Using the many to spot the few

At present, the OpenITI / KITAB corpus comprises 10,243 text files, 6,268 of which are unique titles.  Such a large, and growing, number of texts makes quality control challenging. But at the same time, it is precisely this large number of texts that can be the basis...