Message from the corpus team

The KITAB corpus is KITAB’s constantly evolving corpus of machine-readable, (mostly) premodern Arabic texts and metadata files. It is part of the larger OpenITI corpus.

As such, it is the very backbone of the research KITAB seeks to advance. It is with regard to this corpus that the humanities scholars and the computer scientists who make up KITAB’s team have joined forces to make possible new forms of computer-aided analysis of the premodern Arabic tradition.

The KITAB corpus is necessarily a work in progress, and since it contains the material with which KITAB works, it requires much attention on our part.

We constantly annotate and improve the quality of our texts and add new texts and metadata, and we put a good amount of conceptual work into making room for the specificities of the premodern Arabic textual tradition and acknowledging questions of survival bias.

Irrespective of their usage by KITAB, all files contained in the KITAB corpus are available to everyone and for free.

The KITAB corpus is not an online library or reading environment. But it shares various elements with the latter insofar as individual works can be accessed online and downloaded.

Nonetheless, we are building our corpus to generate specific data, such as data regarding text reuse. From the perspective of KITAB, our corpus is meaningful because of its integration with such data sets, and it is from them that we build our applications.

The fundamental criterion which all material in our corpus therefore must meet is that it must allow the successful application of our core digital tools; that is, it must be machine-readable for our purposes.

This requirement is decisive insofar as it bears upon our priorities and, collaterally, upon the issues of text quality and the availability of metadata.

KITAB is interested in vetted texts and in metadata that is as rich as possible. Our corpus evolves step by step and progresses, most notably, through subprojects that have their own particular needs and involve particular subsets of texts and metadata items.

But just as our diverse subprojects constantly drive us to improve our corpus according to our own needs, other users should improve it according to their needs.

Please feel most welcome to reach out to us and share any ideas you might have with regard to our corpus or about a subproject you feel would be an important and suitable contribution to our corpus.

The KITAB corpus has benefited from many generous contributions from the field, and we hope that many more will follow.

Lorenz Nigst
Research Associate responsible for the KITAB corpus