Scholars working in Arabic can now download the entire corpus used by the KITAB team through Zenodo, an Open Science platform that supports open access.
Users will find in the corpus here: https://doi.org/10.5281/zenodo.3082464
Number of authors: 1,859.
Number of titles: 4,288, totalling 755,689,541 words.
Multiple versions of the same titles: 7,144, totalling 1,520,667,360 words.
The texts sit within the OpenITI corpus (the KITAB project is the major contributor of Arabic texts to the OpenITI). All major versions of the corpus, as well as analytical data sets generated from the corpus with different methods, will be published on Zenodo in the future as part our commitment to open access. For the release notes, including the structure of the data, see here.
The goal of the OpenITI is to build a machine-actionable corpus of premodern texts in Islamicate languages to encourage computational analysis of the Islamicate written tradition. Most of the Arabic texts have been collected from open-access online collections of premodern and modern Arabic texts such as http://shamela.ws/ and http://shiaonlinelibrary.com/ .
If you are using this version of the corpus, please cite it in the following manner:
Maxim Romanov and Masoumeh Seydi, OpenITI: A Machine-Readable Corpus of Islamicate Texts (Version 2019.1.1) [data set] (June 2019), Zenodo, doi:10.5281/zenodo.3082464.