Scholars working in Arabic can now download the entire corpus used by the KITAB team through Zenodo, an Open Science platform that supports Open Access.
Users will find in the corpus here: https://doi.org/10.5281/zenodo.3082464
Number of authors: 1,859.
Number of titles: 4,288, totaling 755,689,541 words.
Multiple versions of the same titles: 7,144, totaling 1,520,667,360 words.
The texts sit within the OpenITI corpus (the KITAB project is the major contributor of Arabic texts to the OpenITI). All major versions of the corpus, as well as analytical datasets generated from the corpus with different methods, will be published in the future on Zenodo as part our commitment to Open Access. For the release notes, including the structure of the data, see here.
The goal of the OpenITI is to build a machine-actionable corpus of premodern texts in Islamicate languages to encourage computational analysis of the Islamicate written tradition. Most of the Arabic texts have been collected from open-access online collections of premodern and modern Arabic texts such as http://shamela.ws/ and http://shiaonlinelibrary.com/ .
If you are using the corpus, please cite in the following manner:
Maxim Romanov and Masoumeh Seydi. 2019. “OpenITI: A Machine-readable Corpus of Islamicate Texts”. Zenodo. doi:10.5281/zenodo.3082464.