A new version of the corpus used by the KITAB team is now available to download at Zenodo, an Open Science platform that supports Open Access. This is the second release developed by the OpenITI organization. It is also accessible at GitHub.
The current release features 7,119 books, including all versions and editions (1,464,011,669 words), of which 4,285 are unique works written by 1,833 authors. Among these, 446 books are in OpenITI mARkdown. Moreover, the project team has made corrections to the book metadata. Major corrections are noted in the release note, which also provides statistics on the corpus, as well as a list of current and past contributors to the corpus. The release note is available here.
The release metadata provides the metadata for the texts in this version. Arabic fields for titles, authors, and tags are being added in the current version. In addition to the release metadata, the latest status of the corpus can be searched through KITAB’s metadata application through different fields representing book titles, authors, tags, etc. The application is continuously updated and is available here.
Broadly, OpenITI aims to develop a machine-actionable corpus of premodern texts in Islamicate languages that can facilitate computational analysis of the Islamicate written tradition. The analytical datasets generated in the KITAB team (including text reuse statistics) using this corpus will also in the future be published on Zenodo.
To cite this version please use the following manner. The bibliographical export is also available at the publication page:
Lorenz Nigst, Maxim Romanov, Sarah Bowen Savant, Masoumeh Seydi, & Peter Verkinderen. (2020). OpenITI: a Machine-Readable Corpus of Islamicate Texts (Version 2020.1.2) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3891466