The KITAB team has released a new version (2021.2.5) of the OpenITI corpus at Zenodo. The release is open access. It is our fifth release (second release in 2021). You can also access the release through our GitHub repository.
The release features 10,342 books, including all versions and editions (2,050,063,811 words), of which 6,337 are unique books written by 2,636 authors. In the release, 181 new ids have been added that reflect either new books, including OCRed books, which are assigned a new unique id, or changes to existing ids and corresponding URIs (resulting in new ids). In terms of structural annotation, 120 books have a new status, of which eighty-two are fully available in structural OpenITI mARkdown with a .mARkdown extension. Currently, there are 353 books in mARkdown, which have been reviewed and vetted by the annotation team.
To access the major changes to the URIs as well as the statistics on the corpus, please see the release notes and the corresponding csv files at the publication page and the GitHub repository.
To cite this version please include the following information (the bibliographical export is available at the publication page):
Lorenz Nigst, Maxim Romanov, Sarah Bowen Savant, Masoumeh Seydi and Peter Verkinderen, OpenITI: A Machine-Readable Corpus of Islamicate Texts (Version 2021.2.5) [data set] (October 2021), Zenodo, https://doi.org/10.5281/zenodo.5550338.