Text Reuse Data Release, version 2023.1.8

A new version of KITAB’s text reuse data is now available to download at Zenodo, an Open Science platform that supports Open Access. The current release features 1,641,766 CSV files of pairwise (book-to-book) text reuse alignments with a total of 51,491,084 alignments between 8,600 books, authored by 3,306 authors. The release also features book-to-book statistics.

Both the alignments and statistical data can be used in computational analyses and visualizations. The KITAB team has been developing an application to explore this data and the underlying corpus. The application also features older versions of the corpus and text reuse data. Future versions of the OpenITI corpus and the corresponding text reuse data will be added once they are released.

Not all of the text reuse statistics are loaded into the application at present, but will be by the end of the year (you will see our warnings on pages where data might be incomplete). We will also continue to develop the application. If you have any feedback or find bugs, please feel free to reach out to the KITAB team.

To access the release notes, citation information, and data, please use the information on the publication page.

Are you interested in using our data in your research? Do you have a research question related to the text reuse data, but are unsure how to proceed? At 2.30pm BST on July 23rd 2024, we will be holding our first KITAB Open House. Save the date and look out for an announcement in the coming weeks.

Share on

Twitter Facebook LinkedIn

Leveraging the OpenITI Corpus for Text Identification: Two Examples from Geniza Documents

March 11, 2026 12 minute read

Among many other things, the steadily growing OpenITI corpus of machine-actionable texts constitutes a useful tool for identifying hitherto unidentified text...

Text Reuse Data Release, version 2023.1.8

Masoumeh Seydi

Glossary:

Share on

You may also enjoy

Leveraging the OpenITI Corpus for Text Identification: Two Examples from Geniza Documents

Research Workshop on Miskawayh

Corpus Building Workshop

Arabic Pasts 2026 - Call for Papers