A new version of KITAB’s text reuse data is now available to download at Zenodo, an Open Science platform that supports Open Access. The current release features 1,641,766 CSV files of pairwise (book-to-book) text reuse alignments with a total of 51,491,084 alignments between 8,600 books, authored by 3,306 authors. The release also features book-to-book statistics.
Both the alignments and statistical data can be used in computational analyses and visualizations. The KITAB team has been developing an application to explore this data and the underlying corpus. The application also features older versions of the corpus and text reuse data. Future versions of the OpenITI corpus and the corresponding text reuse data will be added once they are released.
Not all of the text reuse statistics are loaded into the application at present, but will be by the end of the year (you will see our warnings on pages where data might be incomplete). We will also continue to develop the application. If you have any feedback or find bugs, please feel free to reach out to the KITAB team.
To access the release notes, citation information, and data, please use the information on the publication page.
Are you interested in using our data in your research? Do you have a research question related to the text reuse data, but are unsure how to proceed? At 2.30pm BST on July 23rd 2024, we will be holding our first KITAB Open House. Save the date and look out for an announcement in the coming weeks.