About the Project: A Message from the PI

Sarah-Savant

Welcome to ‘KITAB-Transform’

The KITAB project (Knowledge, Information Technology, and the Arabic Book) originated as a project to develop methods for detecting how authors copied from previous works. Classical Arabic authors frequently made use of past works, cutting them into pieces and reconstituting them to address their own outlooks and concerns. We wanted to discover relationships between Arabic texts, as well as the profoundly intertextual circulatory systems in which they sit. The project received funding from the European Research Council (KITAB, no. 772989) and the Qatar National Library. The results can be seen in our web application, blog, many data releases on Zenodo, and publications – including two monographs due out later this year from Edinburgh University Press.

From January 2026, we begin a new chapter. With KITAB-Transform (ERC, grant no. 101199672), we continue to investigate the ways that authors work, but now – thanks to recent advances in machine learning and natural language processing (NLP) – we can be more ambitious. Previously, we were able to detect near-verbatim matches; now we want to detect matches that are semantically similar but worded differently. For the next five years, or more, we will be training models to find and align paraphrases and translations across the OpenITI corpus. Our training and evaluation data sets will classify and explain different matches, with the hope that our models will do so too.

Our work is first aimed at our research community. We work now with increasing numbers of historians and linguists who are interested in our questions, which revolve around authorial practices, book history, and narrative adaptations. Our earlier work focused on Arabic, but now we are also looking across languages to the relationship of Arabic texts with works written in Persian (we hope for Urdu too). We believe KITAB & KITAB-Transform should change how any student or researcher approaches historical sources.

We also aim to reach libraries, museums, and other institutions that hold book data. Our book relationship data is vital to understanding collections and also to building systems that make them accessible. I am inspired by ImageNet and music collections in platforms such as Spotify, where pattern matching is put to different uses.

But we also believe the Artificial Intelligence community cannot afford to ignore the language modeling we are doing. Classical Arabic, with its many challenges, is a good test case for natural language processing. The OpenITI and our data will increasingly offer testing beds for the capacities of language models.

Additionally, historians must always work under conditions of epistemic uncertainty: source texts are often fragmentary and allusive. We work with incomplete evidence. Tasks of the kind commonly used to test large language models match these conditions poorly. Likewise, historians care deeply about productive disagreement whereas machine learning evaluation often treats disagreement as error. A central aim of KITAB-Transform’s AI research is therefore to develop models and evaluation frameworks that treat uncertainty, loss, and disagreement as structured information rather than noise. We expect to publish NLP papers, historically grounded NLP tasks, and benchmarks that advance AI for history. These will also address the wider gaps in current work in AI.

The technology that powers KITAB is at the frontier of research in computer science. Our main partner in Computer Science remains David Smith, at Northeastern University. We join our Open Islamicate Texts Initiative (OpenITI) partners in generating the corpus with which we work. To use our corpus, please start here. We are annotating and vetting works, with documentation available on GitHub.

Do read the blog, as it provides windows onto team members at work. We are working hard to bring all of our data and sources into the public domain. We want research communities everywhere to be able to take best advantage of what digital technology now allows us all to see and to discover.

Thank you for your interest in KITAB & KITAB-Transform. Please do be in touch if you would like to get involved. We welcome your interest.

Warm regards,
Sarah Bowen Savant
Professor of History

Director, Centre for Digital Humanities\ Aga Khan University International
Institute for the Study of Muslim Civilisations
Principal Investigator
KITAB & KITAB-Transform

About KITAB

About the Project: A Message from the PI