Posts by tags

Post 1: Introducing Ibn ‘Asakir and His History of Damascus

August 11, 2023 32 minute read

The OpenITI corpus contains more than 11,000 works and now exceeds 2 billion words in size. Many of the corpus’s works are extraordinarily large, surpassing ...

Bias in the OpenITI corpus

January 9, 2026 12 minute read

Bias in the OpenITI corpus

January 9, 2026 12 minute read

Bias in the OpenITI corpus

Digital Lead (Subcontractor) – ERC Project KITAB-Transform

January 8, 2026 3 minute read

Digital Lead (Subcontractor) – ERC Project KITAB-Transform

Post 8: Bibliography

September 8, 2023 4 minute read

Antrim, Zayde, ‘Nostalgia for the Future: A Comparison between the Introductions to Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and al-Khaṭīb al-Baghdādī’s Taʾrīkh...

Post 7: People, Connections and Memory

September 8, 2023 13 minute read

Image yourself as a learned bookseller of the twelfth century. You have just been called in to assess the estate of a wealthy, prominent scholar who has died...

Post 6: Searches for References to written materials outside of Isnads

August 29, 2023 16 minute read

As noted in the last post, we struggled to verify book citations in the TMD, both within and outside of isnāds. We believe that our struggles reflect the cha...

Post 5: Ibn ‘Asakir’s Citation of Author Names in Isnads

August 24, 2023 47 minute read

We continue our investigation of Ibn ʿAsākir’s citations to address our third question about his working methods. When author names appear within his isnāds,...

Post 4: Ibn ‘Asakir’s Transmission Terms in Isnads

August 22, 2023 31 minute read

Our previous blog post featured a deep dive into the pool of informants whom Ibn ʿAsākir cites frequently. Now we turn to the big picture of how he says he a...

Post 3: Ibn ‘Asakir’s Direct Informants

August 17, 2023 49 minute read

Ibn ʿAsākir names many persons from whom he acquired information for the TMD. What can our data tell us about them?

Post 2: Ibn ‘Asakir and His History of Damascus, the Data Set

August 11, 2023 15 minute read

Digital humanists often say they would like to read more work in progress. Our blog posts represent such work. We worked intensively over months to create an...

Post 1: Introducing Ibn ‘Asakir and His History of Damascus

August 11, 2023 32 minute read

The OpenITI corpus contains more than 11,000 works and now exceeds 2 billion words in size. Many of the corpus’s works are extraordinarily large, surpassing ...

First Five Hundred Years of the Arabic Book: The Native Origin of the Authors

April 29, 2021 10 minute read

Quantitative and macroanalytic approaches

Adventures in Alignments: Training an Algorithm to Recognise Text Reuse

August 7, 2020 9 minute read

Text reuse is the term that we use to describe cases where one book shares verbatim material with another. Text reuse can be studied manually through the rea...

Algorithmic Reading of Shiʿi Hadith Collections: Direct Borrowing and Common Sources

June 22, 2020 13 minute read

It is not accidental that a large number of books in the OpenITI corpus belong to one important genre, prophetic Hadith – the sayings of the Prophet Muhammad...

On Commentaries, Digressions, Transtextualities, and Rabbit Holes

December 3, 2019 5 minute read

Running the passim algorithm on the OpenITI corpus allows us to identify a vast number of instances of text reuse, but the quality of these results from a hi...

A Tale of 3 “Versions”

September 10, 2017 11 minute read

Measuring variation in the early tradition

Bias in the OpenITI corpus

January 9, 2026 12 minute read

Bias in the OpenITI corpus

First Five Hundred Years of the Arabic Book: The Native Origin of the Authors

April 29, 2021 10 minute read

Quantitative and macroanalytic approaches

Diversifying the OpenITI corpus, One Text at a Time

January 21, 2021 9 minute read

The vast majority of texts in the OpenITI corpus were sourced from three major collections of digital texts originally prepared by organisations based in the...

Between Manuscripts and Digital Texts: Commentaries on Hadith Raʾs al-Jalut

September 30, 2020 12 minute read

For us as digital historians and corpus curators, faced with the complex history of reception and transmission as well as the distinct approach to learning a...

Algorithmic Reading of Shiʿi Hadith Collections: Direct Borrowing and Common Sources

June 22, 2020 13 minute read

It is not accidental that a large number of books in the OpenITI corpus belong to one important genre, prophetic Hadith – the sayings of the Prophet Muhammad...

Contagion in the Corpus: The Black Death and Where to Find It

April 22, 2020 8 minute read

“How can I bear to pair fair words in rhyme

On Commentaries, Digressions, Transtextualities, and Rabbit Holes

December 3, 2019 5 minute read

Running the passim algorithm on the OpenITI corpus allows us to identify a vast number of instances of text reuse, but the quality of these results from a hi...

Post 8: Bibliography

September 8, 2023 4 minute read

Antrim, Zayde, ‘Nostalgia for the Future: A Comparison between the Introductions to Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and al-Khaṭīb al-Baghdādī’s Taʾrīkh...

Post 7: People, Connections and Memory

September 8, 2023 13 minute read

Image yourself as a learned bookseller of the twelfth century. You have just been called in to assess the estate of a wealthy, prominent scholar who has died...

Post 6: Searches for References to written materials outside of Isnads

August 29, 2023 16 minute read

As noted in the last post, we struggled to verify book citations in the TMD, both within and outside of isnāds. We believe that our struggles reflect the cha...

Post 5: Ibn ‘Asakir’s Citation of Author Names in Isnads

August 24, 2023 47 minute read

We continue our investigation of Ibn ʿAsākir’s citations to address our third question about his working methods. When author names appear within his isnāds,...

Post 4: Ibn ‘Asakir’s Transmission Terms in Isnads

August 22, 2023 31 minute read

Our previous blog post featured a deep dive into the pool of informants whom Ibn ʿAsākir cites frequently. Now we turn to the big picture of how he says he a...

Post 3: Ibn ‘Asakir’s Direct Informants

August 17, 2023 49 minute read

Ibn ʿAsākir names many persons from whom he acquired information for the TMD. What can our data tell us about them?

Post 2: Ibn ‘Asakir and His History of Damascus, the Data Set

August 11, 2023 15 minute read

Digital humanists often say they would like to read more work in progress. Our blog posts represent such work. We worked intensively over months to create an...

Post 1: Introducing Ibn ‘Asakir and His History of Damascus

August 11, 2023 32 minute read

The OpenITI corpus contains more than 11,000 works and now exceeds 2 billion words in size. Many of the corpus’s works are extraordinarily large, surpassing ...

OpenITI and the Fihrist: Analysis

November 4, 2022 18 minute read

This is the third blog in a short series of blogs on the overlap between the OpenITI corpus and Ibn al-Nadim’s Fihrist. Please refer to the first part for a ...

OpenITI and the Fihrist: Methodology

November 4, 2022 14 minute read

This is the second blog in a short series of blogs on the overlap between the OpenITI corpus and Ibn al-Nadim’s Fihrist. Please refer to the first part for a...

OpenITI and the Fihrist

November 4, 2022 3 minute read

The corpus of texts the KITAB project uses as the basis for its research is a subsection of the OpenITI corpus. It contains Arabic-language texts of the firs...

A Ramble Through the Cluster Data, Part 2: Quantifying and Visualising Clusters.

June 21, 2022 9 minute read

In part 1, I introduced you to the cluster data set, a second passim data set that is slightly different from the pairwise data set that the KITAB team use i...

A Ramble Through the Cluster Data, Part 1: From Pairs to Clusters.

May 19, 2022 10 minute read

It should be no surprise to any reader of this blog that the KITAB project is primarily interested in studying Arabic text reuse. A large number of posts her...

From Networks to Named Entities and Back Again: Exploring Isnad Networks

May 31, 2021 9 minute read

From Networks to Named Entities and Back Again: Exploring Isnad Networks

Diversifying the OpenITI corpus, One Text at a Time

January 21, 2021 9 minute read

The vast majority of texts in the OpenITI corpus were sourced from three major collections of digital texts originally prepared by organisations based in the...

Al-Maktaba al-Shamila: a short history

December 3, 2020 10 minute read

(This is the first blog post in a longer series of posts about the sources of OpenITI.)

Between Manuscripts and Digital Texts: Commentaries on Hadith Raʾs al-Jalut

September 30, 2020 12 minute read

For us as digital historians and corpus curators, faced with the complex history of reception and transmission as well as the distinct approach to learning a...

Adventures in Alignments: Training an Algorithm to Recognise Text Reuse

August 7, 2020 9 minute read

Text reuse is the term that we use to describe cases where one book shares verbatim material with another. Text reuse can be studied manually through the rea...

Algorithmic Reading of Shiʿi Hadith Collections: Direct Borrowing and Common Sources

June 22, 2020 13 minute read

It is not accidental that a large number of books in the OpenITI corpus belong to one important genre, prophetic Hadith – the sayings of the Prophet Muhammad...

Contagion in the Corpus: The Black Death and Where to Find It

April 22, 2020 8 minute read

“How can I bear to pair fair words in rhyme

When al-Tabari is Not (Just) al-Tabari: The Challenges Posed by Composite Editions in the OpenITI Corpus

January 10, 2020 4 minute read

In the past few months the KITAB team members have been closely studying the issue of versioning and composite editions in the OpenITI corpus. The problem of...

On Commentaries, Digressions, Transtextualities, and Rabbit Holes

December 3, 2019 5 minute read

Running the passim algorithm on the OpenITI corpus allows us to identify a vast number of instances of text reuse, but the quality of these results from a hi...

Detecting What Authors Took from Earlier Works

May 2, 2018 7 minute read

With text reuse detection, we rely on the power, speed and memory of a computer to find common passages between texts.

A Tale of 3 “Versions”

September 10, 2017 11 minute read

Measuring variation in the early tradition

Call for Papers: A Workshop on Citation (25th-26th July 2022)

April 22, 2022 1 minute read

Modeling Attribution and Acknowledgement in the Digital Humanities: Citation Practices and the Pre-Modern Arabic Book.

From Networks to Named Entities and Back Again: Exploring Isnad Networks

May 31, 2021 9 minute read

From Networks to Named Entities and Back Again: Exploring Isnad Networks

Mapping Who’s Who in Isnads – First Steps

October 5, 2020 10 minute read

One of the major challenges for those working with historical Arabic texts lies in names, and in the variety of ways that authors might refer to the same per...

Between Manuscripts and Digital Texts: Commentaries on Hadith Raʾs al-Jalut

September 30, 2020 12 minute read

For us as digital historians and corpus curators, faced with the complex history of reception and transmission as well as the distinct approach to learning a...

Algorithmic Reading of Shiʿi Hadith Collections: Direct Borrowing and Common Sources

June 22, 2020 13 minute read

It is not accidental that a large number of books in the OpenITI corpus belong to one important genre, prophetic Hadith – the sayings of the Prophet Muhammad...

Tracking Traditions: Identifying Isnads in the OpenITI Corpus

February 3, 2020 10 minute read

Due to its size and coverage, the OpenITI corpus is useful for a wide variety of research purposes. In particular, it represents an excellent opportunity to ...

Introducing Text Evaluation using LabelStudio

April 9, 2026 4 minute read

Data annotation and evaluation is a critical part of the KITAB-Transform workflow. Without user-friendly but highly customisable software, it will be impossi...

Leveraging the OpenITI Corpus for Text Identification: Two Examples from Geniza Documents

March 11, 2026 11 minute read

Among many other things, the steadily growing OpenITI corpus of machine-actionable texts constitutes a useful tool for identifying hitherto unidentified text...

OpenITI Release, version 2025.1.9

February 12, 2026 1 minute read

Citation: please use the citation information (available to export in various formats) on the publication page.

DRZCRPS: Towards a Corpus of Druze Poetry

February 12, 2025 2 minute read

Over the past few years, I have become quite interested in historical Druze religious poetry (from about the 17th–18th century CE onwards). Stumbling across ...

OpenITI release 2023.1.8

October 24, 2023 1 minute read

The 8th version (2023.1.8) of the OpenITI corpus is now available at Zenodo. The release is open access and is also accessible through our GitHub repository....

OpenITI release 2022.2.7

March 13, 2023 2 minute read

The KITAB team has released a new version (2022.2.7) of the OpenITI corpus at Zenodo. The release is open access. It is our seventh release (second release i...

OpenITI release 2022.1.6

November 18, 2022 1 minute read

The KITAB team has released a new version (2022.1.6) of the OpenITI corpus at Zenodo. The release is open access. It is our fifth release (second release in ...

OpenITI and the Fihrist: Analysis

November 4, 2022 18 minute read

This is the third blog in a short series of blogs on the overlap between the OpenITI corpus and Ibn al-Nadim’s Fihrist. Please refer to the first part for a ...

OpenITI and the Fihrist: Methodology

November 4, 2022 14 minute read

This is the second blog in a short series of blogs on the overlap between the OpenITI corpus and Ibn al-Nadim’s Fihrist. Please refer to the first part for a...

OpenITI and the Fihrist

November 4, 2022 3 minute read

The corpus of texts the KITAB project uses as the basis for its research is a subsection of the OpenITI corpus. It contains Arabic-language texts of the firs...

Oh Brethren, Where Are Ye? How to search for words and phrases in the OpenITI corpus, demonstrated with the phrase ‘Ikhwan al-Safa’

February 9, 2022 14 minute read

The OpenITI corpus is designed to facilitate many different forms of computational analysis. Within the KITAB project we spend the bulk of our time fine-tuni...

Some Suggestions on Using OpenITI Corpus to Present Enhanced Digital Versions of Large Collections: The Case of al-Dhari‘a Ila Tasanif al-Shi‘a

November 22, 2021 13 minute read

Tagging the structure of the texts in OpenITI corpus is an important step towards the ultimate goal of the KITAB projectStudying the Arabic textual tradition...

OpenITI release 2021.2.5

October 20, 2021 1 minute read

The KITAB team has released a new version (2021.2.5) of the OpenITI corpus at Zenodo. The release is open access. It is our fifth release (second release in ...

Using the Many to Spot the Few

July 13, 2021 3 minute read

At present, the OpenITI/KITAB corpus comprises 10,243 text files, 6,268 of which are unique titles.

First Five Hundred Years of the Arabic Book: The Native Origin of the Authors

April 29, 2021 10 minute read

Quantitative and macroanalytic approaches

OpenITI release 2021.1.4

February 12, 2021 1 minute read

The KITAB team has released a new version (2021.1.4) of the OpenITI corpus at Zenodo. The release is open access and freely available. It is our fourth relea...

Diversifying the OpenITI corpus, One Text at a Time

January 21, 2021 9 minute read

The vast majority of texts in the OpenITI corpus were sourced from three major collections of digital texts originally prepared by organisations based in the...

Tracing the origins of a historical fragment focused on the Samanids

December 11, 2020 2 minute read

At the Arabic Pasts conference this year, Hugh Kennedy and I presented a paper in the panel dedicated to the Invisible East programme, chaired by the program...

Al-Maktaba al-Shamila: a short history

December 3, 2020 10 minute read

(This is the first blog post in a longer series of posts about the sources of OpenITI.)

OpenITI release 2020.2.3

October 19, 2020 1 minute read

A new version (version 2020.2.3) of the OpenITI corpus is available at Zenodo, an Open Science platform that supports open access. This is the third release ...

Between Manuscripts and Digital Texts: Commentaries on Hadith Raʾs al-Jalut

September 30, 2020 12 minute read

For us as digital historians and corpus curators, faced with the complex history of reception and transmission as well as the distinct approach to learning a...

Preserving Pre-Modern Terminologies

August 5, 2020 11 minute read

To categorise things is a fundamental human and scholarly instinct and activity. And yet it is one not without obstacles, for we soon learn that the world is...

OpenITI, OCR, and Textual Criticism

July 16, 2020 5 minute read

In previous posts, other members of the KITAB team have talked about building the OpenITI corpus of Arabic and Persian sources. Many members of the team are ...

New Release of Our Open Access Arabic Corpus, OpenITI, version 2020.1.2

June 17, 2020 1 minute read

A new version of the corpus used by the KITAB team is now available to download at Zenodo, an Open Science platform that supports open access. This is the se...

Tagging the Structure of Texts in the OPENITI Corpus

June 12, 2020 5 minute read

With currently more than 7,000 titles, collected from a number of huge digital Arabic libraries (al-Jamiʿ al-Kabir, al-Maktaba al-Shamila, Shia Online, etc.)...

Contagion in the Corpus: The Black Death and Where to Find It

April 22, 2020 8 minute read

“How can I bear to pair fair words in rhyme

When al-Tabari is Not (Just) al-Tabari: The Challenges Posed by Composite Editions in the OpenITI Corpus

January 10, 2020 4 minute read

In the past few months the KITAB team members have been closely studying the issue of versioning and composite editions in the OpenITI corpus. The problem of...

First Open Access Release of Our Arabic Corpus

June 8, 2019 1 minute read

Scholars working in Arabic can now download the entire corpus used by the KITAB team through Zenodo, an Open Science platform that supports open access.

A New Application that Helps You Find Texts in the OpenITI Corpus

November 4, 2019 1 minute read

The Open Islamicate Texts Initiative (OpenITI) is a multi-institutional effort to construct the first open-access machine-actionable scholarly corpus of prem...

Post 8: Bibliography

September 8, 2023 4 minute read

Antrim, Zayde, ‘Nostalgia for the Future: A Comparison between the Introductions to Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and al-Khaṭīb al-Baghdādī’s Taʾrīkh...

Post 7: People, Connections and Memory

September 8, 2023 13 minute read

Image yourself as a learned bookseller of the twelfth century. You have just been called in to assess the estate of a wealthy, prominent scholar who has died...

Post 6: Searches for References to written materials outside of Isnads

August 29, 2023 16 minute read

As noted in the last post, we struggled to verify book citations in the TMD, both within and outside of isnāds. We believe that our struggles reflect the cha...

Post 5: Ibn ‘Asakir’s Citation of Author Names in Isnads

August 24, 2023 47 minute read

We continue our investigation of Ibn ʿAsākir’s citations to address our third question about his working methods. When author names appear within his isnāds,...

Post 4: Ibn ‘Asakir’s Transmission Terms in Isnads

August 22, 2023 31 minute read

Our previous blog post featured a deep dive into the pool of informants whom Ibn ʿAsākir cites frequently. Now we turn to the big picture of how he says he a...

Post 3: Ibn ‘Asakir’s Direct Informants

August 17, 2023 49 minute read

Ibn ʿAsākir names many persons from whom he acquired information for the TMD. What can our data tell us about them?

Post 2: Ibn ‘Asakir and His History of Damascus, the Data Set

August 11, 2023 15 minute read

Digital humanists often say they would like to read more work in progress. Our blog posts represent such work. We worked intensively over months to create an...

Post 1: Introducing Ibn ‘Asakir and His History of Damascus

August 11, 2023 32 minute read

The OpenITI corpus contains more than 11,000 works and now exceeds 2 billion words in size. Many of the corpus’s works are extraordinarily large, surpassing ...

Between Manuscripts and Digital Texts: Commentaries on Hadith Raʾs al-Jalut

September 30, 2020 12 minute read

For us as digital historians and corpus curators, faced with the complex history of reception and transmission as well as the distinct approach to learning a...

The DNA of a Book: Reading al-Nuwayrī’s (d. 733/1333) Nihāyat al-arab fī funūn al-adab

October 17, 2024 1 minute read

We released a data set of an experiment conducted by Sarah Bowen Savant and Sohail Merchant, who wanted to understand potentially how, and from what, Shihāb ...

First Five Hundred Years of the Arabic Book: The Native Origin of the Authors

April 29, 2021 10 minute read

Quantitative and macroanalytic approaches

Between Manuscripts and Digital Texts: Commentaries on Hadith Raʾs al-Jalut

September 30, 2020 12 minute read

For us as digital historians and corpus curators, faced with the complex history of reception and transmission as well as the distinct approach to learning a...

Adventures in Alignments: Training an Algorithm to Recognise Text Reuse

August 7, 2020 9 minute read

Text reuse is the term that we use to describe cases where one book shares verbatim material with another. Text reuse can be studied manually through the rea...

Algorithmic Reading of Shiʿi Hadith Collections: Direct Borrowing and Common Sources

June 22, 2020 13 minute read

It is not accidental that a large number of books in the OpenITI corpus belong to one important genre, prophetic Hadith – the sayings of the Prophet Muhammad...

On Commentaries, Digressions, Transtextualities, and Rabbit Holes

December 3, 2019 5 minute read

Running the passim algorithm on the OpenITI corpus allows us to identify a vast number of instances of text reuse, but the quality of these results from a hi...

DRZCRPS: Towards a Corpus of Druze Poetry

February 12, 2025 2 minute read

Over the past few years, I have become quite interested in historical Druze religious poetry (from about the 17th–18th century CE onwards). Stumbling across ...

Arabic Pasts 2025: Programme

August 21, 2025 1 minute read

This annual exploratory and informal workshop offers the opportunity to reflect on methodologies, case studies, and research agendas for investigating histor...

Workshop: Classical ML to AI in Arabic and Islamic Studies

April 15, 2025 2 minute read

Workshop: From Classical ML to AI in Arabic and Islamic Studies: A Hands-On Workshop, July 1-4, 2025

Call For Papers : Arabic Pasts 2025

April 3, 2025 2 minute read

Arabic Pasts: Histories and Historiographies

Arabic Pasts 2024: Programme

September 10, 2024 less than 1 minute read

We are pleased to announce the programme for this year’s Arabic Pasts workshop, running from Thursday 3rd until Friday 4th of October 2024. We have yet anoth...

Call For Papers : Arabic Pasts 2024

May 28, 2024 1 minute read

Arabic Pasts: Histories and Historiographies

Arabic Pasts 2023: Programme

July 31, 2023 less than 1 minute read

We are pleased to announce the programme for this year’s Arabic Pasts, running from Thursday 5th until Friday 6th of October 2023. We have yet another exciti...

Call For Papers : Arabic Pasts 2023

March 25, 2023 1 minute read

Arabic Pasts: Histories and Historiographies

Lecture Announcement: SHARIAsource Lab Workshop : Ibn ʿAsākir and His History of Damascus: Named Entity Recognition and Text Reuse, Sarah Bowen Savant (Harvard Law School)

September 23, 2022 less than 1 minute read

On Tuesday September 27, 2022 at 12:00-1:00PM US EST at Lewis 214, KITAB’s Sarah Bowen Savant, will lead a seminar on research in progress that uses the Open...

Arabic Pasts 2022: Programme

August 11, 2022 less than 1 minute read

We are pleased to announce the programme for this year’s Arabic Pasts. We have yet another exciting series of papers covering a range of topics and periods. ...

A Close and Distant Reading of Writerly Practices: Sarah Bowen Savant’s Inaugural Lecture

May 20, 2022 1 minute read

On Thursday 5th of May 2022 Sarah Bowen Savant gave her inaugural lecture as full professor at the AKU-ISMC.

Call for Papers: A Workshop on Citation (25th-26th July 2022)

April 22, 2022 1 minute read

Modeling Attribution and Acknowledgement in the Digital Humanities: Citation Practices and the Pre-Modern Arabic Book.

Call for papers: Arabic Pasts 2022

March 17, 2022 3 minute read

Arabic Pasts: Histories and Historiographies

Call for Papers – Arabic Pasts: Histories and Historiographies (Annual Workshop)

March 30, 2021 2 minute read

This annual exploratory and informal workshop offers the opportunity to reflect on history writing in Arabic. We encourage contributions focused on methodolo...

Arabic Pasts: Histories and Historiographies Research workshop (October 22-24, 2020 London)

September 29, 2020 1 minute read

This annual exploratory and informal workshop offers the opportunity to reflect on history writing in Arabic. This year the event will be held online to allo...

Arabic Pasts – 2018

November 8, 2018 less than 1 minute read

The ‘Arabic Pasts: Histories and Historiography’ workshop was held in the new Aga Khan Centre in London on the 12th and 13th of October and featured papers t...

First Five Hundred Years of the Arabic Book: The Native Origin of the Authors

April 29, 2021 10 minute read

Quantitative and macroanalytic approaches

Between Manuscripts and Digital Texts: Commentaries on Hadith Raʾs al-Jalut

September 30, 2020 12 minute read

For us as digital historians and corpus curators, faced with the complex history of reception and transmission as well as the distinct approach to learning a...

First Five Hundred Years of the Arabic Book: The Native Origin of the Authors

April 29, 2021 10 minute read

Quantitative and macroanalytic approaches

From Networks to Named Entities and Back Again: Exploring Isnad Networks

May 31, 2021 9 minute read

From Networks to Named Entities and Back Again: Exploring Isnad Networks

Adventures in Alignments: Training an Algorithm to Recognise Text Reuse

August 7, 2020 9 minute read

Text reuse is the term that we use to describe cases where one book shares verbatim material with another. Text reuse can be studied manually through the rea...

OpenITI, OCR, and Textual Criticism

July 16, 2020 5 minute read

In previous posts, other members of the KITAB team have talked about building the OpenITI corpus of Arabic and Persian sources. Many members of the team are ...

Algorithmic Reading of Shiʿi Hadith Collections: Direct Borrowing and Common Sources

June 22, 2020 13 minute read

It is not accidental that a large number of books in the OpenITI corpus belong to one important genre, prophetic Hadith – the sayings of the Prophet Muhammad...

Tracking Traditions: Identifying Isnads in the OpenITI Corpus

February 3, 2020 10 minute read

Due to its size and coverage, the OpenITI corpus is useful for a wide variety of research purposes. In particular, it represents an excellent opportunity to ...

DRZCRPS: Towards a Corpus of Druze Poetry

February 12, 2025 2 minute read

Over the past few years, I have become quite interested in historical Druze religious poetry (from about the 17th–18th century CE onwards). Stumbling across ...

Oh Brethren, Where Are Ye? How to search for words and phrases in the OpenITI corpus, demonstrated with the phrase ‘Ikhwan al-Safa’

February 9, 2022 14 minute read

The OpenITI corpus is designed to facilitate many different forms of computational analysis. Within the KITAB project we spend the bulk of our time fine-tuni...

Some Suggestions on Using OpenITI Corpus to Present Enhanced Digital Versions of Large Collections: The Case of al-Dhari‘a Ila Tasanif al-Shi‘a

November 22, 2021 13 minute read

Tagging the structure of the texts in OpenITI corpus is an important step towards the ultimate goal of the KITAB projectStudying the Arabic textual tradition...

OpenITI Release, version 2025.1.9

February 12, 2026 1 minute read

Citation: please use the citation information (available to export in various formats) on the publication page.

Calling for OpenITI Curators (information session)

February 10, 2026 1 minute read

Would you like to help build and diversify the OpenITI corpus? Frustrated that you cannot find your text? Do you wish that OpenITI had more consistent metada...

Introducing KITAB-Transform

January 30, 2026 6 minute read

On 2 January, we began a new European Research Council-funded project, ‘KITAB-Transform’ (ERC, grant no. 101199672). We hope that scholars will join us on wh...

Workshop: Classical ML to AI in Arabic and Islamic Studies

April 15, 2025 2 minute read

Workshop: From Classical ML to AI in Arabic and Islamic Studies: A Hands-On Workshop, July 1-4, 2025

Call For Papers : Arabic Pasts 2025

April 3, 2025 2 minute read

Arabic Pasts: Histories and Historiographies

Virtual Open House 3: Come Learn About Our Data

December 4, 2024 less than 1 minute read

Please join us for an online ‘Open House’ convened by the Centre for Digital Humanities at the Aga Khan University (International) in the United Kingdom and ...

Text Reuse Data Release, version 2023.1.8

June 14, 2024 1 minute read

A new version of KITAB’s text reuse data is now available to download at Zenodo, an Open Science platform that supports Open Access. The current release feat...

Call For Papers : Arabic Pasts 2024

May 28, 2024 1 minute read

Arabic Pasts: Histories and Historiographies

Call For Papers : Arabic Pasts 2023

March 25, 2023 1 minute read

Arabic Pasts: Histories and Historiographies

OpenITI release 2022.2.7

March 13, 2023 2 minute read

The KITAB team has released a new version (2022.2.7) of the OpenITI corpus at Zenodo. The release is open access. It is our seventh release (second release i...

OpenITI release 2022.1.6

November 18, 2022 1 minute read

The KITAB team has released a new version (2022.1.6) of the OpenITI corpus at Zenodo. The release is open access. It is our fifth release (second release in ...

A Close and Distant Reading of Writerly Practices: Sarah Bowen Savant’s Inaugural Lecture

May 20, 2022 1 minute read

On Thursday 5th of May 2022 Sarah Bowen Savant gave her inaugural lecture as full professor at the AKU-ISMC.

Call for Papers: A Workshop on Citation (25th-26th July 2022)

April 22, 2022 1 minute read

Modeling Attribution and Acknowledgement in the Digital Humanities: Citation Practices and the Pre-Modern Arabic Book.

OpenITI release 2021.2.5

October 20, 2021 1 minute read

The KITAB team has released a new version (2021.2.5) of the OpenITI corpus at Zenodo. The release is open access. It is our fifth release (second release in ...

Call for Papers – Arabic Pasts: Histories and Historiographies (Annual Workshop)

March 30, 2021 2 minute read

This annual exploratory and informal workshop offers the opportunity to reflect on history writing in Arabic. We encourage contributions focused on methodolo...

OpenITI release 2021.1.4

February 12, 2021 1 minute read

The KITAB team has released a new version (2021.1.4) of the OpenITI corpus at Zenodo. The release is open access and freely available. It is our fourth relea...

KITAB postdoc Gowaart Van Den Bossche wins BRAIS-De Gruyter dissertation prize – 2020

November 19, 2020 less than 1 minute read

The British Association for Islamic Studies (BRAIS) and De Gruyter have announced the outcome of the fifth (2020) round of the BRAIS–De Gruyter Prize in the ...

Arabic Pasts: Histories and Historiographies Research workshop (October 22-24, 2020 London)

September 29, 2020 1 minute read

This annual exploratory and informal workshop offers the opportunity to reflect on history writing in Arabic. This year the event will be held online to allo...

Call for Participation in KITAB (Knowledge, Information Technology, and the Arabic Book)

July 24, 2020 1 minute read

The KITAB project is seeking researchers who are interested in collaborating to advance their own, distinct research projects. The aim is to build a small gr...

Arabic Pasts – 2018

November 8, 2018 less than 1 minute read

The ‘Arabic Pasts: Histories and Historiography’ workshop was held in the new Aga Khan Centre in London on the 12th and 13th of October and featured papers t...

KITAB is welcoming a new member!

September 2, 2018 less than 1 minute read

Happy News from the ERC…..and Some Details

January 16, 2018 4 minute read

The European Research Council has awarded KITAB a five-year, €2 million grant that will enable us to make major progress on our research agenda.

OpenITI release 2020.2.3

October 19, 2020 1 minute read

A new version (version 2020.2.3) of the OpenITI corpus is available at Zenodo, an Open Science platform that supports open access. This is the third release ...

First Open Access Release of Our Arabic Corpus

June 8, 2019 1 minute read

Scholars working in Arabic can now download the entire corpus used by the KITAB team through Zenodo, an Open Science platform that supports open access.

New KITAB visualizations

December 3, 2021 15 minute read

Much of our work at KITAB involves comparing books in order to understand their relationships. Our main tool for this is the passim software, which detects p...

Post 8: Bibliography

September 8, 2023 4 minute read

Antrim, Zayde, ‘Nostalgia for the Future: A Comparison between the Introductions to Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and al-Khaṭīb al-Baghdādī’s Taʾrīkh...

Post 7: People, Connections and Memory

September 8, 2023 13 minute read

Image yourself as a learned bookseller of the twelfth century. You have just been called in to assess the estate of a wealthy, prominent scholar who has died...

Post 6: Searches for References to written materials outside of Isnads

August 29, 2023 16 minute read

As noted in the last post, we struggled to verify book citations in the TMD, both within and outside of isnāds. We believe that our struggles reflect the cha...

Post 5: Ibn ‘Asakir’s Citation of Author Names in Isnads

August 24, 2023 47 minute read

We continue our investigation of Ibn ʿAsākir’s citations to address our third question about his working methods. When author names appear within his isnāds,...

Post 4: Ibn ‘Asakir’s Transmission Terms in Isnads

August 22, 2023 31 minute read

Our previous blog post featured a deep dive into the pool of informants whom Ibn ʿAsākir cites frequently. Now we turn to the big picture of how he says he a...

Post 3: Ibn ‘Asakir’s Direct Informants

August 17, 2023 49 minute read

Ibn ʿAsākir names many persons from whom he acquired information for the TMD. What can our data tell us about them?

Post 2: Ibn ‘Asakir and His History of Damascus, the Data Set

August 11, 2023 15 minute read

Digital humanists often say they would like to read more work in progress. Our blog posts represent such work. We worked intensively over months to create an...

Post 1: Introducing Ibn ‘Asakir and His History of Damascus

August 11, 2023 32 minute read

The OpenITI corpus contains more than 11,000 works and now exceeds 2 billion words in size. Many of the corpus’s works are extraordinarily large, surpassing ...

First Five Hundred Years of the Arabic Book: The Native Origin of the Authors

April 29, 2021 10 minute read

Quantitative and macroanalytic approaches

Adventures in Alignments: Training an Algorithm to Recognise Text Reuse

August 7, 2020 9 minute read

Text reuse is the term that we use to describe cases where one book shares verbatim material with another. Text reuse can be studied manually through the rea...

Post 8: Bibliography

September 8, 2023 4 minute read

Antrim, Zayde, ‘Nostalgia for the Future: A Comparison between the Introductions to Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and al-Khaṭīb al-Baghdādī’s Taʾrīkh...

Post 7: People, Connections and Memory

September 8, 2023 13 minute read

Image yourself as a learned bookseller of the twelfth century. You have just been called in to assess the estate of a wealthy, prominent scholar who has died...

Post 6: Searches for References to written materials outside of Isnads

August 29, 2023 16 minute read

As noted in the last post, we struggled to verify book citations in the TMD, both within and outside of isnāds. We believe that our struggles reflect the cha...

Post 5: Ibn ‘Asakir’s Citation of Author Names in Isnads

August 24, 2023 47 minute read

We continue our investigation of Ibn ʿAsākir’s citations to address our third question about his working methods. When author names appear within his isnāds,...

Post 4: Ibn ‘Asakir’s Transmission Terms in Isnads

August 22, 2023 31 minute read

Our previous blog post featured a deep dive into the pool of informants whom Ibn ʿAsākir cites frequently. Now we turn to the big picture of how he says he a...

Post 3: Ibn ‘Asakir’s Direct Informants

August 17, 2023 49 minute read

Ibn ʿAsākir names many persons from whom he acquired information for the TMD. What can our data tell us about them?

Post 2: Ibn ‘Asakir and His History of Damascus, the Data Set

August 11, 2023 15 minute read

Digital humanists often say they would like to read more work in progress. Our blog posts represent such work. We worked intensively over months to create an...

Post 1: Introducing Ibn ‘Asakir and His History of Damascus

August 11, 2023 32 minute read

The OpenITI corpus contains more than 11,000 works and now exceeds 2 billion words in size. Many of the corpus’s works are extraordinarily large, surpassing ...

Diversifying the OpenITI corpus, One Text at a Time

January 21, 2021 9 minute read

The vast majority of texts in the OpenITI corpus were sourced from three major collections of digital texts originally prepared by organisations based in the...

Algorithmic Reading of Shiʿi Hadith Collections: Direct Borrowing and Common Sources

June 22, 2020 13 minute read

It is not accidental that a large number of books in the OpenITI corpus belong to one important genre, prophetic Hadith – the sayings of the Prophet Muhammad...

OpenITI Release, version 2025.1.9

February 12, 2026 1 minute read

Citation: please use the citation information (available to export in various formats) on the publication page.

The DNA of a Book: Reading al-Nuwayrī’s (d. 733/1333) Nihāyat al-arab fī funūn al-adab

October 17, 2024 1 minute read

We released a data set of an experiment conducted by Sarah Bowen Savant and Sohail Merchant, who wanted to understand potentially how, and from what, Shihāb ...

OpenITI release 2023.1.8

October 24, 2023 1 minute read

The 8th version (2023.1.8) of the OpenITI corpus is now available at Zenodo. The release is open access and is also accessible through our GitHub repository....

OpenITI release 2022.2.7

March 13, 2023 2 minute read

The KITAB team has released a new version (2022.2.7) of the OpenITI corpus at Zenodo. The release is open access. It is our seventh release (second release i...

OpenITI release 2022.1.6

November 18, 2022 1 minute read

The KITAB team has released a new version (2022.1.6) of the OpenITI corpus at Zenodo. The release is open access. It is our fifth release (second release in ...

OpenITI release 2021.2.5

October 20, 2021 1 minute read

The KITAB team has released a new version (2021.2.5) of the OpenITI corpus at Zenodo. The release is open access. It is our fifth release (second release in ...

OpenITI release 2021.1.4

February 12, 2021 1 minute read

The KITAB team has released a new version (2021.1.4) of the OpenITI corpus at Zenodo. The release is open access and freely available. It is our fourth relea...

OpenITI release 2020.2.3

October 19, 2020 1 minute read

A new version (version 2020.2.3) of the OpenITI corpus is available at Zenodo, an Open Science platform that supports open access. This is the third release ...

New Release of Our Open Access Arabic Corpus, OpenITI, version 2020.1.2

June 17, 2020 1 minute read

A new version of the corpus used by the KITAB team is now available to download at Zenodo, an Open Science platform that supports open access. This is the se...

First Open Access Release of Our Arabic Corpus

June 8, 2019 1 minute read

Scholars working in Arabic can now download the entire corpus used by the KITAB team through Zenodo, an Open Science platform that supports open access.

Virtual Open House 3: Come Learn About Our Data

December 4, 2024 less than 1 minute read

Please join us for an online ‘Open House’ convened by the Centre for Digital Humanities at the Aga Khan University (International) in the United Kingdom and ...

Text Reuse Data Release, version 2023.1.8

June 14, 2024 1 minute read

A new version of KITAB’s text reuse data is now available to download at Zenodo, an Open Science platform that supports Open Access. The current release feat...

Dispatches from al-Tabari 7: Text Reuse Alignments

October 25, 2021 6 minute read

Post 7: Text Reuse Alignments

Post 8: Bibliography

September 8, 2023 4 minute read

Antrim, Zayde, ‘Nostalgia for the Future: A Comparison between the Introductions to Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and al-Khaṭīb al-Baghdādī’s Taʾrīkh...

Post 7: People, Connections and Memory

September 8, 2023 13 minute read

Image yourself as a learned bookseller of the twelfth century. You have just been called in to assess the estate of a wealthy, prominent scholar who has died...

Post 6: Searches for References to written materials outside of Isnads

August 29, 2023 16 minute read

As noted in the last post, we struggled to verify book citations in the TMD, both within and outside of isnāds. We believe that our struggles reflect the cha...

Post 5: Ibn ‘Asakir’s Citation of Author Names in Isnads

August 24, 2023 47 minute read

We continue our investigation of Ibn ʿAsākir’s citations to address our third question about his working methods. When author names appear within his isnāds,...

Post 4: Ibn ‘Asakir’s Transmission Terms in Isnads

August 22, 2023 31 minute read

Our previous blog post featured a deep dive into the pool of informants whom Ibn ʿAsākir cites frequently. Now we turn to the big picture of how he says he a...

Post 3: Ibn ‘Asakir’s Direct Informants

August 17, 2023 49 minute read

Ibn ʿAsākir names many persons from whom he acquired information for the TMD. What can our data tell us about them?

Post 2: Ibn ‘Asakir and His History of Damascus, the Data Set

August 11, 2023 15 minute read

Digital humanists often say they would like to read more work in progress. Our blog posts represent such work. We worked intensively over months to create an...

Post 1: Introducing Ibn ‘Asakir and His History of Damascus

August 11, 2023 32 minute read

The OpenITI corpus contains more than 11,000 works and now exceeds 2 billion words in size. Many of the corpus’s works are extraordinarily large, surpassing ...

First Five Hundred Years of the Arabic Book: The Native Origin of the Authors

April 29, 2021 10 minute read

Quantitative and macroanalytic approaches

Mapping Who’s Who in Isnads – First Steps

October 5, 2020 10 minute read

One of the major challenges for those working with historical Arabic texts lies in names, and in the variety of ways that authors might refer to the same per...

Oh Brethren, Where Are Ye? How to search for words and phrases in the OpenITI corpus, demonstrated with the phrase ‘Ikhwan al-Safa’

February 9, 2022 14 minute read

The OpenITI corpus is designed to facilitate many different forms of computational analysis. Within the KITAB project we spend the bulk of our time fine-tuni...

Post 8: Bibliography

September 8, 2023 4 minute read

Antrim, Zayde, ‘Nostalgia for the Future: A Comparison between the Introductions to Ibn ʿAsākir’s Taʾrīkh Madīnat Dimashq and al-Khaṭīb al-Baghdādī’s Taʾrīkh...

Post 7: People, Connections and Memory

September 8, 2023 13 minute read

Image yourself as a learned bookseller of the twelfth century. You have just been called in to assess the estate of a wealthy, prominent scholar who has died...

Post 6: Searches for References to written materials outside of Isnads

August 29, 2023 16 minute read

As noted in the last post, we struggled to verify book citations in the TMD, both within and outside of isnāds. We believe that our struggles reflect the cha...

Post 5: Ibn ‘Asakir’s Citation of Author Names in Isnads

August 24, 2023 47 minute read

We continue our investigation of Ibn ʿAsākir’s citations to address our third question about his working methods. When author names appear within his isnāds,...

Post 4: Ibn ‘Asakir’s Transmission Terms in Isnads

August 22, 2023 31 minute read

Our previous blog post featured a deep dive into the pool of informants whom Ibn ʿAsākir cites frequently. Now we turn to the big picture of how he says he a...

Post 3: Ibn ‘Asakir’s Direct Informants

August 17, 2023 49 minute read

Ibn ʿAsākir names many persons from whom he acquired information for the TMD. What can our data tell us about them?

Post 2: Ibn ‘Asakir and His History of Damascus, the Data Set

August 11, 2023 15 minute read

Digital humanists often say they would like to read more work in progress. Our blog posts represent such work. We worked intensively over months to create an...

Post 1: Introducing Ibn ‘Asakir and His History of Damascus

August 11, 2023 32 minute read

The OpenITI corpus contains more than 11,000 works and now exceeds 2 billion words in size. Many of the corpus’s works are extraordinarily large, surpassing ...

A Ramble Through the Cluster Data, Part 2: Quantifying and Visualising Clusters.

June 21, 2022 9 minute read

In part 1, I introduced you to the cluster data set, a second passim data set that is slightly different from the pairwise data set that the KITAB team use i...

A Ramble Through the Cluster Data, Part 1: From Pairs to Clusters.

May 19, 2022 10 minute read

It should be no surprise to any reader of this blog that the KITAB project is primarily interested in studying Arabic text reuse. A large number of posts her...

Can Digital Humanities Be Informed by Bioinformatics? Visualising Passim Data for Multiple-Book Relationships

April 29, 2021 9 minute read

As KITAB’s research has shown, passim is an incredibly powerful tool for answering a variety of questions about book history and history in general. The algo...

Tracing the origins of a historical fragment focused on the Samanids

December 11, 2020 2 minute read

At the Arabic Pasts conference this year, Hugh Kennedy and I presented a paper in the panel dedicated to the Invisible East programme, chaired by the program...

Between Manuscripts and Digital Texts: Commentaries on Hadith Raʾs al-Jalut

September 30, 2020 12 minute read

For us as digital historians and corpus curators, faced with the complex history of reception and transmission as well as the distinct approach to learning a...

Adventures in Alignments: Training an Algorithm to Recognise Text Reuse

August 7, 2020 9 minute read

Text reuse is the term that we use to describe cases where one book shares verbatim material with another. Text reuse can be studied manually through the rea...

Algorithmic Reading of Shiʿi Hadith Collections: Direct Borrowing and Common Sources

June 22, 2020 13 minute read

It is not accidental that a large number of books in the OpenITI corpus belong to one important genre, prophetic Hadith – the sayings of the Prophet Muhammad...

On Commentaries, Digressions, Transtextualities, and Rabbit Holes

December 3, 2019 5 minute read

Running the passim algorithm on the OpenITI corpus allows us to identify a vast number of instances of text reuse, but the quality of these results from a hi...

A First Look at KITAB’s Data

September 7, 2018 4 minute read

The digital revolution is arriving rather late to Middle Eastern studies, but it is coming fast.

Detecting What Authors Took from Earlier Works

May 2, 2018 7 minute read

With text reuse detection, we rely on the power, speed and memory of a computer to find common passages between texts.

A Tale of 3 “Versions”

September 10, 2017 11 minute read

Measuring variation in the early tradition

New KITAB visualizations

December 3, 2021 15 minute read

Much of our work at KITAB involves comparing books in order to understand their relationships. Our main tool for this is the passim software, which detects p...

Some Suggestions on Using OpenITI Corpus to Present Enhanced Digital Versions of Large Collections: The Case of al-Dhari‘a Ila Tasanif al-Shi‘a

November 22, 2021 13 minute read

Tagging the structure of the texts in OpenITI corpus is an important step towards the ultimate goal of the KITAB projectStudying the Arabic textual tradition...

Posts by tags

book-history

release

IbnAsakir

OpenITI

Shamela

Tender-Notice

author-practice

author-practice data

bias

book-forms

book-history

citation

corpus

corpus data

cultural-memory

data

dispersed-texts

druze-corpus

events

iran

khurasan

machine-learning

manuscripts

markdown