The vast majority of texts in the OpenITI corpus were sourced from three major collections of digital texts originally prepared by organisations based in the Middle East (see Peter Verkinderen’s excellent blog on the largest of them, Shamela). These collections have proven invaluable to researchers, not in the least because they have facilitated digital forms of analysis, but they all made choices about which textual materials should belong on their platforms and which should not. Shamela for example did not originally include any poetry and focused largely on Sunni traditionalism. Another significant omission in each of these collections is the rich corpus of Arabic texts written by and for non-Muslims.

While it would be naive to think that OpenITI is itself unbiased, we do aim to be as inclusive as possible and try to address several of the issues found in other collections. First, we bring the separate collections together and engage all of them in a continuous dialogue on text reuse. Second, we identify and try to fill gaps in the corpus through digitising texts ourselves or through collaborating with other research teams. This has led to the inclusion of texts digitised by other academic projects in fields that are of only marginal interest to the major text collections. Due to the concerted efforts of Peter Verkinderen and Lorenz Nigst, we now have a number of agrarian treatises, Greco-Arabic philosophical materials, texts of ancient wisdom, as well as some materials preserved in the so-called Cairo Geniza. We list all these various collections and individual contributors with their URI identifiers here.

In the meantime, I have also contributed a number of texts I am interested in by using Optical Character Recognition software, typing up manuscript treatises, and by contacting people who published texts online. To encourage people to also contribute such materials they may have to our corpus, I will here briefly describe a small set of texts I have contributed to the OpenITI in this way. If readers of this blog feel inclined to contribute textual materials in exchange for text reuse data, they should get in contact with KITAB team members and we will happily accept their offers.

Some time ago I was researching historical accounts about a Mamluk expedition against the Nusayri-ʿAlawis in the Lebanese Kisrawan region (the rugged mountain terrain north and northeast of Beirut) which took place in 704/1304. To my pleasant surprise, I found a number of Maronite sources who discuss these events. One of these was a type of text I had not encountered before in my readings of Arabic literature: an epic poem in the zajaliyya form written by the Maronite bishop Jibraʾil Ibn al-Qilaʿi (d. 921/1516). Ibn al-Qilaʿi had been instrumental in spreading adherence to the Catholic Pope amongst Maronites in the region of present-day Lebanon, and thus strengthening the full communion with Rome which the Maronite Church still follows. The poem is usually designated as Madiha fi Jabal Lubnan. and describes events of heroic resistance against outsiders by Maronite chiefs from the dawn of time up to his own day. It consists of 294 quatrains of which the first three hemistiches rhyme with each other and the fourth ends on a monorhyme sustained throughout the entire poem. While there are many Arabic poems discussing or commenting on historical events, they do not tend to be narrative in the way an epic poem usually is. Ibn al-Qilaʿi’s work, by contrast, is highly narrative, if not in a linear chronological way. Furthermore, Ibn al-Qilaʿi’s introductory verses express some interesting ideas about time, history and memory which I am hoping to unpack more in further research. Consider for example this quatrain which seems to conceive of an important role for poets in preserving cultural memory:

وكيف العصور بتتغير   *   وفيها العقول بتتحير  *  ولولا توجد في الاسطار  

 ما كان يخبر عنه انسان

How the ages change

while intellects get confused

if it weren’t for you finding in the lines of text

what informs man about himself

Because of its relatively short length I decided to digitise this poem by using Google’s OCR engine which can be used via Google Drive. The added personal benefit is that by post-correcting the digital text I am effectively skim reading it as a whole, familiarising myself with its structure and identifying interesting sections to return to in more detail. Ibn al-Qilaʿi’s text was edited by Bulus Qaraʾli in 1937 as part of his book Hurub al-muqaddamin (“The Wars of the [Maronite] Commanders”), a blend of primary texts and commentary by Qaraʾli on the Maronite community’s vicissitudes in dealing with their surroundings across the ages. Because of this hybrid publication form, digitising this edition posed a number of challenges and obliged me to make some editorial choices. We always record these kinds of decisions in a so-called YML-file, a specific format of markup language which we use to record various forms of metadata in a formalised and computer-readable way. We use three forms of such files: one for author metadata (which includes an author’s birth and death date, various geographical locations, as well as personal relationships), one for general book metadata (including a.o. variant titles and relationships to other books), and one for version metadata. Below is an example of my version metadata for Ibn al-Qilaʿi’s Madiha

00#VERS#CLENGTH##: 20224
00#VERS#LENGTH###: 4673
00#VERS#URI######: 0921JibrailIbnQilaci.MadihaFiJabalLubnan.GVDB20200821-ara1
90#VERS#ANNOTATOR: Gowaart Van Den Bossche
90#VERS#COMMENT##: This text was digitised with Google OCR and corrected by Gowaart. I have followed the edition's lack of consistently noting the hamza and using a generic ha instead of ta marbuta. The edition of the text is somewhat complicated. It interlaces the poem included here with a 19th-century anonymous mukhtasar (see Salibi, "Maronite Historians of Medieval Lebanon", p. 39) as well as a commentary by the editor. For OpenITI purposes I felt it was best to separate these three layers, and assign separate URIs to the poem and the mukhtasar, while deleting the editorial commentary. I have also deleted the titles, as these seem to be part of the commentary rather than the poem. As a result the pagination looks like the text has gaps, but this is entirely due to my reorganisation of the texts.  The poem is a zajaliyya in Lebanese colloquial. This poetic form is in quatrains, as opposed to the usual Arabic two hemistiches. Currently mARkdown does not accommodate this form (in which the first three hemistiches rhyme with one another, while the last hemistich follows the endrhyme of each other quatrain in the poem), so I have followed some other editors of other texts by splitting the quatrain into two parts. This way it looks identical to other Arabic poetry, except that the end rhyme only happens every two lines. 90#VERS#DATE#####: 2020-08-21

See Hamidreza’s blog for another example of a YML file and some discussion of how it should be used within the OpenITI annotation process

Having grown interested in Ibn al-Qilaʿi (a more malignant observer might say I fell down a substantial rabbit hole) I did some further research and stumbled upon the Academia page of the Lebanese cleric P. Elie Saade. He had edited a few zajaliyyas he found in manuscripts across his home country and published them on his page. I contacted him directly to ask whether he would want to contribute his editions to the OpenITI, something he happily agreed to do. As such, we were able to add one more poem written by Ibn al-Qilaʿi (a hagiography of the third century saint Mar Nuhra) to our corpus, as well as a 19th-century zajaliyya by a Lebanese priest. Saade had transcribed both of these into Arabic script from the Karshuni (Arabic in Syriac script) originals in which they have been preserved. He has also since then notified me of another zajaliyya he published which deals with the experience of the First World War in Lebanon. I hope to add this text to the corpus in the near future. You can find the OpenITI versions of the two first texts on the links below. Note the PES+date at the end of the URI, this identifies P. Elie Saade as contributor to the corpus. Note also that in the above-cited YML file the URI bears my initials GVDB as I was the one who digitised this text through OCR. In this way we keep track of the diverse origins of text versions in our corpus while harmonising their contents as much as possible. 


Through such additions our corpus becomes a little more balanced, but these are of course only small steps towards capturing the true diversity of the Arabic written tradition. There is still a lot of work to be done to achieve a more rounded balance in the corpus. Indeed, three new Christian Arabic texts within a corpus of some 5,000 individual texts is but a drop in an Islamic ocean. It will also come as no surprise that neither of these three texts showcases a lot (if any) reuse. The one case of reuse identified by passim for Ibn al-Qilaʿi is from a late 19th-century compendium of Christian Arabic texts compiled by the great Jesuit Chaldean Catholic scholar Louis Cheikho (d. 1927), in which a number of verses from the Madiha are directly quoted. Even fully capturing the relatively minor zajaliyya tradition poses further challenges. Because they are generally seen as of limited literary value (Kamal Salibi called Ibn al-Qilaʿi’s poems “dull”) and because of their highly regional character, only very few of these texts have been edited and published. As noted, the majority only exist in manuscripts written in Karshuni, so they require people fluent in both scripts to properly transcribe and edit them. They are also largely written in colloquial Arabic. This issue of script and language is another aspect we would like our metadata to reflect better, especially as more Geniza materials (much of which is written in Judaeo-Arabic, that is, Arabic written in Hebrew script) will be added to the corpus in the future. 

What I wanted to highlight more generally is that pooling the various resources of Arabic texts increasingly available online into one harmonised corpus is all well and good, but if we want to really remedy major gaps and issues we also need concerted efforts from scholars to identify gaps in the corpus, as well as to actively curate and contribute materials. Here are a few ways in which you can get involved:

1. If you notice relevant texts/genres/… are missing in our corpus, we can show you how to add these texts yourself (or you can simply notify us that we should include them, especially if you can propose a way to acquire them in digital format).

2. If you are a specialist, you could become a curator for texts for your field, identifying gaps, creating lists of texts to be digitized, and so on.

3. Inform us about other projects that have digitized texts that may not be in our corpus.


The most comprehensive treatment of Ibn al-Qilaʿi remains K.S. Salibi, Maronite Historians of Medieval Lebanon (Beirut 1959).

The photograph used in this blog was taken by Gowaart in May 2017 in one of the Maronite monasteries in the beautiful Kadisha Valley below Bsharré in North Lebanon.
Share this: