Challenges

The digital methods used by the KITAB team are rarely new. They have been adapted from approaches that are already commonly used in other disciplines. Computational analysis of text is well advanced for English and other Latin-script languages. If you have performed a Google search or used Turnitin, you will have probably already leveraged some of the methods that we utilise in our analyses.

The field of digital analysis is largely guided by the needs of the present. Many existing methods were developed for English or other Latin-script languages. Where they have been adapted or developed for Arabic, they have been geared towards Modern Standard Arabic (even Arabic dialects remain relatively underrepresented). This is particularly the case for methods that rely on machine-learning approaches. For languages that are in current use, especially on the internet, there is a ready base of data that can be used to train a language model (for example, Wikipedia provides an accessible supply of proper nouns for the automatic detection of names in texts). For more on how we use machine learning, see our work on subgenre classification.

These methods have not been developed with classical Arabic in mind. KITAB focuses on collaboration with computer scientists, pairing expert knowledge in computer science with that in the humanities to adapt digital methods for use with classical Arabic. This is an iterative and multi-staged process, and the effectiveness of our methods will improve over time.

To observe the evolution of our methods and the processes that we follow to adapt them for classical Arabic, see our blog.