Introduction to our methods

Welcome to the methods section of the website. The KITAB team is proud to be involved in research that is at the cutting edge of computer science and natural language processing. We are keen to stress that we do not treat methods in natural language processing as a black box. These are methods that have to be intuitively refined to suit the languages and texts to which they are applied (on these issues, see challenges). They must be improved through collaboration between experts working with the methods (typically computer scientists) and experts working with the texts.

This is the case particularly because a number of our methods make use of machine learning. Machine learning is becoming well known, especially for its recent advances in the field of science. In our case, using machine learning involves identifying certain features in a text (such as the isnad) – that is, creating a set of training data – and then using those instances to train a computer to recognise the features in texts it has not previously seen. It is an iterative process that can involve the creation of several training data sets to further refine the computer’s model (for example, if the model performs poorly on certain genres of text). The work we do that uses machine learning is therefore a truly collaborative process. We use machine learning mostly for problems of subgenre classification, which in turn informs our research into citation practices and scholarly networks.

It is for these reasons that we give so much emphasis to methods here. Anyone using our data or reading our publications should be able to read about, understand and critique our methods. It also gives us the space and opportunity to keep you updated about our work to improve the methods that we use for text analysis.

We have decided to split this section of our website into two. The first part, ‘How We Study’, describes the computational methods that we employ and that we are involved in developing. This part engages with the more technical dimensions of how these methods work and how we are working to improve them. Those involved in research in natural language processing or other fields of computer science may find these pages of greatest interest.

The second part, ‘What We Study’, showcases how we are employing these various methods to answer historical questions. This section is grounded in real research and provides links to our latest blog posts on the subject. If you are researching history or Arabic literature, these pages are more likely to be of interest to you.

Nonetheless, we encourage you to read these pages regardless of your disciplinary background. They remain our official guide to the methods that we use and the manner in which they work. We will reference these pages in our blog posts and publications and will update them as our work advances.

Why read these pages?

We have provided these pages with a number of audiences in mind (so we ask specialists to please bear with us). You might be interested in these pages if

you are using our apps and visualisations for your research and wish to know how we produced the data behind them (in this case, we urge you to read this material so you can fully understand the caveats of our methods)
you work in the Digital Humanities (whether in Arabic or not) and would like to learn from the methods we used (in this case, we recommend that you also consult the documentation and check out our blog)
you work in natural language processing or similar fields in computer science and would like to know how we are adapting the latest approaches for classical Arabic – perhaps you might have suggestion for methods that we should adopt.

Do you see a way you might help us further our research? If so, reach out to us and get involved.