Network analysis

Network analysis is a fundamental and well-developed field in the Digital Humanities. KITAB team members are using various computational methods to analyse networks through both our text reuse data and manually curated data sets, documenting connections between books and scholars. For this work we do not rely on a single defined tool, and we encourage you to read our blog to explore the various methods we draw on.

There are two main directions that we are following in network analysis:

Networks from isnads

As we note here, we seek to identify automatically isnads in texts across the corpus. One aim of this work is to convert the automatically identified isnads into strings of related names, which can then be interpreted as networks.

Converting isnads into networks is a complex process, because it requires addressing both name variation (that is, one person being referred to by multiple names) and shared names (that is, multiple people being referred to by the same name). We are working to resolve these problems computationally, using the position of names within isnads to infer instances of name variation and to distinguish shared names. We are also approaching the latter problem by examining the position of isnads within texts, since names might be shortened on their second or third mention.

Sarah Savant and Masoumeh Seydi are working together to create ground truth of the names used in the isnads of two authors: al-Tabari and Ibn ʿAsakir. Ryan Muther is addressing the problem computationally, using the ground truth to test and improve his methods. This analysis is a work in progress, and for the latest updates you should see our blog posts on the subject.

Networks from text reuse data

We hope to convert some of our text reuse data into networks that illustrate how pieces of text are shared and disseminated. This is particularly challenging because of the size of our corpus and because the of the large amount of Hadith within our texts (which can be potentially shared across hundreds of texts). The passim algorithm outputs a data set called ‘Cluster data’, which documents how the milestones in our texts are networked together. For an explanation of how passim works and what milestones are, see here. Unfortunately, this data set is messy and difficult to read, largely because of isnads, which create non-meaningful clusters of shared names.

We have recently experimented with running passim without isnads (by excluding the automatically tagged isnads produced through this method), and we will explore how this changes the cluster data. We are also working on producing networks from the text reuse data using other methods. Watch this space!