Reading Thebaid 2

Posted on November 5, 2012 by ncoffee

Kyle Gervais of the University of Otago is working on a commentary on Statius Thebaid book 2, and emailed comments on his use of Tesserae. It’s encouraging to see scholars putting the system to use in this way, and to get some perceptive feedback.

I’ve been using Tesserae in writing my commentary on Statius, Thebaid 2. Of course, it’s not my primary tool for tracking down intertexts, since it doesn’t understand context and doesn’t do synonyms or sound-alike words very well (although I understand that these are areas under development). I typically use it after I’ve written notes on a hundred lines or so, to help me catch any intertexts I’ve missed through traditional methods. I work at a slow pace (no more than two lines of poetry per day) and am very thorough in searching for intertexts (constant searches of the PHI database, consulting half a dozen ancient and modern commentaries and editions, trolling through papers on Statius and commentaries on other authors, and of course my own knowledge of the ancient sources)–so it’s impressive how many new intertexts Tesserae picks up. An example:

After finishing Theb. 2.1-101, I ran the lines against the Aeneid on Tesserae (using the basic search mode). I got 740 hits, and within 30-45 min. skimmed through to find 10 promising hits that I hadn’t found in the traditional ways (I’m sure I could have cut out a lot of the poor quality hits by manipulating the search settings, but I worry about missing things, and find it just as easy to skim). Of the ten, four led nowhere. Of the remaining six:

One reinforced an intertextual frame I already recognized (Hector’s epiphany in Aen. 2 as a frame for Laius’ epiphany): Theb. 2.101 pectora et has uisus fatorum expromere uoces, Aen. 2.280 compellare virum et maestas expromere voces. Obviously no one (including me) had thought to search for expromere uoces.

One helped to flesh out Laius’ role as an agent of discord: Theb. 2.99 infula per crines, glaucaeque innexus oliuae [/ uittarum prouenit honos], Aen. 6.281 ‘[Discordia] vipereum crinem vittis innexa cruentis‘. On a slow day, I might have searched the PHI for innex-, but on most days it would have seemed like a waste of time. Even if I had, I might have skimmed by Aen. 6.281 (since crinem wouldn’t have been highlighted).

Two revealed a subtle link between the underworld at Theb. 2.48ff. and Priam’s palace at Aen. 2.486ff.: 2.49 uacua atria ditat, 2.528 uacua atria lustrat; 2.51 stridor ibi et gemitus poenarum, atroque tumultu…, 2.486 at domus interior gemitu miseroque tumultu…. Never thought to search for uacua atria; never would have searched for gemit– + tumult-.

Two were really exciting:

Baccho + matres pointed to: Theb. 2.79f. ipse etiam gaudens nemorosa per auia sanas / impulerat matres Baccho meliore Cithaeron and Aen. 7.580ff. tum quorum attonitae Baccho nemora avia matres / insultant thiasis (neque enim leue nomen Amatae) / undique collecti coeunt Martemque fatigant. A clear intertext, and more importantly, a good (very modern and very much in Statius’ style) explanation for Baccho meliore, which has been a crux: Bacchus is ‘better’ than he was in the Aeneid.

Finally, Theb. 2.42 (a mountain’s shadow on the water) exigit atque ingens medio natat umbra profundo and Aen. 5.422f. (Entellus) magna ossa lacertosque / exuit atque ingens media consistit harena (note the added correspondence between exigit and exuit, which Tesserae can’t [yet?] pick up). It’s a genuine and interesting intertext, I think, but I never would have found it myself: the contexts aren’t obviously similar, I wouldn’t have had time to search the PHI for atque, ingens, or medius (too many hits), and it wouldn’t have occurred to me to search for combinations of any of those three words. It’s most exciting to me because it’s the kind of intertext that always gets missed since we’re not very good at thinking in the proper way (my comment on the link: ‘An intertext perhaps best *read in reverse*, as an augmentation of Virgil: thanks to Statius the mighty Entellus casts a shadow big as a mountain’).

On the shores of Lake Geneva

Posted on October 19, 2012 by ncoffee

In early November, Chris and I will travel to the Fondation Hardt in Geneva for a conference entitled “Lucain et Claudien, face à face: Une poésie politique entre épopée, histoire et panégyrique.”

Chris will hold a workshop explaining how the online tools work and how to use them. I’ll be giving a presentation on Claudian, the late-4th century CE court poet, and how his epic makes use of the Civil War epic of his predecessor Lucan. My goal is to take advantage of Tesserae tools to offer a broader view of this interaction than has been available so far, and in the process to expand traditional conceptions of intertextuality a little.

The program is full of interesting topics to be addressed by distinguished scholars. We’re excited to be able to exchange views with this group. In particular, we’ll have a chance to speak face à face to Damien Nelis of Geneva and his colleagues and continue the discussion of our research partnership.

Visualizing Sound Patterns in Homer

Posted on October 16, 2012 by Chris Forstall

In his 1974 article “Sound-Patterns in Homer,” David W. Packard compared a wide range of critical opinions about the artistic use of sound in the poetics of the Iliad and Odyssey with a statistical analysis of letter frequencies. This is a seminal paper in digital humanities not only because Packard was a pioneer in designing the hardware and software necessary to digitize ancient Greek texts, but also because it addresses the interface between empirical data and critical interpretation, a problem that persists forty years on, despite huge advances in many areas of the field.

In the DHIB Textual Analysis Working Group, projects such as Tesserae attempt to adapt for the humanistic goals of literary criticism methods designed for such cold-blooded forensic purposes as authorship attribution and plagiarism detection. This means not only digitizing and analyzing, but also being able to return from statistics and data to subjective appreciation, and creating new value for readers. Here I want to show some preliminary results from my dissertation research, which benefits greatly from the intellectual cross-fertilization among the various efforts of Text Analysis. I’ll draw some parallels to Packard’s work, trying to emphasize methods that I hope show the potential for digital interpretation as well as digital analysis of literary works.

The Iliad and Odyssey are, in one way or another, the products of a long oral tradition. Despite the uncertainty that intervening changes in both pronunciation and spelling impose on any understanding we can have of these poems’ first-millennium realization, it’s clear that sound was a vital component of their composition and appreciation. Packard was primarily investigating the question of whether sound patterns were the result of deliberate poetic artistry, but others have argued that they may have served an unconscious mnemonic role, allowing illiterate singers to store vast texts in memory using a sort of data compression.

In either case, digital analysis can aid us by providing the statistics to test theories about what sort of patterns exist. But can it also help us “read” the sounds of the poem in new ways, perhaps pointing us to new hypotheses we wouldn’t otherwise have formed?

Digital Analysis

Following Packard, I begin by breaking the poems down into an alphabet of sounds, most of which have one-to-one correspondence with orthographic characters. From these atoms we can work up hierarchically to lines, either via words and n-grams, or via syllables and feet. But for now, let’s just consider the sounds themselves. The question I want to examine is, do some sounds show an interesting distribution in the poems, and, if so, what does that look like?

I downloaded the texts of the Iliad and the Odyssey from the Perseus Digital Library, concatenated them, then split them into 20-line samples. In order to get a feel for what kind of variation you might expect to see by chance alone, I created a control set where the lines of the two poems were randomly shuffled before splitting into 20-line samples. In fact, I did that ten different times. These ten control sets, then, represent a sort of background noise against which any pattern must clearly distinguish itself.

The graph below looks at the distribution of every unique pair of adjacent sounds that occurs in the two poems. The y-axis shows the portion of all samples in which a pair is found. Sound-pairs are ranged along the x-axis from most common (on average across the ten control sets) at the left, to least common at the right. The most common sound pairs occur in all samples, the least common in only one or two.

There are ten superimposed red curves, one for each of the control sets. The black curve represents the poem in its proper order. You can see that the black falls away from the red in places. Here, a sound-pair is found in rather fewer samples than you’d expect by chance alone. This means that in the original version it’s clumping up in some samples, leaving others bare.

Here’s a close-up showing two prime candidates for interesting behavior, hι and δυ. (I transliterated initial /h/ with a Latin “h” because it has no Greek letter.)

While this chart gives us a clue about which sounds might be interesting, it is a far cry from “interpretable” in a literary sense. Packard’s approach is similar. He begins with a chart showing, for each sound, the number of lines in which it does not occur at all, the number of lines in which it occurs once, twice, and so on (e.g. his Table 1). In another giant table, he lists all the lines in which a given sound occurs unusually frequently (e.g. his Table 3).

These tables serve two functions for Packard. First, where a critic has claimed that a particular line is notable for the density of some sound or other, Packard can tell at a glance how many and which other lines share the same characteristic. Second, he can survey the most “interesting” single lines to see whether they tend to be particularly charged with literary significance. But can these data be reintegrated into a new reading? Can computational techniques be turned from analysis to interpretation?

Digital Interpretation

Packard makes an exciting attempt in this direction, although he cautions that as it stands it is overly simplistic, undertaken “purely as an experiment.” He turns to the work of Dionysius of Halicarnassus, a scholar of the first century BCE who assessed the relative “harshness” of every letter of the Greek alphabet and used this as the basis for poetic criticism. Assigning to every sound a numerical value based on Dionysius’ rankings, Packard calculates for every line in the Iliad and Odyssey a “Dionysian” harshness metric.

My approach to reintegrating sound frequencies into a subjective appreciation of the larger poem draws on techniques I used when I studied satellite image processing in the Earth and Environmental Science department at Lehigh University. There we would visualize three variables from a larger set simultaneously by assigning them to red, green, and blue intensities respectively. In the following figures, each square represents twenty lines of text. The texts proceed from left to right, top to bottom, beginning with the first line of the Iliad.

In this first image, the red value represents density of the sound-pair hι, green represents ιπ, and blue represents ππ. These sounds are all components of the word ἵππος, “horse,” and the biggest bright stripe (a little more than halfway down, on the left) represents the chariot race in Iliad Book 23. Compare the picture above with the one below, made in the same way but using the first control set.

The control set shows the same variability among samples, but no large-scale patterns like the bright stripe in the first picture.

In my first experiment, the three variables used to create the colors tended to co-vary, being part of the same relatively common word. In the next example, they show more independence. Here I used sound triplets: red shows the density of the string δυσ, green represents χιλ, and blue represents τυδ. The frequency of these strings are dominated by the presence of three main characters, Odysseus, Achilles, and Diomedes (“son of Tydeus”).

The huge red region at the bottom is books 5-24 of the Odyssey. The green region in the middle is where Achilles returns to the fighting in the later part of the Iliad. Near the beginning is a blue section corresponding to the Aristeia of Diomedes.

For now, this analysis remains relatively crude, and limited to showing content-driven patterns in sound, rather than purely stylistic ones. My original aim was to perform principal components analysis on all the sound frequencies together, then assign the three color intensities to the first three principal components. So far, though, it’s turned up nothing appreciably different from what you see in the control sets.

Instead, let me close with a tribute to Packard’s approach. Here I’ve calculated his “Dionysian” score for each of my samples and assigned it to a grey scale value. Brighter samples are harsher sounding, to Dionysius of Halicarnassus’ ear, at any rate, while the black squares represent the most mellifluous passages.

But Packard’s metric was designed to examine the sound of individual lines. Perhaps it would be better read in this way:

The graphs above were made using R, the other pictures, using Processing. I used Perl for everything in between. I’d appreciate advice/comments on any aspect of this from one and all…

Originally posted to the DHIB blog.

Unde Quoque

Posted on October 13, 2012 by ncoffee

It may seem odd for a digital humanities project begun in 2008 to get around to starting a blog four years later. Our only excuse is that we were concentrating on developing our intertextual study tools first. But the time for better communication is long overdue, so here we are!

It seems appropriate in our first post to say something about the origins and goals of Tesserae. The project started from a simple idea. In 2008, Amazon had a feature that showed users phrases that were particular, if not unique, to a given book. If it could tell what was rare in one book, surely that meant it was determining what phrases were common in multiple books. Amazon didn’t seem interested in pursuing this (the original feature was discontinued, I believe). But the Amazon feature, along with the emergence of plagiarism-detection software, prompted the question: Why not create a free website to automatically discover and analyze allusions that could serve as a resource for researchers, teachers, students, and the curious?

I took this idea to J.-P. Koenig of UB Linguistics, and in a rare absent-minded moment, he decided to humor me. We then started work on the project, joined by a talented Linguistics Ph.D. candidate, Shakthi Poornima. Progressive stages in the project’s development followed, traceable through the Older Versions link on the site’s main page.

The original idea eventually developed into the three main goals of Tesserae:

reveal unknown instances of intertextuality,
analyze intertextuality at various scales, from large to small, and
use comprehensive surveys and precise criteria to better define the phenomenon of intertextuality.

Some part of this work is susceptible to rather finite measurements of progress, and at this point we can claim with some justification that we’ve taken big steps forward toward all three goals. A fuller declaration of victory might come if we’re able to replicate the results of traditional scholarship convincingly. But even then much would remain to be done: conducting intertextual readings, exploring theoretical ramifications, experimenting with intertextual analysis via a variety of language features, and repurposing the detection of various language features for other kinds of study.

For the foreseeable future, then, these goals represent crisscrossing paths on a research journey. We can plan to travel along enjoyably, even if we’re not sure where the end lies, or what we’ll find when we get there.