Tesserae Text Date Project

        In the course of developing the newest version of Tesserae, the team members working on development are attempting to create a new search feature that would allow users to search texts based on their original date of publication.  To that end, I have completed a new Tesserae Text Date project where I researched and assigned approximate and known dates for all of the texts in the Tesserae corpus.  This work builds on a smaller project completed by Tesserae in the past, the Tesserae author + work data task for network analysis.  This earlier project assigned dates to Latin poetry texts in the Tesserae corpus, using the Oxford Classical dictionary as a source for the dates.  The current project discussed in this post expands significantly on this early work, with approximately 956 texts having been dated.  

        Assigning dates to ancient Greek and Roman texts is notoriously difficult for a variety of reasons; in many cases we cannot know when a certain author composed a certain text, and for that reason a range of dates is usually given.  Other texts are easier to date, namely speeches where the dates are corroborated by epigraphic evidence or other records.  For example, the public speeches of Demosthenes are generally assigned specific dates and in the case of Dinarchus’ speeches, they all appear to be from the same trial that has been assigned a firm date of 323 BC (Lycurgus, Dinarchus, Demades, Hyperides. Minor Attic Orators, Volume II: Lycurgus. Dinarchus. Demades. Hyperides. Translated by J. O. Burtt. Loeb Classical Library 395. Cambridge, MA: Harvard University Press, 1954.).  

        In an effort to provide a singular date, the prior Tesserae author + work data task for network analysis method was employed.  This earlier method dated texts by selecting the latest date given.  Provided a date was not attested for a text, the death date of the author was chosen.  In addition to this method of date analysis, dates in the current project were dated based on individual text circumstances.  For example, the Elegies of Propertius have been dated in the Loeb based on when individual books may have been published.   For simplicity and because Tesserae lists and searches the Elegies undivided by books, I chose the publication date of the final book as the date for the work as a whole (Propertius. Elegies. Edited and translated by G. P. Goold. Loeb Classical Library 18. Cambridge, MA: Harvard University Press, 1990).  Collections of Letters for various authors were given the date of the author’s death based on the practice in antiquity of posthumously collecting these documents and publishing them.  The epistulae collections of Cicero serve as a model for this practice, because we know that Atticus preserved and collected his correspondence with Cicero, the collection published on his death.  We also know that Cicero’s other letters were preserved and collected in a similar way.  (Cicero. Letters to Atticus, Volume I. Edited and translated by D. R. Shackleton Bailey. Loeb Classical Library 7. Cambridge, MA: Harvard University Press, 1999).    

        Other examples of date selection based on individual text circumstances include Florus’ Epitome Bellorum Omnium Annorum, considered to have been composed in the second half of Hadrian’s reign, so the final year of that period was chosen.  In the case of Pindar’s Odes, composed in praise of various individuals and events over many years, the date of his death (438 BCE) was selected (Pindar. Olympian Odes. Pythian Odes. Edited and translated by William H. Race. Loeb Classical Library 56. Cambridge, MA: Harvard University Press, 1997.) 

        The works of Plutarch are generally accepted to have been composed during his retirement and originally a range of dates was chosen.  In the end, the date of 120 was chosen after he died and the works were in the most complete form (Plutarch. Lives, Volume I: Theseus and Romulus. Lycurgus and Numa. Solon and Publicola. Translated by Bernadotte Perrin. Loeb Classical Library 46. Cambridge, MA: Harvard University Press, 1914.).  This practice was implemented with the works of Aristotle as well.  What we have from Aristotle are collections, or compositions of his lecture notes and essays and these are not dated with any specificity so I have chosen the date of his death in accordance with letter and speech collection dating.  This information along with his death date here: Aristotle. Metaphysics, Volume II: Books 10-14. Oeconomica. Magna Moralia. Translated by Hugh Tredennick, G. Cyril Armstrong. Loeb Classical Library 287. Cambridge, MA: Harvard University Press, 1935.  In the case where a text is believed to have been composed during a defined period, i.e. Lactantius’ De Mortibus Persecutorum 313-316, the latest date was chosen since the work would have been completed in total in that year (http://www.earlychristianwritings.com/lactantius.html).  

        In some cases, texts can not be assigned exact dates, nor even a range of specific dates.  This is noticeably the case with some late antique authors, early ecclesiastical figures and pseudo authors, but certain other ancient Greek and Roman texts present the same difficulty.  In these cases the method employed was to assign them to the period in which they were active, often only a given century.  For example, the work of Phlegon De Mirabilibus is assigned to the time of Hadrian with no specific dates, so the second century was chosen.  The works of Aelian are difficult to date and all that we know of the author is that he was born in 170 CE, so the second century is the range given to his text (Aelian. On Animals, Volume I: Books 1-5. Translated by A. F. Scholfield. Loeb Classical Library 446. Cambridge, MA: Harvard University Press, 1958.).  Texts by Pseudo authors are given dates based on the author they are associated with.  In the case of the two Pseudo Cicero texts in the Tesserae corpus, they have been assigned single dates.  The date was sourced from Cicero. Letters to Quintus and Brutus. Letter Fragments. Letter to Octavian. Invectives. Handbook of Electioneering. Edited and translated by D. R. Shackleton Bailey. Loeb Classical Library 462. Cambridge, MA: Harvard University Press, 2002., who states that the invective against Sallust attributed or associated with Cicero appears to have been composed in 54 BCE.  Though a firm, singular date has not been assigned to the second Pseudo Cicero text, the date of the invective against Cicero has been assigned in this project. 

        Many of the remaining Pseudo texts in the Tesserae corpus have been assigned to particular centuries.  For example, Pseudo Cyprian Ad Flavium Felicem de Resurrectione Mortuorum has been given a date of the 3rd century CE, the date that Cyprian was alive and active (http://opengreekandlatin.github.io/csel-dev/).  The works of Hilary of Poitiers are given a specific date based on the death of the author (368 CE), but the Pseudo Hilary texts are dated to the century associated with Hilary himself (4th century CE).  The works of Pseudo Tertullian texts have been dated in the same way, though some are given specific dates and in this case, the specific date is listed in the Tesserae Text Date spreadsheet (http://www.tertullian.org/chronology.htm).  In the spreadsheet included in this post, under the source column, one can view the sources for the text dates and notes describing why a particular date was assigned.

        The dates assigned to the corpus of Tesserae texts were sourced from a variety of places: a majority of the dates were sourced from the Loeb volumes of specific texts and authors, others were sourced from chronologies created by scholars.  The Chronological Table of Augustine’s Work compiled by James J. O’Donnell was invaluable to assigning dates for Augustine’s works.  Additionally, Peter Kirby’s website, Early Christian Writings website and bibliography, was an invaluable reference.  In many cases multiple sources were reviewed for individual texts and authors, and that information has been included under the sources column on the date spreadsheets.  

        Attention has been paid as close as possible to ensure accuracy in dating, researching, and citing the source information in this spreadsheet, and in being as consistent as possible.  However, as noted above, the process of dating was dependent on the information available for individual texts and authors, so the method of dating may vary slightly, but in general the process of assigning the latest possible date, as mentioned above, was employed.  The information in the project spreadsheets may be updated as new information becomes available or the data needs to be corrected.  Eventually, we would like to include the list of dates, authors, and texts on the newest version of the Tesserae site as a separate page so that users can view the material in a singular space.  We are also in the process of adding this material to the current Tesserae website.   The completed Tesserae Text Date spreadsheets referenced in this post include a spreadsheet of Tesserae Singular text dates and a spreadsheet of the Tesserae text date Ranges, this spreadsheet includes the original spectrum of dates for individual texts. Be sure to check out the spreadsheet and the new site when available! 


Intertextuality in Flavian Epic Poetry Contemporary Approaches

New book on Intertextuality in Latin Literature: https://www.degruyter.com/view/product/503007

Summary and goals:

“This collection of essays reaffirms the central importance of adopting an intertextual approach to the study of Flavian epic poetry and shows, despite all that has been achieved, just how much still remains to be done on the topic. Most of the contributions are written by scholars who have already made major contributions to the field, and taken together they offer a set of state of the art contributions on individual topics, a general survey of trends in recent scholarship, and a vision of at least some of the paths work is likely to follow in the years ahead. In addition, there is a particular focus on recent developments in digital search techniques and the influence they are likely to have on all future work in the study of the fundamentally intertextual nature of Latin poetry and on the writing of literary history more generally.”

On the Feasibility of Automated Detection of Allusive Text Reuse

New article compares Tesserae search performance with other methods!  Read the full article here: https://arxiv.org/pdf/1905.02973.pdf 


The detection of allusive text reuse is partic- ularly challenging due to the sparse evidence on which allusive references rely—commonly based on none or very few shared words. Ar- guably, lexical semantics can be resorted to since uncovering semantic relations between words has the potential to increase the support underlying the allusion and alleviate the lexi- cal sparsity. A further obstacle is the lack of evaluation benchmark corpora, largely due to the highly interpretative character of the anno- tation process. In the present paper, we aim to elucidate the feasibility of automated allusion detection. We approach the matter from an In- formation Retrieval perspective in which refer- encing texts act as queries and referenced texts as relevant documents to be retrieved, and esti- mate the difficulty of benchmark corpus com- pilation by a novel inter-annotator agreement study on query segmentation. Furthermore, we investigate to what extent the integration of lexical semantic information derived from dis- tributional models and ontologies can aid re- trieving cases of allusive reuse. The results show that (i) despite low agreement scores, using manual queries considerably improves retrieval performance with respect to a win- dowing approach, and that (ii) retrieval perfor- mance can be moderately boosted with distri- butional semantics.

Measuring Literary Influence at Scale with Tesserae’s Multitext Capability

By James O. Gawley and A. Caitlin Diddams


This paper details an approach to quantifying literary influence based on Tesserae’s multitext capability. Tesserae is an open source, web-based tool originally designed to locate allusions in Latin epic poetry. It accomplishes this by identifying language shared between two texts and sorting these intertexts according to formal features which have been shown to identify allusions.  Its multitext capability was designed to help researchers track phrases beyond the first instance of reuse. We use the multitext tool to eliminate possible alternative sources for shared language. This allows us to isolate the unique connection between texts. We normalize the number of unique connections according to an original formula so that the quantity of shared language in multiple searches can be meaningfully compared. In this paper we illustrate our method with an investigation that uses this technique to quantify the influence of Julius Caesar and Marcus Tullius Cicero on various authors of the Roman empire. The results of this study are in line with the assertions of philologists on the literary influence of these figures, and support the efficacy of our approach as a means of comparing relative authorial influence.

For the full text and author information see the following link:


New Publication on Intertextuality

Walter Scheirer and Chris Forstall, Tesserae team members, have recently published a new text: Quantitative Intertextuality: Analyzing the Markers of Information Reuse.  The text covers a new method of studying intertextuality through the use of a diverse array of computational and quantitative tools.  For more information and to get the text follow this link:


How to Calculate the Relative Influence of an Author

At the end of the first century, Quintilian asked “Is it not sufficient to model our every utterance on Cicero? For my own part, I should consider it sufficient, if I could always imitate him successfully. But what harm is there in occasionally borrowing the vigour of Caesar, the vehemence of Caelius, the precision of Pollio or the sound judgment of Calvus?”

As philologists of the 21st century, we might ask “How often did Roman authors actually borrow phrases from Caesar as opposed to Cicero?”

Caitlin Diddams and I recently published an article in Digital Scholarship in the Humanities which lays out the best practices for determining:

  1. Which phrases shared between two authors did not come from a second possible source
  2. How to measure the relative strength of an “intertextual signal”
  3. How to compare the relative influence of multiple authors on a cross-section of literature

As a test-case, we compared the influence of Cicero and Caesar during the early imperial and late imperial periods.

The methodology we outline in this article can be used on any number of source and target authors, regardless of language. Our formula for calculating the strength of an intertextual signal can be used with any tool for detecting intertextuality (not just Tesserae).

To read the abstract and obtain the full article, visit the Oxford Journals website: https://academic.oup.com/dsh/article-abstract/doi/10.1093/llc/fqx038/4061474/Comparing-the-intertextuality-of-multiple-authors

Relative influence in our methodology is compared according to the ‘rate of intertextuality,’ which is a normalized representation of the number of results you get in a Tesserae search. Normalization is necessary because the length of a work influences the number of results obtained. Previous methods of normalization assumed that Tesserae’s scoring algorithm would perform consistently across various authors and genres of literature. We propose that best practice should avoid such assumptions wherever possible.

Our normalization method in brief (the following is excerpted from a pre-print copy of the article):

The number of results of two searches cannot be meaningfully compared until we consider how many results each search could have produced. The number of search results depends on two factors: the level of engagement between the authors and the length of the texts being compared. Longer texts create more sentence-by-sentence comparisons. There are more opportunities for unique intertexts to occur. The number which can be meaningfully compared is not the number of unique results of a Tesserae search, but the ratio of the results found to the results that could have been found. We normalize the number of results according to the following formula:

We define the rate of intertextuality as the number of connected phrases per pair of phrases considered. This is derived by dividing the absolute value of the set of results by the absolute value of the cross-product of the sets of sentences in source and target texts. This cross-multiplication is necessary because Tesserae compares every sentence in a source text to all of the sentences in a target text.6 Therefore the number of possible results in a comparison of any source and target is the product of the number of sentences in the source and the number of sentences in the target.

Measuring the Distinctiveness of Phrases in Latin Epic

Measuring the co-occurrence patterns of words with pointwise mutual information (PMI) can help identify bigram word-pairs that are unusually represented in the work of a given author. By comparing the PMI values of the Latin epic corpus to the PMI values of Vergil, for example, scholars can discover which word pairings are particularly Vergilian. Many of these Vergilian phrases will be obvious, such as pius Aeneas and puer Ascanius. Others, however, invite further investigation. Some word pairings are so unexpected that they may be sufficiently marked for quotation and imitation.

Tesserae is in the process of incorporating PMI data as an option for scoring search results. Tesserae scores currently rate rare words shared between two texts as more likely to constitute an allusion. This is problematic for capturing allusions from Vergil, who is known for combining common words in uncommon ways, in what ancient critics called a new form or affectation (cacozelia). The incorporation of comparative bigram frequencies can more accurately score allusions to Vergilian bigrams, which would otherwise be erroneously demoted. For example, if a search result is particularly indicative of the source author, but not of the corpus or target author, this might indicate that the target author is quoting a recognizable phrase. In this case, the Tesserae score should be increased. If the match is indicative of a target author’s shared language, but not of the corpus or the source author, it is less likely that the target author is trying to evoke the source author. In this case, the Tesserae score should be decreased.

Many studies from the 1990’s on have shown the efficacy of analyzing word co-occurrence patterns in English. In 2000, Rydberg-Cox adapted existing methods for ancient Greek as a basis for philological research. PMI values represent a ratio of “actual” versus “expected” frequency with which two words appear near each.

The actual frequency of a bigram is a measurement of the frequency with which a combination of words x and y occurs. The expected bigram frequency is a measurement of the frequency with which words x and y might have occurred as a bigram based on the frequencies of its constituent unigrams. This represents the bigram frequencies we would see if the distribution of each word were independent of the distribution of the other. In reality, contextual and syntactic relationships change the likelihood that word y will follow word x, and so the actual and expected frequency values diverge. Finally, because PMI overemphasizes low-frequency collocations, it is standard practice to cut off extremely rare words and to log and normalize the results.

Results from the Aeneid and the corpus are then normalized so that PMI values can be meaningfully compared. Normalization translates the scale of the PMI values from Vergil and from Latin epic authors to a range from -1 to 1. Positive PMI values indicate that once you read one word in Vergil, the uncertainty of the next words dramatically shrinks. Negative PMI values indicate that the presence of one word in Latin epic negligibly affects the possibility pool for the next word.

Consider the following example of a high PMI value from Vergil’s Aeneid 12.338: fumantis sudore quatit, miserabile caesis. Fumantis sudore describes horses frothing with sweat, and has a normalized PMI score of 0.684. Since PMI values for Vergil indicate that fumo usually occurs with incense, altars, food and homes, and sudor usually occurs with people, blood, and labor, fumantis sudore is “marked” or unusual phrase in Vergil. Since fumantis sudore is not a high ranking result in the PMI values of the Latin epic corpus, it is further likely to be an example of particularly “Vergilian” language.

The following graph shows the PMI values for Vergilian bigrams that also exist in the epic corpus. Most of the PMIs are positive, indicating strong associations between words. The data with the highest PMI values represents the strongest word associations in Vergil.

The next graph shows the PMI values for bigrams in the epic corpus that also exist in Vergil. Here, the PMIs are mostly negative. This indicates that in epic as a whole, word association is more flexible than in Vergil alone.

These graphs indicate that Vergil’s word associations as different from those in the epic corpus generally. For example, Vergil’s normalized PMI for aequore~toto is about 0.5, occurring 6 times. In the rest of Latin epic, aequore~toto appears 9 times and has a normalized PMI of 0.03. The difference is that in Vergil, aequore expects toto, whereas is in epic generally, aequore does not prime the reader to expect toto.

The data does not tell us which author differs from the corpus more dramatically – other normalization factors will have to be put into place before we can compare, for example, Vergil’s distance from the corpus to Lucan’s distance from the corpus. Beyond its applications for Tesserae, co-occurrence patterns can improve our understanding of what phrases are more striking or marked than others, and of what constitutes the recognizability of an ancient author’s hand.

Appendix to “Measuring the Presence of Roman Rhetoric: An Intertextual Analysis of Augustine’s De Doctrina Christiana IV”

This appendix contains the intertextual parallels that inform the paper “Measuring the Presence of Roman Rhetoric: An Intertextual Analysis of Augustine’s De Doctrina Christiana IV” published in Mouseion Vol. 14 No. 3, Open Digital Corpora of Greek and Latin. The search parameters for these comparisons are listed at the beginning of each file. Please direct any questions to Caitlin Diddams at acstaab@buffalo.edu or James Gawley at jamesgaw@buffalo.edu.

 Vita Washingtonii vs. DDC

Germania vs. DDC IV

Bello Gallico vs. DDC IV

Dialogus vs. DDC IV

Orator vs. DDC IV

Institutio Oratoria vs. DDC IV



This paper examines the intertextual relationship between Augustine’s De Doctrina Christiana IV and Cicero’s Orator. We use quantitative methods to compare Augustine’s level of engagement with Orator against his engagement with other handbooks of classical Latin rhetoric. Our results inform a close reading of the text as body metaphor in DDC 4.13. Augustine incorporates Ciceronian colometry into his presentation of the epistles to demonstrate Paul’s eloquence. We argue that Augustine’s comparatively heavy use of Cicero is an attempt to justify the use of rhetoric in Christian teaching while adapting that rhetoric to Christian purposes.

Expanding Tesserae into Late Antiquity

We are working to expand the Tesserae’s Latin corpus into late antiquity. Currently, the major works of a set of more commonly read authors are available for searching on the site. See here for a list of authors or here for a list of works organized by author.

We are also adding this set of major authors in the near future.

Finally, we hope to add this set of even later, lesser known, or less often cited authors after some improvements of the user interface, which may include separating the numerous searchable texts by era.

If you would like to see authors added who are not on this list, or see a particular author prioritized, please email Caitlin Diddams at acstaab@buffalo.edu.