Tesserae Text Date Project

Posted on May 23, 2020 by Tessa Little

In the course of developing the newest version of Tesserae, the team members working on development are attempting to create a new search feature that would allow users to search texts based on their original date of publication. To that end, I have completed a new Tesserae Text Date project where I researched and assigned approximate and known dates for all of the texts in the Tesserae corpus. This work builds on a smaller project completed by Tesserae in the past, the Tesserae author + work data task for network analysis. This earlier project assigned dates to Latin poetry texts in the Tesserae corpus, using the Oxford Classical dictionary as a source for the dates. The current project discussed in this post expands significantly on this early work, with approximately 956 texts having been dated.

Assigning dates to ancient Greek and Roman texts is notoriously difficult for a variety of reasons; in many cases we cannot know when a certain author composed a certain text, and for that reason a range of dates is usually given. Other texts are easier to date, namely speeches where the dates are corroborated by epigraphic evidence or other records. For example, the public speeches of Demosthenes are generally assigned specific dates and in the case of Dinarchus’ speeches, they all appear to be from the same trial that has been assigned a firm date of 323 BC (Lycurgus, Dinarchus, Demades, Hyperides. Minor Attic Orators, Volume II: Lycurgus. Dinarchus. Demades. Hyperides. Translated by J. O. Burtt. Loeb Classical Library 395. Cambridge, MA: Harvard University Press, 1954.).

In an effort to provide a singular date, the prior Tesserae author + work data task for network analysis method was employed. This earlier method dated texts by selecting the latest date given. Provided a date was not attested for a text, the death date of the author was chosen. In addition to this method of date analysis, dates in the current project were dated based on individual text circumstances. For example, the Elegies of Propertius have been dated in the Loeb based on when individual books may have been published. For simplicity and because Tesserae lists and searches the Elegies undivided by books, I chose the publication date of the final book as the date for the work as a whole (Propertius. Elegies. Edited and translated by G. P. Goold. Loeb Classical Library 18. Cambridge, MA: Harvard University Press, 1990). Collections of Letters for various authors were given the date of the author’s death based on the practice in antiquity of posthumously collecting these documents and publishing them. The epistulae collections of Cicero serve as a model for this practice, because we know that Atticus preserved and collected his correspondence with Cicero, the collection published on his death. We also know that Cicero’s other letters were preserved and collected in a similar way. (Cicero. Letters to Atticus, Volume I. Edited and translated by D. R. Shackleton Bailey. Loeb Classical Library 7. Cambridge, MA: Harvard University Press, 1999).

Other examples of date selection based on individual text circumstances include Florus’ Epitome Bellorum Omnium Annorum, considered to have been composed in the second half of Hadrian’s reign, so the final year of that period was chosen. In the case of Pindar’s Odes, composed in praise of various individuals and events over many years, the date of his death (438 BCE) was selected (Pindar. Olympian Odes. Pythian Odes. Edited and translated by William H. Race. Loeb Classical Library 56. Cambridge, MA: Harvard University Press, 1997.)

The works of Plutarch are generally accepted to have been composed during his retirement and originally a range of dates was chosen. In the end, the date of 120 was chosen after he died and the works were in the most complete form (Plutarch. Lives, Volume I: Theseus and Romulus. Lycurgus and Numa. Solon and Publicola. Translated by Bernadotte Perrin. Loeb Classical Library 46. Cambridge, MA: Harvard University Press, 1914.). This practice was implemented with the works of Aristotle as well. What we have from Aristotle are collections, or compositions of his lecture notes and essays and these are not dated with any specificity so I have chosen the date of his death in accordance with letter and speech collection dating. This information along with his death date here: Aristotle. Metaphysics, Volume II: Books 10-14. Oeconomica. Magna Moralia. Translated by Hugh Tredennick, G. Cyril Armstrong. Loeb Classical Library 287. Cambridge, MA: Harvard University Press, 1935. In the case where a text is believed to have been composed during a defined period, i.e. Lactantius’ De Mortibus Persecutorum 313-316, the latest date was chosen since the work would have been completed in total in that year (http://www.earlychristianwritings.com/lactantius.html).

In some cases, texts can not be assigned exact dates, nor even a range of specific dates. This is noticeably the case with some late antique authors, early ecclesiastical figures and pseudo authors, but certain other ancient Greek and Roman texts present the same difficulty. In these cases the method employed was to assign them to the period in which they were active, often only a given century. For example, the work of Phlegon De Mirabilibus is assigned to the time of Hadrian with no specific dates, so the second century was chosen. The works of Aelian are difficult to date and all that we know of the author is that he was born in 170 CE, so the second century is the range given to his text (Aelian. On Animals, Volume I: Books 1-5. Translated by A. F. Scholfield. Loeb Classical Library 446. Cambridge, MA: Harvard University Press, 1958.). Texts by Pseudo authors are given dates based on the author they are associated with. In the case of the two Pseudo Cicero texts in the Tesserae corpus, they have been assigned single dates. The date was sourced from Cicero. Letters to Quintus and Brutus. Letter Fragments. Letter to Octavian. Invectives. Handbook of Electioneering. Edited and translated by D. R. Shackleton Bailey. Loeb Classical Library 462. Cambridge, MA: Harvard University Press, 2002., who states that the invective against Sallust attributed or associated with Cicero appears to have been composed in 54 BCE. Though a firm, singular date has not been assigned to the second Pseudo Cicero text, the date of the invective against Cicero has been assigned in this project.

Many of the remaining Pseudo texts in the Tesserae corpus have been assigned to particular centuries. For example, Pseudo Cyprian Ad Flavium Felicem de Resurrectione Mortuorum has been given a date of the 3rd century CE, the date that Cyprian was alive and active (http://opengreekandlatin.github.io/csel-dev/). The works of Hilary of Poitiers are given a specific date based on the death of the author (368 CE), but the Pseudo Hilary texts are dated to the century associated with Hilary himself (4th century CE). The works of Pseudo Tertullian texts have been dated in the same way, though some are given specific dates and in this case, the specific date is listed in the Tesserae Text Date spreadsheet (http://www.tertullian.org/chronology.htm). In the spreadsheet included in this post, under the source column, one can view the sources for the text dates and notes describing why a particular date was assigned.

The dates assigned to the corpus of Tesserae texts were sourced from a variety of places: a majority of the dates were sourced from the Loeb volumes of specific texts and authors, others were sourced from chronologies created by scholars. The Chronological Table of Augustine’s Work compiled by James J. O’Donnell was invaluable to assigning dates for Augustine’s works. Additionally, Peter Kirby’s website, Early Christian Writings website and bibliography, was an invaluable reference. In many cases multiple sources were reviewed for individual texts and authors, and that information has been included under the sources column on the date spreadsheets.

Attention has been paid as close as possible to ensure accuracy in dating, researching, and citing the source information in this spreadsheet, and in being as consistent as possible. However, as noted above, the process of dating was dependent on the information available for individual texts and authors, so the method of dating may vary slightly, but in general the process of assigning the latest possible date, as mentioned above, was employed. The information in the project spreadsheets may be updated as new information becomes available or the data needs to be corrected. Eventually, we would like to include the list of dates, authors, and texts on the newest version of the Tesserae site as a separate page so that users can view the material in a singular space. We are also in the process of adding this material to the current Tesserae website. The completed Tesserae Text Date spreadsheets referenced in this post include a spreadsheet of Tesserae Singular text dates and a spreadsheet of the Tesserae text date Ranges, this spreadsheet includes the original spectrum of dates for individual texts. Be sure to check out the spreadsheet and the new site when available!

How to Calculate the Relative Influence of an Author

Posted on August 3, 2017 by James Gawley

At the end of the first century, Quintilian asked “Is it not sufficient to model our every utterance on Cicero? For my own part, I should consider it sufficient, if I could always imitate him successfully. But what harm is there in occasionally borrowing the vigour of Caesar, the vehemence of Caelius, the precision of Pollio or the sound judgment of Calvus?”

As philologists of the 21st century, we might ask “How often did Roman authors actually borrow phrases from Caesar as opposed to Cicero?”

Caitlin Diddams and I recently published an article in Digital Scholarship in the Humanities which lays out the best practices for determining:

Which phrases shared between two authors did not come from a second possible source
How to measure the relative strength of an “intertextual signal”
How to compare the relative influence of multiple authors on a cross-section of literature

As a test-case, we compared the influence of Cicero and Caesar during the early imperial and late imperial periods.

The methodology we outline in this article can be used on any number of source and target authors, regardless of language. Our formula for calculating the strength of an intertextual signal can be used with any tool for detecting intertextuality (not just Tesserae).

To read the abstract and obtain the full article, visit the Oxford Journals website: https://academic.oup.com/dsh/article-abstract/doi/10.1093/llc/fqx038/4061474/Comparing-the-intertextuality-of-multiple-authors

Relative influence in our methodology is compared according to the ‘rate of intertextuality,’ which is a normalized representation of the number of results you get in a Tesserae search. Normalization is necessary because the length of a work influences the number of results obtained. Previous methods of normalization assumed that Tesserae’s scoring algorithm would perform consistently across various authors and genres of literature. We propose that best practice should avoid such assumptions wherever possible.

Our normalization method in brief (the following is excerpted from a pre-print copy of the article):

The number of results of two searches cannot be meaningfully compared until we consider how many results each search could have produced. The number of search results depends on two factors: the level of engagement between the authors and the length of the texts being compared. Longer texts create more sentence-by-sentence comparisons. There are more opportunities for unique intertexts to occur. The number which can be meaningfully compared is not the number of unique results of a Tesserae search, but the ratio of the results found to the results that could have been found. We normalize the number of results according to the following formula:

We define the rate of intertextuality as the number of connected phrases per pair of phrases considered. This is derived by dividing the absolute value of the set of results by the absolute value of the cross-product of the sets of sentences in source and target texts. This cross-multiplication is necessary because Tesserae compares every sentence in a source text to all of the sentences in a target text.6 Therefore the number of possible results in a comparison of any source and target is the product of the number of sentences in the source and the number of sentences in the target.

Late Antique Texts Available

Posted on April 3, 2017 by Caitlin Diddams

Allusion was a powerful tool in the hands of Late Antique authors. Tesserae now offers a wider selection of popular Late Antique texts. Authors such as Augustine, Ambrose, Cyprian, Tertullian, Orosius, Libanius, and many others are available for Latin/Greek or multi-text searches. For a full list of available texts, click here for Latin and here for Greek.

These new Late Antique texts are drawn from the Open Greek and Latin Project, DigilibLT, and the Latin Library. Our editorial decisions reflect those of the hosting site (see http://tesserae.caset.buffalo.edu/sources.php). Spurious works are listed as [Author] Pseudo. We are still in the process of incorporating more texts, but if you would like to advance the progress of a particular author or work, please email Caitlin Diddams (acstaab@buffalo.edu).

We hope you enjoy searching these texts!

Measuring the Distinctiveness of Phrases in Latin Epic

Posted on March 28, 2017 by Caitlin Diddams

Measuring the co-occurrence patterns of words with pointwise mutual information (PMI) can help identify bigram word-pairs that are unusually represented in the work of a given author. By comparing the PMI values of the Latin epic corpus to the PMI values of Vergil, for example, scholars can discover which word pairings are particularly Vergilian. Many of these Vergilian phrases will be obvious, such as pius Aeneas and puer Ascanius. Others, however, invite further investigation. Some word pairings are so unexpected that they may be sufficiently marked for quotation and imitation.

Tesserae is in the process of incorporating PMI data as an option for scoring search results. Tesserae scores currently rate rare words shared between two texts as more likely to constitute an allusion. This is problematic for capturing allusions from Vergil, who is known for combining common words in uncommon ways, in what ancient critics called a new form or affectation (cacozelia). The incorporation of comparative bigram frequencies can more accurately score allusions to Vergilian bigrams, which would otherwise be erroneously demoted. For example, if a search result is particularly indicative of the source author, but not of the corpus or target author, this might indicate that the target author is quoting a recognizable phrase. In this case, the Tesserae score should be increased. If the match is indicative of a target author’s shared language, but not of the corpus or the source author, it is less likely that the target author is trying to evoke the source author. In this case, the Tesserae score should be decreased.

Many studies from the 1990’s on have shown the efficacy of analyzing word co-occurrence patterns in English. In 2000, Rydberg-Cox adapted existing methods for ancient Greek as a basis for philological research. PMI values represent a ratio of “actual” versus “expected” frequency with which two words appear near each.

The actual frequency of a bigram is a measurement of the frequency with which a combination of words x and y occurs. The expected bigram frequency is a measurement of the frequency with which words x and y might have occurred as a bigram based on the frequencies of its constituent unigrams. This represents the bigram frequencies we would see if the distribution of each word were independent of the distribution of the other. In reality, contextual and syntactic relationships change the likelihood that word y will follow word x, and so the actual and expected frequency values diverge. Finally, because PMI overemphasizes low-frequency collocations, it is standard practice to cut off extremely rare words and to log and normalize the results.

Results from the Aeneid and the corpus are then normalized so that PMI values can be meaningfully compared. Normalization translates the scale of the PMI values from Vergil and from Latin epic authors to a range from -1 to 1. Positive PMI values indicate that once you read one word in Vergil, the uncertainty of the next words dramatically shrinks. Negative PMI values indicate that the presence of one word in Latin epic negligibly affects the possibility pool for the next word.

Consider the following example of a high PMI value from Vergil’s Aeneid 12.338: fumantis sudore quatit, miserabile caesis. Fumantis sudore describes horses frothing with sweat, and has a normalized PMI score of 0.684. Since PMI values for Vergil indicate that fumo usually occurs with incense, altars, food and homes, and sudor usually occurs with people, blood, and labor, fumantis sudore is “marked” or unusual phrase in Vergil. Since fumantis sudore is not a high ranking result in the PMI values of the Latin epic corpus, it is further likely to be an example of particularly “Vergilian” language.

The following graph shows the PMI values for Vergilian bigrams that also exist in the epic corpus. Most of the PMIs are positive, indicating strong associations between words. The data with the highest PMI values represents the strongest word associations in Vergil.

The next graph shows the PMI values for bigrams in the epic corpus that also exist in Vergil. Here, the PMIs are mostly negative. This indicates that in epic as a whole, word association is more flexible than in Vergil alone.

These graphs indicate that Vergil’s word associations as different from those in the epic corpus generally. For example, Vergil’s normalized PMI for aequore~toto is about 0.5, occurring 6 times. In the rest of Latin epic, aequore~toto appears 9 times and has a normalized PMI of 0.03. The difference is that in Vergil, aequore expects toto, whereas is in epic generally, aequore does not prime the reader to expect toto.

The data does not tell us which author differs from the corpus more dramatically – other normalization factors will have to be put into place before we can compare, for example, Vergil’s distance from the corpus to Lucan’s distance from the corpus. Beyond its applications for Tesserae, co-occurrence patterns can improve our understanding of what phrases are more striking or marked than others, and of what constitutes the recognizability of an ancient author’s hand.

Greek Multitext Searching Available

Posted on March 4, 2017 by Caitlin Diddams

You can now use the multitext tool on our entire selection of Greek texts!

http://tesserae.caset.buffalo.edu/greek-multi-text.php.

The multitext search cross-checks your results against all other texts in the corpus. This will allow you to see whether a particular parallel is unique to your two selected works, or whether there is a broader precedent for the repeated expression.

Expanding Tesserae into Late Antiquity

Posted on May 6, 2016 by Caitlin Diddams

We are working to expand the Tesserae’s Latin corpus into late antiquity. Currently, the major works of a set of more commonly read authors are available for searching on the site. See here for a list of authors or here for a list of works organized by author.

We are also adding this set of major authors in the near future.

Finally, we hope to add this set of even later, lesser known, or less often cited authors after some improvements of the user interface, which may include separating the numerous searchable texts by era.

If you would like to see authors added who are not on this list, or see a particular author prioritized, please email Caitlin Diddams at acstaab@buffalo.edu.

Lactantius Now Available

Posted on April 18, 2016 by Caitlin Diddams

All the major works of Lactantius, known as “the Christian Cicero,” are now available for Tesserae searches: Carmen de Passione Domini, De Ave Phoenice, De Ira Dei, De Mortibus Persecutorum, De Opificio Dei, Divinarum Institutionum, and the Epitome Divinarum Institutionum.

Echoes of Cicero

Posted on March 9, 2016 by Caitlin Diddams

When Augustine quotes portions of Paul’s epistles in De Doctrina Christiana 4, he records versions that are not attested by the Vulgate tradition or the Old Latin (Versio Antiqua) tradition. Since Augustine quotes closely from the Vulgate, (and sometimes from the Versio Antiqua), for Gospel and Old Testament passages in DDC 4, why not for Paul’s epistles? My data suggests that these Pauline passages, which appear as examples of style rather than content, are in fact rendered in a more Ciceronian style than alternative translations.

The Pauline passages are a mosaic of Vulgate and Versio Antiqua renderings, mixed with variations that have no authority in either manuscript tradition. Variants are sometimes semantic changes, and at other times simply change the rhythm of the prose – an element of style Augustine is very concerned with (DDC 4.41). The following are examples taken from the first extended Pauline quotation (2nd Corinthians 11:16-31) in DDC 4.12:

DDC: Toleratis enim si quis vos in servitutem redigit
Vul: Sustinetis enim si quis vos in servitutem redigit
VA: Suffertis enim si quis vos in servitutem redigit
DDC: Si gloriari oportet in iis quae infirmitatis meae sunt
Vul: Si gloriari oportet quae infirmitatis meae sunt
VA: Si gloriari oportet quae infirmitatis meae sunt

Below are links to visual comparisons powered by Juxta Commons of Paul’s language as it appears in De Doctrina Christiana, and Sabatier’s Vulgate and Versio Antiqua, and exemplary passages from Psalms and Matthew. There are four options to visualize the differences: a heat map with hyperlinked variants, a side-by-side comparison, a histogram, and a VM model where all three versions can be viewed side-by-side (click “new version” after clicking on the VM button).

2nd Corinthians 11:16-31 – Augustine’s first example of Paul’s eloquence

Galatians 3:15-22 – Augustine’s example of the subdued style

Romans 12:1, 6-16; 13:6-8, 12-14 – Augustine’s example of the moderate style

Galatians 4:10-20 – Augustine’s example of the grand style

Psalm 15:4

Matthew 10:19-20

TEI markup of all textual variants: 2nd Corinthians (first) Galatians (subdued) Romans (moderate) Galatians (grand) Psalm 15-4 Matthew 10-19-20

While it is immediately clear that the Pauline passages show more variations than non-Pauline passages, it is difficult to get a sense of the degree of difference because the passages differ so severely in length. Juxta Commons, however, provides an quantitative measurement of distance from a base text to other versions. The graph below shows Juxta’s measurement of distance between the above passages as they appear in DDC and in the Vulgate / Versio Antiqua. The baseline drawn at 0.05 is the distance Juxta measures between Sabatier’s Vulgate and the Vulgate available on Perseus. It provides a sense of what degree of difference we might expect between manuscripts.

This degree of difference demands explanation. The Pauline passages appear as block quotations, discouraging the interpretation that Augustine is weaving in his own language, as is often the case in his other works, in an extemporaneous style. These passages are not tied together thematically, but appear as distinct units with little or no connective language. It is also notable that Pauline diction appears in De Doctrina Christiana as models of style. Augustine argues at length that even though Paul was untaught in the classical rules of rhetoric, his eloquence displays the qualities of classical rhetoric such as climax, scala, and well-balance membra and caesa (DDC 4.11). He follows his quotation of many Pauline passages with extended colometric analyses, drawing on Cicero’s rhetorical metaphor of the body established in Orator, and continuing in DDC 4 to outline the officia oratoris and the genera dicendi according to Cicero’s model.

Given the extreme level engagement with Cicero in DDC 4, noted consistently in scholarship, might Augustine have chosen to deviate from the Vulgate and Versio Antiqua versions in order to present a more Ciceronian version of Paul in Latin? Whether Augustine quotes this passage from a manuscript no longer extant, or makes editorial choices of his own, Tesserae can lend an objective measurement to traditional stylistic analysis. The following Tesserae results are from my stylistic experiment to compare the three versions of 2nd Corinthians – Augustine’s closest rendition of Paul to the Vulgate / Versio Antiqua tradition – to the entire Ciceronian corpus. Notice that Augustine’s presentation of Pauline diction finds more instances of shared language that that of the Vulgate or Old Latin version. (For a full explanation of the search parameters that target style, rather than allusion, see below at “Explanation of Search Parameters.”)

Ciceronian corpus vs. 2nd Cor. in DDC

Ciceronian corpus vs. 2nd Cor. in Versio Antiqua

Ciceronian corpus vs. 2nd Cor. in Vulgate

*NB Tesserae expects to display results from one source text, but Cicero’s corpus includes many texts in one file. Tesserae only displays the location information for Cicero according to its initial processing. To find the location of a Ciceronian match, simply search for the exact string of characters in from the above results in cicero.corpus_1.tess cicero.corpus_2.tess. (The file is split into two parts so as not to exceed upload capacity; copy and paste cicero.corpus_2 into cicero.corpus_1 for the full .tess file.)

Examples of matches:

DDC: quoniam quidem multi gloriantur
Cicero Tusc. 3.66: quoniam quidem res in nostra potestate est
DDC: Iterum dico ne quis me existimet
Cicero Ver. 2.5.9: ne quis emeret nisi in demortui locum

These results are not particularly meaningful by themselves, but the composite of all the possible results like this is key component of style.

This graph displays the total number of times Tesserae found a set of match-words in Cicero’s corpus shared by each version of 2nd Corinthians. Thus the first column shows the total number of times Tesserae found one to four word chunks in the same position relative to each other in DDC and Cicero’s corpus. No similar pattern emerges from the same three passages tested against the corpora of Caesar, Tacitus, or even against another rhetorical work, Quintilian’s Institutio Oratoria. (Compare here: CaesC DDC CaesC VA CaesC Vul; Q DDC Q VA Q Vul; TC DDC TC VA TC Vul.) This suggests that Augustine’s translation does not intertext more frequently with classical works generally, and suggests that Augustine’s presentation of Paul’s language is indeed more “Ciceronian” than other Latin translations.

Support for this method of stylistic analysis:

For centuries, Lactantius has been called “the Christian Cicero.” Jerome ascribes to Lactantius “Tullian eloquence” (Ep. 58.10), developed by Pico della Mirandola as “Ciceronem sed Christianum.” Even in the present day authors argue that Lactantius deserves this title due to the form and elegance of his language. Tertullian and Justin, other early late antique prose authors, receive no such recognition – in fact, their styles are often disparaged even though their works were extremely popular. Tesserae is able to capture this difference in style. Lactantius’ De Mortibus Persecutorum intertexts with Cicero’s corpus at a much higher rate that similarly sized selections from Tertullian’s Apologeticum and Justin’s Epitome:

(This graph displays the rate of intertextuality rather than number of intertexts since these large selections must be normalized by number of phrases. Rate of intertextuality = {number of Tesserae results / (Number of source text phrases * Number of target text phrases)}

This stylistic measurement remains consistent even when very small chunks of texts are compared to Cicero’s corpus. Following are Tesserae comparisions of small selections chosen at random from Lactantius, Tertullian, and Justin, each about the same size as 2nd Corinthians 11:16-31.

CC Justin small CC Justin small2 CC Justin small3

CC Lact small CC Lactantius small2 CC Lactantius small3

CC Tertullian small CC Tertullian small2 CC Tertullian small3

Averaging these results, we find that Lactantian language even at a small scales intertexts with Cicero’s corpus much more than that of Tertullian or Justin:

This method supports the observations of ancients and scholars to this day who call the style of Lactantius Ciceronian. Such stylistic analysis can be used on other texts, like Augustine’s translations of Pauline diction, with greater confidence.

Future goals for this project include testing hundreds small sections of Lactantius, Tertullian, and Justin (and other authors) against Cicero’s corpus to get a more stable average. This will allow me to add meaningful error bars to the above graph and be more confident in the significance of an author matching Cicero “twice as much” as another.

Explanation of Search Parameters:

The search parameters used in these Tesserae comparisons are very different from the pre-set options geared towards finding possible instances of allusion. My searches use the following parameters:

–unit phrase This makes a Tesserae search divide units of speech by phrase, rather than by line. This option should be chosen for all prose text comparisons.

–dist 4 This constricts a Tesserae match to words that have two or fewer intervening words. The normal parameter is 10, allowing words to match across lines of poetry or across long clauses in prose. It’s important for a stylistic search to look only for words that appear very close to each other, rather than words that might constitute a element of intertextuality across a long distance. For comparisons with much smaller corpora, such as the texts of 2nd Corinthians compared to Caesar’s corpus, I widened the dist metric to 6. This includes a few more matches and provides a more stable measurement for otherwise very sparse results.

–stop 0 This allows Tesserae to search for matches that include every word in the corpus. Usually this parameter is set to 10, which excludes the top 10 most common words in search texts from results. This is helpful for eliminating results in an allusion search, where the user probably won’t want to examine results including qui, sum, et, in, etc. For a stylistic search, however, it’s extremely important to include all words, developing a composite of all the combinations of words that make up an author’s style rather than unusual or salient instances that might be allusions.

–feature word This instructs Tesserae to search for exact word matches rather than matches based on shared stems, or lemmata. Whereas an allusion might contain shared lemmata in different forms depending of local grammatical restraints, exact word matching allows Tesserae to capture elements of style like a tendency towards accusative + infinitive, a preference for a particular tense of a verb, etc. This kind of matching is much more precise for stylistic analysis.

A note on phrase-based searching:

While many other stylistic measurement exist, Tesserae is unique in measuring shared language in the context of an author’s phrase (delimited by periods and colons). Tesserae also looks not only at word frequency, but at the relative position of sets of words to each other in phrases. This helps Tesserae capture style in ways that corpus word frequency measurements do not.

I welcome comments and suggestions at acstaab@buffalo.edu.

Vulgate available on Tesserae

Posted on February 22, 2016 by Caitlin Diddams

Jerome’s Vulgate is now available for Tesserae searches, either as full text or by individual book.

Our text, taken from Perseus, lacked the punctuation Tesserae needs to determine phrases in prose. We have added semicolons at the end of each verse so that verse functions as the Vulgate’s primary phrase unit. When the Vulgate is compared to a work of poetry, Tesserae also reads the poetry by phrase (rather than by line). We welcome comments and suggestions, and anticipate intriguing results. Happy searching,

The Tesserae Team

Collected Benchmark Sets

Posted on October 27, 2015 by Caitlin Diddams

An updated collection of all our benchmark data (click to download):

Greek to Greek:

Apollonius’ Argonautica vs. Homer’s Iliad and Odyssey

Hunter – Apollonius:Richard Hunter’s commentary. Partially complete. Hand ranked.

Apollonius’ Argonautica 3 vs. Homer’s Iliad and Odyssey.

aprhodemily: Complete. Unranked.

Greek to Latin:

Vergils’ Aeneid vs. Homer’s Iliad

Knauer – Iliad: Knauer’s commentary. Complete. Hand ranked.

aeneid1-iliad_include_uni_blank Raw.

aeneid1-iliad Raw.

Vergil’s Aeneid vs. Homer’s Odyssey

Knauer – Odyssey: Knauer’s commentary. Complete. Unranked.

Vergil’s Aeneid vs. Apollonius’s Argonautica

Vergil-Apollonius_Rhodius: Raw. Unranked.

Vergil’s Georgics vs. Homer’s Iliad and Odyssey

Georgiques: Partially complete. Partially ranked.

Latin to Latin:

Lucan’s Bellum Civile 1 vs. Vergil’s Aeneid

aen_luc1_hand: Complete. Hand ranked (derived from ‘bench4’ below)

Lucan-Vergil: Complete. Hand ranked.

slj.txt Complete Tesserae results. Scored. Raw.

bench3 Complete Tesserae results. Scored.

bench4 Complete Tesserae results. Scored.

Tesserae-2010-Benchmark Complete Tesserae results. Scored. Formatted and organized with matchwords in red.

Tesserae-2012-Benchmark Complete Tesserae results. Scored. Includes statistical calculations.

all_lucan Lucan’s Bellum Civile 2-10 vs. Vergil’s Aeneid. Raw.

Statius’ Achilleid vs. Various Authors

Achilleid: includes Vergil’s Aeneid, Ovid’s Metamorphosis, Heoides, and Amores, and Statius’s Thebiad. Complete. Unranked.

Please feel welcome to contact us with comments or questions.