Collected Benchmark Sets (Updated)

An updated collection of all our benchmark data (click to download):

Latin to Latin:

Lucan’s Bellum civile book I vs. Vergil’s Aeneid

Lucan.BC1-Verg.Aeneid.benchmark1 Complete. Hand ranked (derived from ‘Lucan.BC1-Verg.Aeneid.tess.results3’ below)

Lucan.BC1-Verg.Aeneid.benchmark2 Complete. Hand ranked.

Lucan.BC1-Verg.Aeneid.tess.results1 Complete Tesserae results. Scored. Raw.

Lucan.BC1-Verg.Aeneid.tess.results2 Complete Tesserae results. Scored.

Lucan.BC1-Verg.Aeneid.tess.results3 Complete Tesserae results. Scored.

Lucan.BC1-Verg.Aeneid.benchmark.2010 Complete Tesserae results. Scored. Formatted and organized with match-words in red.

Lucan.BC1-Verg.Aeneid.benchmark.2012 Complete Tesserae results. Scored. Includes statistical calculations. Lucan’s Bellum civile II-IX vs. Vergil’s Aeneid. Raw.

Statius’ Achilleid vs. various (Latin)

Stat.Achilleid1.benchmark Complete. Unranked. Compiled during the Geneva Seminar.

Greek to Greek:

Apollonius’ Argonautica vs. Homer’s Iliad and Odyssey

Ap.Argonautica-Homer.benchmark Richard Hunter’s commentary. Partially complete. Hand ranked.

Apollonius’ Argonautica book III vs. Homer’s Iliad and Odyssey.

Ap.Argonautica3-Homer.benchmark  Complete. Unranked.

Greek to Latin:

Vergils’ Aeneid vs. Homer’s Iliad

Verg.Aeneid1-Iliad.benchmark Complete. Hand ranked. Based on Knauer (1964). 

Verg.Aeneid1-Iliad.benchmark.raw.1 Raw.

Verg.Aeneid1-Iliad.benchmark.raw.2 Raw.

Vergil’s Aeneid vs. Homer’s Odyssey

Verg.Aeneid1-Odyssey.benchmark Complete. Unranked. Based on Knauer (1964).

Vergil’s Aeneid vs. Apollonius’s Argonautica

Verg.Aeneid-Ap.Argonautica.benchmark.Neils2001 Raw. Unranked.

Vergil’s Georgics vs. various (Greek and Latin)

Verg.Georgics4.benchmark Partially complete. Partially ranked.


Estimating the Size of the Corpus

We recently had the opportunity to assess where our corpus stands and thought it could be useful for users to know its aggregate numbers. A convenient point of comparison is the largest publication environment for open source texts in Greek and Latin: the Scaife Viewer, which includes Open Greek and Latin texts and all CTS-compliant texts from the Perseus Digital Library. 

The following is an estimated word count for the Tesserae corpus, broken down into a number of steps to make it clear how the calculation was made. The result is a potentially interesting overview of the corpus. 

1.) Total corpus word-count for Version 5, Greek and Latin: 19,700,723 words

2.) Total word-count for Tesserae texts not included in the Scaife Viewer (“Tesserae-only texts”): 469,270 words

  • Incidentally the “Tesserae-only texts” are all Latin texts
  • The number here is relatively small (less than 3% of the corpus as a whole); this is because the overwhelming majority of texts in the Tesserae corpus draw from the same repositories as Scaife (OGL, Perseus DL, CSEL, First 1K Greek, etc.)

3.) Total currently available in the Scaife Viewer: 67,900,000 words (30,300,000 Greek, 16,500,000 Latin)

4.) Difference between the Tesserae corpus and what’s in Scaife (discounting the extra materials in Tesserae): 48,668,547 words

In order to search the entire body of texts available in the Scaife viewer Tesserae would need to add roughly 50,000,000 (48,668,547) words of Greek and Latin from the Open Greek and Latin corpus (with its associated repositories). For Tesserae, there is plenty of room for growth in this new and evolving environment of open source Greek and Latin texts.