New article compares Tesserae search performance with other methods! Read the full article here: https://arxiv.org/pdf/1905.02973.pdf
The detection of allusive text reuse is partic- ularly challenging due to the sparse evidence on which allusive references rely—commonly based on none or very few shared words. Ar- guably, lexical semantics can be resorted to since uncovering semantic relations between words has the potential to increase the support underlying the allusion and alleviate the lexi- cal sparsity. A further obstacle is the lack of evaluation benchmark corpora, largely due to the highly interpretative character of the anno- tation process. In the present paper, we aim to elucidate the feasibility of automated allusion detection. We approach the matter from an In- formation Retrieval perspective in which refer- encing texts act as queries and referenced texts as relevant documents to be retrieved, and esti- mate the difficulty of benchmark corpus com- pilation by a novel inter-annotator agreement study on query segmentation. Furthermore, we investigate to what extent the integration of lexical semantic information derived from dis- tributional models and ontologies can aid re- trieving cases of allusive reuse. The results show that (i) despite low agreement scores, using manual queries considerably improves retrieval performance with respect to a win- dowing approach, and that (ii) retrieval perfor- mance can be moderately boosted with distri- butional semantics.