Building a Dynamic Lexicon from a Digital Library, 2008
Scope and Contents
We describe here in detail our work toward creating a dynamic lexicon from the texts in a large digital library. By leveraging a small structured knowledge source (a 30,457 word treebank), we are able to extract selectional preferences for words from a 3.5 million word Latin corpus. This is promising news for low-resource languages and digital collections seeking to leverage a small human investment into much larger gain. The library architecture in which this work is developed allows us to query customized subcorpora to report on lexical usage by author, genre or era and allows us to continually update the lexicon as new texts are added to the collection.
Dates
- Creation: 2008
Creator
- Bamman, David (Person)
- Crane, Gregory (Person)
Access
Open for research.
Extent
From the Series: 56 Digital Object(s)
Language of Materials
English
Subject
- Perseus Project (Organization)
Repository Details
Part of the Tufts Archival Research Center Repository
35 Professors Row
Tisch Library Building
Tufts University
Medford Massachusetts 02155 United States
617-627-3737
archives@tufts.edu