Skip to main content Skip to search Skip to search results

Perseus Project

 Organization

Found in 106 Collections and/or Records:

A Document Recognition System for Early Modern Latin

 Digital Image
Call Number: tufts:PB.001.001.00021
Dates: 2006

A Document Recognition System for Early Modern Latin, 2006

 Item
Call Number: PB.001.001.00021
Scope and Contents: Large-scale digitization of manuscripts is facilitated by high-accuracy optical character recognition (OCR) engines. The focus of our work is on using these tools to digitize Latin texts. Many of the texts in the language, especially the early modern, make heavy use of special characters like ligatures and accented abbreviations. Current OCRs are inadequate for our purpose: their built-in training sets do not include all these special characters, and further, post-processing of OCR output is...
Dates: 2006

A New Generation of Textual Corpora: Mining Corpora from Very Large Collections

 Digital Image
Call Number: tufts:PB.001.001.00006
Dates: 2007

A New Generation of Textual Corpora: Mining Corpora from Very Large Collections, 2007

 Item
Call Number: PB.001.001.00006
Scope and Contents: While digital libraries based on page images and automatically generated text have made possible massive projects such as the Million Book Library, Open Content Alliance, Google, and others, humanists still depend upon textual corpora expensively produced with labor-intensive methods such as double-keyboarding and manual correction. This paper reports the results from an analysis of OCR-generated text for classical Greek source texts. Classicists have depended upon specialized manual...
Dates: 2007

An Ownership Model of Annotation: The Ancient Greek Dependency Treebank

 Digital Image
Call Number: tufts:PB.001.002.00008
Dates: 2009

An Ownership Model of Annotation: The Ancient Greek Dependency Treebank, 2009

 Item
Call Number: PB.001.002.00008
Scope and Contents: We describe here the first release of the Ancient Greek Dependency Treebank (AGDT), a 90,903-word syntactically annotated corpus of literary texts including the works of Hesiod, Homer and Aeschylus. hile the far larger works of Hesiod and Homer (142,705 words) have been annotated under a standard reebank production method of soliciting annotations from two independent reviewers and then econciling their differences, we also put forth with Aeschylus (48,198 words) a new model of treebank...
Dates: 2009

Analyzing Human Systems Across Time, Space, Language, and Culture

 Digital Image
Call Number: tufts:PB.001.003.00002
Dates: 2011

Analyzing Human Systems Across Time, Space, Language, and Culture, 2011

 Item
Call Number: PB.001.003.00002
Scope and Contents: Due to the rise of very large, heterogeneous collections, increasingly sophisticated multilingual services, and expanding high performance computing infrastructure, we are now in a position to begin studying 4000 years of linguistic data from around the world, tracing change within languages, the interaction of languages, the evolution and circulation of ideas, and the patterns of human society. Language has been an impenetrable barrier we can reach any point on the globe in a matter of hours...
Dates: 2011

Beyond Digital Incunabula: Modeling the Next Generation of Digital Libraries (preprint)

 Digital Image
Call Number: tufts:gcrane-2006.00002
Dates: 2006

Beyond Digital Incunabula: Modeling the Next Generation of Digital Libraries (preprint), 2006

 Item
Call Number: PB.001.001.00002
Scope and Contents: Abstract: This paper describes several incunabular assumptions that impose upon early digital libraries the limitations drawn from print, and argues for a design strategy aimed at providing customization and personalization services that go beyond the limiting models of print distribution, based on services and experiments developed for the Greco-Roman collections in the Perseus Digital Library. Three features fundamentally characterize a successful digital library design: finer granularity of...
Dates: 2006