Computer Science and Engineering

Christian R. Shelton, Professor

Annotating Historical Archives of Images (2008)

by Xiaoyue Wang, Lexiang Ye, Eamonn Keogh, and Christian Shelton

Abstract: Recent initiatives like the Million Book Project and Google Print Library Project have already archived several million books in digital format, and within a few years a significant fraction of world’s books will be online. While the majority of the data will naturally be text, there will also be tens of millions of pages of images. Many of these images will defy automation annotation for the foreseeable future, but a considerable fraction of the images may be amiable to automatic annotation by algorithms that can link the historical image with a modern contemporary, with its attendant metatags. In order to perform this linking we must have a suitable distance measure which appropriately combines the relevant features of shape, color, texture and text. However the best combination of these features will vary from application to application and even from one manuscript to another. In this work we propose a simple technique to learn the distance measure by perturbing the training set in a principled way. We show the utility of our ideas on archives of manuscripts containing images from natural history and cultural artifacts.

Download Information

Xiaoyue Wang, Lexiang Ye, Eamonn Keogh, and Christian Shelton (2008). "Annotating Historical Archives of Images." Joint Conference on Digital Libraries (pp. 341-350). pdf          

Bibtex citation

   author = "Xiaoyue Wang and Lexiang Ye and Eamonn Keogh and Christian Shelton",
   title = "Annotating Historical Archives of Images",
   booktitle = "Joint Conference on Digital Libraries",
   booktitleaddr = "{JCDL}",
   year = 2008,
   pages = "341--350",

