On the influence of word representations for handwritten word spotting in historical documents

Josep LladÓs, Marçal RusiÑol, Alicia FornÉs, David FernÁndez, Anjan Dutta

Research output: Contribution to journalArticleResearchpeer-review

40 Citations (Scopus)

Abstract

Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. In this paper we evaluate the performance of different word descriptors to assess the advantages and disadvantages of statistical and structural models in a framework of query-by-example word spotting in historical documents. We compare four word representation models, namely sequence alignment using DTW as a baseline reference, a bag of visual words approach as statistical model, a pseudo-structural model based on a Loci features representation, and a structural approach where words are represented by graphs. The four approaches have been tested with two collections of historical data: the George Washington database and the marriage records from the Barcelona Cathedral. We experimentally demonstrate that statistical representations generally give a better performance, however it cannot be neglected that large descriptors are difficult to be implemented in a retrieval scenario where word spotting requires the indexation of data with million word images. © 2012 World Scientific Publishing Company.
Original languageEnglish
Article number1263002
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Volume26
Issue number5
DOIs
Publication statusPublished - 1 Aug 2012

Keywords

  • feature representation
  • Handwriting recognition
  • historical documents
  • shape descriptors
  • word spotting

Fingerprint

Dive into the research topics of 'On the influence of word representations for handwritten word spotting in historical documents'. Together they form a unique fingerprint.

Cite this