On the influence of word representations for handwritten word spotting in historical documents

Josep LladÓs, Marçal RusiÑol, Alicia FornÉs, David FernÁndez, Anjan Dutta

Producció científica: Contribució a revistaArticleRecercaAvaluat per experts

49 Cites (Scopus)

Resum

Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. In this paper we evaluate the performance of different word descriptors to assess the advantages and disadvantages of statistical and structural models in a framework of query-by-example word spotting in historical documents. We compare four word representation models, namely sequence alignment using DTW as a baseline reference, a bag of visual words approach as statistical model, a pseudo-structural model based on a Loci features representation, and a structural approach where words are represented by graphs. The four approaches have been tested with two collections of historical data: the George Washington database and the marriage records from the Barcelona Cathedral. We experimentally demonstrate that statistical representations generally give a better performance, however it cannot be neglected that large descriptors are difficult to be implemented in a retrieval scenario where word spotting requires the indexation of data with million word images. © 2012 World Scientific Publishing Company.
Idioma originalAnglès
Número d’article1263002
RevistaInternational Journal of Pattern Recognition and Artificial Intelligence
Volum26
Número5
DOIs
Estat de la publicacióPublicada - 1 d’ag. 2012

Fingerprint

Navegar pels temes de recerca de 'On the influence of word representations for handwritten word spotting in historical documents'. Junts formen un fingerprint únic.

Com citar-ho