TY - JOUR
T1 - On the influence of word representations for handwritten word spotting in historical documents
AU - LladÓs, Josep
AU - RusiÑol, Marçal
AU - FornÉs, Alicia
AU - FernÁndez, David
AU - Dutta, Anjan
PY - 2012/8/1
Y1 - 2012/8/1
N2 - Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. In this paper we evaluate the performance of different word descriptors to assess the advantages and disadvantages of statistical and structural models in a framework of query-by-example word spotting in historical documents. We compare four word representation models, namely sequence alignment using DTW as a baseline reference, a bag of visual words approach as statistical model, a pseudo-structural model based on a Loci features representation, and a structural approach where words are represented by graphs. The four approaches have been tested with two collections of historical data: the George Washington database and the marriage records from the Barcelona Cathedral. We experimentally demonstrate that statistical representations generally give a better performance, however it cannot be neglected that large descriptors are difficult to be implemented in a retrieval scenario where word spotting requires the indexation of data with million word images. © 2012 World Scientific Publishing Company.
AB - Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. In this paper we evaluate the performance of different word descriptors to assess the advantages and disadvantages of statistical and structural models in a framework of query-by-example word spotting in historical documents. We compare four word representation models, namely sequence alignment using DTW as a baseline reference, a bag of visual words approach as statistical model, a pseudo-structural model based on a Loci features representation, and a structural approach where words are represented by graphs. The four approaches have been tested with two collections of historical data: the George Washington database and the marriage records from the Barcelona Cathedral. We experimentally demonstrate that statistical representations generally give a better performance, however it cannot be neglected that large descriptors are difficult to be implemented in a retrieval scenario where word spotting requires the indexation of data with million word images. © 2012 World Scientific Publishing Company.
KW - feature representation
KW - Handwriting recognition
KW - historical documents
KW - shape descriptors
KW - word spotting
UR - https://www.scopus.com/pages/publications/84870486488
U2 - 10.1142/S0218001412630025
DO - 10.1142/S0218001412630025
M3 - Article
SN - 0218-0014
VL - 26
JO - International Journal of Pattern Recognition and Artificial Intelligence
JF - International Journal of Pattern Recognition and Artificial Intelligence
IS - 5
M1 - 1263002
ER -