Efficient segmentation-free keyword spotting in historical document collections

Marçal Rusiñol, David Aldavert, Ricardo Toledo, Josep Lladós

Research output: Contribution to journalArticleResearchpeer-review

81 Citations (Scopus)

Abstract

© 2014 Elsevier Ltd. All rights reserved. In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches.
Original languageEnglish
Pages (from-to)545-555
JournalPattern Recognition
Volume48
Issue number2
DOIs
Publication statusPublished - 1 Jan 2015

Keywords

  • Dense SIFT features
  • Historical documents
  • Keyword spotting
  • Latent semantic analysis
  • Product quantization
  • Segmentation-free

Fingerprint Dive into the research topics of 'Efficient segmentation-free keyword spotting in historical document collections'. Together they form a unique fingerprint.

Cite this