Large-scale document image retrieval and classification with runlength histograms and binary embeddings

Albert Gordo, Florent Perronnin, Ernest Valveny

Research output: Contribution to journalArticleResearchpeer-review

30 Citations (Scopus)

Abstract

We present a new document image descriptor based on multi-scale runlength histograms. This descriptor does not rely on layout analysis and can be computed efficiently. We show how this descriptor can achieve state-of-the-art results on two very different public datasets in classification and retrieval tasks. Moreover, we show how we can compress and binarize these descriptors to make them suitable for large-scale applications. We can achieve state-of-the-art results in classification using binary descriptors of as few as 16-64 bits. © 2012 Elsevier Ltd. All rights reserved.
Original languageEnglish
Pages (from-to)1898-1905
JournalPattern Recognition
Volume46
DOIs
Publication statusPublished - 1 Jul 2013

Keywords

  • Classification
  • Compression
  • Large-scale
  • Retrieval
  • Visual document descriptor

Fingerprint Dive into the research topics of 'Large-scale document image retrieval and classification with runlength histograms and binary embeddings'. Together they form a unique fingerprint.

Cite this