Word searching in unconstrained layout using character pair coding

Partha Pratim Roy, Umapada Pal, Josep Lladós

Research output: Contribution to journalArticleResearchpeer-review

Abstract

© 2014, Springer-Verlag Berlin Heidelberg. Word searching in non-structural layout such as graphical documents is a difficult task due to arbitrary orientations of text words and the presence of graphical symbols. This paper presents an efficient approach for word searching in documents of non-structural layout using an efficient indexing and retrieval approach. The proposed indexing scheme stores spatial information of text characters of a document using a character spatial feature table (CSFT). The spatial feature of text component is derived from the neighbor component information. The character labeling of a multi-scaled and multi-oriented component is performed using support vector machines. For searching purpose, the positional information of characters is obtained from the query string by splitting it into possible combinations of character pairs. Each of these character pairs searches the position of corresponding text in document with the help of CSFT. Next, the searched text components are joined and formed into sequence by spatial information matching. String matching algorithm is performed to match the query word with the character pair sequence in documents. The experimental results are presented on two different datasets of graphical documents: maps dataset and seal/logo image dataset. The results show that the method is efficient to search query word from unconstrained document layouts of arbitrary orientation.
Original languageEnglish
Pages (from-to)343-358
JournalInternational Journal on Document Analysis and Recognition
Volume17
Issue number4
DOIs
Publication statusPublished - 1 Jan 2014

Keywords

  • Graphical document analysis
  • Graphics recognition
  • Information retrieval
  • Multi-Oriented text recognition
  • Word spotting

Fingerprint Dive into the research topics of 'Word searching in unconstrained layout using character pair coding'. Together they form a unique fingerprint.

Cite this