Text line extraction in graphical documents using background and foreground information

Partha Pratim Roy, Umapada Pal, Josep Lladós

Research output: Contribution to journalArticleResearchpeer-review

22 Citations (Scopus)

Abstract

In graphical documents (e. g., maps, engineering drawings), artistic documents etc., the text lines are annotated in multiple orientations or curvilinear way to illustrate different locations or symbols. For the optical character recognition of such documents, individual text lines from the documents need to be extracted. In this paper, we propose a novel method to segment such text lines and the method is based on the foreground and background information of the text components. To effectively utilize the background information, a water reservoir concept is used here. In the proposed scheme, at first, individual components are detected and grouped into character clusters in a hierarchical way using size and positional information. Next, the clusters are extended in two extreme sides to determine potential candidate regions. Finally, with the help of these candidate regions, individual lines are extracted. The experimental results are presented on different datasets of graphical documents, camera-based warped documents, noisy images containing seals, etc. The results demonstrate that our approach is robust and invariant to size and orientation of the text lines present in the document. © 2011 Springer-Verlag.
Original languageEnglish
Pages (from-to)227-241
JournalInternational Journal on Document Analysis and Recognition
Volume15
Issue number3
DOIs
Publication statusPublished - 1 Sep 2012

Keywords

  • Artistic documents
  • Foreground-background information
  • Graphical document analysis
  • Multi-oriented text line segmentation

Fingerprint

Dive into the research topics of 'Text line extraction in graphical documents using background and foreground information'. Together they form a unique fingerprint.

Cite this