Structure detection and segmentation of documents using 2D stochastic context-free grammars

Francisco Álvaro, Francisco Cruz, Joan Andreu Sánchez, Oriol Ramos Terrades, José Miguel Benedí

    Research output: Contribution to journalArticleResearchpeer-review

    4 Citations (Scopus)

    Abstract

    © 2014 Elsevier B.V. In this paper we define a bidimensional extension of stochastic context-free grammars for structure detection and segmentation of images of documents. Two sets of text classification features are used to perform an initial classification of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of historical marriage license books to validate this approach. We also tested several inference algorithms for probabilistic graphical models and the results showed that the proposed grammatical model outperformed the other methods. Furthermore, grammars also provide the document structure along with its segmentation.
    Original languageEnglish
    Pages (from-to)147-154
    JournalNeurocomputing
    Volume150
    Issue numberPart A
    DOIs
    Publication statusPublished - 20 Feb 2015

    Keywords

    • Document image analysis
    • Stochastic context-free grammars
    • Text classification features

    Fingerprint Dive into the research topics of 'Structure detection and segmentation of documents using 2D stochastic context-free grammars'. Together they form a unique fingerprint.

  • Cite this