Table detection in business document images by message passing networks

Pau Riba*, Lutz Goldmann, Oriol Ramos Terrades, Diede Rusticus, Alicia Fornés, Josep Lladós

*Corresponding author for this work

Research output: Contribution to journalArticleResearchpeer-review

13 Citations (Scopus)

Abstract

Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.

Original languageEnglish
Article number108641
JournalPattern Recognition
Volume127
DOIs
Publication statusPublished - 1 Jul 2022

Keywords

  • Anonymized document processing
  • Business document processing
  • Graph neural networks
  • Node and edge classification
  • Table detection

Fingerprint

Dive into the research topics of 'Table detection in business document images by message passing networks'. Together they form a unique fingerprint.

Cite this