Table detection in business document images by message passing networks

Pau Riba*, Lutz Goldmann, Oriol Ramos Terrades, Diede Rusticus, Alicia Fornés, Josep Lladós

*Autor corresponent d’aquest treball

Producció científica: Contribució a revistaArticleRecercaAvaluat per experts

20 Cites (Scopus)

Resum

Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.

Idioma originalAnglès
Número d’article108641
RevistaPattern Recognition
Volum127
DOIs
Estat de la publicacióPublicada - 1 de jul. 2022

Fingerprint

Navegar pels temes de recerca de 'Table detection in business document images by message passing networks'. Junts formen un fingerprint únic.

Com citar-ho