Table detection in business document images by message passing networks

Pau Riba*, Lutz Goldmann, Oriol Ramos Terrades, Diede Rusticus, Alicia Fornés, Josep Lladós

*Autor correspondiente de este trabajo

Producción científica: Contribución a una revistaArtículoInvestigaciónrevisión exhaustiva

11 Citas (Scopus)

Resumen

Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.

Idioma originalInglés
Número de artículo108641
PublicaciónPattern Recognition
Volumen127
DOI
EstadoPublicada - 1 jul 2022

Huella

Profundice en los temas de investigación de 'Table detection in business document images by message passing networks'. En conjunto forman una huella única.

Citar esto