TY - JOUR
T1 - Table detection in business document images by message passing networks
AU - Riba, Pau
AU - Goldmann, Lutz
AU - Terrades, Oriol Ramos
AU - Rusticus, Diede
AU - Fornés, Alicia
AU - Lladós, Josep
N1 - Publisher Copyright:
© 2022
PY - 2022/7/1
Y1 - 2022/7/1
N2 - Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.
AB - Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.
KW - Anonymized document processing
KW - Business document processing
KW - Graph neural networks
KW - Node and edge classification
KW - Table detection
UR - http://www.scopus.com/inward/record.url?scp=85126385555&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2022.108641
DO - 10.1016/j.patcog.2022.108641
M3 - Article
AN - SCOPUS:85126385555
SN - 0031-3203
VL - 127
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 108641
ER -