ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

Dataset

Description

Structured text extraction is one of the most valuable and challenging application direction in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we set up two tracks for the Structured text extraction from Visually-Rich Document images (SVRD) competition:

Track 1: HUST-CELL aims to evaluate the end-to-end performance of Complex Entity Linking and Labeling.
Track 2: Baidu-FEST focuses on evaluating the end-to-end performance and generalization of Few-shot Structured Text extraction.
Compared to the current document benchmarks, our two tracks of competition benchmark enriches the scenarios greatly and contains more than 50 types of visually-rich document images (mainly from the actual enterprise applications). In addition, our task settings not only include complex end-to-end entity linking and labeling, based on track 1, but also provide the zero-shot and few-shot tracks to objectively evaluate the performance and generalization of the competition schemes. We believe that our competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI. There are four main tasks in this competition, which will are detailed in the Tasks tab.
Date made available30 Dec 2022
Date of data production10 Jan 2023

Cite this