Document Visual Question Answering edit

Dataset

Description

The "Document Visual Question Answering" (DocVQA) challenge, focuses on a specific type of Visual Question Answering task, where visually understanding the information on a document image is necessary in order to provide an answer. This goes over and above passing a document image through OCR, and involves understanding all types of information conveyed by a document. Textual content (handwritten or typewritten), non-textual elements (marks, tick boxes, separators, diagrams), layout (page structure, forms, tables), and style (font, colours, highlighting), to mention just a few, are pieces of information that can be potentially necessary for responding to the question at hand.

The DocVQA challenge is a continuous effort linked to various events. The challenge was originally organised in the context of the CVPR 2020 Workshop on Text and Documents in the Deep Learning Era. From this first event a paper with the results was presented in the Document Analysis Systems International Workshop that can be found here. The second edition will take place in the context of the Int. Conference on Document Analysis and Recognition (ICDAR) 2021.
Date made available19 Feb 2023
PublisherComputer Vision Center - Robust Reading Competition Portal
Date of data production2020 - 2023

Cite this