Document Collection Visual Question Answering

Rubèn Tito*, Dimosthenis Karatzas, Ernest Valveny

*Autor corresponent d’aquest treball

Producció científica: Capítol de llibreCapítolRecercaAvaluat per experts

12 Cites (Scopus)

Resum

Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer. Along with the dataset we propose a new evaluation metric and baselines which provide further insights to the new dataset and task.

Idioma originalAnglès
Títol de la publicacióDocument Analysis and Recognition – ICDAR 2021 - 16th International Conference, Proceedings
EditorsJosep Lladós, Daniel Lopresti, Seiichi Uchida
EditorSpringer Science and Business Media Deutschland GmbH
Pàgines778-792
Nombre de pàgines15
ISBN (imprès)9783030863302
DOIs
Estat de la publicacióPublicada - 2021

Sèrie de publicacions

NomLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volum12822 LNCS
ISSN (imprès)0302-9743
ISSN (electrònic)1611-3349

Fingerprint

Navegar pels temes de recerca de 'Document Collection Visual Question Answering'. Junts formen un fingerprint únic.

Com citar-ho