LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach

Maria Pilligua*, Nil Biescas*, Javier Vazquez-Corral, Josep Lladós, Ernest Valveny, Sanket Biswas*

*Autor correspondiente de este trabajo

Producción científica: Capítulo de libroCapítuloInvestigaciónrevisión exhaustiva

Resumen

The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining. Traditional methods often falter with variable document types, leading to poor performance. To overcome these limitations, this paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration (DIR) systems. We propose LayeredDoc, which utilizes two layers of information: the first targets coarse-grained graphic components, while the second refines machine-printed textual content. This hierarchical DIR framework dynamically adjusts to the characteristics of the input document, facilitating effective domain adaptation. We evaluated our approach both qualitatively and quantitatively using a new real-world dataset, LayeredDocDB, developed for this study. Initially trained on a synthetically generated dataset, our model demonstrates strong generalization capabilities for the DIR task, offering a promising solution for handling variability in real-world data. Our code is accessible on this GitHub(https://github.com/mpilligua/LayeredDoc).
Idioma originalInglés
Título de la publicación alojadaDocument Analysis and Recognition – ICDAR 2024 Workshops, Proceedings
EditoresHarold Mouchère, Anna Zhu
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas27-39
Número de páginas13
ISBN (versión digital)9783031706455
ISBN (versión impresa)9783031706455, 9783031706448
DOI
EstadoPublicada - 11 nov 2024

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen14935 LNCS
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Huella

Profundice en los temas de investigación de 'LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach'. En conjunto forman una huella única.

Citar esto