Multimodal page classification in administrative document image streams

Marçal Rusiñol, Volkmar Frinken, Dimosthenis Karatzas, Andrew D. Bagdanov, Josep Lladós

Producció científica: Contribució a revistaArticleRecercaAvaluat per experts

36 Cites (Scopus)

Resum

© 2014, Springer-Verlag Berlin Heidelberg. In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an n-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.
Idioma originalAnglès
Pàgines (de-a)331-341
RevistaInternational Journal on Document Analysis and Recognition
Volum17
Número4
DOIs
Estat de la publicacióPublicada - 1 de gen. 2014

Fingerprint

Navegar pels temes de recerca de 'Multimodal page classification in administrative document image streams'. Junts formen un fingerprint únic.

Com citar-ho