In this paper we present a multipage administrative document image retrieval system based on textual and visual representations of document pages. Individual pages are represented by textual or visual information using a bag-of-words framework. Different fusion strategies are evaluated which allow the system to perform multipage document retrieval on the basis of a single page retrieval system. Results are reported on a large dataset of document images sampled from a banking workflow.
|Title of host publication||ICPR 2012 - 21st International Conference on Pattern Recognition|
|Number of pages||4|
|Publication status||Published - 2012|
|Name||Proceedings - International Conference on Pattern Recognition|