TY - CHAP
T1 - Embedding document structure to bag-of-words through pair-wise stable key-regions
AU - Gao, Hongxing
AU - Rusiñol, Marçal
AU - Karatzas, Dimosthenis
AU - Lladós, Josep
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/4
Y1 - 2014/12/4
N2 - Since the document structure carries valuable discriminative information, plenty of efforts have been made for extracting and understanding document structure among which layout analysis approaches are the most commonly used. In this paper, Distance Transform based MSER (DTMSER) is employed to efficiently extract the document structure as a dendrogram of key-regions which roughly correspond to structural elements such as characters, words and paragraphs. Inspired by the Bag of Words (BoW) framework, we propose an efficient method for structural document matching by representing the document image as a histogram of key-region pairs encoding structural relationships. Applied to the scenario of document image retrieval, experimental results demonstrate a remarkable improvement when comparing the proposed method with typical BoW and pyramidal BoW methods.
AB - Since the document structure carries valuable discriminative information, plenty of efforts have been made for extracting and understanding document structure among which layout analysis approaches are the most commonly used. In this paper, Distance Transform based MSER (DTMSER) is employed to efficiently extract the document structure as a dendrogram of key-regions which roughly correspond to structural elements such as characters, words and paragraphs. Inspired by the Bag of Words (BoW) framework, we propose an efficient method for structural document matching by representing the document image as a histogram of key-region pairs encoding structural relationships. Applied to the scenario of document image retrieval, experimental results demonstrate a remarkable improvement when comparing the proposed method with typical BoW and pyramidal BoW methods.
UR - http://www.scopus.com/inward/record.url?scp=84919933976&partnerID=8YFLogxK
U2 - 10.1109/ICPR.2014.500
DO - 10.1109/ICPR.2014.500
M3 - Chapter
AN - SCOPUS:84919933976
T3 - Proceedings - International Conference on Pattern Recognition
SP - 2903
EP - 2908
BT - Proceedings - International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
ER -