Word-Wise Thai and Roman Script Identification

Sukalpa Chanda, Umapada Pal, Oriol Ramos Terrades

Producció científica: Contribució a revistaArticleRecercaAvaluat per experts

13 Cites (Scopus)

Resum

In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme.
Idioma originalAnglès
Número d’article11
Nombre de pàgines21
RevistaACM Transactions on Asian and Low-Resource Language Information Processing
Volum8
Número3
DOIs
Estat de la publicacióPublicada - 1 d’ag. 2009

Fingerprint

Navegar pels temes de recerca de 'Word-Wise Thai and Roman Script Identification'. Junts formen un fingerprint únic.

Com citar-ho