A novel mutual nearest neighbor based symmetry for text frame classification in video

Palaiahnakote Shivakumara, Anjan Dutta, Trung Quy Phan, Chew Lim Tan, Umapada Pal

    Research output: Contribution to journalArticleResearchpeer-review

    21 Citations (Scopus)

    Abstract

    In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new MaxMin clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels. © 2011 Elsevier Ltd. All rights reserved.
    Original languageEnglish
    Pages (from-to)1671-1683
    JournalPattern Recognition
    Volume44
    Issue number8
    DOIs
    Publication statusPublished - 1 Aug 2011

    Keywords

    • Frame classification
    • Mutual nearest neighbor
    • Text block location
    • Video image
    • Waveletmedian moments

    Fingerprint Dive into the research topics of 'A novel mutual nearest neighbor based symmetry for text frame classification in video'. Together they form a unique fingerprint.

    Cite this