RoadText Competition on Video Text Detection, Tracking and Recognition



The RoadText challenge is a competition that aims to advance the current state of the art in scene text detection, recognition, and tracking in videos. This is a particularly challenging task due to the unique characteristics of text in driving videos. Unlike text in other types of videos, the text in these videos is often incidental and widely dispersed across the scene. Additionally, the camera movement in these videos can introduce distortions such as motion blur, which can make the text difficult to recognize.

To evaluate and improve methods for this task, the RoadText challenge will be based on the RoadText-1K[1] dataset, which contains 1000 dash cam videos. Each video is 10 seconds long and has 30 frames per second. In these videos, the text object lifetimes are typically quite short, which means that models need to be able to handle occlusions and deal with tiny, distorted text instances that are frequently influenced by motion blur and significant perspective distortions. In many cases, text instances may not be fully readable in any single frame, requiring the combination of detections from multiple frames to successfully transcribe them.

Overall, the RoadText challenge focuses on detecting, tracking, and recognizing text instances in videos, with an emphasis on developing models that are able to handle the unique challenges presented by text in driving videos. By addressing these challenges, the competition hopes to contribute to the development of technology that can assist with a variety of tasks, including automatic translation of road signs, improved navigation for self-driving vehicles, and more.
Date made available24 Dec 2022
Date of data production24 Dec 2022

Cite this