Title: DETECTING MOVING TEXT IN VIDEO
1DETECTING MOVING TEXT IN VIDEO USING TEMPORAL
INFORMATION
Weihua Huang, Palaiahnakote Shivakumara and Chew
Lim Tan School of Computing, National University
of Singapore, huangwh, shiva,
tancl_at_comp.nus.edu.sg
Motivation Detection and extraction of text from
video becomes more and more important as the
amount of online video information is explosively
increasing. The extracted text can be used for
automatic indexing and summarization of the video
content, which is then applicable to online
searching and content management. At the moment,
most video text detection techniques in the
literature focus on text in a single frame.
However, the existence of temporal information is
the key difference between text extraction from
video and text extraction from image.
- Forming Text Boxes
- Sub-blocks with velocity 0 leave them to still
text detection - Sub-blocks with velocity gt 0 further estimate if
a block contains text, how? Number of edge pixels
per scan line, density of edge pixels - Group neighboring candidate blocks with same
trajectory and velocity
- Using Temporal Information For Video Text
Detection - Does a text block move or remain still?
- Where does a text block move? How fast does it
move? - For how long does a text block stay on screen?
- Experiment Results
- Testing dataset 8983 frames from 20 video clips
taken from movies (including music videos) and
news clips. - Multiple languages covered, including English,
Chinese, Korean and Thai. - Text detection is based on motion vectors
calculated from every three consecutive frames.
Performance of the proposed method
- Capturing the Motion Information
- Divide a video frame into sub-blocks
- Calculate motion vector for each sub-block by
searching for the best matching sub-block in a
future frame in certain range. - Simplification consider only 4 directions (left,
right, up and down) - Acceleration skip sub-blocks with uniform color
(unlikely to be text)
- Discussions
- False positives are mainly due to rigid textured
object movements (such as the examples above) - Mis-detections in some frames can be recovered by
detections in other frames, which is the effect
of temporal redundancy - A queue is used for tracking blocks entering and
exiting the screen - Experiment shows that the acceleration step skips
83.76 of the blocks, making the method
computationally efficient