Title: Kein Folientitel
1Detection and Extraction of Artificial Text from
Videos
Christian Wolf and Jean-Michel Jolion
10th July 2001
PROJECT France Télécom Research Development
001B575
Laboratoire de Reconnaissance de Formes et
Vision Bât. Jules Verne INSA 69621 Villeurbanne
CEDEX
http//rfv.insa-lyon.fr/wolf,jolion
2Plan of the presentation
Slides
- Introduction
- Detection
- Image enhancement - multiple frame integration
- Binarisation of the text boxes
- Setup of the experiments
- Results
- Detection
- Binarisation
- OCR
- Conclusion and outlook
6 8 3 10 11 6 2 46
3Content based image retrieval
Result
Example image
Similarity Function
Indexing phase
4Similarity measures
similar
similar
Not similar
5Indexing using Text
Result
Key word
Keyword based Search
Patrick Mayhew
Indexing phase
Patrick Mayhew Min. chargé de lirlande de
Nord ISRAEL Jerusalem montage T.Nouel ... ... ...
... ...
6Video properties
7Text extraction general scheme
Image enhancement - Multiple frame integration
Detection of the text in single frames
Tracking
Video
"EVENEMENT" "ACTU" "SPELEOS" "Gouffre Berger
(Isére)" "aujourd'hui" "France 3 Alpes" "un
spéléologue sauveteur"
Segmentation/Binarisation
OCR
8Detection in single frames
Video
Connected components Analysis
Verification of geometric constraints
Calculation of the gradient
Accumulation
Verification of special cases
Binarisation
Combination of the rectangles
Mathematical Morphology
List of rectangles
9Detection in single frames examples
10A filter for text detection
Accumulation of horizontal gradients.
Justification Text forms a regular texture
containing vertical edges which are aligned
horizontally.
11Mathematical morphology
12Detection in video sequences
Detection per single frame
List of rectangles per frame
13Integration of the rectangles ? occurrences
At every new frame, the detected rectangles must
be matched with the stored text occurrences
14Suppression of false alarms Examples
All detections
After suppression of false alarms
15Image enhancement
16Interpolation Examples
Bi-linear interpolation
Robust bi-linear interpolation
Robust bi-cubic interpolation
17Interpolation thresholded examples
Bi-linear interpolation
Robust bi-linear interpolation
Robust bi-cubic interpolation
18Binarisation
- Different Binarisation algorithms have been
implemented and evaluated - Fisher/Otsu and windowed Fisher/Otsu algorithm
- Yanowitz-Bruckstein
- Niblack, Sauvola
- Our adaptive version of Niblack/Sauvolas method.
19Binarisation methods
Yanowitz Bruckstein The threshold surface is
calculated from the edge information.
Threshold surface
Windowed-Fisher, Niblack-Sauvola The threshold
surface is calculated from the statistics
collected in a window which is shifted across the
image.
Threshold surface
20Binarisation by Niblack
Niblack proposed a method which calculates a
threshold surface by gliding a rectangular window
over the image and calculating statistics on this
window
m mean s standard deviation k parameter, -0.2
21Binarisation by Niblack Problems
Problems are light textures in the background,
which are considered as text with small contrast
22Binarisation Improvement by Sauvola
To overcome these problems, Sauvola et al.
proposed a new improved formula to calculate the
threshold
m mean s standard deviation k parameter,
0.5 R parameter (dynamic range of std.dev.), R
128
23Binarisation by Sauvola, examples
Original image
24Improvement Adaptive dynamic range
Fixing the dynamic range R128 might be ok for
document images, but not for text boxes taken
from videos. Binarisation will not be correct, if
the contrast of the image is smaller. We
therefore set the parameter R to the maximum
standard deviation for all windows calculated
To avoid two passes of the windowing algorithm,
the mean and standard deviation can be stored in
a table during the first pass and the threshold
surface calculated on this data.
25Improvement Shift of the image range
The strong hypothesis on the gray values (text
pixels must be near zero) is not justified for
some video text boxes
Gray value histogram
26Improvement Shift of the image range
A correction of the images histogram resolves
this problem
Original image
27Fast incremental calculation
Mean and variance can be calculated in one pass
28The experiments
- Description of the experiments
- The videos used in the experiments.
- Description of the evaluation process (OCR
Evaluation). - Results for
- Text detection
- Binarisation
- OCR
29Test videos
We performed experiments on 5 different MPEG 1
videos of resolution 384x288
30AIM2 Commercials
AIM3 News
AIM4 Cartoon, News
AIM5 News
31Video example - France Télécom
22 minutes of video 33000 frames
32The interface to the OCR software
Ideal situation Pass individual (binarised) text
boxes to an OCR software which recognises the
contents box after box. In reality We used
standard commercial OCR software for our tests.
This software has been designed to recognise
scanned A4 or US letter pages and cannot directly
process text boxes.
33OCR Page - Manual
An input image, ready for the OCR
34OCR Output
051Q07Ô7 NVerf 05JQ0707 PUBLICITE IPUBIIÏITE
IPUBLICITE prenez prenez prenez boyard
boyard boyard française française
française FRANCE FRANCE FRANCE
FRANCE FRANCE c'est plus musclé c'est plus
musclé iï 'J fort fort fort fort fort
.fort .fort .fort cotHfUet blé
cotHfUet blé cQtfUet blé uutàfruuk On va
beaucoup loin avec Itineris. Partout Partout
Partout Partout Partout Partout Partout Partout
Partout Partout I22h35 I22h35 I22h35 I22h35
I22h35 PUBLICITE \PUBLICITE \PUBLICITE gt3h55l23h55
l23h55l23h55l23h55l23h55 20h.50120h50
20h50120h50 20h50120h50 ,f ort boyard ,f ort
boyard ,f ort boyard ,f ort boyard 2,4 Kg J 2,4
Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J
2,4 Kg g 2,4 Kg J II II II II II II
II II II gà dents gà dents gà
dents IIH r Lessive classique lljir Lessive
classique IHT Lessive classique le temps le
temps le temps le temps le temps PUBLICITE
PUBLICITE PUBLICITE I Par Amour du Goût. Il Par
Amour du Goût. I en en en en en en en en
en révolution révolution révolution
35Post processing of OCR output
Post processed OCR output
Ground truth
dimanche 23h55 N Vert 05100707 Berlingo PUBLICITE
prenez diffusion simultanée en stéréo
sur boyard française FRANCE c'est plus
musclé PUBLICITE fort Coral blé complet fruits On
va beaucoup Plus loin avec Itineris. Bohême Partou
t 22h35 PUBLICITE 23h55 20h50 fort fort boyard
23h55 051Q07Ô7 PUBLICITE prenez boyard françai
se FRANCE c'est plus musclé fort blé
cotHfUet uutàfruuk On va beaucoup loin avec
Itineris. Partout I22h35 PUBLICITE
\ gt3h55l 20h.50 ,f ort boyard
36Automatic evaluation using markers
The manual processing of the OCR output
(separation of the output strings and search of
the corresponding input box) is time consuming
and error prone, especially in cases where the
quality of the OCR output is very
poor. Automatic OCR output processing can be
achieved by placing marker images between the
text boxes. The marker boxes contain text which
is easily recognised by the OCR software.
In the results section we will present results
for both types of evaluation.
37An input image with markers, ready for the OCR
38OCR Evaluation
OCR output
Raw ground truth
Tkenchar 037 Tkenchar 037 'gfrançaise
'gfrançaise 'gfrançaise 'gfrançaise Tkenchar
038 Tkenchar 038 Mpe pire de fje pire de
fje pire de Tkenchar 039 Tkenchar 039
_at_S Par Amour du Goût. _at_S en _at_S révolution _at_S
la _at_S française _at_S le pire de _at_S 20H45
39OCR Evaluation Wagner Fischer
A measure for resemblance of two character
strings. The cost to transform string A into
string B is calculated. Basic transformation
operations are used, which correspond to a
certain cost. The cost function is minimised.
Substitution
cost
Insertion
cost
Deletion
cost
40Detection results - INA Videos
No suppression of false alarms
41Binarisation methods Examples
Original image Fisher Fisher (windowed) Yanowitz
B. Yanowitz B. PP Niblack Sauvola et al. Our
method
42Binarisation methods Examples
Original image Fisher Fisher (windowed) Yanowitz
B. Yanowitz B. PP Niblack Sauvola et al. Our
method
43OCR Results - Classification by binarisation
method
Robust bi-cubic interpolation
Results obtained using the manual evaluation
method (no markers in the input page).
44 pages
44OCR Results Interpolation methods
Results obtained using the automatic evaluation
method (including markers in the input page).
Robust bi-cubic interpolation
97 pages
45Conclusion
- We developed a system for detection, tracking,
enhancement and binarisation of text. - A detection performance of 93.5 is obtained.
- We derived a new binarisation method adapted to
the type of text found in videos. - The total recognition rate is surprisingly high,
given the quality of the text, but not yet good
enough for indexation purposes. - OCR integration problem No software development
kits for direct access to the recognition
functions available. A collaboration with an OCR
company seems to be inevitable.
46Outlook
The perspectives of our work are situated in the
extension of the existing algorithms to text with
more difficult properties, and the enhancement
and deeper studies of the existing
techniques Scene text The binarisation
techniques developed in the last 30 years are
aimed either at document images or images from
computer vision. The method we introduced in the
framework of this project is an improvement of
the work already presented, but the quality of
the text is not yet satisfying enough. Especially
the binarisation of scene text will demand the
development of new methods. Detection recall We
are convinced, that the recall of the detection
system can still be increased by further
research, e.g. on the binarisation technique
applied to the map of accumulated gradients.