Title: Video Shot Boundary Detection at RMIT University
1- Video Shot Boundary Detectionat RMIT University
Timo Volkmer, Saied Tahaghoghi, and Hugh E.
Williams School of Computer Science IT, RMIT
University tvolkmer, saied, hugh_at_cs.rmit.edu.au
2Overview
- Our general approach
- The moving query window
- Details of the approach
- How we measure frame similarity
- Improvements for 2004 cut detection
- Detection of gradual transitions
- Evaluation
- Experimental results
- Conclusions
3The Moving Query Window
- A moving query window consists of two equal-sized
half windows, surrounding a current frame - The moving query window is advanced through the
video frame-by-frame
- Cut detection and gradual transition detection is
performed with separate decision stages during a
single pass
4Frame feature representation
- We use one-dimensional, localised histograms with
4x4 regions in the HSV colour space (16 bins per
colour component)
- A colour histogram represents each frame region.
Corresponding regions are compared - Different weights can be applied to each region
during comparison
5Cut detection
- We disregard the four central regions of each
frame to avoid the effect of rapid activity (that
is, their weight 0) - Using the remaining regions, each frame in the
moving window is ranked by decreasing similarity
to the current frame - Frame similarity is the sum of the inter-region
similarities - The number of pre-frames that are ranked in the
top half of the rankings is monitored - When a cut is passed, the number of top ranked
pre-frames (usually) rises to a maximum and falls
to a minimum within a few frames - We have determined an optimum window size and
optimum thresholds that are effective for all our
training sets - Our cut detection is (now) parameter free
6Gradual transition detection
- Pre-frames and post-frames are combined into two
distinct sets of frames. The average distance of
each set to the current frame is computed - We use all frame regions (with identical weights)
- The ratio between the pre-frame set distance and
the post-frame set distance, the PrePostRatio, is
monitored - The end of most gradual transitions is indicated
by a peak in the PrePostRatio curve - We maintain a moving average PrePostRatio for
calculating a dynamic threshold to detect
transitions - As a final decision step, we require a minimum
difference between the last frame of the previous
shot and the first frame of the new shot
7PrePostRatio in detail
- A schematised dissolve between a shot A and a
shot B
- The PrePostRatio is usually minimal at the
beginning of a gradual transition and rises up to
a maximum at the end of the transition
8PrePostRatio curve example
- The curve shows two short gradual transitions and
two cuts within a range of 1000 frames
9Training and Evaluation
- We have trained on the TRECVID 2003 shot boundary
test set - Main parameters for gradual transition detection
are - The query window size
- The size of the history buffer for dynamic
thresholding - A threshold level factor
- Results are discussed on the next slides. (We
achieve similar and better results on the 2002
and 2001 test sets in blind runs.)
10Results at TRECVID 2004
All All Cuts Cuts Gradual Transitions Gradual Transitions
SysID Recall Precision Recall Precision Recall Precision
rmit1 0.915 0.829 0.944 0.922 0.852 0.671
rmit2 0.901 0.850 0.944 0.921 0.810 0.714
rmit3 0.907 0.859 0.944 0.921 0.828 0.738
rmit4 0.893 0.870 0.944 0.921 0.783 0.762
rmit5 0.897 0.877 0.944 0.921 0.798 0.782
rmit6 0.883 0.885 0.944 0.921 0.753 0.802
rmit7 0.889 0.890 0.944 0.921 0.772 0.819
rmit8 0.871 0.893 0.944 0.921 0.715 0.824
rmit9 0.881 0.899 0.944 0.921 0.746 0.844
rmit10 0.860 0.900 0.944 0.921 0.681 0.844
11Overall results
12Frame recall and precision for gradual transitions
13Discussion
- Cut detection is highly effective
- This year, recall is 94 and precision is 92.
Improvements from 2003 due to ignoring centre
region - Gradual detection has improved significantly
since 2003 - Recall now between 68--85, precision 67--84
- High detection threshold favours precision, low
favours recall - Short detection threshold history length was
found to be preferable - Final decision step reduces false positives
- For television news, we are able to use a fixed
moving query window size of 24 frames - Experimented with a simple ASR technique in 10
additional runs, which removed detected
transitions that coincided with spoken words. Ad
hoc, very unsuccessful
14Conclusions
- Disregarding the focus area of frames for cut
detection has improved our results by 3 in
recall and 9 in precision - Our parameter-free ranking scheme is highly
effective in cut detection on a wide variety of
footage - Our gradual transition detection method is
relatively simple and needs only few parameters - The additional, final preprocessing step reduces
false positives and improved results
significantly - The use of localised histograms and more dynamic
thresholding also improved results in gradual
transition detection - Our approach is computationally inexpensive,
simple to implement, and effective - 15,500 seconds to process the video (around 4
hours, 18 minutes)
15Questions?