Video Shot Boundary Detection at RMIT University - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Video Shot Boundary Detection at RMIT University

Description:

Video Shot Boundary Detection at RMIT University – PowerPoint PPT presentation

Number of Views:322
Avg rating:3.0/5.0
Slides: 16
Provided by: justin46
Category:

less

Transcript and Presenter's Notes

Title: Video Shot Boundary Detection at RMIT University


1
  • Video Shot Boundary Detectionat RMIT University

Timo Volkmer, Saied Tahaghoghi, and Hugh E.
Williams School of Computer Science IT, RMIT
University tvolkmer, saied, hugh_at_cs.rmit.edu.au
2
Overview
  • Our general approach
  • The moving query window
  • Details of the approach
  • How we measure frame similarity
  • Improvements for 2004 cut detection
  • Detection of gradual transitions
  • Evaluation
  • Experimental results
  • Conclusions

3
The Moving Query Window
  • A moving query window consists of two equal-sized
    half windows, surrounding a current frame
  • The moving query window is advanced through the
    video frame-by-frame
  • Cut detection and gradual transition detection is
    performed with separate decision stages during a
    single pass

4
Frame feature representation
  • We use one-dimensional, localised histograms with
    4x4 regions in the HSV colour space (16 bins per
    colour component)
  • A colour histogram represents each frame region.
    Corresponding regions are compared
  • Different weights can be applied to each region
    during comparison

5
Cut detection
  • We disregard the four central regions of each
    frame to avoid the effect of rapid activity (that
    is, their weight 0)
  • Using the remaining regions, each frame in the
    moving window is ranked by decreasing similarity
    to the current frame
  • Frame similarity is the sum of the inter-region
    similarities
  • The number of pre-frames that are ranked in the
    top half of the rankings is monitored
  • When a cut is passed, the number of top ranked
    pre-frames (usually) rises to a maximum and falls
    to a minimum within a few frames
  • We have determined an optimum window size and
    optimum thresholds that are effective for all our
    training sets
  • Our cut detection is (now) parameter free

6
Gradual transition detection
  • Pre-frames and post-frames are combined into two
    distinct sets of frames. The average distance of
    each set to the current frame is computed
  • We use all frame regions (with identical weights)
  • The ratio between the pre-frame set distance and
    the post-frame set distance, the PrePostRatio, is
    monitored
  • The end of most gradual transitions is indicated
    by a peak in the PrePostRatio curve
  • We maintain a moving average PrePostRatio for
    calculating a dynamic threshold to detect
    transitions
  • As a final decision step, we require a minimum
    difference between the last frame of the previous
    shot and the first frame of the new shot

7
PrePostRatio in detail
  • A schematised dissolve between a shot A and a
    shot B
  • The PrePostRatio is usually minimal at the
    beginning of a gradual transition and rises up to
    a maximum at the end of the transition

8
PrePostRatio curve example
  • The curve shows two short gradual transitions and
    two cuts within a range of 1000 frames

9
Training and Evaluation
  • We have trained on the TRECVID 2003 shot boundary
    test set
  • Main parameters for gradual transition detection
    are
  • The query window size
  • The size of the history buffer for dynamic
    thresholding
  • A threshold level factor
  • Results are discussed on the next slides. (We
    achieve similar and better results on the 2002
    and 2001 test sets in blind runs.)

10
Results at TRECVID 2004
All All Cuts Cuts Gradual Transitions Gradual Transitions
SysID Recall Precision Recall Precision Recall Precision
rmit1 0.915 0.829 0.944 0.922 0.852 0.671
rmit2 0.901 0.850 0.944 0.921 0.810 0.714
rmit3 0.907 0.859 0.944 0.921 0.828 0.738
rmit4 0.893 0.870 0.944 0.921 0.783 0.762
rmit5 0.897 0.877 0.944 0.921 0.798 0.782
rmit6 0.883 0.885 0.944 0.921 0.753 0.802
rmit7 0.889 0.890 0.944 0.921 0.772 0.819
rmit8 0.871 0.893 0.944 0.921 0.715 0.824
rmit9 0.881 0.899 0.944 0.921 0.746 0.844
rmit10 0.860 0.900 0.944 0.921 0.681 0.844
11
Overall results
12
Frame recall and precision for gradual transitions
13
Discussion
  • Cut detection is highly effective
  • This year, recall is 94 and precision is 92.
    Improvements from 2003 due to ignoring centre
    region
  • Gradual detection has improved significantly
    since 2003
  • Recall now between 68--85, precision 67--84
  • High detection threshold favours precision, low
    favours recall
  • Short detection threshold history length was
    found to be preferable
  • Final decision step reduces false positives
  • For television news, we are able to use a fixed
    moving query window size of 24 frames
  • Experimented with a simple ASR technique in 10
    additional runs, which removed detected
    transitions that coincided with spoken words. Ad
    hoc, very unsuccessful

14
Conclusions
  • Disregarding the focus area of frames for cut
    detection has improved our results by 3 in
    recall and 9 in precision
  • Our parameter-free ranking scheme is highly
    effective in cut detection on a wide variety of
    footage
  • Our gradual transition detection method is
    relatively simple and needs only few parameters
  • The additional, final preprocessing step reduces
    false positives and improved results
    significantly
  • The use of localised histograms and more dynamic
    thresholding also improved results in gradual
    transition detection
  • Our approach is computationally inexpensive,
    simple to implement, and effective
  • 15,500 seconds to process the video (around 4
    hours, 18 minutes)

15
Questions?
Write a Comment
User Comments (0)
About PowerShow.com