A Musicdriven Video Summarization System Using Contentaware Mechanisms - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

A Musicdriven Video Summarization System Using Contentaware Mechanisms

Description:

People are impatient for videos without scenario or voice-over, especially for ... Camera operations such as pan or zoom are widely used in amateur home videos. ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 32
Provided by: Jade
Category:

less

Transcript and Presenter's Notes

Title: A Musicdriven Video Summarization System Using Contentaware Mechanisms


1
A Music-driven Video Summarization System Using
Content-aware Mechanisms ????????????????????
  • CMLab of CSIE, NTU
  • ??????????????
  • ??? ???
  • ???? ??? ??

2
Outline
  • Introduction
  • The problem / Proposed solution
  • Related works
  • System framework
  • Media analysis
  • Audio/video analysis
  • Importance function
  • Synchronization (combining video with audio)
  • Profile Rhythmic Medium
  • Parameter Sequential non-Sequential
  • Demonstration
  • Experimental results
  • Conclusions future work

3
Introduction The Problem
  • The digital video capture devices such as DVs are
    made more affordable for end users.
  • Theres still a tremendous barrier between
    amateurs (home users) and powerful video editing
    softwares (Adobe Premiere, CyberLink
    PowerDirector).
  • Its interesting to shoot videos but frustrating
    to edit them.
  • Finally people leave their precious shots in
    piles of DV tapes without editing and management.

4
Introduction Users Impatience
  • According to a survey on DVworld, the relations
    between the video length and how many times will
    user review them after days
  • Video clips with no more then 5 minutes are best
    for humans concentration.
  • People are impatient for videos without scenario
    or voice-over, especially for those with no music.

http//www.DVworld.com.tw/
5
Introduction Proposed Solution
  • The music-driven video as summarization
  • One study at MIT showed that the improved
    soundtrack quality improve perceived video image
    quality.
  • Synchronizing video and audio segments enhance
    the perception of both.
  • Proposed solution
  • Create a musical video from home videos.
  • The synchronization is done by making the rhythm
    of the video fits that of the audio.
  • ?Because of users direct sympathetic response to
    music, the created musical video is professional
    looking and more entertaining.

6
Introduction Related works
  • Literature
  • Jonathan Foote, Matthew D. Cooper, Andreas
    Girgensohn, "Creating music videos using
    automatic media analysis," ACM Multimedia 2002
    553-560
  • A consumer product called muvee autoProducer
    has been announced to ease the burden of
    professional video editing.
  • The content-analysis technologies are developed
    for years can we use them to help auto-creation
    of musical videos?
  • The content-aware mechanisms

7
System Framework
Volume ZCR Brightness Bandwidth
Proposed Framework
Human face Flash light Motion strength Color
variance Camera Operation ...
8
Media Analysis Audio Features
  • Frame-level features
  • Time-domain features
  • Volume defined as the MSR of audio samples
  • ZCR the number of times that the audio waveform
    crosses the zero axis in each frame.
  • Frequency-domain features
  • Brightness the centroid of frequency spectrum
  • Bandwidth the standard deviation of frequency
    spectrum

0s
30s
60s
90s
9
Media Analysis Audio Analysis
  • Generally the brightness distribution curve is
    almost the same as that if the ZCR curve, so here
    we use ZCR feature only.
  • Bandwidth is an important audio feature but we
    can not easily tell whats the real physical
    meaning of it in music when the bandwidth reaches
    its high/low values.
  • Furthermore, the relations between musical
    perceptual and bandwidth values are not clear and
    not regular.

Brightness
ZCR
12s
34s
10
Media Analysis Audio Segmentation
  • First we cut the input audio into clips when the
    volume changes dramatically.
  • For each clip, we define the burst of ZCR as an
    attack, which may be beat of a base drum or
    voice of a singer.

11
Media Analysis Audio Segmentation
  • The dramatic volume change defines the audio clip
    boundary, while the burst of ZCR (attack) in each
    clip defines the granular sub-segment within it.
  • Besides, we define the dynamic of an audio clip
    as our clip-level feature
  • Faster tempo music usually have clips with higher
    audio dynamics

12
Media Analysis Video Analysis
  • First we apply shot change detection to segment
    video into shots
  • Here we use the combination of pixel MAD (Minimal
    Absolute Difference) and pixel histogram
    difference methods to detect shot change
  • The hybrid method performs well for home videos!

13
Media Analysis Video Analysis
  • Shots Heterogeneity
  • Here we use MPEG-7 ColorLayout descriptor to
    measure each frames similarity
  • Used to measure video shots variety

high heterogeneity
low heterogeneity
14
Media Analysis Camera Operation
  • Camera operations such as pan or zoom are widely
    used in amateur home videos. By detection those
    camera operations can help catching the video
    takers intention.
  • Our camera operation detection is performed on
    the basis of block based motion vectors.
  • This method is simple and efficient.

Otherwise, no camera operation
15
Media Analysis Video Features
  • High-level features
  • Human face feature
  • Use the face detector in the OpenCV library
  • Face feature ratio

Flashlight feature
16
Media Analysis Video Features
  • Medium-level features
  • Medium-level features represent frames that are
    dynamic (higher motion activities) in nature.
  • Motion Strength
  • Static frames tend to cause people lose their
    patience when watching videos
  • Camera Motion Types
  • None, Pan, Zoom
  • Importance Zoom Pan None

17
Media Analysis Video Features
  • Low-level features
  • Modeling frames which are better to be seen,
    i.e., used for selecting high quality frames in
    the final production.
  • Frame brightness (luminance)

18
Media Analysis Video Features
  • Color-variance
  • We use histogram distributions to model the color
    variances

19
Media Analysis Importance Functions
  • Video frame-level importance

A scaling factor, Sa, defined with the
accompanied audio clips dynamics (Adynamic)
20
Media Analysis Importance Functions
  • Video segments with higher scores may have human
    faces resided in, or have higher motion strength,
    or contain zooms and pans depending on which
    features that make them reach high values.

21
Media Analysis Importance Functions
  • Shot-level importance
  • The shot-level importance is motivated by
    observing that
  • Shots with larger motion intensity take longer
    duration.
  • The presence of face attracts viewer.
  • Shots of higher heterogeneity can taker longer
    playing time.
  • Shots with more camera operations are more
    important.
  • Of course, shots with longer length are more
    important.
  • Static shots takes shorter, while dynamic shots
    can take longer.? Gets better results after
    editing

22
Synchronization Profiles
  • General Properties of Home Videos

?The proposed four profiles
23
Synchronization Mechanisms
  • Before we talk about the synchronization process,
    first we introduce the video reduction rate Rva
  • Basic synchronization mechanisms

Original Videos Time-line
n shots
24
Synchronization Rhythmic Profile
  • A basic synchronization unit (BSU)
  • consists of a starting time and a stopping time
    in audios
  • e.g., an audio segment starts from the 25th
    second to 31st second.
  • In medium profile, we use the LBSU, Larger BSU,
    which may be 2 or 3 BSUs in length

25
Synchronization Rhythmic Profile
  • For each BSU, the starting and stopping points of
    BSU will be projected back to the video timeline.
  • Search the projected range to find candidate
    shots with the same length as BSU
  • We apply an audio scaling coefficient in the
    synchronization stage. The weight of motion
    intensity of video shots will be decreased when
    aligned with a slow audio clip while nearly be
    preserved when synchronized with fast audio clip.

Video timeline
26
Synchronization Medium Profile
  • Each shot will be reassigned to a new length
    according to its shot importance, shots may
    becomes longer or shorter in proportion to the
    total length.
  • After projecting to the video space, the length
    budget is calculated according to the reduction
    rate then allocate the budget to each inner
    shots according to its length.
  • If the allocated shot length is too short (frames), then its budget will be transfer to
    neighboring shots.

Video timeline
Audio timeline
27
Demonstration Sample Videos
28
Experimental Results
The users patience test result
  • We have invited 20 people to join this subjective
    test, 10 of them are with computer science
    background and 10 of them are not.

The performance result of music-driven
summarization
29
Experimental Results
Answers about the comparisons of rhythmic and
medium profiles
Answers about the matching of video with audio
tempos
30
Conclusions
  • We have proposed and implemented a music-driven
    video summarization system that can help home
    users to post-process their creations in a fully
    automatic way.
  • Many content-aware mechanisms are also proposed
    to analyze the input media. We combine the input
    video and audio according to their content
    features to form our musical videos.
  • According to our subjective tests, all of the
    testers amaze about our system and feel very
    impressive. Most of the testers are glad to have
    such a tool to help them editing their creations.
  • Besides, our proposed system and content-aware
    mechanisms are also adopted by CyberLink Corp and
    have a planned commercialized scheduled.

31
Future Work
  • Its better to have users feedback, telling us
    which shots are must have which shots are
    better to have and which shots should be
    dropped
  • In our work, we include proper transition effects
    between video shots. But we think the transition
    effect should consider both of the
    characteristics of the accompanying audio clip
    and video content.
  • By exploiting more audio and video features and
    having more understanding about digital contents
    semantics, we can get even better results and the
    automatic video editing system can get closer to
    professional editors
Write a Comment
User Comments (0)
About PowerShow.com