A Musicdriven Video Summarization System Using Contentaware Mechanisms - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

A Musicdriven Video Summarization System Using Contentaware Mechanisms

Description:

People are impatient for videos without scenario or voice-over, especially for ... Camera operations such as pan or zoom are widely used in amateur home videos. ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 32

Provided by: Jade

Category:

more less

Transcript and Presenter's Notes

Title: A Musicdriven Video Summarization System Using Contentaware Mechanisms

1
A Music-driven Video Summarization System Using
Content-aware Mechanisms ????????????????????

CMLab of CSIE, NTU
??????????????
??? ???
???? ??? ??

2
Outline

Introduction
The problem / Proposed solution
Related works
System framework
Media analysis
Audio/video analysis
Importance function
Synchronization (combining video with audio)
Profile Rhythmic Medium
Parameter Sequential non-Sequential
Demonstration
Experimental results
Conclusions future work

3
Introduction The Problem

The digital video capture devices such as DVs are
made more affordable for end users.
Theres still a tremendous barrier between
amateurs (home users) and powerful video editing
softwares (Adobe Premiere, CyberLink
PowerDirector).
Its interesting to shoot videos but frustrating
to edit them.

Finally people leave their precious shots in
piles of DV tapes without editing and management.

4
Introduction Users Impatience

According to a survey on DVworld, the relations
between the video length and how many times will
user review them after days
Video clips with no more then 5 minutes are best
for humans concentration.
People are impatient for videos without scenario
or voice-over, especially for those with no music.

http//www.DVworld.com.tw/
5
Introduction Proposed Solution

The music-driven video as summarization
One study at MIT showed that the improved
soundtrack quality improve perceived video image
quality.
Synchronizing video and audio segments enhance
the perception of both.
Proposed solution
Create a musical video from home videos.
The synchronization is done by making the rhythm
of the video fits that of the audio.
?Because of users direct sympathetic response to
music, the created musical video is professional
looking and more entertaining.

6
Introduction Related works

Literature
Jonathan Foote, Matthew D. Cooper, Andreas
Girgensohn, "Creating music videos using
automatic media analysis," ACM Multimedia 2002
553-560
A consumer product called muvee autoProducer
has been announced to ease the burden of
professional video editing.

The content-analysis technologies are developed
for years can we use them to help auto-creation
of musical videos?
The content-aware mechanisms

7
System Framework
Volume ZCR Brightness Bandwidth
Proposed Framework
Human face Flash light Motion strength Color
variance Camera Operation ...
8
Media Analysis Audio Features

Frame-level features
Time-domain features
Volume defined as the MSR of audio samples
ZCR the number of times that the audio waveform
crosses the zero axis in each frame.
Frequency-domain features
Brightness the centroid of frequency spectrum
Bandwidth the standard deviation of frequency
spectrum

0s
30s
60s
90s
9
Media Analysis Audio Analysis

Generally the brightness distribution curve is
almost the same as that if the ZCR curve, so here
we use ZCR feature only.
Bandwidth is an important audio feature but we
can not easily tell whats the real physical
meaning of it in music when the bandwidth reaches
its high/low values.
Furthermore, the relations between musical
perceptual and bandwidth values are not clear and
not regular.

Brightness
ZCR
12s
34s
10
Media Analysis Audio Segmentation

First we cut the input audio into clips when the
volume changes dramatically.
For each clip, we define the burst of ZCR as an
attack, which may be beat of a base drum or
voice of a singer.

11
Media Analysis Audio Segmentation

The dramatic volume change defines the audio clip
boundary, while the burst of ZCR (attack) in each
clip defines the granular sub-segment within it.
Besides, we define the dynamic of an audio clip
as our clip-level feature
Faster tempo music usually have clips with higher
audio dynamics

12
Media Analysis Video Analysis

First we apply shot change detection to segment
video into shots
Here we use the combination of pixel MAD (Minimal
Absolute Difference) and pixel histogram
difference methods to detect shot change
The hybrid method performs well for home videos!

13
Media Analysis Video Analysis

Shots Heterogeneity
Here we use MPEG-7 ColorLayout descriptor to
measure each frames similarity
Used to measure video shots variety

high heterogeneity
low heterogeneity
14
Media Analysis Camera Operation

Camera operations such as pan or zoom are widely
used in amateur home videos. By detection those
camera operations can help catching the video
takers intention.
Our camera operation detection is performed on
the basis of block based motion vectors.
This method is simple and efficient.

Otherwise, no camera operation
15
Media Analysis Video Features

High-level features
Human face feature
Use the face detector in the OpenCV library
Face feature ratio

Flashlight feature
16
Media Analysis Video Features

Medium-level features
Medium-level features represent frames that are
dynamic (higher motion activities) in nature.
Motion Strength
Static frames tend to cause people lose their
patience when watching videos
Camera Motion Types
None, Pan, Zoom
Importance Zoom Pan None

17
Media Analysis Video Features

Low-level features
Modeling frames which are better to be seen,
i.e., used for selecting high quality frames in
the final production.
Frame brightness (luminance)

18
Media Analysis Video Features

Color-variance
We use histogram distributions to model the color
variances

19
Media Analysis Importance Functions

Video frame-level importance

A scaling factor, Sa, defined with the
accompanied audio clips dynamics (Adynamic)
20
Media Analysis Importance Functions

Video segments with higher scores may have human
faces resided in, or have higher motion strength,
or contain zooms and pans depending on which
features that make them reach high values.

21
Media Analysis Importance Functions

Shot-level importance
The shot-level importance is motivated by
observing that
Shots with larger motion intensity take longer
duration.
The presence of face attracts viewer.
Shots of higher heterogeneity can taker longer
playing time.
Shots with more camera operations are more
important.
Of course, shots with longer length are more
important.
Static shots takes shorter, while dynamic shots
can take longer.? Gets better results after
editing

22
Synchronization Profiles

General Properties of Home Videos

?The proposed four profiles
23
Synchronization Mechanisms

Before we talk about the synchronization process,
first we introduce the video reduction rate Rva
Basic synchronization mechanisms

Original Videos Time-line
n shots
24
Synchronization Rhythmic Profile

A basic synchronization unit (BSU)
consists of a starting time and a stopping time
in audios
e.g., an audio segment starts from the 25th
second to 31st second.
In medium profile, we use the LBSU, Larger BSU,
which may be 2 or 3 BSUs in length

25
Synchronization Rhythmic Profile

For each BSU, the starting and stopping points of
BSU will be projected back to the video timeline.
Search the projected range to find candidate
shots with the same length as BSU
We apply an audio scaling coefficient in the
synchronization stage. The weight of motion
intensity of video shots will be decreased when
aligned with a slow audio clip while nearly be
preserved when synchronized with fast audio clip.

Video timeline
26
Synchronization Medium Profile

Each shot will be reassigned to a new length
according to its shot importance, shots may
becomes longer or shorter in proportion to the
total length.
After projecting to the video space, the length
budget is calculated according to the reduction
rate then allocate the budget to each inner
shots according to its length.
If the allocated shot length is too short (frames), then its budget will be transfer to
neighboring shots.

Video timeline
Audio timeline
27
Demonstration Sample Videos
28
Experimental Results
The users patience test result

We have invited 20 people to join this subjective
test, 10 of them are with computer science
background and 10 of them are not.

The performance result of music-driven
summarization
29
Experimental Results
Answers about the comparisons of rhythmic and
medium profiles
Answers about the matching of video with audio
tempos
30
Conclusions

We have proposed and implemented a music-driven
video summarization system that can help home
users to post-process their creations in a fully
automatic way.
Many content-aware mechanisms are also proposed
to analyze the input media. We combine the input
video and audio according to their content
features to form our musical videos.
According to our subjective tests, all of the
testers amaze about our system and feel very
impressive. Most of the testers are glad to have
such a tool to help them editing their creations.
Besides, our proposed system and content-aware
mechanisms are also adopted by CyberLink Corp and
have a planned commercialized scheduled.

31
Future Work

Its better to have users feedback, telling us
which shots are must have which shots are
better to have and which shots should be
dropped
In our work, we include proper transition effects
between video shots. But we think the transition
effect should consider both of the
characteristics of the accompanying audio clip
and video content.
By exploiting more audio and video features and
having more understanding about digital contents
semantics, we can get even better results and the
automatic video editing system can get closer to
professional editors