Title: Video Shot Structuring using Spectral Methods
1VIDEO SHOT STRUCTURING USING SPECTRAL METHODS
Jean-Marc Odobez, Daniel Gatica-Perez and Mael
Guillemot odobez, gatica, guillemo at idiap dot
ch Institut Dalle Molle dIntelligence
Artificielle Perceptive P.O. Box 592, CH-1920,
Martigny, Switzerland
- Objective Clustering of home video shots
- Method Spectral clustering related to graph
partitioning - Issues Model selection
- Results ? Better results than Probabilistic
Hierarchical Clustering. - ? Model selection with
eigengap. - ? Validation on home videos
and soccer games.
The Spectral Clustering Algorithm
Spectral clustering captures perceptual
organization
1. similarity matrix
conditioning
Key ideas
- based on pair-wise similarity matrix
- spectrum computation ? embedded space
- clustering in embedded space
2. Eigenvectors (2 here)
3. Row normalization (embedded space)
4. K-means clustering
Ng. et al. On spectral clustering analysis
and an algorithm, in Proc. NIPS, Dec. 2001
Automatic model selection
Value of K ? ? what is a good clustering? ? use
of eigengap
- Result
- h interpretation valid for ? large ? ? tight
cluster - Selection scheme
- Relationship with the Cheeger constant h
- ? Graph bi-partitioning into (C , ).
- Eigengap
- with ? and ? 2 largest eigenvalues of the
affinity (similarity) matrix A.
G
G
2
1
- simplest clustering model
- tight extracted clusters S ? S(A ) gt thresh
for all i.
k
k
- From matrix perturbation theory
- Assess stability of first eigenvector extraction
w.r.t. noise/perturbation.
vol( C)
vol( )
?2, v2
?2, v2
?1, v1
?1, v1
Interpretation measure of the tightness of
graph. ? low h existence of a good cut
? large h tight graph (difficult to split
into 2 subsets).
? large ?
? low ?
G
v1 direction stable
G
Experiments and Results
Soccer game structuring
- Shot representation
- 5 key-frames extracted per shot - different
appearance - ? Clustering of key-frames ? shot class
assignment majority vote of key-frames - Visual feature extraction color (r,g,b)
histogram normalized frame number - Multiple ground truth for scene boundary
detection.
- Similarity computation
- No temporal features
- Key-frames are split into two spatial regions
Home video structuring
- Similarity computation
-
- battacharrya
distance
- Kodak home video database (20
videos - total 6 hours) - Multiple ground truth generated from
third party.
K-means
Spectral method
- User study
- Comparing perceptually K-means and Spectral
clustering methods,
- Performance evaluation
- Shots in errors (SIE) edit distance under best
label matching.
- based on 3 pairs of summaries (10 minutes/video
clip) - 10 subjects.
Question 2 which summary do you prefer ? Why ?
Main comments
Question 1 report shots that doesn't fit in
clusters.
2 times (5 times similar )
K-means
Too many similar sized clusters. A small number
of errors everywhere
avg 9.13
- SIEmed fair measure of algorithm performance
- Overall performance average on all videos of
SIEmed/(number of shots in video)
More homogeneous clusters that make more sense.
23 times
Spectral
avg 8.43
Spectral method, better than Probabilistic
Hierarchical Clustering, almost as good as human.
3rd International Workshop on Content Based
Multimedia Indexing, September 22-24, Rennes,
France.