Title: Video Classification
1Video Classification
- By Maryam S. Mirian
- For Multimedia Pattern Recognition Joint
Courses Project
2Outline
- What is Video Classification?
- Straightforward or Difficult?
- What is its Applications?
- What are its methods?
- Review of Video Classification Methods
- What is my own Project, exactly?
3What is Video Classification?
- Classify a Video (Shot) into one of Nc predefined
Classes - Indoor / outdoor
- News / Sports
-
4Is Video Classification Difficult? Why?
- YES, Because
- Data Stream is a Multi-dimensional signal.
- It has a subjective nature.
5Classification
6Required Steps for Classification
Object
Classification
Feature Extraction
Feature Reduction
Observations
Class Labels
Using Methods like PCA, LDA
The most Important and the most difficult part
7Methods of Classification
- Bayesian Classification
- kNN Classification
- Neural Classification
- MLP
- RBF
- Classification based on Support Vector Machines
- Rule-based Classification
8Bayesian Decision Making
So, x belongs to w2
9Methods of Classification
- Bayesian Classification
- kNN Classification
- Neural Classification
- MLP
- RBF
- Classification based on Support Vector Machines
- Rule-based Classification
10kNN Decision Making
k 5, 2 Red Neighbor While 3 Black Neighbor, so
X should be Black!
11Methods of Classification
- Bayesian Classification
- kNN Classification
- Neural Classification
- MLP
- RBF
- Classification based on Support Vector Machines
- Rule-based Classification
12MLP Classifier
13Video Content Analysis
14Applications of Automatic video classification
- Automatic Video segmentation
- content based retrieval
- browsing and retrieving digitized video
- identifying close-up video frames before running
a computationally expensive face recognizer. - effective management of ever-increasing amount of
broadcast news video personalization of news
video.
15Classify Shot or Video?
- One effective way to organize the video is to
segment the video into small, single-story units
and classify these units according to their
semantics. - A shot represents a contiguous sequence of
visually similar frames. It is a syntactical
representation and does not usually convey any
coherent semantics to the users.
16Looking _at_Video Classification
17Ide et al. 1998
- Problem Domain News video
- Features
- Videotext
- motion
- face
- segmented the video into shots
- used clustering techniques
- classify each shot into 1 of 5 classes
Speech/report, Anchor, Walking, Gathering, and
Computer graphics shots. - Quite simple but seems effective for this
restricted class of problems.
18Huang et al. 1999
- Problem Domain TV Programs
- news report
- weather forecast
- Commercials
- basketball games
- football games
- Features
- Audio
- Color
- motion
19Chen and Wong 2001
- Problem Domain
- news video
- News
- Weather
- Reporting
- Commercials
- Basketball
- Football
- Features
- Motion
- Color
- text caption
- cut rate
- used a rule-based approach
20Looking _at_ Lekha Chaisorn et.al 2002 in More
Details
21Basic Ideas
- Proposes a two-level, multi-modal framework.
- The video is analyzed at the shot and story unit
(or scene) levels. - At the shot level, a Decision Tree to classify
the shot into one of 13 pre-defined categories is
employed. - At the scene level, the HMM (Hidden Markov
Models) analysis is used to eliminate shot
classification errors - Results indicate that a high accuracy of over 95
for shot classification can be achieved. - The use of HMM analysis helps to improve the
accuracy of the shot classification and achieve
over 89 accuracy on story segmentation.
22Predefined Classes
23Features in Shot Level
- Low-level Visual Content Feature
- Color Histogram
- Temporal Features
- Background scene change
- Speaker change
- Audio
- Motion activity
- Shot duration
- High-level Object-based features
- Face
- Shot type
- Videotext
- Centralized Videotext
24Feature vector of a shot
- Si (a, m, d, f, s, t, c)
- a the class of audio, a ? tspeech, mmusic,
ssilence, n noise, tn speech noise, tm
speech music, mnmusicnoise - m the motion activity, m ?llow, mmedium,
hhigh - d the shot duration, d ?sshort, mmedium,
llong - f the number of faces, ? ? f
- s the shot type, s ?c closed-up, mmedium,
llong, uunknown - t the number of lines of text in the scene, ?
? t - c set to true if the videotexts present are
centralized, c ?ttrue, ffalse
25Decision Tree for Shot Classification
26Reading these papers, I decided about My own
Project.
27About Problem Domain
- Sport Classification seems OK
- Interesting Enough
- It is helpful for Sports-Lovers
28About Extracting features.
- Features used in video analysis
- color,texture,shape,motion vector
- Criteria of choosing features they should have
similar statistical behavior across time - Color histogram simple and robust
- Motion vectorsinvariance to color and light
29So, My Own Project is
- Sports Video Classifications Football,
Basketball, .(Those Well-defined sports, I can
find Video On!) - Steps I should take
- Finding or Gathering a Video Collection
- Shot Detection
- Feature Extraction
- Key Frame (s) Extraction
- Selecting Middle Shot I-Frame
- Use of Clustering
-
- Motion Vectorbased Features
- Straight Lines Detection
- Design a Classifier
- Test the Approach
30Looking _at_Ekin,Tekalp2003 one Research on
Football Video Classification
31Features
- Cinematic
- result from common video composition and
production rules. - shot types, camera motions and replays.
- Object-based
- Described by their spatial, e.g., color, texture,
and shape, and spatio-temporal features, such as
object motions and interactions
32Robust Dominant Color Region Detection
- A soccer field has one distinct dominant color (a
tone of green) that may vary from stadium to
stadium, and also due to weather and lighting
conditions within the same stadium. - The statistics of this dominant color, in the HSI
space, are learned by the system at start-up, and
then automatically updated to adapt to temporal
variations.
33Shot classification
- Long Shot
- A long shot displays the global view of the
field. - In-Field Medium Shot
- a whole human body is usually visible.
- Close-Up Shot
- shows the above-waist view of one person
- Out of Field Shot
- The audience, coach, and other shots
34(No Transcript)
35How Extend to Shot from a Frame?
- Due to the computational simplicity they find the
class of every frame in a shot and assign the
shot class to the label of the majority of
frames.
36Decision Schema based on G
- The first stage uses G value and two thresholds,
TcloseUp and Tmedium to determine the frame view
label.
37(No Transcript)
38Soccer Eevent Detection
- Goal Detection
- Referee Detection
- Controversial calls, such as red-yellow cards and
penalties - Penalty Box Detection
39Goal Detection
- Occurrence of a goal is generally followed by a
special pattern of cinematic features. - A goal event leads to a break in the game.
- one or more close-up views of the actors of the
goal event. - show one or more replay(s)
- the restart of the game is usually captured by a
long shot.
40(No Transcript)
41Referee Detection
- Assumed that there is, a single referee in a
- medium
- out of field
- close-up shot
- So no search for a referee in a long shot
42Penalty Box Detection
- Field lines in a long view can be used to
localize the view and/or register the current
frame on the standard field model
43Interesting Summaries
- Goal summaries
- summaries with Referee and Penalty box objects
44Adaptation of Parameters
- Parameters
- Tcolor in dominant color region detection
- TcloseUp and Tmedium in shot classification
- referee color statistics
- The training stage can be performed in a very
short time to find Mean and Variance of a Normal
pdf.
45Results for High-Level Analysis and Summarization
46Results for High-Level Analysis and
Summarization(2)
- Referee detection results
47Results for High-Level Analysis and
Summarization(3)
- Penalty box detection results
48References
- Automatic soccer video analysis and
summarization, in Symp. Electronic Imaging
Science and Technology Storage and Retrieval for
Image and Video Databases IV, IST/SPI03, Jan.
2003, CA. - The Segmentation and Classification of Story
Boundaries In News Video, Proceeding of 6th IFIP
working conference on Visual Database Systems-
VDB6 2002, Australia 2002 - Pattern Classification, by Duda, Hart, and Stork,
2000
49Thanks for Your Attention