Title: Computer Science Department
1Computer Science Department
A Speech / Music Discriminator using RMS and
Zero-crossings
Costas Panagiotakis and George Tziritas
Department of Computer Science University of
Crete Heraklion Greece
2Computer Science Department
Presentation Organization
- I. Introduction
- II. Segmentation
- Classification
- Results
- Conclusion
3Computer Science Department
Introduction (1/3)
Input
Figure 1 Original Sound Signal (44100 or 22050
sample rate)
Output
Figure 2 Real time Segmentation and
Classification (Speech,Music,Silence)
4Computer Science Department
Introduction (2/3)
Approaches
- Features extraction (energy,frequency)
- Feature based Segmentation and Classification
Basic purpose
- Real time segmentation and classification
- Algorithmic - computation constraints
- Low feature number
- Low change extraction error (20 msec)
- Low minimum distance between two changes (1 sec)
- High accuracy (95 )
5Computer Science Department
6Computer Science Department
Segmentation (1/3)
Basic characteristics RMS based
?2 distribution fits well the RMS
histograms
G( a 1)
m mean , s2 variance
Two stage algorithm
- Stage 1
- 1 sec accuracy (low computation cost)
-
- Stage 2
- 20 msec accuracy (high computation cost)
7Computer Science Department
Segmentation (2/3)
- Stage 1
- Partitioning in 1 sec frames (50 RMS values)
- Change in Frame i ? Frame i-1 and Frame i1 have
to differ - Computation of frame distance D (Matusita
Distance) using frame similarity (p)
-
- Frame i is candidate for Stage 2 (there is a
change) - If D(i) gt threshold and D(i) local maximal
-
p( p1 , p2 )
Change in frame i
RMS
time
1 sec frames
Distance
8Computer Science Department
Segmentation (3/3)
- Stage 2
- 20 msec accuracy
-
- for each candidate frame (i) from stage 1
- 1. move 2 successive frames (1 sec) located
before and after frame (i) - 2. find the time instant where the 2 successive
frames have the maximum Matusita distance
in RMS distribution -
- Possible oversegmentation
-
-
-
Figure 11 The segmentation result and the RMS
data
Figure 10 The RMS data and the distance D
9Computer Science Department
10Computer Science Department
11Computer Science Department
12Computer Science Department
Classification (4/4)
Silence segment recognition
Segment is silence ? E lt Threshold
Decision making algorithm
13Computer Science Department
Data Data source Segmentat
ion performance
Results
11.328 sec speech 3.131 sec music
70 audio CDs 15 WWW 15 recordings
Actual features performance
- 97 detection probability
- Change accuracy 0.2 sec
Accuracy
s2? Cz
Cz s2?
ZC0 s2?
Fu s2?
All
Cz
ZC0
s2? , ZC0
s2?
Features
Features
14Computer Science Department
15Computer Science Department
Segmentation - Classification Demo
16Computer Science Department
Sound Player Demo