Title: Tel : 83594243
1???????????
- ???
- Tel 83594243
- Office ????608B
- Email gswu_at_nju.edu.cn
2Contents
31 ??????
- ???????????????
- ??????????????60Hz20kHz?
- ?????????300Hz4kHz??,
- ??????????????????
- ?????????????,??????,????????????
- ?????????????????2?,?????????
- ????8 ??16??????
41 ??????
- ????????????,????????,??????????
- ????
- ????????????????????????????????????????
- ??
- ??????????,????????????????????????????????????
?????? - ??
- ?????????????,????/????????????????????????????
5???????
- ?????????????????
- ???????????
- ?????,??????????
- ??????????
- ??????????
- ???????,
- ????????????,???????,??????????
6???????
7???????
- ?? ?? ????
?????? - ???
- ???
- ???
81 ??????
- ?????????????????????????????????????????
- ?????????,?????????
- ???????,?????????????,????????????
- ????????????????????????
- ?????????????????????,???????????????,??????????,?
?,?????????????????
9?????????
- What? ???????????????????????????????
- ??
- 1 ?????????????,???????????????????
- 2 ????????????????,?????,?????????????????????
- 3 ??????(?????)???,?????????????,??????????
102???????
- ?????????????(???,???,?????...)
- ????????????
- ???????????(???????????...)
- ???????????
- ?????????????
112?????????
- ????(by example)???????(onomatopoeia)?
?????????????????,????????????????? - ????????????????????
- ?????????????????
- ??(simile)??? ????????/???????????,????????
122?????????
- ??????? ??????????,?????????????????????????????
? - ???????????????????,?????????????????????,????????
???????
13?????????
14?????????
- ???????,???????????????????
- ??????????????,?????????????????
- ???????????,??????? less?more??????
153 ????
16??????????
- ????????(broadcast radio, TV programs, video
tapes, lectures, voice memo, voice mail, voice
phonebook, etc.) - ???? text and/or speech
- ????????,
- ????????????????????????????????????
17(1) ????????????
- ????????????????????(??)
- ?????,?????,
- ????? ??, ??, ????.
- ????????????? time-align
- ????????????????
- ???????,???????????,
- ????????????????
- ??OOV(Out of Vocabulary)??, 1?OOV??,
18(2) ???????????
- ???????????,????????????,????????????
- ????????????????,????????????
- ???????????????????,????????????
19(2) ???????????
- ?????????????????
- ??????,??????????????,????????
- ????????????????????????????,??????????????(???)?
- ??,?????????,??????????????,?????????????????
20(3) ?????????????
- ?????(Spotting)????????,????????????????,
- (???????????????????????????????)
- ??
- ??????????????, ?????
- ??
- ???????, ????
- ?????????????,??????.
21Sub-word Lattice Based Word Spotting
- ??(Sub-word)???????????????.????????????,?????.
- Sub-word Lattice????????.
- ?????????,?????????, ?????Sub-word
Lattice????????(????).
22???????????
- Growing interest in this area
- Video mail retrieval (Cam, UK)
- BBC news retrieval.
- Digital library projects( CMU?Informedia,
Michagen?MSU, Sheffield?Cam?THIRL Project,
Maryland?VoiceGraph, ATT SCAN). - ARPA broadcast news?TREC6,7,8?SDR Workshop.
234 ???????????
244 ???????????
- ????????????????,?????????????????????,??????????,
???????????????? - ??ASR???????????????,??,???????????????,??????????
???????? - ??,???????,???????????,???????????????????????????
?????????????
254 ???????????
- ???????????????????????????,??????????????????????
?? - ??????????????????????,?????????????????,?????????
???????????
26(1) ???????
- ?????????????
- ?????????????????(??),?????
- ?????????????,????N???????,?????????????????????,?
???????????????????????????
27(1) ???????
- ?????????????????
- ??????????????????,????Euclidean???????,????????(?
?)??,????????????????????? - ????????????????????,?????????,????????,????????
????
28(2) ????
- ??????,???????,????????????????,??????????????????
??,??????????? - ??
- ????????????????????????
- ?????????????,?????????????????,?????????????????
- ????????????????????,???????????????????????
29(3) ????
- ?????????????,????????????????
- ??,???????????????????,????????
- ????????,?????????????,??????????????????
- ???????????????????????
- ??,?????????????????
- ?????????????,??????????????????????????
30(3) ??????
- ?????????????????????????????
- ????????????????
- ??????????,???????????????????
- ??,?????????,?????????????,???????????
- ?????????????????????????,??????????
31(3) ???????
- ?????????????????????????????????????,????????????
???????? - ????????????,?????????????,?????????,??????,????
??????,???????????? - ??,???????,?????????????,?????????????????????????
?
32??Audio Features
- Features derived in the time domain
- average energy
- zero crossing rate (ZCR) It indicates the
freq of signal amplitude sign change. - silence ratio Thresholding may be tricky.
- Features derived in the freq domain
- sound spectrum????
- bandwidth ???? Music usually has a higher
bandwidth than speech. - energy distribution ?????? Music usually
has more high freq components than speech. - gtspectral centroid/brightness (midpoint
of the spectral energy distribution) - harmonicity????? Music is usually more
harmonic than other sounds. - pitch???? Only period sounds give rise to a
sensation of pitch. Pitch is subjective, related
to but not equivalent to the fundamental freq. - Spectrogram?????
- It shows the relation between freq, time and
intensity. Music spectrogram is more regular. - Subjective features??????
- Pitch????, Timbre????, etc.
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Audio Classification
- Here we consider first speech vs. music.
Therere of coz other types of sound, but their
characteristics vary greatly and are difficult to
generalize. -
-
-
-
-
Features Speech Music
Bandwidth 0-7kHz 0-20kHz
Spectral Centroid Low High
Silence ratio High Low
ZCR More variable Less variable
Regular beat Non-existing Often existing
40Audio Classification (contd)
- Step-by-step classification
(From Lu, 2001)
41Audio Classification (contd)
- Feature-vector-based audio classification
- Values of a set of features are calculated
and used as a feature vector. - During the training stage, the average
feature vector (reference vector) is found for
each class of audio. - During classification, the feature vector
of an input is calculated and the vector
distances between the input feature vector and
each of the reference vectors are calculated. The
input is classified into the class from which it
has the least vector distance. - Audio Segmentation
- A long sound track usually consists of a
mixture of speech, music and other sound types.
We can use the above classification methods to
segment a long audio piece into speech and music
intervals. The procedure is windowing,
classification and then grouping.
42More Advanced Audio Features
- High Zero-Crossing Rate Ratio (HZCRR)
Low Short-Time Energy Ratio (LSTER)
43More Advanced Audio Features (contd)
- Spectrum Flux (SF)
Band Periodicity (BP)
44More Advanced Audio Features (contd)
- Noise Frame Ratio (NFR)
Linear Spectral Pair (LSP) Distance Measure
Rule If r i,j(kp)ltthreshold, then a frame is
considered as a noise frame.
45More Advanced Audio Features (contd)
- Linear Spectral Pair (LSP) Distance Measure
(contd) - LSP divergence shape is also a good measure
to discriminate between different speakers.
Denote the covariance for pth and qth s speech
clip is Cp and Cq. If the dissimilarity is larger
than a threshold, then there two speech clips
could be considered as from two different
speakers.
465 ????
47????
- ??????????????,???????????????????????????????
- ????????????,?MIDI?MP3???????????????????
- ??????????????,??????????????????????????????????,
????????????????????????
48The three basic features of a musical
- Pitch
- which is related to the perception of the
fundamental frequency of a sound pitch is said
to range from low or deep to high or acute
sounds. - Intensity
- which is related to the amplitude, and thus to
the energy, of the vibration textual labels for
intensity range from soft to loud the intensity
is also defined loudness. - Timbre
- which is defined as the sound characteristics
that allow listeners to perceive as different two
sounds with same pitch and same intensity.
49Dimensions of the Music Language
- Timbre (?????)
- Orchestration (??)
- Acoustics
- sound quality, ambience, and style
- Rhythm (??)
- Melody (??)
- Harmony (??)
- Structure (??)
50Formats of Musical Documents
- Two Forms
- symbolic scores
- audio performances
- Three Formats
- Symbolic formats
- Audio formats
- The musical instrument digital interface
51Music Search
- Melodic retrieval based on index terms
- Melodic retrieval based on sequence matching
- Melodic retrieval based on geometric methods
52????
- ?????????????????????
- ?????????,??????????????
- ???????????????,??????????????????(?????????)?????
????? - ??????????????,??,??????????????????
53????
- ?????????????,?????????????????
- ?????????????????????,??????????????
- ??,?????????,??????????????????
- ????????MIDI?????????????
- ????????????,???? MIDI??,???????????
54? ?
- ???????????(???????)?????????,
- ???????????????
- ??????10??????
- ???????,???????????????????
- ??????????????????,????????????????????????,??????
????????????????
55(3) ????
- ???? ???????
- ????
- ?????????(?????????????????)
- ???????(?????????????????????,??????????????????(?
?????),?????????????
56????
- ???????????????????????????
- ????????????,????????????????,
- ????????????????
- ??????MIDI??,???????
- ?????????????,???????
57(2) ?????????
- ?????????????, ???????????????????
- ???? ????, ????
- ?? ????(??)??????
- ???????(????)??????
- ??,??,??,??,??.
- ?????????,??????????,??????????????????????
- ??????????????
58(3) ???????
- ??????????????????,?????????????????????,???????,
????????? - ??????????,??????
- ????????,????????????
- ??????????????,??????
- ??????,??????????????,???????,?????????