Tel : 83594243 - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Tel : 83594243

Description:

Tel : 83594243 Office: 608B Email : gswu_at_nju.edu.cn * Wu Gangshan: Modern Information Retrieval * * Wu ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 59
Provided by: graphics1
Category:

less

Transcript and Presenter's Notes

Title: Tel : 83594243


1
???????????
  • ???
  • Tel 83594243
  • Office ????608B
  • Email gswu_at_nju.edu.cn

2
Contents
  • ??
  • ????
  • ????
  • ????
  • ????

3
1 ??????
  • ???????????????
  • ??????????????60Hz20kHz?
  • ?????????300Hz4kHz??,
  • ??????????????????
  • ?????????????,??????,????????????
  • ?????????????????2?,?????????
  • ????8 ??16??????

4
1 ??????
  • ????????????,????????,??????????
  • ????
  • ????????????????????????????????????????
  • ??
  • ??????????,????????????????????????????????????
    ??????
  • ??
  • ?????????????,????/????????????????????????????

5
???????
  • ?????????????????
  • ???????????
  • ?????,??????????
  • ??????????
  • ??????????
  • ???????,
  • ????????????,???????,??????????

6
???????
7
???????
  • ?? ?? ????
    ??????
  • ???
  • ???
  • ???

8
1 ??????
  • ?????????????????????????????????????????
  • ?????????,?????????
  • ???????,?????????????,????????????
  • ????????????????????????
  • ?????????????????????,???????????????,??????????,?
    ?,?????????????????

9
?????????
  • What? ???????????????????????????????
  • ??
  • 1 ?????????????,???????????????????
  • 2 ????????????????,?????,?????????????????????
  • 3 ??????(?????)???,?????????????,??????????

10
2???????
  • ?????????????(???,???,?????...)
  • ????????????
  • ???????????(???????????...)
  • ???????????
  • ?????????????

11
2?????????
  • ????(by example)???????(onomatopoeia)?
    ?????????????????,?????????????????
  • ????????????????????
  • ?????????????????
  • ??(simile)??? ????????/???????????,????????

12
2?????????
  • ??????? ??????????,?????????????????????????????
    ?
  • ???????????????????,?????????????????????,????????
    ???????

13
?????????
14
?????????
  • ???????,???????????????????
  • ??????????????,?????????????????
  • ???????????,??????? less?more??????

15
3 ????
16
??????????
  • ????????(broadcast radio, TV programs, video
    tapes, lectures, voice memo, voice mail, voice
    phonebook, etc.)
  • ???? text and/or speech
  • ????????,
  • ????????????????????????????????????

17
(1) ????????????
  • ????????????????????(??)
  • ?????,?????,
  • ????? ??, ??, ????.
  • ????????????? time-align
  • ????????????????
  • ???????,???????????,
  • ????????????????
  • ??OOV(Out of Vocabulary)??, 1?OOV??,

18
(2) ???????????
  • ???????????,????????????,????????????
  • ????????????????,????????????
  • ???????????????????,????????????

19
(2) ???????????
  • ?????????????????
  • ??????,??????????????,????????
  • ????????????????????????????,??????????????(???)?
  • ??,?????????,??????????????,?????????????????

20
(3) ?????????????
  • ?????(Spotting)????????,????????????????,
  • (???????????????????????????????)
  • ??
  • ??????????????, ?????
  • ??
  • ???????, ????
  • ?????????????,??????.

21
Sub-word Lattice Based Word Spotting
  • ??(Sub-word)???????????????.????????????,?????.
  • Sub-word Lattice????????.
  • ?????????,?????????, ?????Sub-word
    Lattice????????(????).

22
???????????
  • Growing interest in this area
  • Video mail retrieval (Cam, UK)
  • BBC news retrieval.
  • Digital library projects( CMU?Informedia,
    Michagen?MSU, Sheffield?Cam?THIRL Project,
    Maryland?VoiceGraph, ATT SCAN).
  • ARPA broadcast news?TREC6,7,8?SDR Workshop.

23
4 ???????????
24
4 ???????????
  • ????????????????,?????????????????????,??????????,
    ????????????????
  • ??ASR???????????????,??,???????????????,??????????
    ????????
  • ??,???????,???????????,???????????????????????????
    ?????????????

25
4 ???????????
  • ???????????????????????????,??????????????????????
    ??
  • ??????????????????????,?????????????????,?????????
    ???????????

26
(1) ???????
  • ?????????????
  • ?????????????????(??),?????
  • ?????????????,????N???????,?????????????????????,?
    ???????????????????????????

27
(1) ???????
  • ?????????????????
  • ??????????????????,????Euclidean???????,????????(?
    ?)??,?????????????????????
  • ????????????????????,?????????,????????,????????
    ????

28
(2) ????
  • ??????,???????,????????????????,??????????????????
    ??,???????????
  • ??
  • ????????????????????????
  • ?????????????,?????????????????,?????????????????
  • ????????????????????,???????????????????????

29
(3) ????
  • ?????????????,????????????????
  • ??,???????????????????,????????
  • ????????,?????????????,??????????????????
  • ???????????????????????
  • ??,?????????????????
  • ?????????????,??????????????????????????

30
(3) ??????
  • ?????????????????????????????
  • ????????????????
  • ??????????,???????????????????
  • ??,?????????,?????????????,???????????
  • ?????????????????????????,??????????

31
(3) ???????
  • ?????????????????????????????????????,????????????
    ????????
  • ????????????,?????????????,?????????,??????,????
    ??????,????????????
  • ??,???????,?????????????,?????????????????????????
    ?

32
??Audio Features
  • Features derived in the time domain
  • average energy
  • zero crossing rate (ZCR) It indicates the
    freq of signal amplitude sign change.
  • silence ratio Thresholding may be tricky.
  • Features derived in the freq domain
  • sound spectrum????
  • bandwidth ???? Music usually has a higher
    bandwidth than speech.
  • energy distribution ?????? Music usually
    has more high freq components than speech.
  • gtspectral centroid/brightness (midpoint
    of the spectral energy distribution)
  • harmonicity????? Music is usually more
    harmonic than other sounds.
  • pitch???? Only period sounds give rise to a
    sensation of pitch. Pitch is subjective, related
    to but not equivalent to the fundamental freq.
  • Spectrogram?????
  • It shows the relation between freq, time and
    intensity. Music spectrogram is more regular.
  • Subjective features??????
  • Pitch????, Timbre????, etc.

33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Audio Classification
  • Here we consider first speech vs. music.
    Therere of coz other types of sound, but their
    characteristics vary greatly and are difficult to
    generalize.

Features Speech Music
Bandwidth 0-7kHz 0-20kHz
Spectral Centroid Low High
Silence ratio High Low
ZCR More variable Less variable
Regular beat Non-existing Often existing
40
Audio Classification (contd)
  • Step-by-step classification

(From Lu, 2001)
41
Audio Classification (contd)
  • Feature-vector-based audio classification
  • Values of a set of features are calculated
    and used as a feature vector.
  • During the training stage, the average
    feature vector (reference vector) is found for
    each class of audio.
  • During classification, the feature vector
    of an input is calculated and the vector
    distances between the input feature vector and
    each of the reference vectors are calculated. The
    input is classified into the class from which it
    has the least vector distance.
  • Audio Segmentation
  • A long sound track usually consists of a
    mixture of speech, music and other sound types.
    We can use the above classification methods to
    segment a long audio piece into speech and music
    intervals. The procedure is windowing,
    classification and then grouping.

42
More Advanced Audio Features
  • High Zero-Crossing Rate Ratio (HZCRR)
    Low Short-Time Energy Ratio (LSTER)

43
More Advanced Audio Features (contd)
  • Spectrum Flux (SF)
    Band Periodicity (BP)

44
More Advanced Audio Features (contd)
  • Noise Frame Ratio (NFR)
    Linear Spectral Pair (LSP) Distance Measure

Rule If r i,j(kp)ltthreshold, then a frame is
considered as a noise frame.
45
More Advanced Audio Features (contd)
  • Linear Spectral Pair (LSP) Distance Measure
    (contd)
  • LSP divergence shape is also a good measure
    to discriminate between different speakers.
    Denote the covariance for pth and qth s speech
    clip is Cp and Cq. If the dissimilarity is larger
    than a threshold, then there two speech clips
    could be considered as from two different
    speakers.

46
5 ????
47
????
  • ??????????????,???????????????????????????????
  • ????????????,?MIDI?MP3???????????????????
  • ??????????????,??????????????????????????????????,
    ????????????????????????

48
The three basic features of a musical
  • Pitch
  • which is related to the perception of the
    fundamental frequency of a sound pitch is said
    to range from low or deep to high or acute
    sounds.
  • Intensity
  • which is related to the amplitude, and thus to
    the energy, of the vibration textual labels for
    intensity range from soft to loud the intensity
    is also defined loudness.
  • Timbre
  • which is defined as the sound characteristics
    that allow listeners to perceive as different two
    sounds with same pitch and same intensity.

49
Dimensions of the Music Language
  • Timbre (?????)
  • Orchestration (??)
  • Acoustics
  • sound quality, ambience, and style
  • Rhythm (??)
  • Melody (??)
  • Harmony (??)
  • Structure (??)

50
Formats of Musical Documents
  • Two Forms
  • symbolic scores
  • audio performances
  • Three Formats
  • Symbolic formats
  • Audio formats
  • The musical instrument digital interface

51
Music Search
  • Melodic retrieval based on index terms
  • Melodic retrieval based on sequence matching
  • Melodic retrieval based on geometric methods

52
????
  • ?????????????????????
  • ?????????,??????????????
  • ???????????????,??????????????????(?????????)?????
    ?????
  • ??????????????,??,??????????????????

53
????
  • ?????????????,?????????????????
  • ?????????????????????,??????????????
  • ??,?????????,??????????????????
  • ????????MIDI?????????????
  • ????????????,???? MIDI??,???????????

54
? ?
  • ???????????(???????)?????????,
  • ???????????????
  • ??????10??????
  • ???????,???????????????????
  • ??????????????????,????????????????????????,??????
    ????????????????

55
(3) ????
  • ???? ???????
  • ????
  • ?????????(?????????????????)
  • ???????(?????????????????????,??????????????????(?
    ?????),?????????????

56
????
  • ???????????????????????????
  • ????????????,????????????????,
  • ????????????????
  • ??????MIDI??,???????
  • ?????????????,???????

57
(2) ?????????
  • ?????????????, ???????????????????
  • ???? ????, ????
  • ?? ????(??)??????
  • ???????(????)??????
  • ??,??,??,??,??.
  • ?????????,??????????,??????????????????????
  • ??????????????

58
(3) ???????
  • ??????????????????,?????????????????????,???????,
    ?????????
  • ??????????,??????
  • ????????,????????????
  • ??????????????,??????
  • ??????,??????????????,???????,?????????
Write a Comment
User Comments (0)
About PowerShow.com