Title: Jiazhi Ou jzou@cs.cmu.edu
1Wild Dolphin Project 11-751 Speech Final Project
- by
- Jiazhi Ou jzou_at_cs.cmu.edu
- Tal Blum blum_at_cs.cmu.edu
2Outline
- Wild Dolphin Project, Dolphin Speech
- Data, Labeling, Labeling problems
- Previous work
- Models training
- Experiments Results
- Conclusions
3The Wild Dolphin Project (WDP)
- The Wild Dolphin Project (WDP), founded by Dr.
Denise Herzing in 1985, is engaged in an
ambitious, long-term scientific study of a
specific pod of Atlantic spotted dolphins that
live 40 miles off the coast of the Bahamas, in
the Atlantic Ocean. For about 100 days each year,
Phase I research has involved the photographing,
videotaping, and audio taping of a group of
resident dolphins, aiming to learn about their
lives. - http//www.wilddolphinproject.org/index.cfm
4Dolphins Speech
- Dolphins Speech is very different than mans
speech
- Range of frequencies is wider
- Two mechanisms for producing sound simultaneously
- Directionality of some of the frequencies
- Carried in water
- Can travel large distances
5Dolphins Speech(2)
- Is used for
- Identification
- Communicating
- Fighting
- Defending
- Courting
- Warning
- Calling
- Hunting
6Dolphins Speech(3)
- 3 main types
- Whistles
- Signature
- Non-signature
- Clicks
- Spike trains
7What do we know
- Not much
- We know that each dolphin has a unique whistle
called signature whistle. - The signature whistle is similar to those that
are in close contact with the baby dolphin
8Data
- 164 files containing sounds of one dolphin whose
name is known. - Average file length is 7 sec
- Total data length less than 20 minutes out of
which about half is silence - The data does not contain all of the relevant
frequencies
9Labeling
- Dolphin Names
- Dolphin ID project
- Pause, Noise, Dolphin Signature Whistles, Dolphin
Non-Signature whistles.
10Labeling Problems
- How do we distinguish between those 2 whistles?
- How to distinguish between whistles and
non-whistles? - They co-occur
- How to determine the duration of the label?
- Should close labels be labeled as one label?
- This has an effect on the model
- Some signals are weak, probably due to a change
in the dolphins direction
11Mapping from Labels to Models
Label Model
d Signature Whistles
dp, md Non-Signature Whistles
click, electnoise, electricnoise, h, H, MachineSpike, s GARBAGE
pau PAUSE (Water)
12Label Statistics
PAUSE SIGWHISTLE GARBAGE DOLPHIN
occurrences 756 633 13 24
Accumulated time (in secs) 466 320 7.1 11.3
Average time per occurrence 0.6 0.5 0.55 0.47
13Previous Work
- Dolphin-ID Project by Tanja, Alan and Yue
- Task To identify dolphin ID using their
signature whistles - 51 labeled files by Alan
- 13 HMMs 10 for each dolphin DOLPHIN, PAUSE,
and GARBAGE - Use Janus to do training and testing
- Try different kinds of features
14Our Work
- Model Generalized Signature Whistles
- Label More Files
- Create HMMs for signature whistles, non-signature
whistles, garbage, and pause - Train and test the HMMs using Janus
- Evaluate the test results with our own method
- Compare different model selections
15Signal Processing
- Tanja scripts
- Down sampling
- High Pass Filter
- FFT
- LDA
16HMM Topologies
Signature Whistles
Non-Signature Whistles
Garbage
Pause (Water)
17Model Selection
- Scheme 1
- Signature Whistles, Non-Signature Whistles,
GARBAGE, PAUSE - Scheme 2
- Signature Whistles, GARBAGE, PAUSE
- Scheme 3
- 10 HMMs (one for each dolphin), GARBAGE, PAUSE
18Evaluation
- We can not use WER here since there are no words,
just segments. - The method we used was to compute a confusion
matrix over hidden states. - Janus treat silence differently and doesnt show
silence classification which complicates the
evaluation.
19Experiments
- Data
- 162 labeled files were used
- Half of the data for training, half for testing
- Swap the training set and test set
- 162 test results all together
- Features
- The same as those in dolphin-ID project
- Model Selection
- 3 different schemes
20Results Scheme 1
Sig Non-Sig Garbage Pause
Sig 58 6 18 34
Non-Sig 33 8 37 22
Garbage 77 0 5 18
Pause 31 6 27 34
21Results Scheme 2
Sig Garbage Pause
Sig 79 9 21
Garbage 52 21 27
Pause 48 14 38
22Results Scheme 3
Sig Garbage Pause
Sig 91 0.6 8
Garbage 80 10 10
Pause 69 1 30
23Analysis of Results
- You can only get as good as your labels
- Scheme 3 is the best to align signature whistles
-- speaker dependent - Scheme 1 is the worst Not enough data to model
non-signature whistles and garbage - Scheme 2 is in the middle speaker independent
- Pause is the most difficult to model It
contains all different things. We modeled it with
only 1 state
24Conclusion
- Analyzing dolphin sounds is quite different than
analyzing human speech. The methods used have to
be adjusted to the characteristics of the dolphin
sounds. - There is a lot of work to be done in the signal
processing stage - Partly supervised training
- It might be better just to construct a model for
the labels we are sure and let the model learn
what are signature whistles or units that
discriminate between different labels.
25We also tried
- One-state model for non-signature whistles,
garbage, and pause - -- Segmentation fault in training
- Loop back model for signature whistles
- -- The loop back transition makes no difference
26Acknowledgement
- Tanja Schultz
- Yue Pan
- Alan W Black
- Szu-Chen Stan Jou
- Hua Yu
27Thank You!
- Jiazhi Ou
- Tal Blue
- jzou, tblum_at_cs.cmu.edu