Tanja Schultz, Alan Black, Bob Frederking

About This Presentation

Title:

Tanja Schultz, Alan Black, Bob Frederking

Description:

Train one model per dolphin, one garbage' model for the rest. Recognize incoming audio file; hypotheses consist of list of dolphin and garbage models ... – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 21

Provided by: TanjaS6

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Tanja Schultz, Alan Black, Bob Frederking

1
Towards Dolphin Recognition

Tanja Schultz, Alan Black, Bob Frederking
Carnegie Mellon University
West Palm Beach, March 28, 2003

2
Outline

Speech-to-Speech Recognition
Brief Introduction
Lab, Research
Data Requirements
Audio data
Transcriptions
Towards Dolphin Recognition
Applications
Current Approaches
Preliminary Results

3
Part 1

Speech-to-Speech Recognition
Brief Introduction
Lab, Research
Data Requirements
Audio data
Transcriptions
Towards Dolphin Recognition
Applications
Current Approaches
Preliminary Results

4
Speech Processing Terms

Speech Recognition
Converts spoken speech input into written text
output
Natural Language Understanding (NLU)
Derives the meaning of the spoken or written
input
(Speech-to-speech) Translation
Transforms text / speech from language A to text
/ speech of language B
Speech Synthesis (Text-To-SpeechTTS)
Converts written text input into audible output

5
Speech Recognition
Speech Input - Preprocessing
Decoding / Search
Postprocessing - Synthesis
6
Fundamental Equation of SR
P(W/x) P(x/W) P(W) / P(x)
7
SR Data Requirements
Audio Data Sound set Units built from
sounds Text Data
? ? ?
8
Janus Speech Recognition Toolkit (JRTk)

Unlimited and Open Vocabulary
Spontaneous and Conversational Human-Human Speech
Speaker-Independent
High Bandwidth, Telephone, Car, Broadcast
Languages English, German, Spanish, French,
Italian, Swedish, Portuguese, Korean, Japanese,
Serbo-Croatian, Chinese, Shanghai, Arabic,
Turkish, Russian, Tamil, Czech
Best Performance on Public Benchmarks
DoD, (English) DARPA Hub-5 Test 96, 97
(SWB-Task)
Verbmobil (German) Benchmark 95-00 (Travel-Task)

9
Mobil Device for TranslationNavigation
10
Multi-lingual Meeting Support
The Meeting Browser is a powerful tool that
allows us to record a new meeting, review or
summarize an existing meeting or search a set of
existing meetings for a particular speaker,
topic, or idea.
11
Multilingual Indexing of Video

View4You / Informedia Automatically records
Broadcast News and allows the user to
retrieve video segments of news items for
different topics using spoken language input
Non-cooperative speaker on video
Cooperative user
Indexing requires only low quality translation

12
Part 2

Speech-to-Speech Recognition
Brief Introduction
Lab, Research
Data Requirements
Audio data
Transcriptions
Towards Dolphin Recognition
Applications
Current Approaches
Preliminary Results

13
Towards Dolphin Recognition
Identification
Verification/Detection
?
?
Whose voice is this?
Whose voice is it?
Whose voice is this?
Whose voice is this?
Is this Bobs voice?
Is it Nippys voice?
Is this Bobs voice?
Is this Bobs voice?
?
?
?
?
?
Segmentation and Clustering
Which segments are from
Where are speaker
Which segments are fromthe same dolphin?
Where are dolphins changing?
Which segments are from
Where are speaker
Which segments are from
Where are speaker
the same speaker?
changes?
the same speaker?
changes?
the same speaker?
changes?
Speaker A
Speaker A
Speaker B
Speaker B
14
Applications

off-line applications (off the water, off the
boat, off season)
Data Management and Indexing
Automatic Assignment/Labeling of already recorded
(archived) data
Automatic Post-Processing (Indexing) for later
retrieval
Towards Important/Meaningful Units DOLPHONES
Segmentation and Clustering of similar
sounds/units
Find out about unit frequencies
Find out about correlation between sounds and
other events
Whistles correlated to Family Relationship
Who belongs to whom
Find out about the family tree?
Can we find out more about social structure?

15
Applications

on-line applications
Identification and Tracking
Who is currently speaking
Who is around
Towards Important/Meaningful Units
Find out about correlation between sounds and
other events
Whistles correlated to Family Relationship
Who belongs to whom
Wide-range identification, tracking, and
observation(since sound travels longer distances
than image)

16
Common Approaches
Training Phase
Training speech for each dolphin
Model for each dolphin
Nippy
Feature extraction
Model training
Nippy
xyz
Havana
Havana

Two distinct phases

Detection Phase
?
Feature extraction
Detection decision
Hypothesis Havana

17
Current Approaches

A likelihood ratio test is used for the detection
decision

18
First Experiments - Setup

Take the data we got from Denise
Alan did the labeling of about 160 files
Labels
dolphin sounds 370 tokens
electric noise (machine, clicks, others) 180
tokens
pauses 220 tokens
Derive Dolphin ID from file name (educ. Guess)
(Caroh, Havana, Lag, Lat, LG, LH, Luna, Mel,
Nassau, Nippy)
Train one model per dolphin, one garbage model
for the rest
Recognize incoming audio file hypotheses consist
of list of dolphin and garbage models
Count number of models per audio file and return
the name of dolphin with the highest count as the
one being identified