Tanja Schultz, Alan Black, Bob Frederking - PowerPoint PPT Presentation

About This Presentation
Title:

Tanja Schultz, Alan Black, Bob Frederking

Description:

Train one model per dolphin, one garbage' model for the rest. Recognize incoming audio file; hypotheses consist of list of dolphin and garbage models ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 21
Provided by: TanjaS6
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Tanja Schultz, Alan Black, Bob Frederking


1
Towards Dolphin Recognition
  • Tanja Schultz, Alan Black, Bob Frederking
  • Carnegie Mellon University
  • West Palm Beach, March 28, 2003

2
Outline
  • Speech-to-Speech Recognition
  • Brief Introduction
  • Lab, Research
  • Data Requirements
  • Audio data
  • Transcriptions
  • Towards Dolphin Recognition
  • Applications
  • Current Approaches
  • Preliminary Results

3
Part 1
  • Speech-to-Speech Recognition
  • Brief Introduction
  • Lab, Research
  • Data Requirements
  • Audio data
  • Transcriptions
  • Towards Dolphin Recognition
  • Applications
  • Current Approaches
  • Preliminary Results

4
Speech Processing Terms
  • Speech Recognition
  • Converts spoken speech input into written text
    output
  • Natural Language Understanding (NLU)
  • Derives the meaning of the spoken or written
    input
  • (Speech-to-speech) Translation
  • Transforms text / speech from language A to text
    / speech of language B
  • Speech Synthesis (Text-To-SpeechTTS)
  • Converts written text input into audible output

5
Speech Recognition
Speech Input - Preprocessing
Decoding / Search
Postprocessing - Synthesis
6
Fundamental Equation of SR
P(W/x) P(x/W) P(W) / P(x)
7
SR Data Requirements
Audio Data Sound set Units built from
sounds Text Data
? ? ?
8
Janus Speech Recognition Toolkit (JRTk)
  • Unlimited and Open Vocabulary
  • Spontaneous and Conversational Human-Human Speech
  • Speaker-Independent
  • High Bandwidth, Telephone, Car, Broadcast
  • Languages English, German, Spanish, French,
    Italian, Swedish, Portuguese, Korean, Japanese,
    Serbo-Croatian, Chinese, Shanghai, Arabic,
    Turkish, Russian, Tamil, Czech
  • Best Performance on Public Benchmarks
  • DoD, (English) DARPA Hub-5 Test 96, 97
    (SWB-Task)
  • Verbmobil (German) Benchmark 95-00 (Travel-Task)

9
Mobil Device for TranslationNavigation
10
Multi-lingual Meeting Support
The Meeting Browser is a powerful tool that
allows us to record a new meeting, review or
summarize an existing meeting or search a set of
existing meetings for a particular speaker,
topic, or idea.
11
Multilingual Indexing of Video
  • View4You / Informedia Automatically records
    Broadcast News and allows the user to
    retrieve video segments of news items for
    different topics using spoken language input
  • Non-cooperative speaker on video
  • Cooperative user
  • Indexing requires only low quality translation

12
Part 2
  • Speech-to-Speech Recognition
  • Brief Introduction
  • Lab, Research
  • Data Requirements
  • Audio data
  • Transcriptions
  • Towards Dolphin Recognition
  • Applications
  • Current Approaches
  • Preliminary Results

13
Towards Dolphin Recognition
Identification
Verification/Detection
?
?
Whose voice is this?
Whose voice is it?
Whose voice is this?
Whose voice is this?
Is this Bobs voice?
Is it Nippys voice?
Is this Bobs voice?
Is this Bobs voice?
?
?
?
?
?
Segmentation and Clustering
Which segments are from
Where are speaker
Which segments are fromthe same dolphin?
Where are dolphins changing?
Which segments are from
Where are speaker
Which segments are from
Where are speaker
the same speaker?
changes?
the same speaker?
changes?
the same speaker?
changes?
Speaker A
Speaker A
Speaker B
Speaker B
14
Applications
  • off-line applications (off the water, off the
    boat, off season)
  • Data Management and Indexing
  • Automatic Assignment/Labeling of already recorded
    (archived) data
  • Automatic Post-Processing (Indexing) for later
    retrieval
  • Towards Important/Meaningful Units DOLPHONES
  • Segmentation and Clustering of similar
    sounds/units
  • Find out about unit frequencies
  • Find out about correlation between sounds and
    other events
  • Whistles correlated to Family Relationship
  • Who belongs to whom
  • Find out about the family tree?
  • Can we find out more about social structure?

15
Applications
  • on-line applications
  • Identification and Tracking
  • Who is currently speaking
  • Who is around
  • Towards Important/Meaningful Units
  • Find out about correlation between sounds and
    other events
  • Whistles correlated to Family Relationship
  • Who belongs to whom
  • Wide-range identification, tracking, and
    observation(since sound travels longer distances
    than image)

16
Common Approaches
Training Phase
Training speech for each dolphin
Model for each dolphin
Nippy
Feature extraction
Model training
Nippy
xyz
Havana
Havana
  • Two distinct phases

Detection Phase
?
Feature extraction
Detection decision
Hypothesis Havana

17
Current Approaches
  • A likelihood ratio test is used for the detection
    decision


18
First Experiments - Setup
  • Take the data we got from Denise
  • Alan did the labeling of about 160 files
  • Labels
  • dolphin sounds 370 tokens
  • electric noise (machine, clicks, others) 180
    tokens
  • pauses 220 tokens
  • Derive Dolphin ID from file name (educ. Guess)
  • (Caroh, Havana, Lag, Lat, LG, LH, Luna, Mel,
    Nassau, Nippy)
  • Train one model per dolphin, one garbage model
    for the rest
  • Recognize incoming audio file hypotheses consist
    of list of dolphin and garbage models
  • Count number of models per audio file and return
    the name of dolphin with the highest count as the
    one being identified

19
First Experiments - Results
20
Next steps
  • Step 1 To build a real system we need
  • MORE audio data MORE audio data MORE ...
  • Labels (the more accurate the better)
  • Idea 1 Automatic labeling, live with the errors
  • Idea 2 Manual labeling
  • Idea 3 Automatic labeling and post-editing
  • Step 2 Given more data
  • Automatic clustering
  • Try first steps towards unit detection
  • Step 3 Build a working system, make it small and
    fast enough for deployment
Write a Comment
User Comments (0)
About PowerShow.com