Recognition of spoken and spelled proper names - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Recognition of spoken and spelled proper names

Description:

... run is used to get a reduced list of name candidates. ... A list of the most similar names can be retrieved, and then ... a mistake press the star key. ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 35
Provided by: Pili2
Category:

less

Transcript and Presenter's Notes

Title: Recognition of spoken and spelled proper names


1
Recognition of spoken and spelled proper names
Author Michael Meyer, Hermann Hild
Reporter CHEN, TZAN HWEI
2
Outline
  • Introduction
  • Experiments
  • Summary

3
Introduction
  • The recognition of increasingly large sets of
    spoken names is difficult
  • Very large recognition vocabularies contain many
    easily confused words or even homophones.
  • In this paper, it compares the performance for
    proper name recognition when a name is spoken
    only, spelled only or both spoken and spelled.

4
Introduction (cont)
  • In what contexts do people speak and spell names

Table 1 Three scenarios for speaking and
spelling a proper name
5
Experiments
  • Speech data
  • A database of about 2800 German last name spoken
    by 57 different speaker, according to scenario 2.
  • Recorded with a close-talking microphone at a
    sampling rate of 16 kHz.
  • The boundaries between all spoken and spelled
    names were to identified to conduct scenario 1

6
Experiments (cont)
  • The pronunciation dictionary covers about half of
    the 2800 names of our speech data.
  • The set of 1337 spoken and spelled names is used
    all the experiments described below.
  • For experiments, we use a MS-TDNN as a
    specialized letter recognizer.
  • And use the LVCSR of the JANUS system as a spoken
    name recognizer.

7
Experiments (cont)
  • JANUS
  • 60.0 names correct were achieved on the test set
    of the 1337 spoken last names.
  • To recognize the spelled name with JANUS, 93.3
    correct names were achieved on the spelled names.
  • MS-TDNN
  • Achieved 96.5 correct spelled names on the test .

8
Experiments (cont)
  • Small list
  • We assume that the list of names to be recognized
    is small enough, so that every name can be
    explicitly represented in the dictionary.
  • How can we combine the different information
    provided by the spoken and spelled names?

9
Experiments (cont)
  • After all, the pronunciations of the spelled
    letters represent in approximation the sounds of
    the letters in the fluently spoken words.
  • TOM versus T-O-M
  • Exceptions are letters with unusual pronunciation
    and those letter combination which define their
    own pronunciation, such as
  • Sch , ch
  • In the following we will just combines the two
    representations on the basis of their acoustic
    scores only

10
Experiments (cont)
  • Scenario 1

11
Experiments (cont)
  • Scenario 2
  • With 86.1 correct, the recognition on the entire
    utterance is worse than on the spelled apart
    alone.
  • It is possible to adapt a similar approach as in
    scenario 1.
  • The boundary of the first best hypothesis was
    used for the weighting of all hypotheses.
  • Resulting in a recognition rate of 89.1
  • To incorporate the MS-TDNN letter recognizer,
    resulting in 95.8 recognition rate.

12
Experiments (cont)
Figure 1 names correct for a weighted
combination of the N-best list of spoken and
spelled names (scenario 1 and 2)
13
Experiments (cont)
  • Large lists
  • If the number of names exceeds the recognizers
    maximum vocabulary size, a different approach has
    to be taken.
  • A two-step approach is employed.
  • A coarse recognition run is used to get a reduced
    list of name candidates.
  • Then, these are processed in which all the
    previously described techniques for small word
    lists can be applied.

14
Experiments (cont)
  • In the case of scenario 1, the list of candidates
    can be easily reduced if only the spelled names
    are considered in the first pass.
  • For scenario 2, only phonemes and letters in
    JANUSs recognition vocabulary.

15
Experiments (cont)
  • For scenario 2
  • A special language model is employed.
  • A list of the most similar names can be
    retrieved, and then used in another JANUS
    recognition run.
  • The letter segments are then re-recognized with
    the MS-TDNN

16
Experiments (cont)
Table 3 Summary of results for the separated and
combined recognition of fluently spoken and
spelled last names
17
Summary
  • By combining the N-best lists of both the spoken
    and spelled recognition, the overall performance
    can be improved.
  • An input of either L or FL can be distinguish
    with almost 99 correct, resulting of 95.5 names
    correct without a priori knowing whether L or FL
    was spoken.

18
Caller Identification from Spelled-Out Personal
Data Over the Telephone
Reporter CHEN, TZAN HWEI
19
Outline
  • Introduction
  • The personal identification algorithm
  • Tests and results
  • Conclusions

20
Introduction
  • The problem of automatically identifying the
    caller in a telephone conversation from the
    information spoken in the call is extremely
    difficult.
  • The identification must take place despite rather
    substantial speech recognition errors that may be
    made by the machine.

21
Introduction (cont)
  • We can find a solution to the problem if we make
    two assumptions.
  • We assume that there is a database of records
    containing personal information about there the
    caller which can serve as a reference during the
    identification process.
  • We ask our caller to spell the personal
    identifying items so that the spoken vocabulary
    is small and we can look for correlations with
    the items in the database.

22
The personal identification algorithm
Fig 1. the algorithm of personal identification
from spelled tokens
23
The personal identification algorithm (cont)
  • Bayesian computation which starts with an
    estimate, for each record in , of the
    probability that record represents the identity
    of the caller.
  • It uses the acquired information and updates
    each records probability that it corresponds to
    the current callers identity.
  • The incremental computation is

24
The personal identification algorithm (cont)
  • Bayesian update of probabilities

25
Tests and results
  • The system was tested using a database of one
    million records that was constructed by using
    random combinations of 4,375 female first names,
    1,129 male first names, and 88,799 last names.
  • The account numbers were generated so that the
    values for the last four digits of the number
    occurred with equal frequency throughout the
    database.
  • The city, postal code, and phone number fields
    were generated to correspond the locations in the
    U.K.

26
Tests and results (cont)
  • Our test involved identifying 300 different
    records in the database.
  • If the system was unable to make an
    identification of the target record after asking
    the user for all of the information, the caller
    was asked to make a second attempt using the same
    information.
  • If the system failed to produce a result after
    the second attempt, the call was terminated at
    that point.

27
Tests and results (cont)
  • For each telephone call, the users were asked
    eight questions
  • Enter your ID, using you telephone keypad,
    followed by the pound key. If you make a mistake
    press the star key.
  • You entered (the value entered in (1)) if this is
    correct, press 1. If it is not, press2.
  • Please say the first four letters of your last
    name.
  • Please say the first four digits of your first
    name.

28
Tests and results (cont)
  • Please say the last four digits of your card
    number.
  • Could you please spell the city currently listed
    on your account?
  • Please say you phone number.
  • Please say the postal code currently listed on
    your account.

29
Tests and results (cont)
Fig 2. summary of results from 300 calls.
30
Tests and results (cont)
Fig 3. Rate of ASR character misrecognition by
filed
31
Tests and results (cont)
Fig. 4. Rate of misrecognition of field by ASR
(misrecognition at least one error made in
spelled filed value)
32
Tests and results (cont)
Fig 5. Average cumulative number of records
examined by system
33
Conclusions
  • The method tolerates high misrecognition rates.
  • The method can be used with off-the-shelf
    component it doesnt require specialized ASR.
  • To allow the personal identification information
    to be spoken instead of spelled tokens.

34
Record Rv that will be verified
Select another T
Request T from caller
Collect subset near T
Rm Rv
Another T?
No
operator
Add subset to S
Yes
No
Risk(Rv) lt Risk(reject)
No
Update the risk for each Record in S
Reject !
Yes
Rm lt- min risk in S
Accept !
Write a Comment
User Comments (0)
About PowerShow.com