Title: Language modelling word FST
1Language modelling (word FST)
- Operational model for categorizing
mispronunciations
prompted image circus
step 1 decode visual image
(circus)
(cursus)
(circus)
step 2 convert graphemes to phonemes
(/k y r s y s/)
(/s i r k y s/)
(k i r k y s)
step 3 articulate phonemes
(/s i r k y s/)
(/k y - k y r s y s /) (/ka - i - Er - k y s/)
spoken utterance correct, miscue (step3)
or error (steps 1, 2)
2Language modelling (word FST)
- Prevalence of errors of different types (Chorec
data)
Children with RD tend to guess more often
Important to model steps 1 and 3 step 2 not so
important
3Creation of word FST model step 1
- correct pronunciation
- predictable errors
- (prediction model needed)
4Creation of word FST model step 3
- Per branch in previous FST
- Correctly articulated
- Restarts (fixed probabilities for now)
- Spelling (phonemic) (fixed probabilities for now)
5Modelling image decoding errors
- Model 1 memory model
- adopted in listen project
- per target word
- create list of errors found in database
- keep those with P(list entry error TW) gt TH
- advantages
- very simple strategy
- can model real words non-real-word errors
- disadvantages
- cannot model unseen errors
- probably low precision
6Modelling image decoding errors
- Model 2 extrapolation model (idea from ..)
- look for existing words that
- expected to belong to vocabulary of child (
mental lexicon) - bare good resemblance with target word
- select lexicon entries from that vocabulary
- feature based expose (dis)similarities with TW
- features length differences, alignment
agreement, word categories, graphemes in common,
- decision tree ? P(entry decoding error
features) - keep those with P gt TH
- advantage can model not previously seen errors
- disadvantage can only model real word errors
7Modelling image decoding errors
- Model 3 rule based model (under dev.)
- look for frequently observed transformations at
subword level - grapheme deletions, insertions, substitutions
(e.g. d ? b) - grapheme inversions (e.g. leed ? deel)
- combinations
- learn decision tree per transformation
- advantages
- more generic ? better recall/precision compromise
- can model real word non-real word errors
- disadvantage
- more complex time consuming to train
8Modelling results so far
- Measures (over target words with error)
- recall nr of predicted errors / total nr of
errors - precision nr of predicted errors / nr of
predictions - F-rate 2.R.P/(RP)
- branch average nr of predictions per word
- Data test set from Chorec database