Statistical Language Modelling Part I Observable Models - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Statistical Language Modelling Part I Observable Models

Description:

test set Perplexity. Preferred (by me!): Recognition accuracy. Dictionary Extrapolation. Perplexity assumes all models are playing by the same rules ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 22
Provided by: smlu
Category:

less

Transcript and Presenter's Notes

Title: Statistical Language Modelling Part I Observable Models


1
Statistical Language Modelling Part I
Observable Models
  • Simon Lucas

2
Summary
  • Applications
  • The fundamentals
  • Observable v. hidden (latent) models
  • N-gram and scanning n-tuple models
  • Incremental classifiers and LOO optimisation
  • Evaluation methods
  • Results
  • Conclusions and further work

3
Statistical Language Models
  • Compute p(xM) the probability of a sequence x
    given the Model M
  • Java interface
  • public interface LanguageModel
  • public void train(SequenceDataset sd)
  • public double p(int seq)

4
Sequence Dataset
  • public interface SequenceDataset
  • public int nSymbols()
  • public int nSequences()
  • public int getSequence(int i)

5
Evaluating Language Models
  • Standard
  • test set Perplexity
  • Preferred (by me!)
  • Recognition accuracy
  • Dictionary Extrapolation
  • Perplexity assumes all models are playing by the
    same rules
  • The other models make no such assumptions

6
Distributed Mode Evaluation
  • Use Algoval evaluation server
  • Currently http//ace.essex.ac.uk
  • Download the developer pack
  • Configure model or write your own
  • Specify test parameters
  • Run tests
  • View results immediately on web site!

7
Sequence Recognition
  • Given a statistical language model
  • Can easily deploy it for sequence recognition
  • Build a model for each class
  • Assign pattern to class with highest posterior
  • Better still return the vector of posteriors
    for soft recognition
  • Interesting to try these models against simple
    nearest LD and WLD nearest neighbour

8
App1 Recognising OCR Chain Codes
9
Results (OLD!)
10
SN-Tuple MethodCurrent Status for OCR
  • Actively being researched at IBM TJ Watson
  • See Ratzlaff , proc ICDAR 2001, pages 18 22 (on
    djvu.com (note NOT dejavu.com!!!!!))
  • Concludes the sn-tuple is a viable method for
    on-line handwriting recognition

11
App2 Contextual OCR

12
Dictionary Extrapolation
  • Previous slide showed how well we can do with
    noisy images, with the aid of dictionary context
  • BUT suppose the dictionary only has 50 coverage
  • Need a trainable model that can extrapolate from
    the given data
  • How to evaluate such a model?

13
Left Out Rank Estimate
  • For each word in the dictionary
  • Create a new dictionary with that word left out
  • Create a set of neighbouring words to the left
    out word
  • Get model to evaluate likelihood of each
    neighbouring word and the left out word
  • Return a rank-based score between 1.0 and 0.0
    (from top to bottom of list)

14
App3 Human Chromosome Recognition (Banded Images)
15
Example Data 22 Human Chromosomes
  • Chromosome 10
  • / 1802 3 10 55 19 / AAaBaDdDe
    BbBbAaa
  • / 3843 84 10 55 18 / ABaAaDdDd
    CbAcAaa
  • / 7231 158 10 55 20 / ABaCaAcDd
    CbBdAaa
  • / 787 15 10 55 18 / ABaAaBbDe
    AaAaAaAaa
  • / 2459 60 10 54 19 / ABaBaAaCcCd
    CbAcAaa
  • / 3290 21 10 54 19 / ABaAaBbDc
    BbAcAaa
  • / 5591 122 10 54 17 / AAaAaAaEd
    BbAbAaa
  • Chromosome 15
  • / 1447 5 15 43 10 / AAaDbCd
    AaAba
  • / 2120 32 15 43 11 / BaEcAaCd
    AaAaAba
  • / 2759 16 15 43 9 / AADaAaAc
    AaAca

16
N-gram Recognizers
  • Bigram

17
Leave One Out Error
  • Generally a good estimate of test-set error
  • Especially fast to compute for Incremental
    classifiers (O(n))
  • As opposed to O(n2) for non-incremental

18
Incremental Classifiers
  • Can learn new patterns on demand without access
    to rest of training set
  • Can forget or unlearn patterns on demand also
  • Incremental n-gram, n-tuple, nearest neighbour
    (memory or counting methods)
  • Non-incremental MLP, HMM, (SVM?) (latent
    variable re-estimation methods)

19
Statistical Model Servers
  • Server model of statistical models
  • Each server supports a range of models
  • Each model can have many instances
  • Each instance can be invoked for training or
    estimation
  • Now we can independently evaluate the service,
    not just the model!

20
Results
  • Bioinformatics
  • Dictionary modelling

21
Statistical Language ModellingPart II
  • Ensembles of observable models
  • Latent variable models
  • HMM
  • SCFG
  • Category n-gram
  • Other applications Robot Sensors?
Write a Comment
User Comments (0)
About PowerShow.com