N-gram Models

About This Presentation

Title:

N-gram Models

Description:

Optical illusions. Occlusion. Depth perception 'Objects are closer than they appear' ... extract 3-D shapes from image. match against 'shape library' Problems: ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 54

Provided by: ginal5

Learn more at: https://www.classes.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: N-gram Models

1
N-gram Models

CMSC 25000
Artificial Intelligence
March 1, 2005

2
Markov Assumptions

Exact computation requires too much data
Approximate probability given all prior wds
Assume finite history
Bigram Probability of word given 1 previous
First-order Markov
Trigram Probability of word given 2 previous
N-gram approximation

Bigram sequence
3
Evaluating n-gram models

Entropy Perplexity
Information theoretic measures
Measures information in grammar or fit to data
Conceptually, lower bound on bits to encode
Entropy H(X) X is a random var, p prob fn
Perplexity
Weighted average of number of choices

4
Perplexity Model Comparison

Compare models with different history
Train models
38 million words Wall Street Journal
Compute perplexity on held-out test set
1.5 million words (20K unique, smoothed)
N-gram Order Perplexity
Unigram 962
Bigram 170
Trigram 109

5
Does the model improve?

Compute probability of data under model
Compute perplexity
Relative measure
Decrease toward optimum?
Lower than competing model?

Iter 0 1 2 3 4 5 6 9 10
P(data) 9-19 1-16 2-16 3-16 4-16 4-16 4-16 5-16 5-16
Perplex 3.393 2.95 2.88 2.85 2.84 2.83 2.83 2.8272 2.8271
6
Entropy of English

Shannons experiment
Subjects guess strings of letters, count guesses
Entropy of guess seq Entropy of letter seq
1.3 bits Restricted text
Build stochastic model on text compute
Brown computed trigram model on varied corpus
Compute (per-char) entropy of model
1.75 bits

7
Using N-grams

Language Identification
Take text samples
English, French, Spanish, German
Build character tri-gram models
Test Sample Compute maximum likelihood
Best match is chosen language
Authorship attribution

8
Sequence Models in Modern AI

Probabilistic sequence models
HMMs, N-grams
Train from available data
Classification with contextual influence
Robust to noise/variability
E.g. Sentences vary in degrees of acceptability
Provides ranking of sequence quality
Exploits large scale data, storage, memory, CPU

9
Computer Vision

CMSC 25000
Artificial Intelligence
March 1, 2005

10
Roadmap

Motivation
Computer vision applications
Is a Picture worth a thousand words?
Low level features
Feature extraction intensity, color
High level features
Top-down constraint shape from stereo, motion,..
Case Study Vision as Modern AI
Fast, robust face detection (Viola Jones 2002)

11
Perception

From observation to facts about world
Analogous to speech recognition
Stimulus (Percept) S, World W
S g(W)
Recognition Derive world from percept
Wg(S)
Is this possible?

12
Key Perception Problem

Massive ambiguity
Optical illusions
Occlusion
Depth perception
Objects are closer than they appear
Is it full-sized or a miniature model?

13
Image Ambiguity
14
Handling Uncertainty

Identify single perfect correct solution
Impossible!
Noise, ambiguity, complexity
Solution
Probabilistic model
P(WS) aP(SW) P(W)
Maximize image probability and model probability

15
Handling Complexity

Dont solve the whole problem
Dont recover every object/position/color
Solve restricted problem
Find all the faces
Recognize a person
Align two images

16
Modern Computer Vision Applications

Face / Object detection
Medical image registration
Face recognition
Object tracking

17
Vision Subsystems
18
Image Formation
19
Images and Representations

Initially pixel images
Image as NxM matrix of pixel values
Alternate image codings
Grey-scale intensity values
Color encoding intensities of RGB values

20
Images
21
Grey-scale Images
22
Color Images
23
Image Features

Grey-scale and color intensities
Directly access image signal values
Large number of measures
Possibly noisy
Only care about intensities as cues to world
Image Features
Mid-level representation
Extract from raw intensities
Capture elements of interest for image
understanding

24
Edge Detection
25
Edge Detection

Find sharp demarcations in intensity
1) Apply spatially oriented filters
E.g. vertical, horizontal, diagonal
2) Label above-threshold pixels with edge
orientation
3) Combine edge segments with same orientation
line

26
Top-down Constraints

Goal Extract objects from images
Approach apply knowledge about how the world
works to identify coherent objects reconstruct
3D

27
Motion Optical Flow

Find correspondences in sequential images
Units which move together represent objects

28
Stereo
29
Stereo Depth Resolution
30
Texture and Shading
31
Edge-Based 2-3D Reconstruction
Assume world of solid polyhedra with 3-edge
vertices Apply Waltz line labeling via
Constration Satisfaction
32
Basic Object Recognition

Simple idea
extract 3-D shapes from image
match against shape library"
Problems
extracting curved surfaces from image
representing shape of extracted object
representing shape and variability of library
object classes
improper segmentation, occlusion
unknown illumination, shadows, markings, noise,
complexity, etc.
Approaches
index into library by measuring invariant
properties of objects
alignment of image feature with projected library
object feature
match image against multiple stored views
(aspects) of library object
machine learning methods based on image
statistics

33
Hand-written Digit Recognition
34
Summary

Vision is hard
Noise, ambiguity, complexity
Prior knowledge is essential to constrain problem
Cohesion of objects, optics, object features
Combine multiple cues
Motion, stereo, shading, texture,
Image/object matching
Library features, lines, edges, etc
Apply domain knowledge Optics
Apply machine learning NN, NN, CSP, etc

35
Computer Vision Case Study

Rapid Object Detection using a Boosted Cascade
of Simple Features, Viola/Jones 01
Challenge
Object detection
Find all faces in an arbitrary images
Real-time execution
15 frames per second
Need simple features, classifiers

36
Rapid Object Detection Overview

Fast detection with simple local features
Simple fast feature extraction
Small number of computations per pixel
Rectangular features
Feature selection with Adaboost
Sequential feature refinement
Cascade of classifiers
Increasingly complex classifiers
Repeatedly rule out non-object areas

37
Picking Features

What cues do we use for object detection?
Not direct pixel intensities
Features
Can encode task specific domain knowledge (bias)
Difficult to learn directly from data
Reduce training set size
Feature system can speed processing

38
Rectangle Features

Treat rectangles as units
Derive statistics
Two-rectangle features
Two similar rectangular regions
Vertically or horizontally adjacent
Sum pixels in each region
Compute difference between regions

39
Rectangle Features II

Three-rectangle features
3 similar rectangles horizontally/vertically
Sum outside rectangles
Subtract from center region
Four-rectangle features
Compute difference between diagonal pairs
HUGE feature set 180,000

40
Rectangle Features
41
Computing Features Efficiently

Fast detection requires fast feature calculation
Rapidly compute intermediate representation
Integral image
Value for point (x,y) is sum of pixels above,
left
ii(x,y) Sxltx,ylty i(x,y)
Computed by recurrence
s(x,y) s(x,y-1) i(x,y) , where s(x,y)
cumulative row
ii(x,y) ii(x-1,y) s(x,y)
Compute rectangle sum with 4 array references

42
Rectangle Feature Summary

Rectangle features
Relatively simple
Sensitive to bars, edges, simple structure
Coarse
Rich enough for effective learning
Efficiently computable

43
Learning an Image Classifier

Supervised training /- examples
Many learning approaches possible
Adaboost
Selects features AND trains classifier
Improves performance of simple classifiers
Guaranteed to converge exponentially rapidly
Basic idea Simple classifier
Boosts performance by focusing on previous errors

44
Feature Selection and Training

Goal Pick only useful features from 180000
Idea Small number of features effective
Learner selects single feature that best
separates /- ve examples
Learner selects optimal threshold for each
feature
Classifier h(x) 1 if pf(x)ltp?, 0 otherwise

45
Basic Learning Results

Initial classification Frontal faces
200 features
Finds 95, 1/14000 false positive
Very fast
Adding features adds to computation time
Features interpretable
Darker region around eyes that nose/cheeks
Eyes are darker than bridge of nose

46
Primary Features
47
Attentional Cascade

Goal Improved classification, reduced time
Insight Small fast classifiers can reject
But have very few false negatives
Reject majority of uninteresting regions quickly
Focus computation on interesting regions
Approach Degenerate decision tree
Aka cascade
Positive results passed to high detection
classifiers
Negative results rejected immediately

48
Cascade Schematic
All Sub-window Features
T
T
T
CL 1
CL 2
CL 3
More Classifiers
F
F
F
Reject Sub-Window
49
Cascade Construction

Each stage is a trained classifier
Tune threshold to minimize false negatives
Good first stage classifier
Two feature strong classifier eye/check
eye/nose
Tuned Detect 100 40 false positives
Very computationally efficient
60 microprocessor instructions

50
Cascading

Goal Reject bad features quickly
Most features are bad
Reject early in processing, little effort
Good regions will trigger full cascade
Relatively rare
Classification is progressively more difficult
Rejected the most obvious cases already
Deeper classifiers more complex, more error-prone

51
Cascade Training

Tradeoffs Accuracy vs Cost
More accurate classifiers more features, complex
More features, more complex Slower
Difficult optimization
Practical approach
Each stage reduces false positive rate
Bound reduction in false pos, increase in miss
Add features to each stage until meet target
Add stages until overall effectiveness targets met

52
Results

Task Detect frontal upright faces
Face/non-face training images
Face 5000 hand-labeled instances
Non-face 9500 random web-crawl, hand-checked
Classifier characteristics
38 layer cascade
Increasing number of features 1,10,25, 6061
Classification Average 10 features per window
Most rejected in first 2 layers
Process 384x288 image in 0.067 secs

53
Detection Tuning

Multiple detections
Many subwindows around face will alert
Create disjoint subsets
For overlapping boundaries, only report one
Return average of corners
Voting
3 similarly trained detectors
Majority rules
Improves overall

54
Conclusions

Fast, robust facial detection
Simple, easily computable features
Simple trained classifiers
Classification cascade allows early rejection
Early classifiers also simple, fast
Good overall classification in real-time

55
Some Results
56
Vision in Modern Ai

Goals
Robustness
Multidomain applicability
Automatic acquisition
Speed Real time
Approach
Simple mechanisms, feature selection
Machine learning Tune features, classification

Write a Comment

User Comments (0)

About PowerShow.com

N-gram Models - PowerPoint PPT Presentation

N-gram Models

Optical illusions. Occlusion. Depth perception 'Objects are closer than they appear' ... extract 3-D shapes from image. match against 'shape library' Problems: ... – PowerPoint PPT presentation