Title: Sequence Models in Modern AI
1Sequence Models in Modern AI
- Probabilistic sequence models
- HMMs, N-grams
- Train from available data
- Classification with contextual influence
- Robust to noise/variability
- E.g. Sentences vary in degrees of acceptability
- Provides ranking of sequence quality
- Exploits large scale data, storage, memory, CPU
2Computer Vision
- CMSC 25000
- Artificial Intelligence
- March 1, 2007
- Motivation
- Computer vision applications
- Is a Picture worth a thousand words?
- Low level features
- Feature extraction intensity, color
- High level features
- Top-down constraint shape from stereo, motion,..
- Case Study Vision as Modern AI
- Fast, robust face detection (Viola Jones 2002)
- From observation to facts about world
- Analogous to speech recognition
- Stimulus (Percept) S, World W
- S g(W)
- Recognition Derive world from percept
- Wg(S)
- Is this possible?
5Key Perception Problem
- Massive ambiguity
- Optical illusions
- Occlusion
- Depth perception
- Objects are closer than they appear
- Is it full-sized or a miniature model?
6Image Ambiguity
7Handling Uncertainty
- Identify single perfect correct solution
- Impossible!
- Noise, ambiguity, complexity
- Solution
- Probabilistic model
- P(WS) aP(SW) P(W)
- Maximize image probability and model probability
8Handling Complexity
- Dont solve the whole problem
- Dont recover every object/position/color
- Solve restricted problem
- Find all the faces
- Recognize a person
- Align two images
9Modern Computer Vision Applications
- Face / Object detection
- Medical image registration
- Face recognition
- Object tracking
10Vision Subsystems
11Image Formation
12Images and Representations
- Initially pixel images
- Image as NxM matrix of pixel values
- Alternate image codings
- Grey-scale intensity values
- Color encoding intensities of RGB values
14Grey-scale Images
15Color Images
16Image Features
- Grey-scale and color intensities
- Directly access image signal values
- Large number of measures
- Possibly noisy
- Only care about intensities as cues to world
- Image Features
- Mid-level representation
- Extract from raw intensities
- Capture elements of interest for image
17Edge Detection
18Edge Detection
- Find sharp demarcations in intensity
- 1) Apply spatially oriented filters
- E.g. vertical, horizontal, diagonal
- 2) Label above-threshold pixels with edge
orientation - 3) Combine edge segments with same orientation
19Top-down Constraints
- Goal Extract objects from images
- Approach apply knowledge about how the world
works to identify coherent objects reconstruct
20Motion Optical Flow
- Find correspondences in sequential images
- Units which move together represent objects
22Stereo Depth Resolution
23Texture and Shading
24Edge-Based 2-3D Reconstruction
Assume world of solid polyhedra with 3-edge
vertices Apply Waltz line labeling via
Constraint Satisfaction
25Basic Object Recognition
- Simple idea
- extract 3-D shapes from image
- match against shape library"
- Problems
- extracting curved surfaces from image
- representing shape of extracted object
- representing shape and variability of library
object classes - improper segmentation, occlusion
- unknown illumination, shadows, markings, noise,
complexity, etc. - Approaches
- index into library by measuring invariant
properties of objects - alignment of image feature with projected library
object feature - match image against multiple stored views
(aspects) of library object - machine learning methods based on image
26Hand-written Digit Recognition
- Vision is hard
- Noise, ambiguity, complexity
- Prior knowledge is essential to constrain problem
- Cohesion of objects, optics, object features
- Combine multiple cues
- Motion, stereo, shading, texture,
- Image/object matching
- Library features, lines, edges, etc
- Apply domain knowledge Optics
- Apply machine learning NN, NN, CSP, etc
28Computer Vision Case Study
- Rapid Object Detection using a Boosted Cascade
of Simple Features, Viola/Jones 01 - Challenge
- Object detection
- Find all faces in an arbitrary images
- Real-time execution
- 15 frames per second
- Need simple features, classifiers
29Rapid Object Detection Overview
- Fast detection with simple local features
- Simple fast feature extraction
- Small number of computations per pixel
- Rectangular features
- Feature selection with Adaboost
- Sequential feature refinement
- Cascade of classifiers
- Increasingly complex classifiers
- Repeatedly rule out non-object areas
30Picking Features
- What cues do we use for object detection?
- Not direct pixel intensities
- Features
- Can encode task specific domain knowledge (bias)
- Difficult to learn directly from data
- Reduce training set size
- Feature system can speed processing
31Rectangle Features
- Treat rectangles as units
- Derive statistics
- Two-rectangle features
- Two similar rectangular regions
- Vertically or horizontally adjacent
- Sum pixels in each region
- Compute difference between regions
32Rectangle Features II
- Three-rectangle features
- 3 similar rectangles horizontally/vertically
- Sum outside rectangles
- Subtract from center region
- Four-rectangle features
- Compute difference between diagonal pairs
- HUGE feature set 180,000
33Rectangle Features
34Computing Features Efficiently
- Fast detection requires fast feature calculation
- Rapidly compute intermediate representation
- Integral image
- Value for point (x,y) is sum of pixels above,
left - ii(x,y) Sxltx,ylty i(x,y)
- Computed by recurrence
- s(x,y) s(x,y-1) i(x,y) , where s(x,y)
cumulative row - ii(x,y) ii(x-1,y) s(x,y)
- Compute rectangle sum with 4 array references
35Rectangle Feature Summary
- Rectangle features
- Relatively simple
- Sensitive to bars, edges, simple structure
- Coarse
- Rich enough for effective learning
- Efficiently computable
36Learning an Image Classifier
- Supervised training /- examples
- Many learning approaches possible
- Adaboost
- Selects features AND trains classifier
- Improves performance of simple classifiers
- Guaranteed to converge exponentially rapidly
- Basic idea Simple classifier
- Boosts performance by focusing on previous errors
37Feature Selection and Training
- Goal Pick only useful features from 180000
- Idea Small number of features effective
- Learner selects single feature that best
separates /- ve examples - Learner selects optimal threshold for each
feature - Classifier h(x) 1 if pf(x)ltp?, 0 otherwise
- Initialize weights, where mneg, lpos
- For t 1,,T
- 1. Normalize the weights, so that wt is
probability distribn - 2. For each feature, j, train a classifier, hi,
which is restricted - to a single feature. Error is evaluated with
respect to wt - 3. Choose the classifier, ht, with lowest error
- 4. Update the weights where ei0 if example xi
classified - correctly and ei 1 o.w.
- The final classifier isßt et/(1-et)
39Basic Learning Results
- Initial classification Frontal faces
- 200 features
- Finds 95, 1/14000 false positive
- Very fast
- Adding features adds to computation time
- Features interpretable
- Darker region around eyes that nose/cheeks
- Eyes are darker than bridge of nose
40Primary Features
41Attentional Cascade
- Goal Improved classification, reduced time
- Insight Small fast classifiers can reject
- But have very few false negatives
- Reject majority of uninteresting regions quickly
- Focus computation on interesting regions
- Approach Degenerate decision tree
- Aka cascade
- Positive results passed to high detection
classifiers - Negative results rejected immediately
42Cascade Schematic
All Sub-window Features
CL 1
CL 2
CL 3
More Classifiers
Reject Sub-Window
43Cascade Construction
- Each stage is a trained classifier
- Tune threshold to minimize false negatives
- Good first stage classifier
- Two feature strong classifier eye/check
eye/nose - Tuned Detect 100 40 false positives
- Very computationally efficient
- 60 microprocessor instructions
- Goal Reject bad features quickly
- Most features are bad
- Reject early in processing, little effort
- Good regions will trigger full cascade
- Relatively rare
- Classification is progressively more difficult
- Rejected the most obvious cases already
- Deeper classifiers more complex, more error-prone
45Cascade Training
- Tradeoffs Accuracy vs Cost
- More accurate classifiers more features, complex
- More features, more complex Slower
- Difficult optimization
- Practical approach
- Each stage reduces false positive rate
- Bound reduction in false pos, increase in miss
- Add features to each stage until meet target
- Add stages until overall effectiveness targets met
- Task Detect frontal upright faces
- Face/non-face training images
- Face 5000 hand-labeled instances
- Non-face 9500 random web-crawl, hand-checked
- Classifier characteristics
- 38 layer cascade
- Increasing number of features 1,10,25, 6061
- Classification Average 10 features per window
- Most rejected in first 2 layers
- Process 384x288 image in 0.067 secs
47Detection Tuning
- Multiple detections
- Many subwindows around face will alert
- Create disjoint subsets
- For overlapping boundaries, only report one
- Return average of corners
- Voting
- 3 similarly trained detectors
- Majority rules
- Improves overall
- Fast, robust facial detection
- Simple, easily computable features
- Simple trained classifiers
- Classification cascade allows early rejection
- Early classifiers also simple, fast
- Good overall classification in real-time
49Some Results
50Vision in Modern Ai
- Goals
- Robustness
- Multidomain applicability
- Automatic acquisition
- Speed Real time
- Approach
- Simple mechanisms, feature selection
- Machine learning Tune features, classification