Title: Datadriven Approaches for Texture and Motion
1Data-driven Approaches for Texture and Motion
- Alexei A. Efros
- University of Oxford
2Graphics and Vision
The Geometric Story
projection
3D scene
2D image
Computer Graphics
Computer Vision
3Eye of the Beholder
Claude Monet Gare St.Lazare Paris, 1877
4Eye of the Beholder
5Seeing less than you think
6Seeing less than you think
Geometry is not enough! Need learning
7Learning in Vision
Recognition
Modeling
Capture
Image / Video
slide by C.Bregler
8Learning in Graphics
Image / Video
Synthesis
Modeling
Capture
Image / Video
slide by C.Bregler
9Data-driven Approaches
Image / Video
Synthesis, recognition
Easy just look up the answer!
Capture
Image / Video
10- A.I. for the postmodern world
- all questions have already been answeredmany
times, in many ways - Google is dumb, the intelligence is in the data
- This is exactly associative memory!
- No model, but inference still possible
- automatic translation
- dictionaryless spell checking
11- Main Problem find the right similarity metric
- text is easy well defined, segmented, compact
- Natural phenomena are hard (e.g. Genome)
- Visual data is 2D, even harder!
12Two Domains
- Texture
- Texture Synthesis
- Texture Transfer
- Human Motion
- Analysis
- Synthesis
- Applications
13Texture
- Texture depicts spatially repeating patterns
- Many natural phenomena are textures
radishes
rocks
yogurt
14Texture Synthesis
- Goal of Texture Synthesis create new samples of
a given texture - Many applications virtual environments,
hole-filling, texturing surfaces
15The Challenge
- Need to model the whole spectrum from repeated
to stochastic texture
repeated
stochastic
Both?
16Previous Work
- Inspired by texture analysis and psychophysics
- Heeger Bergen, SIGGRAPH 95
- Zhu et al., 98
- Portilla Simoncelli,98
- DeBonet, SIGGRAPH 97
courtesy DeBonet,97
17Classical Texture Synthesis
Novel texture
Synthesis
Texture Model
Analysis
Sample texture
18Our Approach
Novel texture
Synthesis
Analysis
Sample texture
19Motivation from Language
- Shannon,48 proposed a way to generate
English-looking text using N-grams - Assume a generalized Markov model
- Use a large text to compute prob. distributions
of each letter given N-1 previous letters - Starting from a seed repeatedly sample this
Markov chain to generate new letters - Also works for whole words
WE NEED
TO
EAT
CAKE
20Mark V. Shaney (Bell Labs)
- Results (using alt.singles corpus)
- As I've commented before, really relating to
someone involves standing next to impossible. - One morning I shot an elephant in my arms and
kissed him. - I spent an interesting evening recently with a
grain of salt - Notice how well local structure is preserved!
- Now, instead of letters lets try pixels
21Pixel-based Algorithm EfrosLeung
Synthesizing a pixel
- Assuming Markov property, compute P(pN(p))
- Building explicit probability tables infeasible
22Neighborhood Window
input
23Synthesis Results
french canvas
rafia weave
24More Results
white bread
brick wall
25Homage to Shannon
26Hole Filling
27Extrapolation
28Image Quilting Efros Freeman
non-parametric sampling
Input image
- Observation neighbor pixels are highly correlated
29block
Input texture
B1
B2
Random placement of blocks
30Minimal error boundary
overlapping blocks
vertical boundary
31Our Philosophy
- The Corrupt Professors Algorithm
- Plagiarize as much of the source image as you can
- Then try to cover up the evidence
- Rationale
- Texture blocks are by definition correct samples
of texture so problem only connecting them
together
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Failures (Chernobyl Harvest)
39Portilla Simoncelli
Xu, Guo Shum
input image
Wei Levoy
Our algorithm
40Portilla Simoncelli
Xu, Guo Shum
input image
Wei Levoy
Our algorithm
41Portilla Simoncelli
Xu, Guo Shum
input image
Wei Levoy
Our algorithm
42Application Texture Transfer
- Try to explain one object with bits and pieces of
another object
- Same as texture synthesis, except an additional
constraint - Consistency of texture
- Similarity to the image being explained
43 44(No Transcript)
45parmesan
rice
46Two Domains
- Texture
- Texture Synthesis
- Texture Transfer
- Human Motion
- Analysis
- Synthesis
- Applications
47Looking at People
Far field
Near field
- 3-pixel man
- Blob tracking
- vast surveillance literature
- 300-pixel man
- Limb tracking
- e.g. Yacoob Black, Rao Shah, etc.
48Medium-field Recognition
49Appearance vs. Motion
50Goals
- Recognize human actions at a distance
- Low resolution, noisy data
- Moving camera, occlusions
- Wide range of actions (including non-periodic)
51Our Approach
- Motion-based approach
- Non-parametric use large amount of data
- Classify a novel motion by finding the most
similar motion from the training set - Related Work
- Periodicity analysis
- Polana Nelson Seitz Dyer Bobick et al
Cutler Davis Collins et al. - Model-free
- Temporal Templates Bobick Davis
- Orientation histograms Freeman et al Zelnik
Irani - Using MoCap data Zhao Nevatia, Ramanan
Forsyth
52Gathering action data
- Tracking
- Simple correlation-based tracker
- User-initialized
53Figure-centric Representation
- Stabilized spatio-temporal volume
- No translation information
- All motion caused by persons limbs
- Good news indifferent to camera motion
- Bad news hard!
- Good test to see if actions, not just
translation, are being captured
54Efros, Berg, Mori and Malik
Image / Video
Synthesis, recognition
Capture
Image / Video
55Remembrance of Things Past
- Explain novel motion sequence by matching to
previously seen video clips - For each frame, match based on some temporal
extent
input sequence
Challenge how to compare motions?
56How to describe motion?
- Appearance
- Not preserved across different clothing
- Gradients (spatial, temporal)
- same (e.g. contrast reversal)
- Edges/Silhouettes
- Too unreliable
- Optical flow
- Explicitly encodes motion
- Least affected by appearance
- but too noisy
57Spatial Motion Descriptor
Image frame
Optical flow
58Spatio-temporal Motion Descriptor
Sequence A
S
Sequence B
t
59Football Actions matching
Input Sequence
Matched Frames
input
matched
60Football Actions classification
10 actions 4500 total frames 13-frame motion
descriptor
61Classifying Ballet Actions
16 Actions 24800 total frames 51-frame motion
descriptor. Men used to classify women and vice
versa.
62Classifying Tennis Actions
6 actions 4600 frames 7-frame motion
descriptor Woman player used as training, man as
testing.
63Classifying Tennis
- Red bars show classification results
64Querying the Database
input sequence
database
652D Skeleton Transfer
- We annotate database with 2D joint positions
- After matching, transfer data to novel sequence
- Ajust the match for best fit
Input sequence
Transferred 2D skeletons
663D Skeleton Transfer
- We populate database with rendered stick figures
from 3D Motion Capture data - Matching as before, we get 3D joint positions
(kind of)!
Input sequence
Transferred 3D skeletons
67Do as I Do Motion Synthesis
input sequence
synthetic sequence
- Matching two things
- Motion similarity across sequences
- Appearance similarity within sequence (like
VideoTextures) - Dynamic Programming
68Smoothness for Synthesis
- is similarity between source and target
frames - is appearance similarity within target
frames - For every source frame i, find best target frame
- by maximizing following cost function
- Optimize using dynamic programming
69Do as I Do
Source Motion
Source Appearance
3400 Frames
Result
70Do as I Say Synthesis
run walk left swing walk
right jog
run
jog
swing
walk right
walk left
synthetic sequence
- Synthesize given action labels
- e.g. video game control
71Do as I Say
- Red box shows when constraint is applied
72Application Motion Retargeting
- Rendering new character into existing footage
- Algorithm
- Track original character
- Find matches from new character
- Erase original character
- Render in new character
- Need to worry about occlusions
73Demo
SHOW VIDEO
74Context-based Image Correction
Input sequence
3 closest frames
median images
75Big Picture
- Modeling is good
- but many things are hard/impossible to model
- Data-driven data itself is the model
- the more data the merrier!
- we are running out of domains with good models
- Great gains from physics and geometry
- but many problems still unsolved, e.g.
- Capture/rendering of materials and weather
- Smart image/video capture and enhancement
- Object/Scene recognition navigation
- Visual data is now cheap and plentiful
- The time is right for data-driven methods!
76Acknowledgments
- Co-authors Thomas Leung, Bill Freeman, Alexander
Berg, Greg Mori, and Jitendra Malik. - NSF, MURI, AZ
- Thank You
77EXTRA SLIDES
- (for these who didnt have enough)
78Varying Window Size
Increasing window size
79Summary
- The Efros Leung algorithm
- Very simple
- Surprisingly good results
- Synthesis is easier than analysis!
- but very slow
80Follow-up work
- Optimizations and Improvements
- Wei Levoy,00 (based on Popat Picard,93)
- Harrison,01
- Ashikhmin,01
- Theory
- Levina,02 proof of consistency
- Applications
- Surface Texture Synthesis Ying et.al,'01, Wei
and Levoy,'01, Turk,'01, Gorla et.al.,'01,
Soler et al.,02, Tong et al.,02 - Hertzmann et al.,01 Image Analogies
- Brooks, et al.,02 Texture Editing
- Hertzmann et al.,02 Curve Analogies
- etc.