Title: Motion estimation
1Motion estimation
- Introduction to Computer VisionCS223B, Winter
2005Richard Szeliski
2Why Visual Motion?
- Visual Motion can be annoying
- Camera instabilities, jitter
- Measure it. Remove it.
- Visual Motion indicates dynamics in the scene
- Moving objects, behavior
- Track objects and analyze trajectories
- Visual Motion reveals spatial layout of the scene
- Motion parallax
3Todays lecture
- Motion estimation
- background image pyramids, image warping
- application image morphing
- parametric motion (review)
- optic flow
- layered motion models
4Image Pyramids
5Image Pyramids
6Pyramid Creation
filter mask
Gaussian Pyramid
- Laplacian Pyramid
- Created from Gaussianpyramid by subtractionLl
Gl expand(Gl1)
7Octaves in the Spatial Domain
Lowpass Images
8Pyramids
- Advantages of pyramids
- Faster than Fourier transform
- Avoids ringing artifacts
- Many applications
- small images faster to process
- good for multiresolution processing
- compression
- progressive transmission
- Known as mip-maps in graphics community
- Precursor to wavelets
- Wavelets also have these advantages
9Laplacian level 0
left pyramid
right pyramid
blended pyramid
10Pyramid Blending
11Image Warping
12Image Warping
- image filtering change range of image
- g(x) h(f(x))
- image warping change domain of image
- g(x) f(h(x))
13Image Warping
- image filtering change range of image
- g(x) h(f(x))
- image warping change domain of image
- g(x) f(h(x))
f
g
f
g
14Parametric (global) warping
- Examples of parametric warps
aspect
rotation
translation
perspective
cylindrical
affine
152D coordinate transformations
- translation x x t x (x,y)
- rotation x R x t
- similarity x s R x t
- affine x A x t
- perspective x ? H x x (x,y,1) (x is a
homogeneous coordinate) - These all form a nested group (closed w/ inv.)
16Image Warping
- Given a coordinate transform x h(x) and a
source image f(x), how do we compute a
transformed image g(x) f(h(x))?
h(x)
x
x
f(x)
g(x)
17Forward Warping
- Send each pixel f(x) to its corresponding
location x h(x) in g(x)
- What if pixel lands between two pixels?
h(x)
x
x
f(x)
g(x)
18Forward Warping
- Send each pixel f(x) to its corresponding
location x h(x) in g(x)
- What if pixel lands between two pixels?
- Answer add contribution to several pixels,
normalize later (splatting)
h(x)
x
x
f(x)
g(x)
19Inverse Warping
- Get each pixel g(x) from its corresponding
location x h-1(x) in f(x)
- What if pixel comes from between two pixels?
h-1(x)
x
x
f(x)
g(x)
20Inverse Warping
- Get each pixel g(x) from its corresponding
location x h-1(x) in f(x)
- What if pixel comes from between two pixels?
- Answer resample color value from interpolated
(prefiltered) source image
x
x
f(x)
g(x)
21Interpolation
- Possible interpolation filters
- nearest neighbor
- bilinear
- bicubic (interpolating)
- sinc / FIR
- Needed to prevent jaggies and texture crawl
(see demo)
22Prefiltering
- Essential for downsampling (decimation) to
prevent aliasing - MIP-mapping Williams83
- build pyramid (but what decimation filter?)
- block averaging
- Burt Adelson (5-tap binomial)
- 7-tap wavelet-based filter (better)
- trilinear interpolation
- bilinear within each 2 adjacent levels
- linear blend between levels (determined by pixel
size)
23Prefiltering
- Essential for downsampling (decimation) to
prevent aliasing - Other possibilities
- summed area tables
- elliptically weighted Gaussians (EWA)
Heckbert86
24Image Warping non-parametric
- Specify more detailed warp function
- Examples
- splines
- triangles
- optical flow (per-pixel motion)
25Image Warping non-parametric
- Move control points to specify spline warp
26Image Morphing
27Image Morphing
- How can we in-between two images?
- Cross-dissolve(all examples from Gomes
et al.99)
28Image Morphing
- How can we in-between two images?
- Warp then cross-dissolve morph
29Warp specification
- How can we specify the warp?
- Specify corresponding points
- interpolate to a complete warping
function - Nielson, Scattered Data Modeling, IEEE CGA93
30Warp specification
- How can we specify the warp?
- Specify corresponding vectors
- interpolate to a complete warping function
31Warp specification
- How can we specify the warp?
- Specify corresponding vectors
- interpolate Beier Neely, SIGGRAPH92
32Warp specification
- How can we specify the warp?
- Specify corresponding spline control points
- interpolate to a complete warping function
33Final Morph Result
34Motion estimation
35Classes of Techniques
- Feature-based methods
- Extract salient visual features (corners,
textured areas) and track them over multiple
frames - Analyze the global pattern of motion vectors of
these features - Sparse motion fields, but possibly robust
tracking - Suitable especially when image motion is large
(10-s of pixels) - Direct-methods
- Directly recover image motion from
spatio-temporal image brightness variations - Global motion parameters directly recovered
without an intermediate feature motion
calculation - Dense motion fields, but more sensitive to
appearance variations - Suitable for video and when image motion is small
(lt 10 pixels)
36The Brightness Constraint
- Brightness Constancy Equation
Or, better still, Minimize
Linearizing (assuming small (u,v))
37Gradient Constraint (or the Optical Flow
Constraint)
38Local Patch Analysis
39Patch Translation Lucas-Kanade
Assume a single velocity for all pixels within an
image patch
Minimizing
LHS sum of the 2x2 outer product tensor of the
gradient vector
40The Aperture Problem
and
Let
- Algorithm At each pixel compute by
solving - M is singular if all gradient vectors point in
the same direction - e.g., along an edge
- of course, trivially singular if the summation
is over a single pixel or there is no texture - i.e., only normal flow is available (aperture
problem) - Corners and textured areas are OK
41Aperture Problem and Normal Flow
42Local Patch Analysis
43Iterative Refinement
- Estimate velocity at each pixel using one
iteration of Lucas and Kanade estimation - Warp one image toward the other using the
estimated flow field - (easier said than done)
- Refine estimate by repeating the process
44Optical Flow Iterative Estimation
x
x0
45Optical Flow Iterative Estimation
46Optical Flow Iterative Estimation
47Optical Flow Iterative Estimation
x
x0
48Optical Flow Iterative Estimation
- Some Implementation Issues
- warping is not easy (make sure that errors in
interpolation and warping are not bigger than the
estimate refinement) - warp one image, take derivatives of the other so
you dont need to re-compute the gradient after
each iteration. - often useful to low-pass filter the images before
motion estimation (for better derivative
estimation, and somewhat better linear
approximations to image intensity)
49Optical Flow Iterative Estimation
- Some Implementation Issues
- warping is not easy (make sure that errors in
interpolation and warping are not bigger than the
estimate refinement) - warp one image, take derivatives of the other so
you dont need to re-compute the gradient after
each iteration. - often useful to low-pass filter the images before
motion estimation (for better derivative
estimation, and somewhat better linear
approximations to image intensity)
50Optical Flow Aliasing
Temporal aliasing causes ambiguities in optical
flow because images can have many pixels with the
same intensity. I.e., how do we know which
correspondence is correct?
nearest match is correct (no aliasing)
nearest match is incorrect (aliasing)
To overcome aliasing coarse-to-fine estimation.
51Iterative refinement
BUT!!
52Limits of the gradient method
- Fails when intensity structure in window is poor
- Fails when the displacement is large (typical
operating range is motion of 1 pixel) - Linearization of brightness is suitable only for
small displacements - Also, brightness is not strictly constant in
images - actually less problematic than it appears, since
we can pre-filter images to make them look similar
53Coarse-to-Fine Estimation
54Coarse-to-Fine Estimation
I
J
J
Jw
I
refine
warp
J
I
Jw
pyramid construction
pyramid construction
refine
warp
J
I
Jw
refine
warp
55Global Motion Models
- 2D Models
- Affine
- Quadratic
- Planar projective transform (Homography)
- 3D Models
- Instantaneous camera motion models
- Homographyepipole
- PlaneParallax
56Example Affine Motion
- Substituting into the B.C. Equation
Each pixel provides 1 linear constraint in 6
global unknowns
57Other 2D Motion Models
583D Motion Models
59Correlation and SSD
- For larger displacements, do template matching
- Define a small area around a pixel as the
template - Match the template against each pixel within a
search area in next image. - Use a match measure such as correlation,
normalized correlation, or sum-of-squares
difference - Choose the maximum (or minimum) as the match
- Sub-pixel interpolation also possible
60SSD Surface Textured area
61SSD Surface -- Edge
62SSD homogeneous area
63Discrete Search vs. Gradient Based Estimation
Consider image I translated by
The discrete search method simply searches for
the best estimate. The gradient method linearizes
the intensity function and solves for the estimate
64Uncertainty in Local Estimation
Consider image I translated by
Now,
This assumes uniform priors on the velocity field
65Quadratic Approximation
When
are small
After some fiddling around, we can show
66Posterior uncertainty
At edges is singular, but just take
pseudo-inverse
Note that the error is always convex, since
is positive semi-definite i.e., even for
occluded points and other false matches, this is
the case seems a bit odd!
67Match plus confidence
- Numerically compute error for various
- Search for the peak
- Numerically fit a qudratic to around the
peak - Find sub-pixel estimate for and covariance
- If the matrix is negative, it is false match
- Or even better, if you can afford it, simply
maintain a discrete sampling of and
68Quadratic Approximation and Covariance Estimation
When
where
69Correlation Window Size
- Small windows lead to more false matches
- Large windows are better this way, but
- Neighboring flow vectors will be more correlated
(since the template windows have more in common) - Flow resolution also lower (same reason)
- More expensive to compute
- Another way to look at this
- Small windows are good for local search but more
precise and less smooth - Large windows good for global search but less
precise and more smooth method
70Optical Flow Robust Estimation
Issue 7 Noise distributions are often
non-Gaussian, having much heavier tails. Noise
samples from the tails are called outliers.
- Sources of outliers (multiple motions)
- specularities / highlights
- jpeg artifacts / interlacing / motion blur
- multiple motions (occlusion boundaries,
transparency)
71Robust Estimation
- Noise distributions are often non-Gaussian,
having much heavier tails. Noise samples from
the tails are called outliers. - Sources of outliers (multiple motions)
- specularities / highlights
- jpeg artifacts / interlacing / motion blur
- multiple motions (occlusion boundaries,
transparency)
72Robust Estimation
Standard Least Squares Estimation allows too much
influence for outlying points
73Robust Estimation
Robust gradient constraint
Robust SSD
74Robust Estimation
75Layered Motion Models
Layered models provide a 2.5 representation, like
cardboard cutouts.
Key players
- intensity (appearance)
- alpha map (opacity)
- warp maps (motion)
76Layered Scene Representations
77Motion representations
- How can we describe this scene?
78Block-based motion prediction
- Break image up into square blocks
- Estimate translation for each block
- Use this to predict next frame, code difference
(MPEG-2)
79Layered motion
- Break image sequence up into layers
- ?
- Describe each layers motion
80Outline
- Why layers?
- 2-D layers Wang Adelson 94 Weiss 97
- 3-D layers Baker et al. 98
- Layered Depth Images Shade et al. 98
- Transparency Szeliski et al. 00
81Layered motion
- Advantages
- can represent occlusions / disocclusions
- each layers motion can be smooth
- video segmentation for semantic processing
- Difficulties
- how do we determine the correct number?
- how do we assign pixels?
- how do we model the motion?
82Layers for video summarization
83Background modeling (MPEG-4)
- Convert masked images into a background sprite
for layered video coding -
-
84What are layers?
- Wang Adelson, 1994
- intensities
- alphas
- velocities
85How do we composite them?
86How do we form them?
87How do we form them?
88How do we estimate the layers?
- compute coarse-to-fine flow
- estimate affine motion in blocks (regression)
- cluster with k-means
- assign pixels to best fitting affine region
- re-estimate affine motions in each region
89Layer synthesis
- For each layer
- stabilize the sequence with the affine motion
- compute median value at each pixel
- Determine occlusion relationships
90Results
91What if the motion is not affine?
- Use a regularized (smooth) motion field
- Weiss, CVPR97
92A Layered Approach To Stereo Reconstruction
- Simon Baker, Richard Szeliski and P. Anandan
- CVPR98
93Layered Stereo
- Assign pixel to different layers (objects,
sprites) - already covered in Stereo Lecture 2
94Layer extraction from multiple images containing
reflections and transparency
- Richard Szeliski
- Shai Avidan
- P. Anandan
- CVPR2000
- extra bonus material
95Transparent motion
- Photograph (Lee) and reflection (Michael)
96Previous work
- Physics-based vision and polarizationShafer et
al. Wolff Nayar et al. - Perception of transparency Adelson
- Transparent motion estimationShizawa Mase
Bergen et al. Irani et al. Darrell
Simoncelli - 3-frame layer recovery Bergen et al.
97Problem formulation
MotionX,i( )
X
Y
MotionY,i( )
98Image formation model
- Pure additive mixing of positive signals
- mk(x) ?l Wkl ? fl(x)
- or
- mk ?l Wkl fl
- Assume motion is planar (perspective transform,
aka homography)
99Two processing stages
- Estimate the motions and initial layer estimates
- Compute optimal layer estimates (for known motion)
100Dominant motion estimation
- Stabilize sequence by dominant motionrobus
t affine Bergen et al. 92 Szeliski Shum
101Dominant layer estimate
- How do we form composite (estimate)?
102Average?
103Median?
- Hint all layers are non-negative
104Min-composite
- Smallest value is over-estimate of layer
105Difference sequence
- Subtract min-composite from original image
?
original - min composite
difference image
106Min composite
Intensity
Time
(overestimate of background layer)
107Difference sequence
(underestimate of foreground layer)
108Stabilizing secondary motion
109Max-composite
Largest value is under-estimate of layer
110Min-max alternation
- Subtract secondary layer (under-estimate) from
original sequence - Re-compute dominant motion and better
min-composite - Iterate
- Does this process converge?
111Min-max alternation
- Does this process converge?
- in theory yes
- each iteration reduces number of mis-estimated
pixels (tightens the bounds) proof in paper
112Min-max alternation
- Does this process converge?
- in practice no
- resampling errors and noise both lead to
divergence discussion in paperresampling
error noisy
113Two processing stages
- Estimate the motions and initial layer estimates
- Compute optimal layer estimates (for known motion)
114Optimal estimation
- Recall additive mixing of positive signals
- mk ?l Wkl fl
- Use constrained least squares(quadratic
programming) - min ?k ?l Wkl fl mk 2 s.t. fl ? 0
115Least squares example
background foreground
116Uniqueness of solution
- If any layer does not have a black region,
i.e., if fl ? c, then can add this offset to
another layer (and subtract it from fl)
background foreground
117Degeneracies in solution
- If motion is degenerate (e.g., horizontal),
regions (scanlines) decouple (w/o MRF)
mixed
118Noise sensitivity
- In general, low-frequency components hard to
recover for small motions
? recovered ?
mixed
? scaled errors ?
119Three-layer example
- 3 layers with general motion works well
120Complete algorithm
- Dominant motion with min-composites
- Difference (residual) images
- Non-dominant motion on differences
- Improve the motion estimates
- Unconstrained least-squares problem
- Constrained least-squares problem
121Complete example
original
stabilized
122Complete example
difference
stabilized
123Final Results
124Another example
- original stabilized min-comp. resid.? 2
125Results Anne and books
original background foreground (photo)
126Transparent layer recovery
- Pure (additive) mixing of intensities
- simple constrained least squares problem
- degeneracies for simple or small motions
- Processing stages
- dominant motion estimation
- min- and max-composites to initialize
- optimization of motion and layers
127Future work
- Mitigating degeneracies (regularization)
- Opaque layers (? estimation)
- Non-planar geometry (parallax)
128Bibliography
- L. Williams. Pyramidal parametrics.
- Computer Graphics, 17(3)1--11, July 1983.
- L. G. Brown. A survey of image registration
techniques. - Computing Surveys, 24(4)325--376, December 1992.
- C. D. Kuglin and D. C. Hines. The phase
correlation image alignment method. - In IEEE 1975 Conference on Cybernetics and
Society, pages - 163--165, New York, September 1975.
- J. Gomes, L. Darsa, B. Costa, and L. Velho.
Warping and Morphing of Graphical Objects. - Morgan Kaufmann Publishers, San Francisco Altos,
California, 1999. - G. M. Nielson. Scattered data modeling.
- IEEE Computer Graphics and Applications,
13(1)60--70, January 1993. - T. Beier and S. Neely. Feature-based image
metamorphosis. - Computer Graphics (SIGGRAPH'92), 26(2)35--42,
July 1992.
129Bibliography
- J. R. Bergen, P. Anandan, K. J. Hanna, and R.
Hingorani. Hierarchical model-based motion
estimation. In ECCV92, pp. 237252, Italy, May
1992. - M. J. Black and P. Anandan. The robust estimation
of multiple motions Parametric and
piecewise-smooth flow fields. Comp. Vis. Image
Understanding, 63(1)75104, 1996. - H. S. Sawhney and S. Ayer. Compact representation
of videos through dominant multiple motion
estimation. IEEE Trans. Patt. Anal. Mach. Intel.,
18(8)814830, Aug. 1996. - Y. Weiss. Smoothness in layers Motion
segmentation using nonparametric mixture
estimation. In CVPR97, pp. 520526, June 1997.
130Bibliography
- J. Y. A. Wang and E. H. Adelson. Representing
moving images with layers. IEEE Transactions on
Image Processing, 3(5)625--638, September 1994.
- Y. Weiss and E. H. Adelson. A unified mixture
framework for motion segmentation Incorporating
spatial coherence and estimating the number of
models. In IEEE Computer Society Conference on
Computer Vision and Pattern Recognition
(CVPR'96), pages 321--326, San Francisco,
California, June 1996. - Y. Weiss. Smoothness in layers Motion
segmentation using nonparametric mixture
estimation. In IEEE Computer Society Conference
on Computer Vision and Pattern Recognition
(CVPR'97), pages 520--526, San Juan, Puerto Rico,
June 1997. - P. R. Hsu, P. Anandan, and S. Peleg. Accurate
computation of optical flow by using layered
motion representations. In Twelfth International
Conference on Pattern Recognition (ICPR'94),
pages 743--746, Jerusalem, Israel, October 1994.
IEEE Computer Society Press
131Bibliography
- T. Darrell and A. Pentland. Cooperative robust
estimation using layers of support. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 17(5)474--487, May 1995. - S. X. Ju, M. J. Black, and A. D. Jepson. Skin
and bones Multi-layer, locally affine, optical
flow and regularization with transparency. In
IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR'96), pages
307--314, San Francisco, California, June 1996. - M. Irani, B. Rousso, and S. Peleg. Computing
occluding and transparent motions. International
Journal of Computer Vision, 12(1)5--16, January
1994. - H. S. Sawhney and S. Ayer. Compact
representation of videos through dominant
multiple motion estimation. IEEE Transactions on
Pattern Analysis and Machine Intelligence,
18(8)814--830, August 1996. - M.-C. Lee et al. A layered video object coding
system using sprite and affine motion model.
IEEE Transactions on Circuits and Systems for
Video Technology, 7(1)130--145, February 1997.
132Bibliography
- S. Baker, R. Szeliski, and P. Anandan. A layered
approach to stereo reconstruction. In IEEE
CVPR'98, pages 434--441, Santa Barbara, June
1998. - R. Szeliski, S. Avidan, and P. Anandan. Layer
extraction from multiple images containing
reflections and transparency. In IEEE CVPR'2000,
volume 1, pages 246--253, Hilton Head Island,
June 2000. - J. Shade, S. Gortler, L.-W. He, and R. Szeliski.
Layered depth images. In Computer Graphics
(SIGGRAPH'98) Proceedings, pages 231--242,
Orlando, July 1998. ACM SIGGRAPH. - S. Laveau and O. D. Faugeras. 3-d scene
representation as a collection of images. In
Twelfth International Conference on Pattern
Recognition (ICPR'94), volume A, pages 689--691,
Jerusalem, Israel, October 1994. IEEE Computer
Society Press. - P. H. S. Torr, R. Szeliski, and P. Anandan. An
integrated Bayesian approach to layer extraction
from image sequences. In Seventh ICCV'98, pages
983--990, Kerkyra, Greece, September 1999.