Motion estimation

About This Presentation

Title:

Motion estimation

Description:

Motion estimation Introduction to Computer Vision CS223B, Winter 2005 Richard Szeliski Why Visual Motion? Visual Motion can be annoying Camera instabilities, jitter ... – PowerPoint PPT presentation

Number of Views:228

Avg rating:3.0/5.0

Slides: 117

Provided by: robotsSta

Learn more at: http://robots.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Motion estimation

1
Motion estimation

Introduction to Computer VisionCS223B, Winter
2005Richard Szeliski

2
Why Visual Motion?

Visual Motion can be annoying
Camera instabilities, jitter
Measure it. Remove it.
Visual Motion indicates dynamics in the scene
Moving objects, behavior
Track objects and analyze trajectories
Visual Motion reveals spatial layout of the scene
Motion parallax

3
Todays lecture

Motion estimation
background image pyramids, image warping
application image morphing
parametric motion (review)
optic flow
layered motion models

4
Image Pyramids
5
Image Pyramids
6
Pyramid Creation
filter mask
Gaussian Pyramid

Laplacian Pyramid
Created from Gaussianpyramid by subtractionLl
Gl expand(Gl1)

7
Octaves in the Spatial Domain
Lowpass Images

Bandpass Images

8
Pyramids

Advantages of pyramids
Faster than Fourier transform
Avoids ringing artifacts
Many applications
small images faster to process
good for multiresolution processing
compression
progressive transmission
Known as mip-maps in graphics community
Precursor to wavelets
Wavelets also have these advantages

9
Laplacian level 0
left pyramid
right pyramid
blended pyramid
10
Pyramid Blending
11
Image Warping
12
Image Warping

image filtering change range of image
g(x) h(f(x))
image warping change domain of image
g(x) f(h(x))

13
Image Warping

image filtering change range of image
g(x) h(f(x))
image warping change domain of image
g(x) f(h(x))

f
g
f
g
14
Parametric (global) warping

Examples of parametric warps

aspect
rotation
translation
perspective
cylindrical
affine
15
2D coordinate transformations

translation x x t x (x,y)
rotation x R x t
similarity x s R x t
affine x A x t
perspective x ? H x x (x,y,1) (x is a
homogeneous coordinate)
These all form a nested group (closed w/ inv.)

16
Image Warping

Given a coordinate transform x h(x) and a
source image f(x), how do we compute a
transformed image g(x) f(h(x))?

h(x)
x
x
f(x)
g(x)
17
Forward Warping

Send each pixel f(x) to its corresponding
location x h(x) in g(x)

What if pixel lands between two pixels?

h(x)
x
x
f(x)
g(x)
18
Forward Warping

Send each pixel f(x) to its corresponding
location x h(x) in g(x)

What if pixel lands between two pixels?

Answer add contribution to several pixels,
normalize later (splatting)

h(x)
x
x
f(x)
g(x)
19
Inverse Warping

Get each pixel g(x) from its corresponding
location x h-1(x) in f(x)

What if pixel comes from between two pixels?

h-1(x)
x
x
f(x)
g(x)
20
Inverse Warping

Get each pixel g(x) from its corresponding
location x h-1(x) in f(x)

What if pixel comes from between two pixels?

Answer resample color value from interpolated
(prefiltered) source image

x
x
f(x)
g(x)
21
Interpolation

Possible interpolation filters
nearest neighbor
bilinear
bicubic (interpolating)
sinc / FIR
Needed to prevent jaggies and texture crawl
(see demo)

22
Prefiltering

Essential for downsampling (decimation) to
prevent aliasing
MIP-mapping Williams83
build pyramid (but what decimation filter?)
block averaging
Burt Adelson (5-tap binomial)
7-tap wavelet-based filter (better)
trilinear interpolation
bilinear within each 2 adjacent levels
linear blend between levels (determined by pixel
size)

23
Prefiltering

Essential for downsampling (decimation) to
prevent aliasing
Other possibilities
summed area tables
elliptically weighted Gaussians (EWA)
Heckbert86

24
Image Warping non-parametric

Specify more detailed warp function
Examples
splines
triangles
optical flow (per-pixel motion)

25
Image Warping non-parametric

Move control points to specify spline warp

26
Image Morphing
27
Image Morphing

How can we in-between two images?
Cross-dissolve(all examples from Gomes
et al.99)

28
Image Morphing

How can we in-between two images?
Warp then cross-dissolve morph

29
Warp specification

How can we specify the warp?
Specify corresponding points
interpolate to a complete warping
function
Nielson, Scattered Data Modeling, IEEE CGA93

30
Warp specification

How can we specify the warp?
Specify corresponding vectors
interpolate to a complete warping function

31
Warp specification

How can we specify the warp?
Specify corresponding vectors
interpolate Beier Neely, SIGGRAPH92

32
Warp specification

How can we specify the warp?
Specify corresponding spline control points
interpolate to a complete warping function

33
Final Morph Result
34
Motion estimation
35
Classes of Techniques

Feature-based methods
Extract salient visual features (corners,
textured areas) and track them over multiple
frames
Analyze the global pattern of motion vectors of
these features
Sparse motion fields, but possibly robust
tracking
Suitable especially when image motion is large
(10-s of pixels)
Direct-methods
Directly recover image motion from
spatio-temporal image brightness variations
Global motion parameters directly recovered
without an intermediate feature motion
calculation
Dense motion fields, but more sensitive to
appearance variations
Suitable for video and when image motion is small
(lt 10 pixels)

36
The Brightness Constraint

Brightness Constancy Equation

Or, better still, Minimize
Linearizing (assuming small (u,v))
37
Gradient Constraint (or the Optical Flow
Constraint)
38
Local Patch Analysis
39
Patch Translation Lucas-Kanade
Assume a single velocity for all pixels within an
image patch
Minimizing
LHS sum of the 2x2 outer product tensor of the
gradient vector
40
The Aperture Problem
and
Let

Algorithm At each pixel compute by
solving
M is singular if all gradient vectors point in
the same direction
e.g., along an edge
of course, trivially singular if the summation
is over a single pixel or there is no texture
i.e., only normal flow is available (aperture
problem)
Corners and textured areas are OK

41
Aperture Problem and Normal Flow
42
Local Patch Analysis
43
Iterative Refinement

Estimate velocity at each pixel using one
iteration of Lucas and Kanade estimation
Warp one image toward the other using the
estimated flow field
(easier said than done)
Refine estimate by repeating the process

44
Optical Flow Iterative Estimation
x
x0
45
Optical Flow Iterative Estimation
46
Optical Flow Iterative Estimation
47
Optical Flow Iterative Estimation
x
x0
48
Optical Flow Iterative Estimation

Some Implementation Issues
warping is not easy (make sure that errors in
interpolation and warping are not bigger than the
estimate refinement)
warp one image, take derivatives of the other so
you dont need to re-compute the gradient after
each iteration.
often useful to low-pass filter the images before
motion estimation (for better derivative
estimation, and somewhat better linear
approximations to image intensity)

49
Optical Flow Iterative Estimation

Some Implementation Issues
warping is not easy (make sure that errors in
interpolation and warping are not bigger than the
estimate refinement)
warp one image, take derivatives of the other so
you dont need to re-compute the gradient after
each iteration.
often useful to low-pass filter the images before
motion estimation (for better derivative
estimation, and somewhat better linear
approximations to image intensity)

50
Optical Flow Aliasing
Temporal aliasing causes ambiguities in optical
flow because images can have many pixels with the
same intensity. I.e., how do we know which
correspondence is correct?
nearest match is correct (no aliasing)
nearest match is incorrect (aliasing)
To overcome aliasing coarse-to-fine estimation.
51
Iterative refinement
BUT!!
52
Limits of the gradient method

Fails when intensity structure in window is poor
Fails when the displacement is large (typical
operating range is motion of 1 pixel)
Linearization of brightness is suitable only for
small displacements
Also, brightness is not strictly constant in
images
actually less problematic than it appears, since
we can pre-filter images to make them look similar

53
Coarse-to-Fine Estimation
54
Coarse-to-Fine Estimation
I
J
J
Jw
I
refine
warp

J
I
Jw
pyramid construction
pyramid construction
refine
warp

J
I
Jw
refine
warp

55
Global Motion Models

2D Models
Affine
Quadratic
Planar projective transform (Homography)
3D Models
Instantaneous camera motion models
Homographyepipole
PlaneParallax

56
Example Affine Motion

Substituting into the B.C. Equation

Each pixel provides 1 linear constraint in 6
global unknowns
57
Other 2D Motion Models
58
3D Motion Models
59
Correlation and SSD

For larger displacements, do template matching
Define a small area around a pixel as the
template
Match the template against each pixel within a
search area in next image.
Use a match measure such as correlation,
normalized correlation, or sum-of-squares
difference
Choose the maximum (or minimum) as the match
Sub-pixel interpolation also possible

60
SSD Surface Textured area
61
SSD Surface -- Edge
62
SSD homogeneous area
63
Discrete Search vs. Gradient Based Estimation
Consider image I translated by
The discrete search method simply searches for
the best estimate. The gradient method linearizes
the intensity function and solves for the estimate
64
Uncertainty in Local Estimation
Consider image I translated by
Now,
This assumes uniform priors on the velocity field
65
Quadratic Approximation
When
are small
After some fiddling around, we can show
66
Posterior uncertainty
At edges is singular, but just take
pseudo-inverse
Note that the error is always convex, since
is positive semi-definite i.e., even for
occluded points and other false matches, this is
the case seems a bit odd!
67
Match plus confidence

Numerically compute error for various
Search for the peak
Numerically fit a qudratic to around the
peak
Find sub-pixel estimate for and covariance
If the matrix is negative, it is false match
Or even better, if you can afford it, simply
maintain a discrete sampling of and

68
Quadratic Approximation and Covariance Estimation
When
where
69
Correlation Window Size

Small windows lead to more false matches
Large windows are better this way, but
Neighboring flow vectors will be more correlated
(since the template windows have more in common)
Flow resolution also lower (same reason)
More expensive to compute
Another way to look at this
Small windows are good for local search but more
precise and less smooth
Large windows good for global search but less
precise and more smooth method

70
Optical Flow Robust Estimation
Issue 7 Noise distributions are often
non-Gaussian, having much heavier tails. Noise
samples from the tails are called outliers.

Sources of outliers (multiple motions)
specularities / highlights
jpeg artifacts / interlacing / motion blur
multiple motions (occlusion boundaries,
transparency)

71
Robust Estimation

Noise distributions are often non-Gaussian,
having much heavier tails. Noise samples from
the tails are called outliers.
Sources of outliers (multiple motions)
specularities / highlights
jpeg artifacts / interlacing / motion blur
multiple motions (occlusion boundaries,
transparency)

72
Robust Estimation
Standard Least Squares Estimation allows too much
influence for outlying points
73
Robust Estimation
Robust gradient constraint
Robust SSD
74
Robust Estimation
75
Layered Motion Models
Layered models provide a 2.5 representation, like
cardboard cutouts.
Key players

intensity (appearance)
alpha map (opacity)
warp maps (motion)

76
Layered Scene Representations
77
Motion representations

How can we describe this scene?

78
Block-based motion prediction

Break image up into square blocks
Estimate translation for each block
Use this to predict next frame, code difference
(MPEG-2)

79
Layered motion

Break image sequence up into layers
?
Describe each layers motion

80
Outline

Why layers?
2-D layers Wang Adelson 94 Weiss 97
3-D layers Baker et al. 98
Layered Depth Images Shade et al. 98
Transparency Szeliski et al. 00

81
Layered motion

Advantages
can represent occlusions / disocclusions
each layers motion can be smooth
video segmentation for semantic processing
Difficulties
how do we determine the correct number?
how do we assign pixels?
how do we model the motion?

82
Layers for video summarization
83
Background modeling (MPEG-4)

Convert masked images into a background sprite
for layered video coding

84
What are layers?

Wang Adelson, 1994
intensities
alphas
velocities

85
How do we composite them?
86
How do we form them?
87
How do we form them?
88
How do we estimate the layers?

compute coarse-to-fine flow
estimate affine motion in blocks (regression)
cluster with k-means
assign pixels to best fitting affine region
re-estimate affine motions in each region

89
Layer synthesis

For each layer
stabilize the sequence with the affine motion
compute median value at each pixel
Determine occlusion relationships

90
Results
91
What if the motion is not affine?

Use a regularized (smooth) motion field
Weiss, CVPR97

92
A Layered Approach To Stereo Reconstruction

Simon Baker, Richard Szeliski and P. Anandan
CVPR98

93
Layered Stereo

Assign pixel to different layers (objects,
sprites)
already covered in Stereo Lecture 2

94
Layer extraction from multiple images containing
reflections and transparency

Richard Szeliski
Shai Avidan
P. Anandan
CVPR2000
extra bonus material

95
Transparent motion

Photograph (Lee) and reflection (Michael)

96
Previous work

Physics-based vision and polarizationShafer et
al. Wolff Nayar et al.
Perception of transparency Adelson
Transparent motion estimationShizawa Mase
Bergen et al. Irani et al. Darrell
Simoncelli
3-frame layer recovery Bergen et al.

97
Problem formulation
MotionX,i( )
X

Y
MotionY,i( )
98
Image formation model

Pure additive mixing of positive signals
mk(x) ?l Wkl ? fl(x)
or
mk ?l Wkl fl
Assume motion is planar (perspective transform,
aka homography)

99
Two processing stages

Estimate the motions and initial layer estimates
Compute optimal layer estimates (for known motion)

100
Dominant motion estimation

Stabilize sequence by dominant motionrobus
t affine Bergen et al. 92 Szeliski Shum

101
Dominant layer estimate

How do we form composite (estimate)?

102
Average?
103
Median?

Hint all layers are non-negative

104
Min-composite

Smallest value is over-estimate of layer

105
Difference sequence

Subtract min-composite from original image
?

original - min composite
difference image
106
Min composite
Intensity
Time
(overestimate of background layer)
107
Difference sequence
(underestimate of foreground layer)
108
Stabilizing secondary motion
109
Max-composite
Largest value is under-estimate of layer
110
Min-max alternation

Subtract secondary layer (under-estimate) from
original sequence
Re-compute dominant motion and better
min-composite
Iterate
Does this process converge?

111
Min-max alternation

Does this process converge?
in theory yes
each iteration reduces number of mis-estimated
pixels (tightens the bounds) proof in paper

112
Min-max alternation

Does this process converge?
in practice no
resampling errors and noise both lead to
divergence discussion in paperresampling
error noisy

113
Two processing stages

Estimate the motions and initial layer estimates
Compute optimal layer estimates (for known motion)

114
Optimal estimation

Recall additive mixing of positive signals
mk ?l Wkl fl
Use constrained least squares(quadratic
programming)
min ?k ?l Wkl fl mk 2 s.t. fl ? 0

115
Least squares example
background foreground
116
Uniqueness of solution

If any layer does not have a black region,
i.e., if fl ? c, then can add this offset to
another layer (and subtract it from fl)

background foreground
117
Degeneracies in solution

If motion is degenerate (e.g., horizontal),
regions (scanlines) decouple (w/o MRF)

mixed
118
Noise sensitivity

In general, low-frequency components hard to
recover for small motions

? recovered ?
mixed
? scaled errors ?
119
Three-layer example

3 layers with general motion works well

120
Complete algorithm

Dominant motion with min-composites
Difference (residual) images
Non-dominant motion on differences
Improve the motion estimates
Unconstrained least-squares problem
Constrained least-squares problem

121
Complete example
original
stabilized
122
Complete example
difference
stabilized
123
Final Results

124
Another example

original stabilized min-comp. resid.? 2

125
Results Anne and books

original background foreground (photo)
126
Transparent layer recovery

Pure (additive) mixing of intensities
simple constrained least squares problem
degeneracies for simple or small motions
Processing stages
dominant motion estimation
min- and max-composites to initialize
optimization of motion and layers

127
Future work

Mitigating degeneracies (regularization)
Opaque layers (? estimation)
Non-planar geometry (parallax)

128
Bibliography

L. Williams. Pyramidal parametrics.
Computer Graphics, 17(3)1--11, July 1983.
L. G. Brown. A survey of image registration
techniques.
Computing Surveys, 24(4)325--376, December 1992.
C. D. Kuglin and D. C. Hines. The phase
correlation image alignment method.
In IEEE 1975 Conference on Cybernetics and
Society, pages
163--165, New York, September 1975.
J. Gomes, L. Darsa, B. Costa, and L. Velho.
Warping and Morphing of Graphical Objects.
Morgan Kaufmann Publishers, San Francisco Altos,
California, 1999.
G. M. Nielson. Scattered data modeling.
IEEE Computer Graphics and Applications,
13(1)60--70, January 1993.
T. Beier and S. Neely. Feature-based image
metamorphosis.
Computer Graphics (SIGGRAPH'92), 26(2)35--42,
July 1992.

129
Bibliography

J. R. Bergen, P. Anandan, K. J. Hanna, and R.
Hingorani. Hierarchical model-based motion
estimation. In ECCV92, pp. 237252, Italy, May
1992.
M. J. Black and P. Anandan. The robust estimation
of multiple motions Parametric and
piecewise-smooth flow fields. Comp. Vis. Image
Understanding, 63(1)75104, 1996.
H. S. Sawhney and S. Ayer. Compact representation
of videos through dominant multiple motion
estimation. IEEE Trans. Patt. Anal. Mach. Intel.,
18(8)814830, Aug. 1996.
Y. Weiss. Smoothness in layers Motion
segmentation using nonparametric mixture
estimation. In CVPR97, pp. 520526, June 1997.

130
Bibliography

J. Y. A. Wang and E. H. Adelson. Representing
moving images with layers. IEEE Transactions on
Image Processing, 3(5)625--638, September 1994.
Y. Weiss and E. H. Adelson. A unified mixture
framework for motion segmentation Incorporating
spatial coherence and estimating the number of
models. In IEEE Computer Society Conference on
Computer Vision and Pattern Recognition
(CVPR'96), pages 321--326, San Francisco,
California, June 1996.
Y. Weiss. Smoothness in layers Motion
segmentation using nonparametric mixture
estimation. In IEEE Computer Society Conference
on Computer Vision and Pattern Recognition
(CVPR'97), pages 520--526, San Juan, Puerto Rico,
June 1997.
P. R. Hsu, P. Anandan, and S. Peleg. Accurate
computation of optical flow by using layered
motion representations. In Twelfth International
Conference on Pattern Recognition (ICPR'94),
pages 743--746, Jerusalem, Israel, October 1994.
IEEE Computer Society Press

131
Bibliography

T. Darrell and A. Pentland. Cooperative robust
estimation using layers of support. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 17(5)474--487, May 1995.
S. X. Ju, M. J. Black, and A. D. Jepson. Skin
and bones Multi-layer, locally affine, optical
flow and regularization with transparency. In
IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR'96), pages
307--314, San Francisco, California, June 1996.
M. Irani, B. Rousso, and S. Peleg. Computing
occluding and transparent motions. International
Journal of Computer Vision, 12(1)5--16, January
1994.
H. S. Sawhney and S. Ayer. Compact
representation of videos through dominant
multiple motion estimation. IEEE Transactions on
Pattern Analysis and Machine Intelligence,
18(8)814--830, August 1996.
M.-C. Lee et al. A layered video object coding
system using sprite and affine motion model.
IEEE Transactions on Circuits and Systems for
Video Technology, 7(1)130--145, February 1997.

132
Bibliography

S. Baker, R. Szeliski, and P. Anandan. A layered
approach to stereo reconstruction. In IEEE
CVPR'98, pages 434--441, Santa Barbara, June
1998.
R. Szeliski, S. Avidan, and P. Anandan. Layer
extraction from multiple images containing
reflections and transparency. In IEEE CVPR'2000,
volume 1, pages 246--253, Hilton Head Island,
June 2000.
J. Shade, S. Gortler, L.-W. He, and R. Szeliski.
Layered depth images. In Computer Graphics
(SIGGRAPH'98) Proceedings, pages 231--242,
Orlando, July 1998. ACM SIGGRAPH.
S. Laveau and O. D. Faugeras. 3-d scene
representation as a collection of images. In
Twelfth International Conference on Pattern
Recognition (ICPR'94), volume A, pages 689--691,
Jerusalem, Israel, October 1994. IEEE Computer
Society Press.
P. H. S. Torr, R. Szeliski, and P. Anandan. An
integrated Bayesian approach to layer extraction
from image sequences. In Seventh ICCV'98, pages
983--990, Kerkyra, Greece, September 1999.