Motion estimation - PowerPoint PPT Presentation

About This Presentation
Title:

Motion estimation

Description:

Motion estimation Introduction to Computer Vision CS223B, Winter 2005 Richard Szeliski Why Visual Motion? Visual Motion can be annoying Camera instabilities, jitter ... – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 117
Provided by: robotsSta
Category:

less

Transcript and Presenter's Notes

Title: Motion estimation


1
Motion estimation
  • Introduction to Computer VisionCS223B, Winter
    2005Richard Szeliski

2
Why Visual Motion?
  • Visual Motion can be annoying
  • Camera instabilities, jitter
  • Measure it. Remove it.
  • Visual Motion indicates dynamics in the scene
  • Moving objects, behavior
  • Track objects and analyze trajectories
  • Visual Motion reveals spatial layout of the scene
  • Motion parallax

3
Todays lecture
  • Motion estimation
  • background image pyramids, image warping
  • application image morphing
  • parametric motion (review)
  • optic flow
  • layered motion models

4
Image Pyramids
5
Image Pyramids
6
Pyramid Creation
filter mask
Gaussian Pyramid
  • Laplacian Pyramid
  • Created from Gaussianpyramid by subtractionLl
    Gl expand(Gl1)

7
Octaves in the Spatial Domain
Lowpass Images
  • Bandpass Images

8
Pyramids
  • Advantages of pyramids
  • Faster than Fourier transform
  • Avoids ringing artifacts
  • Many applications
  • small images faster to process
  • good for multiresolution processing
  • compression
  • progressive transmission
  • Known as mip-maps in graphics community
  • Precursor to wavelets
  • Wavelets also have these advantages

9
Laplacian level 0
left pyramid
right pyramid
blended pyramid
10
Pyramid Blending
11
Image Warping
12
Image Warping
  • image filtering change range of image
  • g(x) h(f(x))
  • image warping change domain of image
  • g(x) f(h(x))

13
Image Warping
  • image filtering change range of image
  • g(x) h(f(x))
  • image warping change domain of image
  • g(x) f(h(x))

f
g
f
g
14
Parametric (global) warping
  • Examples of parametric warps

aspect
rotation
translation
perspective
cylindrical
affine
15
2D coordinate transformations
  • translation x x t x (x,y)
  • rotation x R x t
  • similarity x s R x t
  • affine x A x t
  • perspective x ? H x x (x,y,1) (x is a
    homogeneous coordinate)
  • These all form a nested group (closed w/ inv.)

16
Image Warping
  • Given a coordinate transform x h(x) and a
    source image f(x), how do we compute a
    transformed image g(x) f(h(x))?

h(x)
x
x
f(x)
g(x)
17
Forward Warping
  • Send each pixel f(x) to its corresponding
    location x h(x) in g(x)
  • What if pixel lands between two pixels?

h(x)
x
x
f(x)
g(x)
18
Forward Warping
  • Send each pixel f(x) to its corresponding
    location x h(x) in g(x)
  • What if pixel lands between two pixels?
  • Answer add contribution to several pixels,
    normalize later (splatting)

h(x)
x
x
f(x)
g(x)
19
Inverse Warping
  • Get each pixel g(x) from its corresponding
    location x h-1(x) in f(x)
  • What if pixel comes from between two pixels?

h-1(x)
x
x
f(x)
g(x)
20
Inverse Warping
  • Get each pixel g(x) from its corresponding
    location x h-1(x) in f(x)
  • What if pixel comes from between two pixels?
  • Answer resample color value from interpolated
    (prefiltered) source image

x
x
f(x)
g(x)
21
Interpolation
  • Possible interpolation filters
  • nearest neighbor
  • bilinear
  • bicubic (interpolating)
  • sinc / FIR
  • Needed to prevent jaggies and texture crawl
    (see demo)

22
Prefiltering
  • Essential for downsampling (decimation) to
    prevent aliasing
  • MIP-mapping Williams83
  • build pyramid (but what decimation filter?)
  • block averaging
  • Burt Adelson (5-tap binomial)
  • 7-tap wavelet-based filter (better)
  • trilinear interpolation
  • bilinear within each 2 adjacent levels
  • linear blend between levels (determined by pixel
    size)

23
Prefiltering
  • Essential for downsampling (decimation) to
    prevent aliasing
  • Other possibilities
  • summed area tables
  • elliptically weighted Gaussians (EWA)
    Heckbert86

24
Image Warping non-parametric
  • Specify more detailed warp function
  • Examples
  • splines
  • triangles
  • optical flow (per-pixel motion)

25
Image Warping non-parametric
  • Move control points to specify spline warp

26
Image Morphing
27
Image Morphing
  • How can we in-between two images?
  • Cross-dissolve(all examples from Gomes
    et al.99)

28
Image Morphing
  • How can we in-between two images?
  • Warp then cross-dissolve morph

29
Warp specification
  • How can we specify the warp?
  • Specify corresponding points
  • interpolate to a complete warping
    function
  • Nielson, Scattered Data Modeling, IEEE CGA93

30
Warp specification
  • How can we specify the warp?
  • Specify corresponding vectors
  • interpolate to a complete warping function

31
Warp specification
  • How can we specify the warp?
  • Specify corresponding vectors
  • interpolate Beier Neely, SIGGRAPH92

32
Warp specification
  • How can we specify the warp?
  • Specify corresponding spline control points
  • interpolate to a complete warping function

33
Final Morph Result
34
Motion estimation
35
Classes of Techniques
  • Feature-based methods
  • Extract salient visual features (corners,
    textured areas) and track them over multiple
    frames
  • Analyze the global pattern of motion vectors of
    these features
  • Sparse motion fields, but possibly robust
    tracking
  • Suitable especially when image motion is large
    (10-s of pixels)
  • Direct-methods
  • Directly recover image motion from
    spatio-temporal image brightness variations
  • Global motion parameters directly recovered
    without an intermediate feature motion
    calculation
  • Dense motion fields, but more sensitive to
    appearance variations
  • Suitable for video and when image motion is small
    (lt 10 pixels)

36
The Brightness Constraint
  • Brightness Constancy Equation

Or, better still, Minimize
Linearizing (assuming small (u,v))
37
Gradient Constraint (or the Optical Flow
Constraint)
38
Local Patch Analysis
39
Patch Translation Lucas-Kanade
Assume a single velocity for all pixels within an
image patch
Minimizing
LHS sum of the 2x2 outer product tensor of the
gradient vector
40
The Aperture Problem
and
Let
  • Algorithm At each pixel compute by
    solving
  • M is singular if all gradient vectors point in
    the same direction
  • e.g., along an edge
  • of course, trivially singular if the summation
    is over a single pixel or there is no texture
  • i.e., only normal flow is available (aperture
    problem)
  • Corners and textured areas are OK

41
Aperture Problem and Normal Flow
42
Local Patch Analysis
43
Iterative Refinement
  • Estimate velocity at each pixel using one
    iteration of Lucas and Kanade estimation
  • Warp one image toward the other using the
    estimated flow field
  • (easier said than done)
  • Refine estimate by repeating the process

44
Optical Flow Iterative Estimation
x
x0
45
Optical Flow Iterative Estimation
46
Optical Flow Iterative Estimation
47
Optical Flow Iterative Estimation
x
x0
48
Optical Flow Iterative Estimation
  • Some Implementation Issues
  • warping is not easy (make sure that errors in
    interpolation and warping are not bigger than the
    estimate refinement)
  • warp one image, take derivatives of the other so
    you dont need to re-compute the gradient after
    each iteration.
  • often useful to low-pass filter the images before
    motion estimation (for better derivative
    estimation, and somewhat better linear
    approximations to image intensity)

49
Optical Flow Iterative Estimation
  • Some Implementation Issues
  • warping is not easy (make sure that errors in
    interpolation and warping are not bigger than the
    estimate refinement)
  • warp one image, take derivatives of the other so
    you dont need to re-compute the gradient after
    each iteration.
  • often useful to low-pass filter the images before
    motion estimation (for better derivative
    estimation, and somewhat better linear
    approximations to image intensity)

50
Optical Flow Aliasing
Temporal aliasing causes ambiguities in optical
flow because images can have many pixels with the
same intensity. I.e., how do we know which
correspondence is correct?
nearest match is correct (no aliasing)
nearest match is incorrect (aliasing)
To overcome aliasing coarse-to-fine estimation.
51
Iterative refinement
BUT!!
52
Limits of the gradient method
  • Fails when intensity structure in window is poor
  • Fails when the displacement is large (typical
    operating range is motion of 1 pixel)
  • Linearization of brightness is suitable only for
    small displacements
  • Also, brightness is not strictly constant in
    images
  • actually less problematic than it appears, since
    we can pre-filter images to make them look similar

53
Coarse-to-Fine Estimation
54
Coarse-to-Fine Estimation
I
J
J
Jw
I
refine
warp

J
I
Jw
pyramid construction
pyramid construction
refine
warp

J
I
Jw
refine
warp

55
Global Motion Models
  • 2D Models
  • Affine
  • Quadratic
  • Planar projective transform (Homography)
  • 3D Models
  • Instantaneous camera motion models
  • Homographyepipole
  • PlaneParallax

56
Example Affine Motion
  • Substituting into the B.C. Equation

Each pixel provides 1 linear constraint in 6
global unknowns
57
Other 2D Motion Models
58
3D Motion Models
59
Correlation and SSD
  • For larger displacements, do template matching
  • Define a small area around a pixel as the
    template
  • Match the template against each pixel within a
    search area in next image.
  • Use a match measure such as correlation,
    normalized correlation, or sum-of-squares
    difference
  • Choose the maximum (or minimum) as the match
  • Sub-pixel interpolation also possible

60
SSD Surface Textured area
61
SSD Surface -- Edge
62
SSD homogeneous area
63
Discrete Search vs. Gradient Based Estimation
Consider image I translated by
The discrete search method simply searches for
the best estimate. The gradient method linearizes
the intensity function and solves for the estimate
64
Uncertainty in Local Estimation
Consider image I translated by
Now,
This assumes uniform priors on the velocity field
65
Quadratic Approximation
When
are small
After some fiddling around, we can show
66
Posterior uncertainty
At edges is singular, but just take
pseudo-inverse
Note that the error is always convex, since
is positive semi-definite i.e., even for
occluded points and other false matches, this is
the case seems a bit odd!
67
Match plus confidence
  • Numerically compute error for various
  • Search for the peak
  • Numerically fit a qudratic to around the
    peak
  • Find sub-pixel estimate for and covariance
  • If the matrix is negative, it is false match
  • Or even better, if you can afford it, simply
    maintain a discrete sampling of and

68
Quadratic Approximation and Covariance Estimation
When
where
69
Correlation Window Size
  • Small windows lead to more false matches
  • Large windows are better this way, but
  • Neighboring flow vectors will be more correlated
    (since the template windows have more in common)
  • Flow resolution also lower (same reason)
  • More expensive to compute
  • Another way to look at this
  • Small windows are good for local search but more
    precise and less smooth
  • Large windows good for global search but less
    precise and more smooth method

70
Optical Flow Robust Estimation
Issue 7 Noise distributions are often
non-Gaussian, having much heavier tails. Noise
samples from the tails are called outliers.
  • Sources of outliers (multiple motions)
  • specularities / highlights
  • jpeg artifacts / interlacing / motion blur
  • multiple motions (occlusion boundaries,
    transparency)

71
Robust Estimation
  • Noise distributions are often non-Gaussian,
    having much heavier tails. Noise samples from
    the tails are called outliers.
  • Sources of outliers (multiple motions)
  • specularities / highlights
  • jpeg artifacts / interlacing / motion blur
  • multiple motions (occlusion boundaries,
    transparency)

72
Robust Estimation
Standard Least Squares Estimation allows too much
influence for outlying points
73
Robust Estimation
Robust gradient constraint
Robust SSD
74
Robust Estimation
75
Layered Motion Models
Layered models provide a 2.5 representation, like
cardboard cutouts.
Key players
  • intensity (appearance)
  • alpha map (opacity)
  • warp maps (motion)

76
Layered Scene Representations
77
Motion representations
  • How can we describe this scene?

78
Block-based motion prediction
  • Break image up into square blocks
  • Estimate translation for each block
  • Use this to predict next frame, code difference
    (MPEG-2)

79
Layered motion
  • Break image sequence up into layers
  • ?
  • Describe each layers motion

80
Outline
  • Why layers?
  • 2-D layers Wang Adelson 94 Weiss 97
  • 3-D layers Baker et al. 98
  • Layered Depth Images Shade et al. 98
  • Transparency Szeliski et al. 00

81
Layered motion
  • Advantages
  • can represent occlusions / disocclusions
  • each layers motion can be smooth
  • video segmentation for semantic processing
  • Difficulties
  • how do we determine the correct number?
  • how do we assign pixels?
  • how do we model the motion?

82
Layers for video summarization
83
Background modeling (MPEG-4)
  • Convert masked images into a background sprite
    for layered video coding

84
What are layers?
  • Wang Adelson, 1994
  • intensities
  • alphas
  • velocities

85
How do we composite them?
86
How do we form them?
87
How do we form them?
88
How do we estimate the layers?
  1. compute coarse-to-fine flow
  2. estimate affine motion in blocks (regression)
  3. cluster with k-means
  4. assign pixels to best fitting affine region
  5. re-estimate affine motions in each region

89
Layer synthesis
  • For each layer
  • stabilize the sequence with the affine motion
  • compute median value at each pixel
  • Determine occlusion relationships

90
Results
91
What if the motion is not affine?
  • Use a regularized (smooth) motion field
  • Weiss, CVPR97

92
A Layered Approach To Stereo Reconstruction
  • Simon Baker, Richard Szeliski and P. Anandan
  • CVPR98

93
Layered Stereo
  • Assign pixel to different layers (objects,
    sprites)
  • already covered in Stereo Lecture 2

94
Layer extraction from multiple images containing
reflections and transparency
  • Richard Szeliski
  • Shai Avidan
  • P. Anandan
  • CVPR2000
  • extra bonus material

95
Transparent motion
  • Photograph (Lee) and reflection (Michael)

96
Previous work
  • Physics-based vision and polarizationShafer et
    al. Wolff Nayar et al.
  • Perception of transparency Adelson
  • Transparent motion estimationShizawa Mase
    Bergen et al. Irani et al. Darrell
    Simoncelli
  • 3-frame layer recovery Bergen et al.

97
Problem formulation
MotionX,i( )
X

Y
MotionY,i( )
98
Image formation model
  • Pure additive mixing of positive signals
  • mk(x) ?l Wkl ? fl(x)
  • or
  • mk ?l Wkl fl
  • Assume motion is planar (perspective transform,
    aka homography)

99
Two processing stages
  • Estimate the motions and initial layer estimates
  • Compute optimal layer estimates (for known motion)

100
Dominant motion estimation
  • Stabilize sequence by dominant motionrobus
    t affine Bergen et al. 92 Szeliski Shum

101
Dominant layer estimate
  • How do we form composite (estimate)?

102
Average?
103
Median?
  • Hint all layers are non-negative

104
Min-composite
  • Smallest value is over-estimate of layer

105
Difference sequence
  • Subtract min-composite from original image
    ?

original - min composite
difference image
106
Min composite
Intensity
Time
(overestimate of background layer)
107
Difference sequence
(underestimate of foreground layer)
108
Stabilizing secondary motion
109
Max-composite
Largest value is under-estimate of layer
110
Min-max alternation
  • Subtract secondary layer (under-estimate) from
    original sequence
  • Re-compute dominant motion and better
    min-composite
  • Iterate
  • Does this process converge?

111
Min-max alternation
  • Does this process converge?
  • in theory yes
  • each iteration reduces number of mis-estimated
    pixels (tightens the bounds) proof in paper

112
Min-max alternation
  • Does this process converge?
  • in practice no
  • resampling errors and noise both lead to
    divergence discussion in paperresampling
    error noisy

113
Two processing stages
  • Estimate the motions and initial layer estimates
  • Compute optimal layer estimates (for known motion)

114
Optimal estimation
  • Recall additive mixing of positive signals
  • mk ?l Wkl fl
  • Use constrained least squares(quadratic
    programming)
  • min ?k ?l Wkl fl mk 2 s.t. fl ? 0

115
Least squares example
background foreground
116
Uniqueness of solution
  • If any layer does not have a black region,
    i.e., if fl ? c, then can add this offset to
    another layer (and subtract it from fl)

background foreground
117
Degeneracies in solution
  • If motion is degenerate (e.g., horizontal),
    regions (scanlines) decouple (w/o MRF)

mixed
118
Noise sensitivity
  • In general, low-frequency components hard to
    recover for small motions

? recovered ?
mixed
? scaled errors ?
119
Three-layer example
  • 3 layers with general motion works well

120
Complete algorithm
  • Dominant motion with min-composites
  • Difference (residual) images
  • Non-dominant motion on differences
  • Improve the motion estimates
  • Unconstrained least-squares problem
  • Constrained least-squares problem

121
Complete example
original
stabilized
122
Complete example
difference
stabilized
123
Final Results

124
Another example
  • original stabilized min-comp. resid.? 2

125
Results Anne and books

original background foreground (photo)
126
Transparent layer recovery
  • Pure (additive) mixing of intensities
  • simple constrained least squares problem
  • degeneracies for simple or small motions
  • Processing stages
  • dominant motion estimation
  • min- and max-composites to initialize
  • optimization of motion and layers

127
Future work
  • Mitigating degeneracies (regularization)
  • Opaque layers (? estimation)
  • Non-planar geometry (parallax)

128
Bibliography
  • L. Williams. Pyramidal parametrics.
  • Computer Graphics, 17(3)1--11, July 1983.
  • L. G. Brown. A survey of image registration
    techniques.
  • Computing Surveys, 24(4)325--376, December 1992.
  • C. D. Kuglin and D. C. Hines. The phase
    correlation image alignment method.
  • In IEEE 1975 Conference on Cybernetics and
    Society, pages
  • 163--165, New York, September 1975.
  • J. Gomes, L. Darsa, B. Costa, and L. Velho.
    Warping and Morphing of Graphical Objects.
  • Morgan Kaufmann Publishers, San Francisco Altos,
    California, 1999.
  • G. M. Nielson. Scattered data modeling.
  • IEEE Computer Graphics and Applications,
    13(1)60--70, January 1993.
  • T. Beier and S. Neely. Feature-based image
    metamorphosis.
  • Computer Graphics (SIGGRAPH'92), 26(2)35--42,
    July 1992.

129
Bibliography
  • J. R. Bergen, P. Anandan, K. J. Hanna, and R.
    Hingorani. Hierarchical model-based motion
    estimation. In ECCV92, pp. 237252, Italy, May
    1992.
  • M. J. Black and P. Anandan. The robust estimation
    of multiple motions Parametric and
    piecewise-smooth flow fields. Comp. Vis. Image
    Understanding, 63(1)75104, 1996.
  • H. S. Sawhney and S. Ayer. Compact representation
    of videos through dominant multiple motion
    estimation. IEEE Trans. Patt. Anal. Mach. Intel.,
    18(8)814830, Aug. 1996.
  • Y. Weiss. Smoothness in layers Motion
    segmentation using nonparametric mixture
    estimation. In CVPR97, pp. 520526, June 1997.

130
Bibliography
  • J. Y. A. Wang and E. H. Adelson. Representing
    moving images with layers. IEEE Transactions on
    Image Processing, 3(5)625--638, September 1994.
  • Y. Weiss and E. H. Adelson. A unified mixture
    framework for motion segmentation Incorporating
    spatial coherence and estimating the number of
    models. In IEEE Computer Society Conference on
    Computer Vision and Pattern Recognition
    (CVPR'96), pages 321--326, San Francisco,
    California, June 1996.
  • Y. Weiss. Smoothness in layers Motion
    segmentation using nonparametric mixture
    estimation. In IEEE Computer Society Conference
    on Computer Vision and Pattern Recognition
    (CVPR'97), pages 520--526, San Juan, Puerto Rico,
    June 1997.
  • P. R. Hsu, P. Anandan, and S. Peleg. Accurate
    computation of optical flow by using layered
    motion representations. In Twelfth International
    Conference on Pattern Recognition (ICPR'94),
    pages 743--746, Jerusalem, Israel, October 1994.
    IEEE Computer Society Press

131
Bibliography
  • T. Darrell and A. Pentland. Cooperative robust
    estimation using layers of support. IEEE
    Transactions on Pattern Analysis and Machine
    Intelligence, 17(5)474--487, May 1995.
  • S. X. Ju, M. J. Black, and A. D. Jepson. Skin
    and bones Multi-layer, locally affine, optical
    flow and regularization with transparency. In
    IEEE Computer Society Conference on Computer
    Vision and Pattern Recognition (CVPR'96), pages
    307--314, San Francisco, California, June 1996.
  • M. Irani, B. Rousso, and S. Peleg. Computing
    occluding and transparent motions. International
    Journal of Computer Vision, 12(1)5--16, January
    1994.
  • H. S. Sawhney and S. Ayer. Compact
    representation of videos through dominant
    multiple motion estimation. IEEE Transactions on
    Pattern Analysis and Machine Intelligence,
    18(8)814--830, August 1996.
  • M.-C. Lee et al. A layered video object coding
    system using sprite and affine motion model.
    IEEE Transactions on Circuits and Systems for
    Video Technology, 7(1)130--145, February 1997.

132
Bibliography
  • S. Baker, R. Szeliski, and P. Anandan. A layered
    approach to stereo reconstruction. In IEEE
    CVPR'98, pages 434--441, Santa Barbara, June
    1998.
  • R. Szeliski, S. Avidan, and P. Anandan. Layer
    extraction from multiple images containing
    reflections and transparency. In IEEE CVPR'2000,
    volume 1, pages 246--253, Hilton Head Island,
    June 2000.
  • J. Shade, S. Gortler, L.-W. He, and R. Szeliski.
    Layered depth images. In Computer Graphics
    (SIGGRAPH'98) Proceedings, pages 231--242,
    Orlando, July 1998. ACM SIGGRAPH.
  • S. Laveau and O. D. Faugeras. 3-d scene
    representation as a collection of images. In
    Twelfth International Conference on Pattern
    Recognition (ICPR'94), volume A, pages 689--691,
    Jerusalem, Israel, October 1994. IEEE Computer
    Society Press.
  • P. H. S. Torr, R. Szeliski, and P. Anandan. An
    integrated Bayesian approach to layer extraction
    from image sequences. In Seventh ICCV'98, pages
    983--990, Kerkyra, Greece, September 1999.
Write a Comment
User Comments (0)
About PowerShow.com