Title: Dense Motion Estimation
1Dense Motion Estimation
- Reading Szeliski, Chapter 8
2Dense Motion Estimation
3Dense Motion Estimation
- 2D motion in video sequence
- Object tracking
- Image stabilization
4Motion Estimation
- Error metric
- Compare images
- Search technique
- Full search -- simple but slow
- Hierarchical coarse-to-fine
- Fourier transforms
- Incremental methods
- Optical flow
- Multiple independent motions
5Translational Alignment
- Alignment between two images or image patches
6Translational Alignment
- Minimum of Sum of Squared Difference (SSD)
- Assumption corresponding pixel values remains
the same in the two images - ---- Brightness constancy constraint
7Robust Error Metrics
- Robust norm of error
- (Huber 1981 Hampel, Ronchetti, Rousseeuw et al.
1986 Black and Anandan 1996 Stewart 1999) - Sum of Absolute Difference (L1 norm)
Grows less quickly than the quadratic penalty
associated with least squares
ESAD is NOT differentiable at the origin, not
well suited to gradient descent approaches
8Robust Error Metric
- Smoothly varying function (Black and Rangarajan
(1996) ) - Quadratic for small values but
- grows more slowly away from the origin
- GemanMcClure function
9Spatially Varying Weights
- Pixels that may lie outside of the boundaries
- Partially or completely downweight the
contribution of certain pixels - Erase moving object for background alignment
- Multiple moving objects
Weighted (or Windowed) SSD function
10Weighted SSD
- Large range of potential motion
- Bias towards smaller overlap solutions
11Bias and Gain (Exposure Differences)
- For images being aligned were not taken with the
same exposure - Simple model of linear intensity variation
- --- Bias and Gain model
12Bias and Gain
- Least Squares with Bias and Gain
- Linear regression
- Color image
- Estimate bias and gain for each color channel
- Weighted prediction in video codecs
13Correlation
- Cross-Correlation
- Taking intensity difference
- Maximize the produce of two aligned images
Is Bias and Gain modeling unnecessary?
Bright patch exists in images
14Normalized Cross-Correlation
Mean images of the corresponding patches
- NCC in -1,1
- Works well when matching images taken with
different exposure - Degrades for noisy low-contrast regions (Zero
variance)
15Normalized Cross-Correlation
- Normalized SSD score (Criminisi, Shotton, Blake
et al., 2007). - Produce comparable results to NCC
- More efficient when applied to a large number of
overlapping patches using a moving average
technique
16Hierarchical Motion Estimation
- How can we find its minimum?
- Full search over some range of shifts
- Often used for block matching in motion
compensated video compression - Simple to implement but slow
- To accelerate the search process
- Hierarchical motion estimation
17Hierarchical Motion Estimation
- Steps
- Construct image pyramid
- At coarser levels, search over a smaller number
of discrete pixels - Motion estimation at coarse level is used to
initialize a smaller local search at the next
finer level - Not guaranteed to produce the same results as a
full search, but works almost as well and much
faster
18Hierarchical Motion Estimation
- Image downsampling
- Coarsest level search for the best that
minimize the difference between - Full search over the range
- Predict a likely displacement
- Search over displacement is repeated at the finer
level over a much narrower range - Incremental refinement step with warped image
19Incremental Refinement
- Nearest pixel integer pixel
- Higher accuracy is required for stabilization or
stitching - Sub-pixel estimates
- Evaluate several values (u,v) around the best
value - Interpolate the matching score to find the
analytic minimum - Gradient descent on SSD energy function
20Incremental Refinement
- SSD energy and Taylor series expansion
Lucas and Kanade (1981)
21Incremental Refinement
Optical flow constraint or brightness constancy
constraint
22Incremental Refinement
23Incremental Refinement
- For efficiency
-
- Precompute the Hessian and Jacobian image save
significant computation - Precompute the inner product between the gradient
field and shifted version of I1 allows the
iterative re-computation of ei to be performed
in constant time (independent of the number of
pixels)
24Incremental Refinement
- Iterations
- The effectiveness relies on the quality of Taylor
series approximation - When far away from the true displacement (say,
12 pixels), several iterations may be needed - It is possible to estimate a value for J_1 using
a least squares fit to a series of larger
displacements in order to increase the range of
convergence (Jurie and Dhome 2002) or to learn
a special-purpose recognizer for a given patch
25Incremental Refinement
- Stopping criterion
- monitor the magnitude of the displacement
correction u and to stop when it drops below a
certain threshold (say, 1/10 of a pixel) - For larger motions
- combine the incremental update rule with a
hierarchical coarse-to-fine search strategy
26Incremental Refinement
- Poorly conditioned because of lack of
two-dimensional texture in the patch being aligned
27Uncertainty Modeling
- Capture the reliability of a particular
patch-based motion estimate - Simplest model covariance matrix
- Captures the expected variance in the motion
estimate in all possible directions - Under small amounts of additive Gaussian noise
28Uncertainty modeling
- For larger amounts of noise, the linearization
performed by the LucasKanade algorithm is only
approximate - The minimum and maximum eigenvalues of the
Hessian A can now be interpreted as the (scaled)
inverse variances in the least-certain and
most-certain directions of motion.
29Bias and gain, weighting, and robust error metrics
- 44 system of equations to estimate
- Weighed SSD using Lucus-Kanade algorithm
- Robust Error metrics
- solved using the iteratively reweighted least
squares technique
308.2 Parametric Motion
- More sophisticated motion models
- Affine, has 4 unknowns
- Full search over possible range is impractical
- Lucas-Kanade algorithm ? parametric motion models
(Lucas and Kanade 1981 Rehg and Witkin 1991 Fuh
and Maragos 1991 Bergen, Anandan, Hanna et al.
1992 Shashua and Toelg 1997 Shashua and Wexler
2001 Baker and Matthews 2004).
31Parametric Motion
- Instead of using a single constant translation u
- Use a spatially varying motion field or
correspondence map
32Parametric Motion
33Incremental Refinement
- Jacobian
- (Gauss-Newton) Hessian
- Gradient weighted residual vector
34Patch-based Approximation
- Expensive computation of A, b
- N pixels and n parameters O(n2N)
- Image to sub-blocks Pj, only accumulate the
simpler 2x2 quantities
35Compositional Approach
- Complex parametric motion such as homography
- Warp target image I_1 to the current estimate
36Compositional Approach
- and are assumed to be fairly similar,
then only an incremental parametric motion is
required, i.e. the incremental motion can be
evaluated around -
Szeliski and Shum (1997)
37Compositional Approach
38Compositional Approach
- If the appearance of the warped and template
images is similar enough, we can replace the
gradient of with the gradient of - Pre-computate the Hessian matrix
- The residual vector b can also be partially
precomputed, i.e., the steepest descent images
can can be
precomputed and stored for later multiplication
with the ea
error images
39Inverse Compositional Algorithm
Baker and Matthews (2004)
- Rather than (conceptually) re-warping the warped
target image I_1(x), they instead warp the
template image I_0(x) and minimize - Identical to the forward warped algorithm with
- Gradients are replaced by
- Difference sign of e_i
40Inverse Compositional Algorithm
41Non-Linear Least Sequares
- Solve using
- Update
- The parameter is an additional damping
parameter used to ensure that the system takes a
downhill step in energy (squared error) and is
an essential component of the LevenbergMarquardt
algorithm
428.4 Optical Flow
- Optical flow or optic flow is the pattern of
apparent motion of objects, surfaces, and edges
in a visual scene caused by the relative motion
between an observer (an eye or a camera) and the
scene. - The concept of optical flow was first studied in
the 1940s and ultimately published by American
psychologist James J. Gibson4 as part of his
theory of affordance. - Optical flow techniques utilize this motion of
the objects surfaces, and edges - motion detection, object segmentation,
time-to-collision and focus of expansion
calculations, motion compensated encoding, and
stereo disparity measurement
438.4 Optical Flow
- Independent estimate of motion at each pixel
- Number of variables is twice the number of
measurements -- underconstrained problem - two typical approaches
- Patch-based or window-based approach
- Add smoothness the terms on ui using
regularization or Markov random fields and to
search for a global minimum
44Optical Flow
http//en.wikipedia.org/wiki/Optical_flow
- Phase correlation inverse of normalized
cross-power spectrum - Block-based methods minimizing sum of squared
differences or sum of absolute differences, or
maximizing normalized cross-correlation - Differential methods of estimating optical flow,
based on partial derivatives of the image signal
and/or the sought flow field and higher-order
partial derivatives, such as - LucasKanade Optical Flow Method regarding
image patches and an affine model for the flow
field - HornSchunck method optimizing a functional
based on residuals from the brightness constancy
constraint, and a particular regularization term
expressing the expected smoothness of the flow
field - BuxtonBuxton method based on a model of the
motion of edges in image sequences9 - BlackJepson method coarse optical flow via
correlation6 - General variational methods a range of
modifications/extensions of HornSchunck, using
other data terms and other smoothness terms. - Discrete optimization methods the search space
is quantized, and then image matching is
addressed through label assignment at every
pixel, such that the corresponding deformation
minimizes the distance between the source and the
target image.10 The optimal solution is often
recovered through min-cut max-flow algorithms,
linear programming or belief propagation methods.
45Optical Flow
- Regularization-based framework Horn and Schunck
(1981) - Instead of solving for each motion (or motion
update) independently - Simultaneously minimized over all flow vectors
u_i - Smoothness constraints
- Brightness constancy constraint
46Optical Flow
- Combine local and global flow estimation
- Using a locally aggregated Hessian as the
brightness constancy term - Replace per-pixel Hessian and
- with aggregated version
47Optical Flow
- Combine global (parametric) and local motion
models - Estimate either per-image or per-segment affine
motion models combined with per-pixel residual
corrections - Image brightness varying
- Gradient descent and coarse-to-fine continuation
methods to minimize the global energy function - Combinatorial optimization methods based on
Markov random fields
48Multi-frame Motion Estimation
- Filter the spatio-temporal volume using oriented
or steerable filters (Heeger 1988) - Spatio-temporal filtering uses a 3D volume around
each pixel to determine the best orientation in
spacetime, which corresponds to a pixels
velocity
49Multi-frame Motion Estimation
- Spatio-temporal filters have moderately large
extents, which severely degrades the quality of
their estimates near motion discontinuities - An alternative to full spatio-temporal filtering
is to estimate more local spatio-temporal
derivatives and use them inside a global
optimization framework to fill in textureless
regions (Bruhn,Weickert, and Schnorr 2005
Govindu 2006).
508.5 Layered Motion
- Global smoothness? Local neighborhood
constraints? - Visual motion is caused by the movement of a
number of objects at different depths - Pixels are grouped into appropriate objects or
layers - The pixel motions can be described more succintly
and estimated more reliably
51Layered Motion
52Layered Motion
- Compact representation
- Exploit the information available in multiple
video frames - Accurately modeling the appearance of pixels near
motion discontinuities - Image-based rendering
- Object-level video editing
53Layered Motion
Wang and Adelson (1994)
- How to compute layered representation of a video?
- Estimate affine motion models over a collection
of non-overlapping patches - Cluster the estimates using K-means
- Alternate between
- Assigning pixels to layers
- Recomputing the motion estimates for each layer
- Construct layers
- by warping and merging the various layer pieces
from all frames together - median filter(shape composite layers that are
robust to small intensity variations, infer
occlusion between layers)
54Layered Motion
55Layered Motion
Weiss and Adelson (1996)
- Probabilistic mixture model to
- infer both the optimal number of layers and
- the per-pixel layer assignments
- Per-layer affine motion ? smooth regularized
per-pixel motion (Weiss 1997) - Better handle curved layers
56Layered Motion
- Distinction between motion estimating and layer
assignments - Later estimating the layer colors
- Generalized to account for real-world rigid
motion scenes
Baker, Szeliski, and Anandan (1998)
57A Layered Approach to Stereo Reconstruction
Baker, Szeliski, and Anandan (1998)
- Motion of each frame
- Described using a 3D camera model
- Motion of each layer
- Described using 3D plane equation
- Per-pixel residual depth offsets
- Initial layers estimation
- Similar to Wang and Adelson, 1994
- Affine motion ? homography
- Final model refinement
- Jointly re-optimize the layer pixel color and
opacity and depth, plane, and motion parameters
- By minimizing the discrepency between the
re0synthesized and observed motion sequence
58A Layered Approach to Stereo Reconstruction
Baker, Szeliski, and Anandan (1998)
(g) before and (h) after residual depth estimation
59A Layered Approach to Stereo Reconstruction
Baker, Szeliski, and Anandan (1998)
- Motion boundaries and layer assignments are much
crisper - Individual layer color values are also shaper
- because of per-pixel depth offsets
- Require a rough initial assignment
- Improvement Torr, Szeliski, and Anandan, 2001
- Automated Bayesian techniques for
- initializing the system and
- Determining the optimal number of layers
60Layered Motion
- Active research area
- Sawhney and Ayer 1996
- Jojic and Frey 2001
- Xiao and Shah 2005
- Kumar, Torr, and Zisserman 2008
- Thayananthan, Iwasaki, and Cipolla 2008
- Schoenemann and Cremers 2008).
- Alternate between segmentation and estimation of
optical flow
61Transparent Layers and Reflections
- Reflection in windows, picture frames,
- Reflection Model ?how much intensity each layer
contributed to the final image
Glass surface
Image
62The amount of reflected light is quite low
compared to the transmitted light (the picture of
the girl) and yet the algorithm is still able to
recover both layers.
63Transparent Layers and Reflections
- If the motions of individual layers are known
- Suffer from low-frequency ambiguities
- Especially, the layers lacks dark pixels
- The motion is uni-directional
64Transparent Layers and Reflections
Szeliski, Avidan, and Anandan (2000)
- Simultaneous estimation of motion and layer
- Alternating between
- Robustly computing the motion layers
- Making conservative estimates of the layer
intensities - Final motion and layer
- Polished using gradient descent on joint
constrained least squares - Parametric motion models
- Only valid for planar reflectors scenes with
shallow depth - More extensions Swaminathan, Kang, Szeliski et
al. 2002 Criminisi, Kang, Swaminathan et al.
2005, Tsin, Kang, and Szeliski 2006