Title: Video Epitomes
1Video Epitomes
IEEE CVPR 2005San Diego, CAJune 22, 2005
- Vincent Cheung
- Probabilistic and Statistical Inference Group
- Electrical and Computer Engineering
- University of Toronto
- Joint Work Brendan J. Frey (U. Toronto)
- and Nebojsa Jojic (Microsoft Research)
2Image Epitome
- Jojic, N., Frey, B., Kannan, A. (2003).
Epitomic analysis of appearance and shape. In
Proc. IEEE ICCV. - Miniature, condensed version of the image
- Models the images textural components
- Applications
- object detection
- texture segmentation
- image retrieval
- compression
3Learning the Epitome
4Reconstruction with the Epitome
- Replace patches in an image / video with patches
from the epitome - Overlapping patches are averaged
- Made to agree using a variational parameter
5Missing Observations Scenarios
6Shifted Cumulative Sum Algorithm
- Compute
- Distances between all patches in input video and
all patches in epitome (E-Step) - Sufficient statistics (M-Step)
- Use cumulative sums to efficiently perform
computations that is invariant to patch size - Get computations for all patch sizes
simultaneously - Naïve O(Vep)
- Convolution/FFT O(Veloge)
- SCS O(Ve)
7Video Super-Resolution
- Super-resolve a low-resolution wide-angle video
given a high-resolution zoomed-in shot - Learn the video epitome of the zoomed-in sequence
- Use the high-resolution epitome to reconstruct
the low-resolution sequence
1
8Video Super-Resolution Result (1)
Low Res
Low Res
2
Epitome
9Video Super-Resolution Result (2)
10Learning from Videos with Missing Observations (1)
- Fill in missing portions of a video
- Learn the epitome only on observed values
- Initialize the missing data with random values
- Iteratively reconstruct, only updating the
missing values
11Learning from Videos with Missing Observations (2)
12Video Missing Channels Fill-in
- Each RGB color channel for each pixel missing
with 50 probability - We know whichchannels are missing
13Video Missing Channels Fill-in Result (1)
- Learn the epitome only on observed values
- Initialize the missing data with random values
- Iteratively reconstruct, only updating the
missing values
2,3
1
14Video Missing Channels Fill-in Result (2)
Epitome Result
Gaussian Filter Result
15Dropped Frames Recovery
- Streaming video with frames dropped
- Recovery only using the epitome of the corrupted
video
16Conclusion
- Extended the concept of epitomes to video
sequences - Compactly representing spatial and temporal
features - Efficient algorithm for learning epitomes
- Video epitome is a natural representation for
several applications - Super-resolution
- Inpainting
- Missing channels
- Dropped frames
- Videos available at
http//www.psi.toronto.edu/vincent/videoepitome.h
tml
17http//www.psi.toronto.edu/vincent/videoepitome.h
tml
18References
- Cheu05 Cheung, V., Frey, B., and Nebojsa, N.
(2005). Video epitomes. In Proc. IEEE CVPR. - Jojic03 Jojic, N., Frey, B., Kannan, A.
(2003). Epitomic analysis of appearance and
shape. In Proc. IEEE ICCV. - Frey03 Frey, B. Jojic, N. (2003). Advances
in algorithms for inference and learning in
complex probability models. IEEE Trans. PAMI.
19Future Work
- Determining the size of the epitome
- Dependent on the complexity of the image / video
- Minimum description length
- Variational Bayesian
- Optimal patch size(s)
- Problem specific
- Additional transformations into the epitome
- Rotation
- Scale
- Additional video epitome applications
- Super-resolution
- Layer separation
- Object recognition
20Image Epitome Examples
21Video Epitome Example
22Learning the Epitome
23Image Epitome Issues
- Patch size - large patches to get large structure
and nice stitching, small to get details - Use small patches with prior on indices to stitch
them together - Close patches in image should map to close areas
in epitome - Show screen shot of GUI
- Early results with index prior
- Epitome size
- Currently arbitrarily chosen
- Learn the size of the epitome
- Dependent on problem, but can use a cost function
(MDL approach), Chinese restaurant process
24Computational Issues
- Learning the epitome under this generative model
is computationally expensive - Expectation step
- Estimate the posterior
- Compute a weighted Euclidean distance between
patches in the image and patches in the epitome - Maximization step
- Update the epitome
- Collect patch-based statistics using the
posterior - Want to implement these steps efficiently,eg.
reuse computations
25Shifted Cumulative Sum Algorithm
- Require to compute distances between all patches
in the input video with all patches in the
epitome for learning and reconstruction (estimate
the mapping posteriors) - Use cumulative sums to efficiently compute
distances between all-patches that is invariant
to patch size - Similar trick used in M-step to collect
sufficient statistics - Naïve O(Vep)
- Convolution/FFT O(Veloge)
- SCS O(Ve)
26Shifted Cumulative Sum Algorithm
27Shifted Cumulative Sum Algorithm
28Shifted Cumulative Sum Algorithm
29Shifted Cumulative Sum Algorithm
30Collecting Sufficient Statistics
31Image Missing Channels Fill-in
32Missing Channels
- Generalization of the video inpainting problem
- Inpainting
- Missing entire pixels
- Missing Channels
- Missing one or more of the red, green, or blue
(RGB) components of a given pixel - Epitome must consolidate multiple patches
together to piece together the missing channel
information - No training patch contains all the channel
information - Use the epitome to fill-in the missing data
33Walking Video Epitome