Noah Snavely - PowerPoint PPT Presentation

1 / 140
About This Presentation
Title:

Noah Snavely

Description:

Noah Snavely – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 141
Provided by: vno2
Category:
Tags: kir | noah | snavely

less

Transcript and Presenter's Notes

Title: Noah Snavely


1
Photo Tourism and IM2GPS 3D Reconstruction and
Geolocalization from Internet Photo Collections
  • Noah Snavely
  • Cornell University

James Hays CMU MIT (Fall 2009) Brown (Spring
2010-)
CVPR 09, June 20, 2009
2
The world in photos
  • There are billions of photos online
  • Photographic record of the surface of the earth
  • Photo sharing on a massive scale

3
Flickr
3 billion
2 billion
1 billion
gt 3.6 billion photos on Flickr, gt 7.2 billion on
Photobucket, gt 15 billion on Facebook
4
Applications of Internet photo collections
Hays and Efros, Scene completion using millions
of photographs
Crandall et al., Mapping the worlds photos
Simon et al., Scene summarization
5
Applications of Internet photo collections
Simon and Seitz, Scene segmentation
Kuthirummal et al., Camera calibration
  • See more cool work tomorrow at the Internet
    Vision Workshop

6
Todays agenda
  • Part I Photo Tourism 3D reconstruction
    and visualization from Internet photo collections
  • Part II IM2GPS the Internet as a data source
    automatic geolocation of single images

7
Rough Schedule
  • 830 - 840 Introduction
  • Part I Photo Tourism
  • 840 - 1000 Image matching and structure from
    motion (Snavely)
  • 1000 - 1020 Break
  • 1020 - 1100 3D visualization of photo
    collections (Snavely)
  • Part II IM2GPS
  • 1100 - 1230 Geolocalization of images use
    of Internet as a data source (Hays)

8
Part I Photo Tourism
9
Traditional structure from motion
  • Input video sequence (handheld, mounted to a
    mechanical arm, or attached to a robot)
  • Output 3D model

Beardsley et al., 3D model acquisition from
Extended Image Sequences, ECCV 96
Pollefeys et al., Visual modeling with a
handheld camera, IJCV 04
David Nister, Ph.D. Thesis
10
Traditional structure from motion
  • Input video sequence (handheld, mounted to a
    mechanical arm, or attached to a robot)
  • Output 3D model

Commercial SfM software from 2d3
11
Traditional structure from motion
  • Input video sequence (handheld, mounted to an
    arm, or attached to a robot)
  • Output 3D model
  • Input video characteristics
  • Images are taken by a single camera
  • In a short amount of time
  • Moving continuously
  • Given in a logical temporal order

12
Internet structure from motion
  • Input collection of photos resulting from
    Internet search

13
Internet structure from motion
  • Input collection of photos resulting from
    Internet search
  • Input characteristics
  • Taken by many different people and cameras

Motorola RAZR
Nikon D3
14
Internet structure from motion
  • Input collection of photos resulting from
    Internet search
  • Input characteristics
  • Taken at many different times of day, year,
    century

15
Internet structure from motion
  • Input collection of photos resulting from
    Internet search
  • Input characteristics
  • Given in essentially random order

16
SfM for unordered photo collections
  • Very different from traditional video sequences
  • Early work in this area by Schaffalitzky and
    Zisserman

Multi-view matching for unordered image sets, or
How do I organize my holiday snaps?, ECCV 02
17
SfM for unordered photo collections
Vergauwen and Van Gool, Web-based
reconstruction service, Machine Vision
Applications 2006
Brown and Lowe, Unsupervised 3D Object
Recognition and Reconstruction in Unordered
Datasets, 3DIM 06
http//www.arc3d.be/
18
Two important breakthroughs
  • Advances in wide-baseline feature matching (e.g.,
    SIFT)
  • Advances in multi-view geometry techniques

19
Overview of Part I
  • Basic SfM pipeline
  • Feature detection
  • Feature matching and track generation
  • Structure from motion (SfM)
  • Faster matching and SfM
  • Problem cases

20
Feature detection
Detect features using SIFT Lowe, IJCV 2004
21
Feature detection
Detect features using SIFT Lowe, IJCV 2004
22
Feature detectors
  • SIFT Lowe, IJCV 04
  • Binary available at http//www.cs.ubc.ca/lowe/key
    points/
  • C implementation (by Andrea Vedaldi) available
    at http//www.vlfeat.org/ (also implements MSER)
  • Other implementations http//people.csail.mit.edu
    /albert/ladypack/wiki/index.php/Known_implementati
    ons_of_SIFT
  • SURF Bay et al., CVIU 08
  • http//www.vision.ee.ethz.ch/surf/
  • Many others

23
Feature detection
Detect features using SIFT Lowe, IJCV 2004
24
Wide-baseline feature matching
  • Match features between each pair of images

25
Wide-baseline feature matching
  • Standard approach for pairwise matching
  • For each feature in image A
  • Find the feature with the closest descriptor in
    image B

From Schaffalitzky and Zisserman 02
26
Wide-baseline feature matching
  • Compare the distance to the closest feature to
    the distance to the second closest feature
  • If the ratio of distances is less than a
    threshold, keep the feature
  • Why the ratio test?
  • Eliminates hard-to-match repeated features
  • Distances in SIFT space seem to be non-uniform

27
Wide-baseline feature matching
  • Because of the high dimensionality of features,
    approximate nearest neighbors are necessary for
    efficient performance
  • See ANN package, Mount and Arya
  • http//www.cs.umd.edu/mount/ANN/

28
Wide-baseline feature matching
Refine matching using RANSAC 8-point algorithm
to estimate fundamental matrices between pairs
29
The power of SIFT
30
Image connectivity graph
(graph layout produced using the Graphviz
toolkit http//www.graphviz.org/)
31
From pairwise matches to tracks
  • Once we have pairwise matches, next step is to
    link up matches to form tracks

Image 2
  • Each track is a connected component of the
    pairwise feature match graph
  • Each track will eventually grow up to become a 3D
    point

Image 1
Image 3
32
From pairwise matches to tracks
  • Once we have pairwise matches, next step is to
    link up matches to form tracks

Image 2
  • Some tracks might be inconsistent
  • We remove the features from the troublesome images

Image 1
Image 3
33
Image connectivity post track generation
Image matches after track generation
Raw image matches
34
The power of transitivity
35
but most tracks are short
  • Example image collection with 3,000 images
  • 1,546,612 total tracks
  • 79 have length 2
  • 90 have length lt 3
  • 98 have length lt 10
  • Longest track 385 features

36
The story so far
Input images
Feature detection
Matching track generation
Images with feature correspondence
37
The story so far
  • Next step
  • Use structure from motion to solve for geometry
    (cameras and points)
  • First what are cameras and points?

38
Points and cameras
  • Point 3D position in space ( )
  • Camera ( )
  • A 3D position ( )
  • A 3D orientation ( )
  • Intrinsic parameters
    (focal length, aspect ratio,
    )
  • 7 parameters (331) in total

39
Structure from motion
Camera 1
Camera 3
R1,c1,f1
R3,c3,f3
Camera 2
R2,c2,f2
40
Solving structure from motion
Inputs feature tracks
Outputs 3D cameras and points
  • How do we solve the SfM problem?
  • Challenges
  • Large number of parameters (1000s of cameras,
    millions of points)
  • Very non-linear objective function

41
Solving structure from motion
Inputs feature tracks
Outputs 3D cameras and points
  • Important tool Bundle Adjustment Triggs et al.
    00
  • Joint non-linear optimization of both cameras and
    points
  • Very powerful, elegant tool
  • The bad news
  • Starting from a random initialization is very
    likely to give the wrong answer
  • Difficult to initialize all the cameras at once

42
Solving structure from motion
Inputs feature tracks
Outputs 3D cameras and points
  • The good news
  • Structure from motion with two cameras is
    (relatively) easy
  • Once we have an initial model, its easy to add
    new cameras
  • Idea
  • Start with a small seed reconstruction, and grow

43
Incremental SfM
  • Automatically select an initial pair of images

44
Incremental SfM
45
Incremental SfM
46
Incremental SfM Algorithm
  • Pick a strong initial pair of images
  • Initialize the model using two-frame SfM
  • While there are connected images remaining
  • Pick the image which sees the most existing 3D
    points
  • Estimate the pose of that camera
  • Triangulate any new points
  • Run bundle adjustment

47
1. Picking the initial pair
  • We want a pair with many matches, but which has
    as large a baseline as possible

large baseline
very few matches
lots of matches
small baseline
large baseline
lots of matches
48
1. Picking the initial pair
  • Many possible heuristics
  • Ours
  • Choose the pair with at least 100 matches, such
    that the ratio
  • is as small as possible
  • A homography will be a bad fit if there is
    sufficient parallax (and the scene is not planar)

49
2. Two-frame reconstruction
  • Input two images with correspondence
  • Output camera parameters, 3D points
  • In general, there can be ambiguities if the
    cameras are uncalibrated (camera intrinsics are
    unknown)
  • We assume that the only intrinsic parameter is an
    unknown focal length

50
Finding calibration information
  • Many cameras list the focal length of a photo in
    its Exif metadata

File size 85111 bytes File date
20051216 041712 Camera make
Panasonic Camera model DMC-FZ20 Date/Time
20050319 125233 Resolution 450 x
600 Flash used No Focal length
6.0mm Exposure time 0.0012 s (1/800) Aperture
f/5.6 ISO equiv. 80 Whitebalance
Auto Metering Mode matrix Exposure program
(auto)
51
http//www.dpreview.com/reviews/specs/Panasonic/pa
nasonic_dmcfz20.asp
52
Finding calibration information
File size 85111 bytes File date
20051216 041712 Camera make
Panasonic Camera model DMC-FZ20 Date/Time
20050319 125233 Resolution 450 x
600 Flash used No Focal length
6.0mm Exposure time 0.0012 s (1/800) Aperture
f/5.6 ISO equiv. 80 Whitebalance
Auto Metering Mode matrix Exposure program
(auto) Sensor size 5.75mm
Focal length (pixels) Focal length (mm) x Image
width (pixels) / Sensor size (mm)
6.0 mm x 600 pixels /
5.75 mm 626.1 pixels
53
2. Two-view reconstruction
  • Two-view SfM Given two calibrated images with
    corresponding points, compute the camera and
    point positions
  • Solved by finding the essential matrix between
    the images
  • Best approach is the 5-point algorithm (as
    opposed to the 6-, 7-, or 8- point algorithms)

54
Five-point algorithm
Image 1
Image 2
Camera 2
Camera 1
55
Five-point algorithm
  • First practical solution to the 5-point
    algorithm Nister, An efficient solution to the
    5-point relative pose problem, PAMI 04
  • See also
  • Li and Hartley, Five-Point Motion Estimation
    Made Easy, ICPR 06

56
Two-view reconstruction
Camera 2
Camera 1
57
Two-view reconstruction
Camera 2
Camera 1
58
3bc. Pose estimation and Triangulation
  • Next step grow the reconstruction by adding
    another image, triangulating new points

n-view triangulation
59
3bc. Pose estimation and triangulation
  • Next step grow the reconstruction by adding
    another image, triangulating new points
  • Both of these problems can be solved
    approximately using linear systems
    (Direct Linear Transformation (DLT))

60
3b. Pose estimation
  • Choose the image with the most matches to
    existing 3D points
  • Linear 6-point algorithm for finding the 3x4
    projection matrix ?
  • ? can then be decomposed into KRt (intrinsics
    rotation and translation) using RQ
    decomposition
  • Use non-linear polishing to snap the camera into
    place
  • For calibrated cameras, there is also a 3-point
    algorithm

61
3c. n-view triangulation
  • Objective function sum of squared reprojection
    errors
  • Also solvable (approximately) using a simple
    linear system
  • Follow with a non-linear polishing

62
3bc. Pose estimation and triangulation
  • In practice, multiple images can be added at once
  • If the highest-matching image has N matches, add
    all images with at least 0.75 N matches (or at
    least 500 matches)

63
3d. Bundle adjustment
Camera 1
Camera 3
R1,c1,f1
R3,c3,f3
Camera 2
R2,c2,f2
64
3d. Bundle adjustment
  • Given
  • Vectors of cameras and 3D points
  • A set of observed point projections
  • the observed 2D location
    of point j in image i
  • adjust the cameras and points to minimize g, the
    sum of squared reprojection errors

65
Reprojection error
Xj
reprojection error
qij
objective function
indicator variable 1 if point j is visible
in camera i 0 otherwise
66
Objective function
Projection equation (simplified version)

67
Bundle adjustment
  • Minimizing g is a sparse non-linear least squares
    problem
  • Usual approach approximate P with a linear
    function , minimize using linear least
    squares, and repeat until convergence

68
Bundle adjustment
  • Usual approach approximate P by linearizing
    around a current guess C0, X0
  • where J is the Jacobian (matrix of partials),

69
Bundle adjustment
  • Linearized problem find the step
    that minimizes
  • Then set
    and repeat

70
Bundle adjustment
  • How do we minimize

  • Least-squares solution to the overconstrained
    linear system

?
71
Bundle adjustment
  • (Over-constrained as long as
  • 2 x numObservations gt 7 x numCameras 3 x
    numPoints)
  • Solved using the normal equations

72
Bundle adjustment
  • Guess an answer
  • Linearize and compute an optimal step
  • Relinearize and repeat
  • This algorithm is known as Gauss-Newton
  • In practice, a modified algorithm known as
    Levenberg-Marquardt is used

73
Bundle adjustment
7 points 3 cameras 21 observations 21 21 42
variables 21 x 2 42 equations
74
(No Transcript)
75
(No Transcript)
76
Typical problem (6 cameras, 100 points)
77
Other tricks
  • Many approaches to bundle adjustment use the
    Schur complement to reduce the size of the linear
    system
  • Schur complement factors out points to form a
    reduced system that is just the size of the
    number of camera parameters
  • Bundle adjustment then takes time O(n3) in the
    number of cameras (less if the reduced camera
    system is sparse)
  • See Triggs et al., Bundle Adjustment A Modern
    Synthesis 00 for more details

78
Other tricks
  • Many packages use direct methods (e.g., Cholesky
    factorization, QR factorization) to solve the
    linear system
  • Recently, weve been trying iterative methods
    (i.e., conjugate gradient) to good effect
    (faster, smaller memory footprint)

79
Sparse bundle adjustment packages
  • Sparse Bundle Adjustment (SBA)
  • Lourakis and Argyros, http//www.ics.forth.gr/lou
    rakis/sba/
  • Simple Sparse Bundle Adjustment (SSBA)
  • Christopher Zach, http//www.cs.unc.edu/cmzach/op
    ensource.html

80
The problem of outliers
  • In spite of our best efforts to get clean
    matches, outliers remain
  • The sum-of-squared residuals objective function
    is statistically correct given a Gaussian noise
    model
  • Unfortunately, outliers break the Gaussian
    assumption

81
The problem of outliers
  • Possible solutions
  • After each run of bundle adjustment, remove
    outliers and rerun
  • Use a robust objective function

Credit Triggs, et al. Bundle adjustment a
modern synthesis
82
Radial distortion
  • In practice, radial distortion is a significant
    issue

83
Radial distortion
  • Typically modeled as a low-order polynomial in
    the distance from a pixel to the center of
    distortion (often assumed to be the image center)

84
Radial distortion
  • Typical values

85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
Timing information
90
Timing breakdown
Matching O(n2) in the number of input images
(but easily parallelizable)
SfM worst-case O(n4) in the number of
reconstructed images
91
SfM complexity
  • Dominated by the cost of bundle adjustment
  • If we add a constant k number of images in each
    round, then we do work proportional to
  • k3 (2k)3 (3k)3 n3
  • O(n4)

92
Timing historical comparison
Ours about 0.002 frames per second
from David Nisters CVPR 2005 tutorial on
real-time 3D reconstruction
93
Faster image matching
  • Recent techniques are based on ideas from text
    retrieval (applying Google to images)
  • Create a vocabulary of visual features
    (words)
  • Given a database of images, represent each image
    as a collection of visual words (or a histogram
    of word frequencies)
  • Create an inverted file mapping visual words -gt
    images
  • Compute histogram distances using the inverted
    file

94
Faster image matching
  • Idea first appeared in Sivic and
    Zisserman, Video Google A Text Retrieval
    Approach to Object Matching in Videos, ICCV 03

95
Faster image matching
Nister and Stewenius, Scalable Recognition with
a Vocabulary Tree, CVPR 06
Chum et al., Total Recall Automatic Query
Expansion with a Generative Feature Model for
Object Retrieval, ICCV 07
Real time visual image search with 50,000-image
database
Introduced the idea of query expansion for
increasing recall
96
Faster SfM
  • SfM is also very computationally intensive
  • How can we make it faster? We need either
  • Faster algorithms
  • Fewer images
  • Observation Internet collections represent very
    non-uniform samplings of viewpoint
  • Snavely, Seitz, Szeliski, CVPR 2008
  • Idea remove redundant images

97
The Pantheon
98
Stonehenge
99
Stonehenge
Full graph
Skeletal graph
100
Skeletal set
  • Goal given an image graph ,
  • select a small set S of important images to
    reconstruct, bounding the loss in quality of the
    reconstruction
  • Reconstruct the skeletal set S
  • Estimate the remaining images with much faster
    pose estimation steps

101
Properties of the skeletal set
  • Should touch all parts of
  • Dominating set
  • Should form a single reconstruction
  • Connected dominating set
  • Should result in an accurate reconstruction

?
102
Representing information in a graph
103
Representing information in a graph
104
Representing information in a graph
105
Representing information in a graph
  • Want to find a subgraph with
  • many leaves
  • small growth in estimated uncertainty between any
    pair of nodes

106
t-spanner problem
  • Given a graph , find a spanning subgraph
    such that, for every pair of vertices (P,Q),
    the distance between P and Q in is at most
    t times the distance between P and Q in

t the stretch factor
Applications in wireless ad hoc networking
Peleg Schäffer 1989, Althöfer, et al, 1993,
Li, et al 2000, Alzoubi 2003
4-spanner
3-spanner
107
Stonehenge
Skeletal graph (t16) (leaves omitted)
Full graph
108
Properties of approach
  • Results in a connected reconstruction (when
    possible)
  • Bounds expected increase in uncertainty of
    reconstructed model (bound is defined by t)
  • Remaining information can be used to refine the
    model after the initial reconstruction

109
Results
110
Pantheon
Full graph
Skeletal graph (t16)
111
Skeletal reconstruction 101 images
After adding leaves 579 images
After final optimization 579 images
112
Pisa
1093 images registered (352 in skeletal set)
113
Trafalgar Square
2973 images registered (277 in skeletal set)
114
(No Transcript)
115
Statue of Liberty
7834 images registered (322 in skeletal set)
116
(No Transcript)
117
Running time
(10 days)
(50 days)
hours
118
Structure from Motion Failure cases
  • Images too far apart
  • Some points need to be successfully matched in at
    least three images (the Rule of 3)

images courtesy Yasutaku Furukawa
119
Structure from Motion Failure cases
  • Repetitive structures

120
SfM Failure cases
  • Necker reversal

121
SfM Failure cases
  • Necker reversal

122
Gauge ambiguity
  • Without extra information, can only reconstruct
    scene up to an unknown similarity transform
    (translation, rotation, and scale).
  • We dont know where the scene is located, how it
    is oriented, or how big it is (is the cube 10 cm
    across or 1,000,000 km?)
  • (im2gps will help with this)

123
Gauge ambiguity
7 points 3 cameras 21 observations 21 21 42
variables
21 21 - 7 35 variables
124
Gauge ambiguity
  • Often possible to estimate one of these
    parameters (the up vector) after reconstruction
  • Usually many cameras a parallel to a ground plane
  • Most people capture images with little camera
    twist

125
How good are Exif tags?
126
Dense 3D Modeling
Michael Goesele, Noah Snavely, Brian Curless,
Hugues Hoppe, Steve Seitz, ICCV 2007
127
References for Part I
  • Code available at http//phototour.cs.washington.
    edu/bundler
  • Image Matching
  • F. Schaffalitzky, A. Zisserman. Multi-view
    Matching for Unordered Image Sets, or How do I
    Organize my Holiday Snaps? ECCV 02.
  • Sivic and Zisserman, Video Google A Text
    Retrieval Approach to Object Matching in Videos,
    ICCV 03.
  • D. Nister and H. Stewenius. Scalable Recognition
    with a Vocabulary Tree. CVPR 06.
  • O. Chum et al. Total Recall Automatic Query
    Expansion with a Generative Feature Model for
    Object Retrieval. ICCV 07.

128
References for Part I
  • Code available at http//phototour.cs.washington.
    edu/bundler
  • Structure from Motion
  • N. Snavely, S. Seitz, R. Szeliski. Modeling the
    World from Internet Photo Collections. IJCV 08.
  • N. Snavely, S. Seitz, R. Szeliski. Skeletal Sets
    for Efficient Structure from Motion. CVPR 08.
  • B. Triggs, P. MacLauchlan, R. Hartley, A.
    Fitzgibbon. Bundle Adjustment A Modern
    Synthesis. ECCV 00.

129
Part I Photo Tourism(continued)
130
(No Transcript)
131
Photo Tourism
132
Prague Old Town Square
133
Rendering
  • What can we use for rendering?
  • A sparse set of points
  • A sparse set of images
  • Representation too sparse for traditional 3D
    rendering algorithms (geometry too sparse) or
    image-based rendering (images too sparse)
  • Our approach
  • Assume that the scene consists of 3D planes,
    treat images as projectors onto these planes

134
Rendering transitions
135
Rendering transitions
136
Rendering transitions
137
Rendering transitions
Camera A
Camera B
For each image / pair of images, the projection
plane is computed as a best-fit plane to the set
of points
138
Yosemite
139
3D navigation Photo Tourism
Demo
140
Continuous navigation
Demo
Write a Comment
User Comments (0)
About PowerShow.com