Noah Snavely

About This Presentation

Title:

Noah Snavely

Description:

Noah Snavely – PowerPoint PPT presentation

Number of Views:208

Avg rating:3.0/5.0

Slides: 141

Provided by: vno2

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Noah Snavely

1
Photo Tourism and IM2GPS 3D Reconstruction and
Geolocalization from Internet Photo Collections

Noah Snavely
Cornell University

James Hays CMU MIT (Fall 2009) Brown (Spring
2010-)
CVPR 09, June 20, 2009
2
The world in photos

There are billions of photos online
Photographic record of the surface of the earth
Photo sharing on a massive scale

3
Flickr
3 billion
2 billion
1 billion
gt 3.6 billion photos on Flickr, gt 7.2 billion on
Photobucket, gt 15 billion on Facebook
4
Applications of Internet photo collections
Hays and Efros, Scene completion using millions
of photographs
Crandall et al., Mapping the worlds photos
Simon et al., Scene summarization
5
Applications of Internet photo collections
Simon and Seitz, Scene segmentation
Kuthirummal et al., Camera calibration

See more cool work tomorrow at the Internet
Vision Workshop

6
Todays agenda

Part I Photo Tourism 3D reconstruction
and visualization from Internet photo collections

Part II IM2GPS the Internet as a data source
automatic geolocation of single images

7
Rough Schedule

830 - 840 Introduction
Part I Photo Tourism
840 - 1000 Image matching and structure from
motion (Snavely)
1000 - 1020 Break
1020 - 1100 3D visualization of photo
collections (Snavely)
Part II IM2GPS
1100 - 1230 Geolocalization of images use
of Internet as a data source (Hays)

8
Part I Photo Tourism
9
Traditional structure from motion

Input video sequence (handheld, mounted to a
mechanical arm, or attached to a robot)
Output 3D model

Beardsley et al., 3D model acquisition from
Extended Image Sequences, ECCV 96
Pollefeys et al., Visual modeling with a
handheld camera, IJCV 04
David Nister, Ph.D. Thesis
10
Traditional structure from motion

Input video sequence (handheld, mounted to a
mechanical arm, or attached to a robot)
Output 3D model

Commercial SfM software from 2d3
11
Traditional structure from motion

Input video sequence (handheld, mounted to an
arm, or attached to a robot)
Output 3D model
Input video characteristics
Images are taken by a single camera
In a short amount of time
Moving continuously
Given in a logical temporal order

12
Internet structure from motion

Input collection of photos resulting from
Internet search

13
Internet structure from motion

Input collection of photos resulting from
Internet search
Input characteristics
Taken by many different people and cameras

Motorola RAZR
Nikon D3
14
Internet structure from motion

Input collection of photos resulting from
Internet search
Input characteristics
Taken at many different times of day, year,
century

15
Internet structure from motion

Input collection of photos resulting from
Internet search
Input characteristics
Given in essentially random order

16
SfM for unordered photo collections

Very different from traditional video sequences
Early work in this area by Schaffalitzky and
Zisserman

Multi-view matching for unordered image sets, or
How do I organize my holiday snaps?, ECCV 02
17
SfM for unordered photo collections
Vergauwen and Van Gool, Web-based
reconstruction service, Machine Vision
Applications 2006
Brown and Lowe, Unsupervised 3D Object
Recognition and Reconstruction in Unordered
Datasets, 3DIM 06
http//www.arc3d.be/
18
Two important breakthroughs

Advances in wide-baseline feature matching (e.g.,
SIFT)
Advances in multi-view geometry techniques

19
Overview of Part I

Basic SfM pipeline
Feature detection
Feature matching and track generation
Structure from motion (SfM)
Faster matching and SfM
Problem cases

20
Feature detection
Detect features using SIFT Lowe, IJCV 2004
21
Feature detection
Detect features using SIFT Lowe, IJCV 2004
22
Feature detectors

SIFT Lowe, IJCV 04
Binary available at http//www.cs.ubc.ca/lowe/key
points/
C implementation (by Andrea Vedaldi) available
at http//www.vlfeat.org/ (also implements MSER)
Other implementations http//people.csail.mit.edu
/albert/ladypack/wiki/index.php/Known_implementati
ons_of_SIFT
SURF Bay et al., CVIU 08
http//www.vision.ee.ethz.ch/surf/
Many others

23
Feature detection
Detect features using SIFT Lowe, IJCV 2004
24
Wide-baseline feature matching

Match features between each pair of images

25
Wide-baseline feature matching

Standard approach for pairwise matching
For each feature in image A
Find the feature with the closest descriptor in
image B

From Schaffalitzky and Zisserman 02
26
Wide-baseline feature matching

Compare the distance to the closest feature to
the distance to the second closest feature
If the ratio of distances is less than a
threshold, keep the feature
Why the ratio test?
Eliminates hard-to-match repeated features
Distances in SIFT space seem to be non-uniform

27
Wide-baseline feature matching

Because of the high dimensionality of features,
approximate nearest neighbors are necessary for
efficient performance
See ANN package, Mount and Arya
http//www.cs.umd.edu/mount/ANN/

28
Wide-baseline feature matching
Refine matching using RANSAC 8-point algorithm
to estimate fundamental matrices between pairs
29
The power of SIFT
30
Image connectivity graph
(graph layout produced using the Graphviz
toolkit http//www.graphviz.org/)
31
From pairwise matches to tracks

Once we have pairwise matches, next step is to
link up matches to form tracks

Image 2

Each track is a connected component of the
pairwise feature match graph
Each track will eventually grow up to become a 3D
point

Image 1
Image 3
32
From pairwise matches to tracks

Once we have pairwise matches, next step is to
link up matches to form tracks

Image 2

Some tracks might be inconsistent
We remove the features from the troublesome images

Image 1
Image 3
33
Image connectivity post track generation
Image matches after track generation
Raw image matches
34
The power of transitivity
35
but most tracks are short

Example image collection with 3,000 images
1,546,612 total tracks
79 have length 2
90 have length lt 3
98 have length lt 10
Longest track 385 features

36
The story so far
Input images
Feature detection
Matching track generation
Images with feature correspondence
37
The story so far

Next step
Use structure from motion to solve for geometry
(cameras and points)
First what are cameras and points?

38
Points and cameras

Point 3D position in space ( )
Camera ( )
A 3D position ( )
A 3D orientation ( )
Intrinsic parameters
(focal length, aspect ratio,
)
7 parameters (331) in total

39
Structure from motion
Camera 1
Camera 3
R1,c1,f1
R3,c3,f3
Camera 2
R2,c2,f2
40
Solving structure from motion
Inputs feature tracks
Outputs 3D cameras and points

How do we solve the SfM problem?
Challenges
Large number of parameters (1000s of cameras,
millions of points)
Very non-linear objective function

41
Solving structure from motion
Inputs feature tracks
Outputs 3D cameras and points

Important tool Bundle Adjustment Triggs et al.
00
Joint non-linear optimization of both cameras and
points
Very powerful, elegant tool
The bad news
Starting from a random initialization is very
likely to give the wrong answer
Difficult to initialize all the cameras at once

42
Solving structure from motion
Inputs feature tracks
Outputs 3D cameras and points

The good news
Structure from motion with two cameras is
(relatively) easy
Once we have an initial model, its easy to add
new cameras
Idea
Start with a small seed reconstruction, and grow

43
Incremental SfM

Automatically select an initial pair of images

44
Incremental SfM
45
Incremental SfM
46
Incremental SfM Algorithm

Pick a strong initial pair of images
Initialize the model using two-frame SfM
While there are connected images remaining
Pick the image which sees the most existing 3D
points
Estimate the pose of that camera
Triangulate any new points
Run bundle adjustment

47
1. Picking the initial pair

We want a pair with many matches, but which has
as large a baseline as possible

large baseline
very few matches
lots of matches
small baseline
large baseline
lots of matches
48
1. Picking the initial pair

Many possible heuristics
Ours
Choose the pair with at least 100 matches, such
that the ratio
is as small as possible
A homography will be a bad fit if there is
sufficient parallax (and the scene is not planar)

49
2. Two-frame reconstruction

Input two images with correspondence
Output camera parameters, 3D points
In general, there can be ambiguities if the
cameras are uncalibrated (camera intrinsics are
unknown)
We assume that the only intrinsic parameter is an
unknown focal length

50
Finding calibration information

Many cameras list the focal length of a photo in
its Exif metadata

File size 85111 bytes File date
20051216 041712 Camera make
Panasonic Camera model DMC-FZ20 Date/Time
20050319 125233 Resolution 450 x
600 Flash used No Focal length
6.0mm Exposure time 0.0012 s (1/800) Aperture
f/5.6 ISO equiv. 80 Whitebalance
Auto Metering Mode matrix Exposure program
(auto)
51
http//www.dpreview.com/reviews/specs/Panasonic/pa
nasonic_dmcfz20.asp
52
Finding calibration information
File size 85111 bytes File date
20051216 041712 Camera make
Panasonic Camera model DMC-FZ20 Date/Time
20050319 125233 Resolution 450 x
600 Flash used No Focal length
6.0mm Exposure time 0.0012 s (1/800) Aperture
f/5.6 ISO equiv. 80 Whitebalance
Auto Metering Mode matrix Exposure program
(auto) Sensor size 5.75mm
Focal length (pixels) Focal length (mm) x Image
width (pixels) / Sensor size (mm)
6.0 mm x 600 pixels /
5.75 mm 626.1 pixels
53
2. Two-view reconstruction

Two-view SfM Given two calibrated images with
corresponding points, compute the camera and
point positions
Solved by finding the essential matrix between
the images
Best approach is the 5-point algorithm (as
opposed to the 6-, 7-, or 8- point algorithms)

54
Five-point algorithm
Image 1
Image 2
Camera 2
Camera 1
55
Five-point algorithm

First practical solution to the 5-point
algorithm Nister, An efficient solution to the
5-point relative pose problem, PAMI 04
See also
Li and Hartley, Five-Point Motion Estimation
Made Easy, ICPR 06

56
Two-view reconstruction
Camera 2
Camera 1
57
Two-view reconstruction
Camera 2
Camera 1
58
3bc. Pose estimation and Triangulation

Next step grow the reconstruction by adding
another image, triangulating new points

n-view triangulation
59
3bc. Pose estimation and triangulation

Next step grow the reconstruction by adding
another image, triangulating new points
Both of these problems can be solved
approximately using linear systems
(Direct Linear Transformation (DLT))

60
3b. Pose estimation

Choose the image with the most matches to
existing 3D points
Linear 6-point algorithm for finding the 3x4
projection matrix ?
? can then be decomposed into KRt (intrinsics
rotation and translation) using RQ
decomposition
Use non-linear polishing to snap the camera into
place
For calibrated cameras, there is also a 3-point
algorithm

61
3c. n-view triangulation

Objective function sum of squared reprojection
errors
Also solvable (approximately) using a simple
linear system
Follow with a non-linear polishing

62
3bc. Pose estimation and triangulation

In practice, multiple images can be added at once
If the highest-matching image has N matches, add
all images with at least 0.75 N matches (or at
least 500 matches)

63
3d. Bundle adjustment
Camera 1
Camera 3
R1,c1,f1
R3,c3,f3
Camera 2
R2,c2,f2
64
3d. Bundle adjustment

Given
Vectors of cameras and 3D points
A set of observed point projections
the observed 2D location
of point j in image i
adjust the cameras and points to minimize g, the
sum of squared reprojection errors

65
Reprojection error
Xj
reprojection error
qij
objective function
indicator variable 1 if point j is visible
in camera i 0 otherwise
66
Objective function
Projection equation (simplified version)

67
Bundle adjustment

Minimizing g is a sparse non-linear least squares
problem
Usual approach approximate P with a linear
function , minimize using linear least
squares, and repeat until convergence

68
Bundle adjustment

Usual approach approximate P by linearizing
around a current guess C0, X0
where J is the Jacobian (matrix of partials),

69
Bundle adjustment

Linearized problem find the step
that minimizes
Then set
and repeat

70
Bundle adjustment

How do we minimize
Least-squares solution to the overconstrained
linear system

?
71
Bundle adjustment

(Over-constrained as long as
2 x numObservations gt 7 x numCameras 3 x
numPoints)
Solved using the normal equations

72
Bundle adjustment

Guess an answer
Linearize and compute an optimal step
Relinearize and repeat
This algorithm is known as Gauss-Newton
In practice, a modified algorithm known as
Levenberg-Marquardt is used

73
Bundle adjustment
7 points 3 cameras 21 observations 21 21 42
variables 21 x 2 42 equations
74
(No Transcript)
75
(No Transcript)
76
Typical problem (6 cameras, 100 points)
77
Other tricks

Many approaches to bundle adjustment use the
Schur complement to reduce the size of the linear
system
Schur complement factors out points to form a
reduced system that is just the size of the
number of camera parameters
Bundle adjustment then takes time O(n3) in the
number of cameras (less if the reduced camera
system is sparse)
See Triggs et al., Bundle Adjustment A Modern
Synthesis 00 for more details

78
Other tricks

Many packages use direct methods (e.g., Cholesky
factorization, QR factorization) to solve the
linear system
Recently, weve been trying iterative methods
(i.e., conjugate gradient) to good effect
(faster, smaller memory footprint)

79
Sparse bundle adjustment packages

Sparse Bundle Adjustment (SBA)
Lourakis and Argyros, http//www.ics.forth.gr/lou
rakis/sba/
Simple Sparse Bundle Adjustment (SSBA)
Christopher Zach, http//www.cs.unc.edu/cmzach/op
ensource.html

80
The problem of outliers

In spite of our best efforts to get clean
matches, outliers remain
The sum-of-squared residuals objective function
is statistically correct given a Gaussian noise
model
Unfortunately, outliers break the Gaussian
assumption

81
The problem of outliers

Possible solutions
After each run of bundle adjustment, remove
outliers and rerun
Use a robust objective function

Credit Triggs, et al. Bundle adjustment a
modern synthesis
82
Radial distortion

In practice, radial distortion is a significant
issue

83
Radial distortion

Typically modeled as a low-order polynomial in
the distance from a pixel to the center of
distortion (often assumed to be the image center)

84
Radial distortion

Typical values

85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
Timing information
90
Timing breakdown
Matching O(n2) in the number of input images
(but easily parallelizable)
SfM worst-case O(n4) in the number of
reconstructed images
91
SfM complexity

Dominated by the cost of bundle adjustment
If we add a constant k number of images in each
round, then we do work proportional to
k3 (2k)3 (3k)3 n3
O(n4)

92
Timing historical comparison
Ours about 0.002 frames per second
from David Nisters CVPR 2005 tutorial on
real-time 3D reconstruction
93
Faster image matching

Recent techniques are based on ideas from text
retrieval (applying Google to images)
Create a vocabulary of visual features
(words)
Given a database of images, represent each image
as a collection of visual words (or a histogram
of word frequencies)
Create an inverted file mapping visual words -gt
images
Compute histogram distances using the inverted
file

94
Faster image matching

Idea first appeared in Sivic and
Zisserman, Video Google A Text Retrieval
Approach to Object Matching in Videos, ICCV 03

95
Faster image matching
Nister and Stewenius, Scalable Recognition with
a Vocabulary Tree, CVPR 06
Chum et al., Total Recall Automatic Query
Expansion with a Generative Feature Model for
Object Retrieval, ICCV 07
Real time visual image search with 50,000-image
database
Introduced the idea of query expansion for
increasing recall
96
Faster SfM

SfM is also very computationally intensive
How can we make it faster? We need either
Faster algorithms
Fewer images
Observation Internet collections represent very
non-uniform samplings of viewpoint
Snavely, Seitz, Szeliski, CVPR 2008
Idea remove redundant images

97
The Pantheon
98
Stonehenge
99
Stonehenge
Full graph
Skeletal graph
100
Skeletal set

Goal given an image graph ,
select a small set S of important images to
reconstruct, bounding the loss in quality of the
reconstruction
Reconstruct the skeletal set S
Estimate the remaining images with much faster
pose estimation steps

101
Properties of the skeletal set

Should touch all parts of
Dominating set
Should form a single reconstruction
Connected dominating set
Should result in an accurate reconstruction

?
102
Representing information in a graph
103
Representing information in a graph
104
Representing information in a graph
105
Representing information in a graph

Want to find a subgraph with
many leaves
small growth in estimated uncertainty between any
pair of nodes

106
t-spanner problem

Given a graph , find a spanning subgraph
such that, for every pair of vertices (P,Q),
the distance between P and Q in is at most
t times the distance between P and Q in

t the stretch factor
Applications in wireless ad hoc networking
Peleg Schäffer 1989, Althöfer, et al, 1993,
Li, et al 2000, Alzoubi 2003
4-spanner
3-spanner
107
Stonehenge
Skeletal graph (t16) (leaves omitted)
Full graph
108
Properties of approach

Results in a connected reconstruction (when
possible)
Bounds expected increase in uncertainty of
reconstructed model (bound is defined by t)
Remaining information can be used to refine the
model after the initial reconstruction

109
Results
110
Pantheon
Full graph
Skeletal graph (t16)
111
Skeletal reconstruction 101 images
After adding leaves 579 images
After final optimization 579 images
112
Pisa
1093 images registered (352 in skeletal set)
113
Trafalgar Square
2973 images registered (277 in skeletal set)
114
(No Transcript)
115
Statue of Liberty
7834 images registered (322 in skeletal set)
116
(No Transcript)
117
Running time
(10 days)
(50 days)
hours
118
Structure from Motion Failure cases

Images too far apart
Some points need to be successfully matched in at
least three images (the Rule of 3)

images courtesy Yasutaku Furukawa
119
Structure from Motion Failure cases

Repetitive structures

120
SfM Failure cases

Necker reversal

121
SfM Failure cases

Necker reversal

122
Gauge ambiguity

Without extra information, can only reconstruct
scene up to an unknown similarity transform
(translation, rotation, and scale).

We dont know where the scene is located, how it
is oriented, or how big it is (is the cube 10 cm
across or 1,000,000 km?)
(im2gps will help with this)

123
Gauge ambiguity
7 points 3 cameras 21 observations 21 21 42
variables
21 21 - 7 35 variables
124
Gauge ambiguity

Often possible to estimate one of these
parameters (the up vector) after reconstruction
Usually many cameras a parallel to a ground plane
Most people capture images with little camera
twist

125
How good are Exif tags?
126
Dense 3D Modeling
Michael Goesele, Noah Snavely, Brian Curless,
Hugues Hoppe, Steve Seitz, ICCV 2007
127
References for Part I

Code available at http//phototour.cs.washington.
edu/bundler
Image Matching
F. Schaffalitzky, A. Zisserman. Multi-view
Matching for Unordered Image Sets, or How do I
Organize my Holiday Snaps? ECCV 02.
Sivic and Zisserman, Video Google A Text
Retrieval Approach to Object Matching in Videos,
ICCV 03.
D. Nister and H. Stewenius. Scalable Recognition
with a Vocabulary Tree. CVPR 06.
O. Chum et al. Total Recall Automatic Query
Expansion with a Generative Feature Model for
Object Retrieval. ICCV 07.

128
References for Part I

Code available at http//phototour.cs.washington.
edu/bundler
Structure from Motion
N. Snavely, S. Seitz, R. Szeliski. Modeling the
World from Internet Photo Collections. IJCV 08.
N. Snavely, S. Seitz, R. Szeliski. Skeletal Sets
for Efficient Structure from Motion. CVPR 08.
B. Triggs, P. MacLauchlan, R. Hartley, A.
Fitzgibbon. Bundle Adjustment A Modern
Synthesis. ECCV 00.

129
Part I Photo Tourism(continued)
130
(No Transcript)
131
Photo Tourism
132
Prague Old Town Square
133
Rendering

What can we use for rendering?
A sparse set of points
A sparse set of images
Representation too sparse for traditional 3D
rendering algorithms (geometry too sparse) or
image-based rendering (images too sparse)
Our approach
Assume that the scene consists of 3D planes,
treat images as projectors onto these planes

134
Rendering transitions
135
Rendering transitions
136
Rendering transitions
137
Rendering transitions
Camera A
Camera B
For each image / pair of images, the projection
plane is computed as a best-fit plane to the set
of points
138
Yosemite
139
3D navigation Photo Tourism
Demo
140
Continuous navigation
Demo

Write a Comment

User Comments (0)

About PowerShow.com

Noah Snavely - PowerPoint PPT Presentation

Noah Snavely

Noah Snavely – PowerPoint PPT presentation