Title: CV: methods of 3D sensing
1CV methods of 3D sensing
- Structured light
- Shape-from-shading
- Photometric stereo
- Depth-from-focus
- Structure from motion.
2Alternate projection models
- orthographic
- weak perspective
- simpler mathematical models
- approximations often very good in center of the
FOV - can use as a first approximation and then switch
to full perspective
3Perspective vs orthographic projection
Orthographic is often used in design and
blueprints. True (scaled) dimensions can be taken
from the image
4Orthographic projection
5Weak perspective is orthographic and scaling
6Study of approximation
7P3P problem solve for pose of object relative to
camera using 3 corresponding points (Pi, Qi)
3 points in 3D
3 corresponding 2D image points
8What is the pose of an object?
- pose means position and orientation
- work in 3D camera frame defined by a known
camera with known parameters - common problem given the image of a known model
of an object, compute the pose of that object in
the camera frame - needed for object recognition by alignment and
for robot manipulation
9Recognition by alignment
- Have CAD model of objects
- Detect image features of objects
- Compute object pose from 3D-2D point matches
10P3P solution approach
11General PnP problem
- perspective n-point problem
- Given n 3D points from some model
- Given n 2D image points known to correspond to
the 3D model points - Given perspective transformation with known
camera parameters (not pose) - Solve for the location of all n model points in
terms of camera coordinates, or the relative
rotation and translation of the object model
12Formal definition of PnP problem
Solutions exist for P3P in most cases there are
2 solutions in a rare case there are 4 solutions
(see Fischler and Bolles 1981 paper). An
interative solution, good for continuous tracking
is given below. A simpler solution using weak
perspective has been provided by Huttenlocher and
Ullman (1988)
13Deriving 3 quadratic equations in 3 unknowns
We know qi by solving for the 3 ai we will known
where each Pi is located
We know the interpoint distances from the model
qi are unit vectors
14Iteratively solving 3 equations in 3 unknowns
Want these all to be 0
15Approximate via Taylor series
Start with some guessed a1, a2, a3 and move
along gradient toward 0,0,0
16Solution using Newtons Method
17Our functions have simple partial derivatives
18Iteration can be very fast
19Notes on this P3P method
- the equations actually have 8 solutions
- 4 are behind the camera (-ai ai)
- 4 are possible, but rare
- 2 are common how to get both
solutions? - method used by Ohmura et al (1988) to track a
human face at workstation using points outside
the eyes and one under the nose - any 3 model points can align with any 3 image
points can match a ship to the image of a face
20Using weak perspective
- algorithm by Huttenlocher and Ullman is in
closed form no iterations - it produces 2 solutions
- these solutions can be used as starting points
for the iterative perspective method - additional point correspondences can be used to
choose correct starting point
21Shape from shading methods
- Computing surface normals of diffuse objects from
the intensity of surface pixels
22Surface normals in C orthographic projection
23Information used by such algorithms
- Typically use weak perspective projection model
- Brightest surface elt points to light
- Normal determined to be perpendicular at object
limb - Use differential equations to propagate z from
boundary using surface normal. - Smooth using neighbor information.
24Results from Tsai-Shah Alg.
Left from compturer generated image of a vase
right from a bust of Mozart
25Constraint on surface normals
There is a cone of constraint for a normal N
relative to the light source.
26How to use the constraints?
27Photometric stereo calibrate by lighting a
sphere, get tables
28Photometric stereo 3 lights
29Photometric stereo online
30Comments
- Photometric stereo is a brilliant idea
- Rajarshi Ray got it to work well even on specular
objects, such as metal parts - Requires careful set up and calibration
- Not a replacement for structured light, which has
better precision and flexibility as evidenced by
many applications.
31Depth from focus
- Humans and machine vision devices can use focus
in a single image to estimate depth
32Use model of thins lens
World point P is in focus at image point p
33Automatic focus technique
- Consumer camera autofocus many methods
- One method requires user to frame object in a
small window (face?) - Focus is changed automatically until the contrast
is the best - Search over focal length until small window has
the sharpest features (most energy)
34Depth map from focus concept
- for an entire range of focal lengths fi
- set focal plane at fi and take image
- for all pixels (x,y) in the image,
- compute contrast fi, x, y
- set Depthx,y max contrastfi, x, y
-
35A look at blur vs focal length
- Can define resolution limit in line pairs per
inch can define depth-of-field of sensing
36Points P create a blurred image on non optimal
image planes
Point P is in focus on plane S, but out of focus
on planes S and S
Image plane
37How many line pairs can be resolved?
- imagine a target that is just a set of parallel
black lines on white paper - if lines are far apart relative to the blur
radius b, then their image will be a set of lines - if the lines are close relative to blur radius
b, then a gray image without clear lines will be
observed
38Thin lens equation relates object depth to image
plane via f
For world point P in focus, then the thin lens
equation is 1/f 1/u
1/v
39Derivation of thin lens equation from geometry
40To compute depth-of-field
- the blur changes for different locations via
simple geometry - move image forward get blur
- move image backward get blur
- move image plane to extremes within limiting
blur b and compute depth of field
41extreme locations of v set the extremes of u
a is aperture. By similar triangles b/a
(v-v)/v so v/v (ab)/a
42Compute near extreme of u
Apply thin lens equation with v
Note that if b0, we obtain Un U
43Compute far extreme of u
DEF The depth of field is the difference between
the far and near object planes (Ur Un) for the
given imaging parameters and blur b. Smaller
focal lengths f yield larger DOF.
44Example computation
- assume f 50 mm, u 1000 mm,
- b 0.025mm, a 5 mm
- Un 1000 (5 0.025) / (5 25/50)
- 1000 (5.025)/5.5 914
- Ur 1000 (5 0.025) / (5 25/50)
- 1000 (4.975)/4.5 1106
45Example computation
- assume f 25 mm, u 1000 mm,
- b 0.025mm, a 5 mm
- Un 1000 (5 0.025) / (5 25/25)
- 1000 (5.025)/6.0 838
- Ur 1000 (5 0.025) / (5 25/25)
- 1000 (4.975)/4.5 1244
- A smaller f gives larger DOF
46Large a needed to pinpoint u
- changing the aperture to 10 mm
- Un 955mm
- Ur 1050mm
- changing the aperture to 20 mm
- Un 977mm
- Ur 1024mm
- (See work of Murali Subbarao)
47Structure from Motion
- A moving camera/computer computes the 3D
structure of the scene and its own motion
48Sensing 3D scene structure via a moving camera
We now have two views over time/space compared to
stereo which has multiple views at the same time.
49Assumptions for now
- The scene is rigid.
- The scene may move or the camera may move giving
a sequence of 2 or more 2D images - Corresponding 2D image points (Pi, Pj) are
available across the images
50What can be computed
- The 3D coordinates of the scene points
- The motion of the camera
Camera sees many frames of 2D points
Rigid scene with many 3D interest points
From Jabara, Azarbayejani, Pentland
51From 2D point correspondences, compute 3D points
WP and TR
52applications
- We can compute a 3D model of a landmark from a
video - We can create 3D television!
- We can compute the trajectory of the sensor
relative to the 3D object points
53Use only 2D correspondences, SfM can compute 3D
jig pts
up to one scale factor.
54http//www1.cs.columbia.edu/jebara/htmlpapers/SFM
/sfm.html Jabara, Azarbayejani, Pentland
- Two video frames with corresponding 2D interest
points. 3D points can be computed from SfM
method. - Some edges detected from 2D gradients.
- Texture mapping from 2D frames onto 3D polyhedral
model. - 3D model can be viewed arbitrarily!
55Virtual museums 3D TV?
- Much work, and software, from about 10 years ago.
- 3D models, including shape and texture can be
made of famous places (Notre Dame, Taj Mahal,
Titanic, etc.) and made available to those who
cannot travel to see the real landmark. - Theoretically, only quality video is required.
- Usually, some handwork is needed.
56Shape from Motion methods
- Typically require careful mathematics
- EX from 5 matched points, get 10 equations to
estimate 10 unknowns also a more popular 8 pt
linear method - Effects of noise imply many matches needed, still
can have large errors - Methods can run in real time
- Rich literature still evolving
57Special mathematics
- Epipolar geometry is modeled
- Fundamental matrix computed from a pair of
cameras and point matches - Essential matrix specialization of fundamental
matrix when calibration is available
58Epipolar constraint on view pair
A) Relative orientation of cameras C1 and C2 can
be computed from many point matches
B) 3D point positions (P) can also be computed
from many point matches. Fundamental matrix
represents the constraints.
59Revisit Internal parameters of the camera 5,6,7
?
- Properties of actual camera, not its pose
- Actual focal length f
- Actual pixel size Sx, Sy
- Actual location Ix, Iy of optical axis on image
array - Can have skew Sk
- Can have radial distortion of the lens r.
Sensor array
Optical axis
606 Extrinsic/external parameters
- Define the pose of the camera in the world
- 3 rotation parameters relative to W
- 3 translation parameters
- Projection of world to image
- IP Mi Me WP
- where Me has 6 parameters and Mi has 5
61Fundamental matrix F
- Represents epipolar structure of 2 views of scene
- Depends only on the internal parameters of the
camera and the relative pose of the two views - Not dependent on the scene
- Can compute F, and E, and more from many
correspondences lots of literature and public
software - What actual mathematical methods? What point
detection and point correspondence methods?
62Summary of shape-from methods
- each uses a simple source of information math
model often uses minimal information - Psychologist J.J. Gibson, and others, were aware
of information used by humans - David Marr, around 1980, proposed study of
Type-I AI research - study information processing problem
- identify what information is used
- develop/study algorithm choices
- favor algorithm suited for human arch.
63Recent years
- Trend is away from minimal models minimal models
are fragile - Multiple channels cooperate and compete (see
experiments by Ramachandran at UCSD) - Human brain is more plastic than formerly
believed many things are learned, new neurons
and connections