Title: CS 223-B Part A Lect. : Advanced Features
1CS 223-B Part A Lect. Advanced Features
- Sebastian Thrun
- Gary Bradski
http//robots.stanford.edu/cs223b/index.html
2Readings
- This lecture is in 2 separate parts A -
Fourier, Gabor, SIFT and B - Texture and other
operators. B is optional due to time
limitations. Good to look through nevertheless. - Read
- Computer Vision, Forsyth Ponce
- Chapters 7 and (optional for texture) 9 but do
it lightly just for the gist. - David G. Lowe, Distinctive Image Features from
Scale-Invariant Keypoints, IJCV04. - Just read/take notes on basic flow of the
algorithm. - W. Freeman and E. Adelson, The Design and Use of
Steerable Filters, IEEE Trans. Patt. Anal. and
Machine Intell., Vol. 13, No. 9. - Read pages 1-15.
3Left over questions
- Calibration question the optimization is based
on gradient descent iterations which depend on
finding a good initial starting guess. - How do we scale image derivatives?? Great
question - Images exist as brightness values over pixels.
What are the units then of a simple derivative
operator like -1 0 1?
In the features lecture, we only wanted to find
edges (identification), but what if we
had instead wanted to make measurements?
In optical flow, we end up wanting to
calculate the velocity v which is found (in the
optical flow lecture) to be equal to It, the
temporal derivative (image difference) I(t1)
I(t) which is in pixels divided by the spatial
derivative Ix in brightness/pixel
vx pixels It / Ix brightness/(brightness/pix
el) Oops! Our derivative is a factor of 2 too
great gt NEED TO NORMALIZE Ix -1/2 0
1/2.
4Good Features beat Good Algorithms
- For tasks such as recognition, tracking, and
segmentation, experience shows - With the right features, all algorithms will
work well. - With the wrong features, good algorithms will
work marginally better than bad/simple
algorithms, but it wont work well.
5Fourier Transform 1
- Foundational trick represent signal/data in
terms of an orthogonal basis. For example, a
vector v in 3 space can be represented as a
projection onto 3 orthonormal vectors - In the same way, a function can be represented as
a point projected into a space of (infinitely
many) orthogonal functions. For Fourier
transforms, we project a function into a space of
cos and sin - Intuitively, how do we know this sin, cos basis
is orthogonal? - Sin or Cos periodically spend as much time above
as below the axis. If the frequency is
mismatched, the functions will cancel each other
out over minus to plus infinity. - Formally, one could use
- To prove
Eqns from Computer Vision IT412
6Fourier Transform 2
Fourier transform is defined as continuous
Inverse transform gets rid of freq. components
In general, Fourier transform is complex
The Fourier Spectrum is then
The Phase is then
We often view the Power Spectrum
7Fourier Properties
Fourier Transform
Is linear
Its spatial scale is inverse to frequency
Shift goes to phase change
Fourier Transform Symmetries are
Is the complex conjugate
Convolution Property
Note that scale property implies delta function
goes to uniform
8Fourier Discrete (DFT)
- Animals and Machines live in a discrete world.
To move the continuous - Fourier world to its discrete version, we sample
- gt Multiply by infinite series of delta
functions spaced apart - gt Convolve with a uniform function inversely
spaced
9Fourier Discrete (DFT) 2
All real world signals are band limited That
is, they dont have infinite frequencies nor
infinite spatial extend. This is good, otherwise
our discrete Fourier copies would collide and
alias together. But, what if we still sample too
seldom? Even band limited will eventually
collide.
How do we keep the copies apart? Sample at at
least twice the signals band limit frequency gt
Niquist Criterion
102D DFT
Discrete Fourier Transform (DFT)
Inverse DFT
Optimally implemented on serial machines via the
Fast Fourier Transform (FFT), DFT is faster on
parallel machines.
11Fourier Examples
Raw Image
Fourier Amplitude
Sinusoid, higher frequency
DC term side lobes wide spacing
Sinusoid, lower frequency
DC term side lobes close spacing
Sinusoid, tilted
Titled spectrum
Images from Steve Lehar http//cns-alumni.bu.edu/
slehar An Intuitive Explanation of Fourier Theory
12More Fourier Examples
- Fourier basis element
- example, real part
- Fu,v(x,y)
- Fu,v(x,y)const. for (uxvy)const.
- Vector (u,v)
- Magnitude gives frequency
- Direction gives orientation.
Slides from Marc Pollefeys, Comp 256 lecture 7
13More Fourier Examples
Here u and v are larger than in the previous
slide.
Slides from Marc Pollefeys, Comp 256 lecture 7
14More Fourier Examples
And larger still...
Slides from Marc Pollefeys, Comp 256 lecture 7
15Fourier Filtering
Multiply by a filter in the frequency domain gt
convolve with the fiter in spatial domain.
Fourier Amplitude
Images from Steve Lehar http//cns-alumni.bu.edu/
slehar An Intuitive Explanation of Fourier Theory
16Fourier Lens
Remember that Fourier transform takes delta
functions to uniform, and uniform to delta?
Figures from Steve Lehar http//cns-alumni.bu.edu/
slehar An Intuitive Explanation of Fourier
Theory
17Phase Caries More Information
Raw Images
Reconstruct (inverse FFT) mixing the magnitude
and phase images
18Phase Coherence for Feature Detection?
Note that the Fourier components for a square
wave cohere (are in phase) at the step junction
Here, they must all pass through zero right at
the step edge, and achieve local maximums at the
corners.
Phase coherence is maximal at corner points of
triangle and trapezoid waves too
Images Peter Kovesi, Proc. VIIth Digital Image
Computing Techniques and Applications, Sun C.,
Talbot H., Ourselin S. and Adriaansen T. (Eds.),
10-12 Dec. 2003, Sydney
19Phase Coherence for Feature Detection
Gist of the idea Fourier transform yields a
series of real and imaginary sinusoidal terms. At
any point x, the local Fourier components will
each have an amplitude An(x) and a phase angle
fn(x). Vector addition of these terms yields an
vector E(x) at the average phase angle.
Morrone defined a measure that at absolute phase
coherence will be 1 everything points in the
same direction -- and for no phase coherence will
be zero. Local maximums indicate edges and
corners, insensitive to contrast in the image.
In practice, these local components are
calculated with Gabor filters at
several orientations that can yield oriented
edges and corners.
Images Peter Kovesi, Proc. VIIth Digital Image
Computing Techniques and Applications, Sun C.,
Talbot H., Ourselin S. and Adriaansen T. (Eds.),
10-12 Dec. 2003, Sydney
20Phase Coherence for Feature Detection
Comparison of phase vs. Harris Corner detector.
Harris response varies by 2 or more orders of
magnitudethreshold? Phase can only vary between
0 and 1 and is not sensitive to contrast or
lighting.
Images Peter Kovesi, Proc. VIIth Digital Image
Computing Techniques and Applications, Sun C.,
Talbot H., Ourselin S. and Adriaansen T. (Eds.),
10-12 Dec. 2003, Sydney
21Gabor filters and Jets
- Global information is used for physical systems
identification. - Impulse response of a centrifuge to identify
resonance points which indicate which spin
frequencies to avoid. - Local information is used for physical signal
analysis. - In images, it is the relationship of details that
matter, not (usually) things like average
brightness. - In 1946, Gabor suggested representing signals
over space and time called Information diagrams.
He showed that a Gaussian occupies minimal area
in such diagrams. Time and Frequency analysis
are the two extremes of such an analysis.
22Gabor filters and Jets
- Gabor filters are formed by modulating a complex
sinusoid by a Gaussian function. - Gabor filters became
popular in vision partly
because J.G Daugman (1980, 88, 90) showed that
the receptive fields of most orientation
receptive neurons in the (cats) brain looked
very much like Gabor functions. - As with Gabor filters, the brain often makes use
of over complete, non-orthogonal functions.
J.G.Daugman, Two dimensional spectral analysis
of cortical receptive field profiles, Vision
Res., vol.20.pp.847-856.1980
J. Daugman, Complete discrete 2-d gabor
transforms by neural network for image analysis
and compression, IEEE Transactions on Acoustics,
Speech, and Signal Processing, vol. 36, no. 7,
pp. 11691179, 1988.
Daugman, J.G. (1990) An informationtheoretic
view of analogue representation in striate
cortex, Computational Neuroscience, Ed. Schwartz,
E. L., Cambridge, MA MIT Press, 403424.
23Gabor filters and Jets
Rotated Gaussian
Oriented Complex Sinusoid
2D Gabor filter
Depending on ones task (object ID, texture
analysis, tracking,) one must then decide what
size filters, in what orientations and what
frequencies to use.
24Gabor filters and Jets
In practice, once the scales, orientation and
radial frequencies are chosen one usually sets
up filters in quadrature (90o phase shift) pairs
and just empirically normalizes them such that
the response is zero to a uniform background.
Quadrature pairs, in practice the center point
(p,q) is set to (0,0).
The magnitude response is then calculated as
25Gabor filters and Jets
Von Der Malsburg organized Gabor filters at
multiple scales and orientations in a vector, or
Jet
A graph of such Jets (Elastic Graph Matching)
has proven to be a good primitive for object
recognition.
L. Wiskott, J-M. Fellous, N. Kuiger, C. Malsburg,
Face Recognition by Elastic Bunch Graph
Matching, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol.19(7), July 1997,
pp. 775-779.
Image from Laurenz Wiskott, http//itb.biologie.hu
-berlin.de/wiskott/
26Gabor filters and Jets Example
Gabor Filters used
Gang Song, Tao Wang, Yimin Zhang, Wei Hu,
Guangyou Xu, Gary Bradski, Face Modeling and
Recognition Using Bayesian Networks, Submitted
to CVPR 2004
27Scale
- 3D to 2D Perspective projections give widely
varying scale for the same object. Computer
vision needs to address scale. - Gabor discussion above addressed image scale via
the sigma of the modulating Gaussians and the
frequency of the complex sinusoid. - We can directly deal with scale by repeatedly
down-sampling the image to look for courser and
courser patterns. We call this scale space, or
Image Pyramids
28Image Pyramids
Commonly, we down-sample by 2 or sqrt(2). Sqrt(2)
obviously calls for inter-pixel interpolation
For down-sample by 2, typical Gaussian sigma is
1.4. For Sqrt(2) sigma is typically the
sqrt(1.4).
Full power 2 pyramid only doubles the number of
pixels to process.
Laplacian Pyramid Error Pyramid
29Steerability
Bill Freeman, in his 1992 Thesis determined the
necessary conditions for Steerability -- the
ability to synthesize a filter of any orientation
from a linear combination of filters at fixed
orientations. The simplest example of this is
oriented first derivative of Gaussian filters, at
0o and 90o
Steering Eqn
0o
90o
Synthesized 30o
Filter Set
Response
Taken from W. Freeman, T. Adelson, The Design
and Use of Sterrable Filters, IEEE Trans.
Patt, Anal. and Machine Intell., vol 13, 9, pp
891-900, Sept 1991
Raw Image
30Steerability
Freeman showed that any band limited signal could
form a steerable basis with as many bases as it
had non-zero Fourier coefs. Important example is
2nd derivative of Gaussian
(Laplacian)
Taken from W. Freeman, T. Adelson, The Design
and Use of Steerable Filters, IEEE Trans. Patt,
Anal. and Machine Intell., vol 13, 9, pp
891-900, Sept 1991
31Steerable Pyramid
We may combine Steerability with Pyramids to get
a Steerable Laplacian Pyramid as shown below
Decomposition
Reconstruction
High pass, since band pass in pyramid low pass at
bottom.
Oriented
Low Pass
2 Level decomposition of white circle example
Images from http//www.cis.upenn.edu/eero/steerp
yr.html
32Scale Invariant Feature Transform
- Idea is to find local features that stay the same
(as much as possible) under - Scale change
- 2D rotation in the image x,y plane
- 3D rotation (affine variation)
- Illumination
- Collections of such features can be used for
reliable - 3D object recognition
- User interface, toy interface
- Robot localization, navigation and mapping
- Digital image stitching, organization
- 3D scene understanding
33Scale Invariant Feature Transform
- High Level Algorithm
- Find peak responses (over scale) in Laplacian
pyramid. - Find response with sub-pixel accuracy.
- Only keep corner like responses
- Assign orientation
- Create recognition signature
- Solve affine parameters (3D rot. changes)
34Scale Invariant Feature Transform
From Gaussian scale pyramid -- create
Difference of Gaussian (DOG) images
And find maximum response over space and scale
Images from David G. Lowe, Object recognition
from local scale-invariant features,
International Conference on Computer Vision,
Corfu, Greece (September 1999), pp. 1150-1157
35Scale Invariant Feature Transform
At the location and scale of peak found, find the
gradient orientation
Use the gradients to only keep corner like
peaks in manner similar to Harris corner
detector
At each peak location and scale, use gradients to
form slip tolerant orientation histogram
recognition keys
Images from David G. Lowe, Object recognition
from local scale-invariant features,
International Conference on Computer Vision,
Corfu, Greece (September 1999), pp. 1150-1157
36Scale Invariant Feature Transform
To account for out of image plane (3D) rotation,
solve for affine distortion parameters
For features found, set up system of equations
Which take the form of . Over
determined (least sqrs) solution is then
Eqns from David G. Lowe, Object recognition
from local scale-invariant features,
International Conference on Computer Vision,
Corfu, Greece (September 1999), pp. 1150-1157
37Scale Invariant Feature Transform
Recognition example. Learned models of SIFT
features, and got object outline from background
subtraction
Objects may then be found under occlusion and 3D
rotation
Images from David Lowe, Object Recognition from
Local Scale-Invariant Features Proc. of the
International Conference on Computer Vision,Corfu
(Sept. 1999)
38Scale Invariant Feature Transform
Image stitching example. Attach images together
from keypoints
Solving the homography
Finding similar images in a roll and stitching
Images from M. Brown and D. G. Lowe. Recognising
Panoramas. In Proceedings of the 9th
International Conference on Computer Vision
(ICCV2003)
39Scale Invariant Feature Transform
Localizing Example
Find different views of same scene in video2
Given key images, find and trigger on them1
2) Josef Sivic and Andrew Zisserman, Video
Google A Text Retrieval Approach to Object
Matching in Videos, ICCV 2003
1) David G. Lowe, Distinctive Image Features from
Scale-Invariant Keypoints, Submitted to
International Journal of Computer Vision. Version
date June 2003
40Log-Polar Transform
Go from Euclidian (x,y) to log-polar space
log(reiq) gt (log r, q) space.
Log-polar transform is always done relative to a
chosen center point (xc,yc)
- Images, further advances in George Wolberg,
Siavash Zokai, ROBUST - IMAGE REGISTRATION USING LOG-POLAR TRANSFORM,
ICIP 2000
y
x
Rotation and scale are converted to shifts along
the q or log r axis. Shifting back to a
canonical location gives rotation and scale
invariance. If used on a Fourier image
(translation invariant), we get rotation, scale
and translation invariance (called Fourier-Mellin
transform)1.
41Bilateral Filtering
- We want smoothing that preserves edges. Typically
done via P. Perona and J. Malik anisotropic
diffusion. More clever is the Tomasi and
Manduchi approximation - Rather than just convolve with a Gaussian in
space - the convolution weights use a Gaussian in space
together with a Gaussian in gray level values.
C. Tomasi and R. Manduchi, "Bilateral Filtering
for Gray and Color Images", Proceedings of the
1998 IEEE International Conference on Computer
Vision, Bombay, India
42But Bio-Vision is more dynamic
- Artifacts of competitive edge/diffusion process
Neon Color Spreading Illusion
Best explanation is Grossberg and Mingolla edge
detectors need to be shut off, performed by
competitive inhibition. When weaker edges meet
stronger, the weaker edge is suppressed breaking
the dikes that hold back the diffusion process.
When the edges are disconnected, the illusion
goes away or is diminished below
Grossberg, S., Mingolla, E. (1985). Neural
Dynamics of Form Perception Boundary Completion.
Psychol. Rev., 92, 173--211.
43Local vs. Global
Still, vision is a stranger thing than simple
processing
44Local vs. Global
Still, vision is a stranger thing than simple
processing
45Computer vision often misses the fact that vision
is an active sense
These lines are straight
Nothing is moving here