Title: Intelligent Vision Processor
1Intelligent Vision Processor
- John Morris
- Computer Science/Electrical Computer
Engineering,The University of Auckland
Iolanthe II rounds Channel Island
-Auckland-Tauranga Race, 2007
2Intelligent Vision Processor
- Applications
- ? Robot Navigation
- ? Collision avoidance autonomous vehicles
- ? Manoeuvring in dynamic environments
- ? Biometrics
- Face recognition
- ? Tracking individuals
- ? Films
- ? Markerless motion tracking
- ? Security
- ? Intelligent threat detection
- ? Civil Engineering
- ? Materials Science
- ? Archaeology
3Background
4Intelligent Vision
- Our vision system is extraordinary
- Capabilities currently exceed those of any single
processor - Our brains
- Operates on a very slow clock
- kHz region
- Massively parallel
- gt1010 neurons can compute in parallel
- Vision system (eyes) can exploit this parallelism
- 3 x 106 sensor elements (rods and cones) in
human retina
5Intelligent Vision
- Matching and recognition
- Artificial intelligence systems are currently not
in the race! - For example
- Face recognition
- We can recognize faces
- From varying angles
- Under extreme lighting conditions
- With or without glasses, beards, bandages,
makeup, etc - With skin tone changes, eg sunburn
- Games
- We can strike balls travelling at gt 100 km/h
- and
- Direct that ball with high precision
6Human vision
- Uses a relatively slow, but massively parallel
processor (our brains) - Able to perform tasks
- At speeds
- and
- With accuracy
- beyond capabilities of state-of-the-art
artificial systems
7Intelligent Artificial Vision
- High performance processor
- Too slow for high resolution (Mpixel) imagein
real time (30 frames per second) - Useful vision systems
- Must be able to
- Produce 3D scene models
- Update scene models quickly
- Immediate goal 20-30Hz to mimic human
capabilities - Long term goal gt30 Hz to provide enhanced
capabilities - Produce accurate scene models
8Intelligent Artificial Vision
- Use human brain as the fundamental model
- We know it works better than a conventional
processor! - We need
Artificial system Brain
Large numbers of (small) processing elements Neurons
Many parallel connections Nerves
9Human Vision Systems
- Higher order animals all use binocular vision
systems - Permits estimation of distance to an object
- Vital for many survival tasks
- Hunting
- Avoiding danger
- Fighting predators
- Distance (or depth) computed by triangulation
P
P
P
P-P is the disparity It increases as P comes
closer
10Human Vision Systems
- Higher order animals all use binocular vision
systems - Permits estimation of distance to an object
- Vital for many survival tasks
- Hunting
- Avoiding danger
- Fighting predators
- Distance (or depth) computed by triangulation
P
P
P
P-P is the disparity Increases as P comes
closer
11Artificial Vision
- Evolution took millions of years to optimize
vision - Dont ignore those lessons!
- Binocular vision works
- Verging optics
- Human eyes are known to swivel to fixate on an
object of interest
12Real vs Ideal Systems
- Real lenses distort images
- Distortion must be removed for high precision
work! - Easy
- but
- Conventional technique uses iterative solution
- Slow!
- Faster approach needed for real time work
Image of a rectangular gridwith a real lens
13Why Stereo?
- Range finders give depth information directly
- SONAR
- Simple
- Not very accurate (long l)
- Beam spread ? Low spatial resolution
- Lasers
- Precise
- Low divergence ? High spatial resolution
- Requires fairly sophisticated electronics
- Nothing too challenging in 2008
- Why use an indirect measurement when direct ones
are available?
14Why Stereo?
- Passive
- Suitable for dense environments
- Sensors do not interfere with each other
- Wide area coverage
- Multiple overlapping views obtainable without
interference - Wide area 3D data can be acquired at high rates
- 3D data aids unambiguous recognition
- 3rd dimension provides additional discrimination
- Textureless regions cause problems
- but
- Active illumination can resolve these
- Active patterns can use IR (invisible, eye-safe)
light
15Artificial Vision Challenges
16Artificial Vision - Challenges
- High processor power
- Match parallel capabilities of human brain
- Distortion removal
- Real lenses always show some distortion
- Depth accuracy
- Evolution learnt about verging optics millions of
years ago! - Efficient matching
- Good corresondence algorithms
17Artificial Vision
- Simple stereo systems are being produced
- Point Grey, etc
- All use canonical configuration
- Parallel axes, coplanar image planes
- Computationally simpler
- High performance processor doesnt have time to
deal with the extra computational complexity of
verging optics
Point Grey Research Trinocular vision system
18Artificial System Requirements
- Highly Parallel Computation
- Calculations are not complex
- but
- There are a lot of them in megapixel ( gt106 )
images! - High Resolution Images
- Depth is calculated from the disparity
- If its only a few pixels, then depth accuracy is
low - Basic equation (canonical configuration only!)
Baseline
Focal Length
Depth, z b f d p
Pixel size
Disparity
19Artificial System Requirements
- Depth resolution is critical!
- A cricket player can catch a 100mm ball
travelling at 100km/h - High Resolution Images Needed
- Disparities are large numbers of pixels
- Small depth variations can be measured
- but
- High resolution images increase the demand for
processing power!
Strange game played in former British
coloniesin which a batsmen defends 3 small
sticksin the centre of a large field against a
bowler whotries to knock them down!
20Artificial System Requirements
- Conventional processors do not have sufficient
processing power - but Moores Law says
- Wait 18 months and the power will have doubled
- but
- The changes that give you twice the poweralso
give your twice as many pixels in a rowand four
times as many in an image!
Specialized highly parallel hardwareis the only
solution!
21Processing Power Solution
22FPGA Hardware
- FPGA Field Programmable Gate Array
- Soft hardware
- Connections and logic functions are programmed
in much the same way as a conventional von Neuman
processor - Creating a new circuit is about as difficult as
writing a programme! - High order parallelism is easy
- Replicate the circuit n times
- As easy as writing a for loop!
23FPGA Hardware
- FPGA Field Programmable Gate Array
- Circuit is stored in static RAM cells
- Changed as easily as reloading a new program
24FPGA Hardware
- Why is programmability important?
- or
- Why not design a custom ASIC?
- Optical systems dont have the flexibility of a
human eye - Lenses fabricated from rigid materials
- Not possible to make a one system fits all
system - Optical configurations must be designed for each
application - Field of view
- Resolution required
- Physical constraints
-
- Processing hardware has to be adapted to the
optical configuration - If we design an ASIC, it will only work for one
application!!
25Correspondence or Matching
26Stereo Correspondence
Can you find all the matching points in these two
images?
Of course! Its easy!
The best computer matching algorithms get 5 or
more of the points completely wrong!
and take a long time to do it!Theyre not
candidates for real time systems!!
27Stereo Correspondence
- High performance matching algorithms are global
in nature - Optimize over large image regions using energy
minimization schemes - Global algorithms are inherently slow
- Iterate many times over small regions to find
optimal solutions
28Correspondence Algorithms
- Good matching performance, global, low speed
- Graph-cut, belief-propagation,
- High speed, simple, local, high parallelism,
lowest performance - Correlation
- High speed, moderate complexity, parallel, medium
performance
Dynamic programming algorithms
29Depth Accuracy
30Stereo Configuration
Points along these lineshave the same disparity
- Canonical configuration Two cameras with
parallel optical axes - Rays are drawn through each pixel in the image
- Ray intersections represent points imaged onto
the centre of each pixel
Depthresolution
- but
- To obtain depth information, a point must be seen
by both cameras, ie it must be in the Common
Field of View
31Stereo Camera Configuration
- Now, consider an object of extent, a
- To be completely measured, it must lie in the
Common Field of View - but
- place it as close to the camera as you can so
that you can obtain the best accuracy, say at D - Now increase b to increase the accuracy at D
- But you must increase D so that the object stays
within the CFoV! - Detailed analysis leads to an optimum value of b
? a
a
D
b
a
32Increasing the baseline
Increasing the baseline decreases performance!!
good matches
Images corridor set (ray-traced) Matching
algorithms P2P, SAD
Baseline, b
33Increasing the baseline
Examine the distribution of errors
Increasing the baseline decreases performance!!
Standard Deviation
Images corridor set (ray-traced) Matching
algorithms P2P, SAD
Baseline, b
34Increased Baseline ? Decreased Performance
- Statistical
- Higher disparity range
- increased probability of matching incorrectly -
youve simply got more choices! - Perspective
- Scene objects are not fronto-planar
- Angled to camera axes
- subtend different numbers of pixels in L and R
images - Scattering
- Perfect scattering (Lambertian) surface
assumption - OK at small angular differences
- increasing failure at higher angles
- Occlusions
- Number of hidden regions increases as angular
difference increases - increasing number of monocular points for
which there is no 3D information!
35Evolution
- Human eyes verge on an object to estimate its
distance, ie the eyes fix on the object in the
field of view
Configuration commonly used in stereo systems
Configuration discovered by evolution millions of
years ago
Note immediately that the CFoV is much larger!
36Look at the optical configuration!
- If we increase f, then Dmin returns to the
critical value!
Original f
Increase f
37Depth Accuracy - Verging axes, increased f
Now the depth accuracy has increased dramatically!
Note that at large f, the CFoV does not
extend very far!
38Summary
39Summary Real time stereo
- General data acquisition is
- Non contact
- Adaptable to many environments
- Passive
- Not susceptible to interference from other
sensors - Rapid
- Acquires complete scenes in each shot
- Imaging technology is well established
- Cost effective, robust, reliable
- 3D data enhances recognition
- Full capabilities of 2D imaging system
- Depth data
- With hardware acceleration
- 3D scene views available for
- ControlMonitoring
- in real time
- Rapid response ? rapid throughput
Host computer is free to process complex control
algorithms ? Intelligent Vision
Processing Systems which can mimic human vision
system capabilities!
40Our Solution
41System Architecture
FPGA
L Camera
SerialInterface Firewire/GigE/CameraLink
Line BuffersDistortion Removal Image Alignment
RCamera
PC
Host Higher orderInterpretation
Control Signals
CorrectedImages
Stereo Matching
Disparity? Depth
DepthMap
42Distortion removal
- Image of a rectangular grid from camera with
simple zoom lens - Lines should be straight!
- Store displacements of actual image from ideal
points in LUT - Removal algorithm
- For each ideal pixel position
- Get displacement to real image
- Calculate intensity of ideal pixel (bilinear
interpolation)
43Distortion Removal
- Fundamental Idea
- Calculation of undistorted pixel position
- Simple but slow
- Not suitable for real time
- but
- Its the same for every image!
- So, calculate once!
- Create a look up table containing ideal ? actual
displacements for each pixel
ud uud (1k2k4..)r2 r2 (uudvud)2
44Distortion Removal
- Creating the LUT
- One entry (dx,dy) per pixel
- For a 1 Mpixel image needs 8 Mpixels!
- Each entry is a float (dx,dy) requires 8 bytes
- However, distortion is a smooth curve
- Store one entry per n pixels
- Trials show that n64 is OK for severely
distorted image - LUT row contains 210 / 2 6 24 16 entries
- Total LUT is 256 entries
- Displacement for pixel j,k
- dujk (j mod 64) duj/64,k/64
- duj/64,k/64 is stored in LUT
- Simple, fast circuit
Since the algorithm runs along scan lines,this
multiplication is done by repeated addition
45Alignment correction
- In general, cameras will not be perfectly aligned
in canonical configuration - Also, may be using verging axes to improve depth
resolution - Calculate locations of epipolar lines once!
- Add displacements to LUT for distortion!
46Real time 3D data acquisition
- Real time stereo vision
- Implemented Gimelfarbs Symmetric Dynamic
Programming Stereo in FPGA hardware - Real time precise stereo vision
- Faster, smaller hardware circuit
- Real time 3D maps
- 1 depth accuracy with 2 scan line latency at 25
frames/se
System block diagram lens distortion
removal,misalignment correction and depth
calculator
Output is stream of depth values a 3D movie!
47Real time 3D data acquisition
- Possible Applications
- Collision avoidance for robots
- Recognition via 3D models
- Fast model acquisition
- Imaging technology not scanning!
- Recognition of humans without markers
- Tracking objects
- Recognizing orientation, alignment
- Process monitoring
- eg Resin flow in flexible (bag) moulds
- Motion capture robot training
System block diagram lens distortion
removal,misalignment correction and depth
calculator
Output is stream of depth values a 3D movie!
48FPGA Stereo System
Parallel Host Interface
FirewireCables
FPGA AlteraStratix
FirewirePhysical Layer ASIC
FirewireLink Layer ASIC
FPGA Prog Cable
49Summary
50Summary
- Challenges of Artificial Vision Systems
- Real-time Image processing requires compute
power! - Correspondence (Matching)
- Depth accuracy
- Evolution Lessons
- Emulate parallel processing capability of
humanbrain - Use verging optics
51Summary
- Our system
- FPGA front end processor
- Remove distortion
- Correct camera misalignment
- Stereo matching
- Using dynamic programming
- Latency
- Several scan lines (?1 millisecond)
- Depends on lens distortion and camera alignment
- Host does not have to wait for a whole image!
- Depth (distance) maps in real-time
- 3D vision!
- Frees host processor for image interpretation
- Use both technologies (FPGA, conventional CPU)
where they perform best!
52Ongoing Photogrammetry Projects
53Ongoing Projects
- Face Recognition
- Development of Face Models
- Animation
- Automated Driving
- With Daimler-Benz
- Stereo Algorithms
- Improved correspondence algorithms
- High Quality Rendering
- Movie special effects eg The Lord of the
Rings - Using reconfigurable hardware (FPGA)
54Spare slides
55Stereo matching
- Automated stereo systems find matching regions in
the two images - The separation of the matching regions is the
disparity from which depth is calculated - Matching algorithms generally search over a range
of possible disparities - Looking for the best match in the two images
Stereo Correspondence is a classical challenge
for AI systems Our brains match regions in images
without effort .. but computers struggle to match
as well!
56Stereo Photogrammetry
Pairs of images giving different views of the
scene
can be used to compute a depth (disparity) map ?
57DetailSystem Architecture
Pixel Buffers
Pixel AddressGeneratorRemoves distortionand
misalignment
Predecessormatrix (dynamic programming)
n DisparityCalculatorsOne for each
possibledisparity value
Stream of disparity values
58(No Transcript)