Title: Early Cognitive Vision
1Early Cognitive Vision
- Recursive Mid-Level Vision
- ECOVISION Summary from year 3
- ECOVISION Highlights in year 3
- Hardware implementation of flow stereo
- IMO detection and space variant mappings
- Motion-Stereo Gestalts for scene disambiguation
- Conclusions
2Hierarchical Image Processing
Pixels
Data and Noise reduction Extraction of
Meaningful Information (first steps)
Low-Level Vision
Features
Spatio-temporal Context Grouping,
Segmentation, Task-dependent Attention Self-Emerge
nce of Entities
Primitives
Mid-Level Vision
Gestalts
High-Level Vision (Cognition)
Higher Cognitive Aspects Reasoning
Objects
3Summary Motion Part
Normal Flow, Hardware Implementation
Smoothing by MT-cell filtering (Neuro)
First extract Heading, then subtract and
then extract all other coarse flow segments.
Fine structure analysis relying on the
RBM principle.
4Summary Stereo-Part
Early vision steps (year 1)
Gestalts in Space
Gestalts in Space-Time Recursive, predictive
processing
5Real-time processing
- Hardware implementation (FPGA)
6Motivation
- Massive parallel processing
- Taking advantage of the digital technology
advances - Specific purpose processing architectures
- 6 Mgates on a single chip
- Motion processing
- Stereo processing
- Space variant mapping
- Motion-driven object tracking
7System-on-Chip Real time processing
8Motion on chip
Different motion processing schemes evaluated in
software (Lucas Kanade, McGM, Horn Schunk,
Simoncelli Heeger, etc)
- Only two approaches have been addressed in
hardware - McGM
- Motivation Robust optic flow estimation.
- Status only the first convolutionary stages
implemented (towards an hybrid approach
sotfware/hardware) - Lucas Kanade
- Motivation Good quality vs computational
complexity trade-off - Status fully working on an FPGA (System-on-chip)
Motion chip (LK) accuracy evaluated with
benchmark sequences and tested in a real-world
application scenario.
9Hardware Implementation of Lucas Kanade
Kpps Resolution Fps
Medium Quality 1776 160x120 320x240 97 26
Medium Quality 625 160x120 33
High Quality 1776 160x120 320x240 95 25
Low cost 400 120x90 38
lt 20 ?
Kpps Kilo pixels per second Fps Frames per
second Averaging stage Medium Quality (3x3) ,
High Quality (5x5)
Spatial vs Temporal resolution
10Hardware optic flow results
- The estimation is correct when the overtaking
relative velocity is significant - The system has been tested on the instrumented car
11Stereo on chip
Different stereo algorithms considered Lucas
Kanade, Phase-based (Silvio et al), block
matching.
- Phase-based stereo processing (Silvio et al)
- Motivation Know-how at ECOVISION, low
complexity computation - Status Fully working on an FPGA platform
Frame Grabber
Phased based stereo system
VGA controller
Direct phase difference calculation
Local contrast
Gabor Filters
Frame Grabber
Precision
8
9
9
11
12Phase-Based Dynamic Stereopsis
Disparity as phase difference
Direct phase diference computation
where
with
13Stereo hardware
SPECS Device occupation On-chip memory Mpps Embedded multipliers Image Resol. Fps Max. Fclk (MHz)
Global system 6165 (18) 23 (15) 31.5 26 (18) 640x480 102 31.5
14Direct phase difference calculation module
15Playing with stereo in real-time manipulation of
objects
16Stereo system data flow
FPGA COARSE GRAIN PIPELINE
17Stereo hardware
SPE CS Device occupation On-chip memory Mpps Embedded multipliers Image Resol. Fps Max. Fclk (MHz)
Global system 6165 (18) 23 (15) 31.5 26 (18) 640x480 102 31.5
Subcircuits device occupation On-chip memory Embedded multipliers Cycles required Max. Fclk (MHz)
2 Frame-grabber 2 VGA controllers 1921 (5) 0 0 1 50
Local Contrast 792 (2) 11 (7) 5 (3) 1 120
Gabors filters 610 (1) 0 8 (5) 1 83
Direct phase Difference calculation 861 (2) 1 (1) 8 (5) 1 47
Cameras Calibration 1070 (3) 0 0 - 50
18Hardware Implementation SPECS (Virtex E 2000)
- 3x3 model.
- Fast version (1776 Kpps)
- Hardware slices occupation 54 .
- BlockRAM memory occupation 17 .
- Slow version (625 Kpps)
- Hardware slices occupation 43 .
- BlockRAM memory occupation 17 .
- 5x5 model
- Fast version (1776 Kpps)
- Hardware slices occupation 82 .
- BlockRAM memory occupation 23 .
- Low cost (3x3 model, 400 Kpps)
- Hardware slices occupation 36 .
- BlockRAM memory occupation 8 .
19Extracting speed from raw optic flow data
solutions
It is possible to compensate for the effect of
perspective by doing a remapping.
Reduce this area
Expand this area
- The advantages of the remapping are
- The speed of the car is more uniform.
- The divergence caused by the expansion of the car
is reduced.
20Space Variant Mapping (SVM) 102 Fps with the
circuit running at 31.5 MHz.
Pipeline stage Number of Slices device occupation on-chip memory Image size Max. Clk (MHz)
Frame-Grabber and Manage Memory modules 883 2.8 0 640 x 480 94.0
IPM 2,454 7.8 0 640 x 480 69.6
Total system 3,564 11.5 0 640 x 480 44.8
21Tracking examples
Foggy and rainy day.
22Tracking examples II
Truck overtaking
23Tracking examples III
Multiple car fast overtaking
24Extracting speed from raw optic flow data
difficulties
- Due to the effect of perspective
- The car will appear to move faster as it
approaches the camera (even though its real speed
is constant) - A spurious expansion is added to the
translational movement of the car.
Car is dark grey
Car is white
25Stereo and Motion stand-alone platforms
Motion processing platform
Stereo processing platform
26Segmenting Independent Motion Overview
- Robust extraction of egomotion from optic flow
- Spatio-temporal filtering of residual flow field
- using motion angle
- using Kalman filter (partner Ita)
- Task-specific remapping
- improved optic flow computation (partner Eng)
27Egomotion Extraction
- Novel algorithm
- outperforms all linear algorithms
- performs close to optimal algorithms
proposed method
linear
optimal
28Egomotion Extraction
- Advantage over optimal algorithms largely
increased robustness to local minima - important when using robust estimation techniques
that introduce additional local minima(e.g.
Tukey M-estimator)
29Motion Segmentation
- After egomotion computation, each optic flow
vector is decomposed in a static (environment)
and moving (independent motion) component - Spatio-temporal smoothing of angular deviations
from the static components yields independently
moving regions
30Motion Segmentation
- Using Kalman filtering, elementary flow
components can be matched to residual flow
vectors ( vectors obtained after subtraction of
static components) (partner Ita) - incorporates spatio-temporal contextual
information - object-based segmentation
31Task-specific Remapping
- Rear view mirror scenario
- Inverse perspective mapping (partner Eng)
- Optic flow computed both in original and remapped
space - large velocities in original space are smaller in
remapped space, which facilitates their
calculation
original flow original space
remapped flow remapped space
remapped flow original space
32Task-specific Remapping
- Fuse flow in original space and segment
independent motion
original flow only
fused flow
33Speed constancy achieved by the Inverse
Perspective Mapping
- As the car approaches the camera, it appears to
be accelerating although it travels at constant
speed. - It can be seen that in the speed image (bottom
left) dark grey (low speed) progressively becomes
white (high speed).
- In the remapped sequence the increase in the
image speed of the car is significantly reduced.
34Speed constancy a quantitative analysis
Methodology
- We manually segment the car in 20 frames in the
original and remapped sequences. - Over the car, we compute the mean speed and its
standard deviation.
Results
- The cars mean speed is considerably more stable
in the remapped sequence. The ratio between
minimum and maximum speed is 1.35 compared to
7.71 in the original sequence - The dispersion of speed values over the car also
shows remarkable stability in the remapped
sequence.
remapped sequence
original sequence
35Recursive mid-level vision
36The Primitive Extraction Scheme
37Stereo and Grouping (1)
- Why using the primitive grouping for stereo ?
- Line primitives are ambiguous along an edge.
- Consistency in primitives should be conserved by
stereo. - Considering groups for stereo largely reduces the
number of candidates.
38Stereo and Grouping (2)
Without grouping constraint
With grouping constraint
39Quantification Method
- Generated stereo colour sequences with ground
truth using colour range data provided by Riegl
(www.riegl.com). - Advantages
- Natural images natural textures, surfaces and
illuminations - Accurate ground truth for stereo and motion
40Performance
- Grouping for stereo
- Improves consistently stereo performance.
- Offer larger improvements for low similarity
threshold - Combined use of lower thresholds yield better
reliability / density trade-offs. - The optimal choice of threshold depends on the
application (need for reliability vs density)
41Performance
Performance correct / total stereo matches
Similarity between Groups
Inner Similarity
42Performance
Performance
Density
43Grouping and Interpolation
44RBM Estimation (1)
Formalisation of RBM
Visual Entities
3D Point / 3D Line 3D Point / 3D Plane
Twists
Numerical Optimisation
Householder
Constraints
System of Linear Equations
Shortest Euclidian Distance of 3D Entities
Rosenhahn, Granert and Sommer 2002
45RBM Estimation (2)
- One needs only some twenty 3D-point-to-2D-line
correspondences to compute an RBM, compared to
some 10.000 primitives extracted ! - The search for the ego-motion is processed as
follows - Use strong grouping constraint to select a set of
highly reliable correspondences, over time and
stereo - Estimate the quality of the computed motion using
reprojection of 3D hypothesis. - Discard correspondences leading to wrong motion.
46Stereo over time (1)
- Stereo problem structures parallel to epipolar
line (Horizontal) - Due to the physical set-up of the cameras
(fronto-parallel). - If the motion is known then stereo between the
same camera at instants T and TN can be
computed. - If the motion is not a pure lateral translation
then the epipolar lines will have different
orientations. - Tri-focal constraint
47Reconstruction All hypotheses
48Stereo Over Time (2)
Reconstruction of both stereos (Accumulation over
5 frames)
Standard stereo reconstruction (no horizontal
structure)
Reconstruction using Stereo over time (no radial
structures)
49Reconstruction combining standard(advanced)
stereo with stereo-over-time
50Final step 3D Accumulation over Timedoing
everything
- If there is a correspondence under transformation
T - increase confidence and
- merge the two entities
- else
- decrease confidence
513D Accumulation Over Timeoriginal frame
523D Accumulation Over Time1st iteration
533D Accumulation Over Time2nd iteration
543D Accumulation Over Time3rd iteration
553D Accumulation Over Time4th iteration
563D Accumulation Over Time5th iteration
57Some final result
58Conclusion
- The individual parts of the ECOVISION system work
well and have been quantitatively tested. - Some parts have been tested directly in cars
- Other parts have been tested off-line with real
driving scenes - Integration of Stereo and Motion using RBM was
successful - Integration of IMO detection and space variant
mapping, too - Integration of the above two parts has not been
achieved in the tenure of this grant - Real time performance has been achieved with the
hardware front ends. - Real time performance of the complete system
would require about 12 more PMs of programming - Several Grant proposals have been put in (locally
and at the Commission) to continue this work.
59(No Transcript)
60The pixel in the remapped image at coordinates
X,Y come from the coordinates x,y in the
original image.
61Advantages of Grouping for RBM Estimation
- For each entity in the top row there are 6
correspondences. Grouping leads to a reduction
from 66 46,656 to 224 correspondences only. - Correspondences with no fitting attributes (e.g.
colour) can be discarded. - c) Local position and orientation can be quite
distorted. Grouping can improve the accuracy of
such local estimates(cf. interpolation).