Computer Vision and Human Perception

About This Presentation

Title:

Computer Vision and Human Perception

Description:

Robot vehicle must see stop sign. sign = imread('Images/stopSign.jpg','jpg' ... Side view classes of Ford Taurus (Chen and Stockman) ... – PowerPoint PPT presentation

Number of Views:316

Avg rating:3.0/5.0

Slides: 91

Provided by: georges9

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Computer Vision and Human Perception

1
Computer Vision and Human Perception

A brief intro from an AI perspective

2
Computer Vision and Human Perception

What are the goals of CV?
What are the applications?
How do humans perceive the 3D world via images?
Some methods of processing images.

3
Goal of computer vision

Make useful decisions about real physical objects
and scenes based on sensed images.
Alternative (Aloimonos and Rosenfeld) goal is
the construction of scene descriptions from
images.
How do you find the door to leave?
How do you determine if a person is friendly or
hostile? .. an elder? .. a possible mate?

4
Critical Issues

Sensing how do sensors obtain images of the
world?
Information how do we obtain color, texture,
shape, motion, etc.?
Representations what representations should/does
a computer or brain use?
Algorithms what algorithms process image
information and construct scene descriptions?

5
Images 2D projections of 3D

3D world has color, texture, surfaces, volumes,
light sources, objects, motion, betweeness,
adjacency, connections, etc.
2D image is a projection of a scene from a
specific viewpoint many 3D features are
captured, some are not.
Brightness or color g(x,y) or f(row, column)
for a certain instant of time
Images indicate familiar people, moving objects
or animals, health of people or machines

6
Image receives reflections

Light reaches surfaces in 3D
Surfaces reflect
Sensor element receives light energy
Intensity matters
Angles matter
Material maters

7
CCD Camera has discrete elts

Lens collects light rays
CCD elts replace chemicals of film
Number of elts less than with film (so far)

8
Intensities near center of eye
9
Camera Programs Display

Camera inputs to frame buffer
Program can interpret data
Program can add graphics
Program can add imagery

10
Some image format issues

Spatial resolution intensity resolution image
file format

11
Resolution is pixels per unit of length

Resolution decreases by one half in cases at
left
Human faces can be recognized at 64 x 64 pixels
per face

12
Features detected depend on the resolution

Can tell hearts from diamonds
Can tell face value
Generally need 2 pixels across line or small
region (such as eye)

13
Human eye as a spherical camera

100M sensing elts in retina
Rods sense intensity
Cones sense color
Fovea has tightly packed elts, more cones
Periphery has more rods
Focal length is about 20mm
Pupil/iris controls light entry

Eye scans, or saccades to image details on
fovea
100M sensing cells funnel to 1M optic nerve
connections to the brain

14
Image processing operations

Thresholding
Edge detection
Motion field computation

15
Find regions via thresholding

Region has brighter or darker or redder color,
etc.
If pixel threshold
then pixel 1 else pixel 0

16
Example red blood cell image

Many blood cells are separate objects
Many touch bad!
Salt and pepper noise from thresholding
How useable is this data?

17
Robot vehicle must see stop sign
sign imread('Images/stopSign.jpg','jpg') red
(sign(, , 1)120) (sign(,,2)(sign(,,3)'Images/stopRed120.jpg', 'jpg')
18
Thresholding is usually not trivial
19
Can cluster pixels by color similarity and by
adjacency
Original RGB Image
Color Clusters by K-Means
20
Some image processing ops

Finding contrast in an image using neighborhoods
of pixels detecting motion across 2 images

21
Differentiate to find object edges

For each pixel, compute its contrast
Can use max difference of its 8 neighbors
Detects intensity change across boundary of
adjacent regions

LOG filter later on
22
4 and 8 neighbors of a pixel

4 neighbors are at multiples of 90 degrees
. N .
W E
. S .

8 neighbors are at every multiple of 45 degrees
NW N NE
W E
SW S SE

23
Detect Motion via Subtraction

Constant background
Moving object
Produces pixel differences at boundary
Reveals moving object and its shape

Differences computed over time rather than over
space
24
Two frames of aerial imagery
Video frame N and N1 shows slight movement most
pixels are same, just in different locations.
25
Best matching blocks between video frames N1 to
N (motion vectors)
The bulk of the vectors show the true motion of
the airplane taking the pictures. The long
vectors are incorrect motion vectors, but they do
work well for compression of image I2!
Best matches from 2nd to first image shown as
vectors overlaid on the 2nd image. (Work by Dina
Eldin.)
26
Gradient from 3x3 neighborhood
Estimate both magnitude and direction of the edge.
27
Prewitt versus Sobel masks
Sobel mask uses weights of 1,2,1 and -1,-2,-1 in
order to give more weight to center estimate. The
scaling factor is thus 1/8 and not 1/6.
28
Computational short cuts
29
2 rows of intensity vs difference
30
Masks show how to combine neighborhood values
Multiply the mask by the image neighborhood to
get first derivatives of intensity versus x and
versus y
31
Curves of contrasting pixels
32
Boundaries not always found well
33
Canny boundary operator
34
LOG filter creates zero crossing at step edges
(2nd der. of Gaussian)
3x3 mask applied at each image position
Detects spots
Detects steps edges
Marr-Hildreth theory of edge detection
35
G(x,y) Mexican hat filter
36
Positive center
Negative surround
37
Properties of LOG filter

Has zero response on constant region
Has zero response on intensity ramp
Exaggerates a step edge by making it larger
Responds to a step in any direction across the
receptive field.
Responds to a spot about the size of the center

38
Human receptive field is analogous to a mask
Xj are the image intensities.
Wj are gains (weights) in the mask
39
Human receptive fields amplify contrast
40
3D neural network in brain
Level j
Level j1
41
Mach band effect shows human bias
42
Human bias and illusions supports receptive field
theory of edge detection
43
Human brain as a network

100B neurons, or nodes
Half are involved in vision
10 trillion connections
Neuron can have fanout of 10,000
Visual cortex highly structured to process 2D
signals in multiple ways

44
Color and shading

Used heavily in human vision
Color is a pixel property, making some
recognition problems easy
Visible spectrum for humans is 400nm (blue) to
700 nm (red)
Machines can see much more ex. X-rays,
infrared, radio waves

45
Imaging Process (review)
46
Factors that Affect Perception

Light the spectrum of energy that
illuminates the object
surface
Reflectance ratio of reflected light to
incoming light
Specularity highly specular (shiny) vs.
matte surface
Distance distance to the light source
Angle angle between surface normal
and light
source
Sensitivity how sensitive is the sensor

47
Some physics of color

White light is composed of all visible
frequencies (400-700)
Ultraviolet and X-rays are of much smaller
wavelength
Infrared and radio waves are of much longer
wavelength

48
Models of Reflectance
We need to look at models for the physics of
illumination and reflection that will
1. help computer vision algorithms extract
information about the 3D world,
and 2. help computer graphics algorithms render
realistic images of model scenes.
Physics-based vision is the subarea of computer
vision that uses physical models to understand im
age formation in order to better analyze real-wor
ld images.
49
The Lambertian ModelDiffuse Surface Reflection
A diffuse reflecting surface reflects light unif
ormly in all directions
Uniform brightness for all viewpoints of a planar

surface.
50
Real matte objects
Light from ring around camera lens
51
Specular reflection is highly directional and
mirrorlike.
R is the ray of reflection V is direction fr
om the surface toward the viewpoint ? is
the shininess
parameter
52
CV Perceiving 3D from 2D

Many cues from 2D images enable interpretation of
the structure of the 3D world producing them

53
Many 3D cues
How can humans and other machines reconstruct the
3D nature of a scene from 2D images?
What other world knowledge needs to be added in
the process?
54
Labeling image contours interprets the 3D scene
structure
Logo on cup is a mark on the material
shadow relates to illumination, not material
An egg and a thin cup on a table top lighted from
the top right

55
Intrinsic Image stores 3D info in pixels and
not intensity.
For each point of the image, we want depth to the
3D surface point, surface normal at that point,
albedo of the surface material, and illumination
of that surface point.
56
3D scene versus 2D image

Creases
Corners
Faces
Occlusions (for some viewpoint)

Edges
Junctions
Regions
Blades, limbs, Ts

57
Labeling of simple polyhedra
Labeling of a block floating in space. BJ and KI
are convex creases. Blades AB, BC, CD, etc model
the occlusion of the background. Junction K is a
convex trihedral corner. Junction D is a
T-junction modeling the occlusion of blade CD by
blade JE.
58
Trihedral Blocks World Image Junctions only 16
cases!
Only 16 possible junctions in 2D formed by
viewing 3D corners formed by 3 planes and viewed
from a general viewpoint! From top to bottom
L-junctions, arrows, forks, and T-junctions.
59
How do we obtain the catalog?

think about solid/empty assignments to the 8
octants about the X-Y-Z-origin
think about non-accidental viewpoints
account for all possible topologies of junctions
and edges
then handle T-junction occlusions

60
Blocks world labeling
Left block floating in space
Right block glued to a wall at the back
61
Try labeling these interpret the 3D structure,
then label parts
What does it mean if we cant label them? If we
can label them?
62
1975 researchers very excited

very strong constraints on interpretations
several hundred in catalogue when cracks and
shadows allowed (Waltz) algorithm works very
well with them
but, world is not made of blocks!
later on, curved blocks world work done but not
as interesting

63
Backtracking or interpretation tree
64
Necker cube has multiple interpretations
Label the different interpretations
A human staring at one of these cubes typically
experiences changing interpretations. The
interpretation of the two forks (G and H)
flip-flops between front corner and back
corner. What is the explanation?
65
Depth cues in 2D images
66
Interposition cue
Def Interposition occurs when one object
occludes another object, thus indicating that the
occluding object is closer to the viewer than the
occluded object.
67
interposition

T-junctions indicate occlusion top is occluding
edge while bar is the occluded edge
Bench occludes lamp post
leg occludes bench
lamp post occludes fence
railing occludes trees
trees occlude steeple

Perspective scaling railing looks smaller at
the left bench looks smaller at the right 2
steeples are far away
Forshortening the bench is sharply angled
relative to the viewpoint image length is
affected accordingly

69
Texture gradient reveals surface orientation
( In East Lansing, we call it corn not maize.
)
Note also that the rows appear to converge in 2D
Texture Gradient change of image texture along
some direction, often corresponding to a change
in distance or orientation in the 3D world
containing the objects creating the texture.
70
3D Cues from Perspective
71
3D Cues from perspective
72
More 3D cues
Virtual lines
Falsely perceived interposition
73
Irving Rock The Logic of Perception 1982

Summarized an entire career in visual psychology
Concluded that the human visual system acts as a
problem-solver
Triangle unlikely to be accidental must be
object in front of background must be brighter
since its closer

74
More 3D cues
2D alignment usually means 3d alignment
2D image curves create perception of 3D surface
75
structured light can enhance surfaces in
industrial vision
Potatoes with light stripes
Sculpted object
76
Models of Reflectance
We need to look at models for the physics of
illumination and reflection that will
1. help computer vision algorithms extract
information about the 3D world,
and 2. help computer graphics algorithms render
realistic images of model scenes.
Physics-based vision is the subarea of computer
vision that uses physical models to understand im
age formation in order to better analyze real-wor
ld images.
77
The Lambertian ModelDiffuse Surface Reflection
A diffuse reflecting surface reflects light unif
ormly in all directions
Uniform brightness for all viewpoints of a planar

surface.
78
Shape (normals) from shading
Clearly intensity encodes shape in this case
Cylinder with white paper and pen stripes
Intensities plotted as a surface
79
Shape (normals) from shading
Plot of intensity of one image row reveals the 3D
shape of these diffusely reflecting objects.
80
Specular reflection is highly directional and
mirrorlike.
R is the ray of reflection V is direction fr
om the surface toward the viewpoint ? is
the shininess
parameter
81
What about models for recognition

recognition to know again
How does memory store models of faces, rooms,
chairs, etc.?

82
Human capability extensive

Child age 6 might recognize 3000 words
And 30,000 objects
Junkyard robot must recognize nearly all objects
Hundreds of styles of lamps, chairs, tools,

83
Some methods recognize

Via geometric alignment CAD
Via trained neural net
Via parts of objects and how they join
Via the function/behavior of an object

84
Side view classes of Ford Taurus (Chen and
Stockman)
These were made in the PRIP Lab from a scale
model. Viewpoints in between can be generated fr
om x and y curvature stored on boundary.
Viewpoints matched to real image boundaries via
optimization.
85
Matching image edges to model limbs
Could recognize car model at stoplight or gate.
86
Object as parts relations