Computer Vision and Human Perception - PowerPoint PPT Presentation

About This Presentation
Title:

Computer Vision and Human Perception

Description:

Robot vehicle must see stop sign. sign = imread('Images/stopSign.jpg','jpg' ... Side view classes of Ford Taurus (Chen and Stockman) ... – PowerPoint PPT presentation

Number of Views:316
Avg rating:3.0/5.0
Slides: 91
Provided by: georges9
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Computer Vision and Human Perception


1
Computer Vision and Human Perception
  • A brief intro from an AI perspective

2
Computer Vision and Human Perception
  • What are the goals of CV?
  • What are the applications?
  • How do humans perceive the 3D world via images?
  • Some methods of processing images.

3
Goal of computer vision
  • Make useful decisions about real physical objects
    and scenes based on sensed images.
  • Alternative (Aloimonos and Rosenfeld) goal is
    the construction of scene descriptions from
    images.
  • How do you find the door to leave?
  • How do you determine if a person is friendly or
    hostile? .. an elder? .. a possible mate?

4
Critical Issues
  • Sensing how do sensors obtain images of the
    world?
  • Information how do we obtain color, texture,
    shape, motion, etc.?
  • Representations what representations should/does
    a computer or brain use?
  • Algorithms what algorithms process image
    information and construct scene descriptions?

5
Images 2D projections of 3D
  • 3D world has color, texture, surfaces, volumes,
    light sources, objects, motion, betweeness,
    adjacency, connections, etc.
  • 2D image is a projection of a scene from a
    specific viewpoint many 3D features are
    captured, some are not.
  • Brightness or color g(x,y) or f(row, column)
    for a certain instant of time
  • Images indicate familiar people, moving objects
    or animals, health of people or machines

6
Image receives reflections
  • Light reaches surfaces in 3D
  • Surfaces reflect
  • Sensor element receives light energy
  • Intensity matters
  • Angles matter
  • Material maters

7
CCD Camera has discrete elts
  • Lens collects light rays
  • CCD elts replace chemicals of film
  • Number of elts less than with film (so far)

8
Intensities near center of eye
9
Camera Programs Display
  • Camera inputs to frame buffer
  • Program can interpret data
  • Program can add graphics
  • Program can add imagery

10
Some image format issues
  • Spatial resolution intensity resolution image
    file format

11
Resolution is pixels per unit of length
  • Resolution decreases by one half in cases at
    left
  • Human faces can be recognized at 64 x 64 pixels
    per face

12
Features detected depend on the resolution
  • Can tell hearts from diamonds
  • Can tell face value
  • Generally need 2 pixels across line or small
    region (such as eye)

13
Human eye as a spherical camera
  • 100M sensing elts in retina
  • Rods sense intensity
  • Cones sense color
  • Fovea has tightly packed elts, more cones
  • Periphery has more rods
  • Focal length is about 20mm
  • Pupil/iris controls light entry
  • Eye scans, or saccades to image details on
    fovea
  • 100M sensing cells funnel to 1M optic nerve
    connections to the brain

14
Image processing operations
  • Thresholding
  • Edge detection
  • Motion field computation

15
Find regions via thresholding
  • Region has brighter or darker or redder color,
    etc.
  • If pixel threshold
  • then pixel 1 else pixel 0

16
Example red blood cell image
  • Many blood cells are separate objects
  • Many touch bad!
  • Salt and pepper noise from thresholding
  • How useable is this data?

17
Robot vehicle must see stop sign
sign imread('Images/stopSign.jpg','jpg') red
(sign(, , 1)120) (sign(,,2)(sign(,,3)'Images/stopRed120.jpg', 'jpg')
18
Thresholding is usually not trivial
19
Can cluster pixels by color similarity and by
adjacency
Original RGB Image
Color Clusters by K-Means
20
Some image processing ops
  • Finding contrast in an image using neighborhoods
    of pixels detecting motion across 2 images

21
Differentiate to find object edges
  • For each pixel, compute its contrast
  • Can use max difference of its 8 neighbors
  • Detects intensity change across boundary of
    adjacent regions

LOG filter later on
22
4 and 8 neighbors of a pixel
  • 4 neighbors are at multiples of 90 degrees
  • . N .
  • W E
  • . S .
  • 8 neighbors are at every multiple of 45 degrees
  • NW N NE
  • W E
  • SW S SE

23
Detect Motion via Subtraction
  • Constant background
  • Moving object
  • Produces pixel differences at boundary
  • Reveals moving object and its shape

Differences computed over time rather than over
space
24
Two frames of aerial imagery
Video frame N and N1 shows slight movement most
pixels are same, just in different locations.
25
Best matching blocks between video frames N1 to
N (motion vectors)
The bulk of the vectors show the true motion of
the airplane taking the pictures. The long
vectors are incorrect motion vectors, but they do
work well for compression of image I2!
Best matches from 2nd to first image shown as
vectors overlaid on the 2nd image. (Work by Dina
Eldin.)
26
Gradient from 3x3 neighborhood
Estimate both magnitude and direction of the edge.
27
Prewitt versus Sobel masks
Sobel mask uses weights of 1,2,1 and -1,-2,-1 in
order to give more weight to center estimate. The
scaling factor is thus 1/8 and not 1/6.
28
Computational short cuts
29
2 rows of intensity vs difference
30
Masks show how to combine neighborhood values
Multiply the mask by the image neighborhood to
get first derivatives of intensity versus x and
versus y
31
Curves of contrasting pixels
32
Boundaries not always found well
33
Canny boundary operator
34
LOG filter creates zero crossing at step edges
(2nd der. of Gaussian)
3x3 mask applied at each image position
Detects spots
Detects steps edges
Marr-Hildreth theory of edge detection
35
G(x,y) Mexican hat filter
36
Positive center
Negative surround
37
Properties of LOG filter
  • Has zero response on constant region
  • Has zero response on intensity ramp
  • Exaggerates a step edge by making it larger
  • Responds to a step in any direction across the
    receptive field.
  • Responds to a spot about the size of the center

38
Human receptive field is analogous to a mask
Xj are the image intensities.
Wj are gains (weights) in the mask
39
Human receptive fields amplify contrast
40
3D neural network in brain
Level j
Level j1
41
Mach band effect shows human bias
42
Human bias and illusions supports receptive field
theory of edge detection
43
Human brain as a network
  • 100B neurons, or nodes
  • Half are involved in vision
  • 10 trillion connections
  • Neuron can have fanout of 10,000
  • Visual cortex highly structured to process 2D
    signals in multiple ways

44
Color and shading
  • Used heavily in human vision
  • Color is a pixel property, making some
    recognition problems easy
  • Visible spectrum for humans is 400nm (blue) to
    700 nm (red)
  • Machines can see much more ex. X-rays,
    infrared, radio waves

45
Imaging Process (review)
46
Factors that Affect Perception
  • Light the spectrum of energy that
  • illuminates the object
    surface
  • Reflectance ratio of reflected light to
    incoming light
  • Specularity highly specular (shiny) vs.
    matte surface
  • Distance distance to the light source
  • Angle angle between surface normal
    and light
  • source
  • Sensitivity how sensitive is the sensor

47
Some physics of color
  • White light is composed of all visible
    frequencies (400-700)
  • Ultraviolet and X-rays are of much smaller
    wavelength
  • Infrared and radio waves are of much longer
    wavelength

48
Models of Reflectance
We need to look at models for the physics of
illumination and reflection that will
1. help computer vision algorithms extract
information about the 3D world,
and 2. help computer graphics algorithms render
realistic images of model scenes.
Physics-based vision is the subarea of computer
vision that uses physical models to understand im
age formation in order to better analyze real-wor
ld images.
49
The Lambertian ModelDiffuse Surface Reflection
A diffuse reflecting surface reflects light unif
ormly in all directions
Uniform brightness for all viewpoints of a planar

surface.
50
Real matte objects
Light from ring around camera lens
51
Specular reflection is highly directional and
mirrorlike.
R is the ray of reflection V is direction fr
om the surface toward the viewpoint ? is
the shininess
parameter
52
CV Perceiving 3D from 2D
  • Many cues from 2D images enable interpretation of
    the structure of the 3D world producing them

53
Many 3D cues
How can humans and other machines reconstruct the
3D nature of a scene from 2D images?
What other world knowledge needs to be added in
the process?
54
Labeling image contours interprets the 3D scene
structure
Logo on cup is a mark on the material
shadow relates to illumination, not material
An egg and a thin cup on a table top lighted from
the top right

55
Intrinsic Image stores 3D info in pixels and
not intensity.
For each point of the image, we want depth to the
3D surface point, surface normal at that point,
albedo of the surface material, and illumination
of that surface point.
56
3D scene versus 2D image
  • Creases
  • Corners
  • Faces
  • Occlusions (for some viewpoint)
  • Edges
  • Junctions
  • Regions
  • Blades, limbs, Ts

57
Labeling of simple polyhedra
Labeling of a block floating in space. BJ and KI
are convex creases. Blades AB, BC, CD, etc model
the occlusion of the background. Junction K is a
convex trihedral corner. Junction D is a
T-junction modeling the occlusion of blade CD by
blade JE.
58
Trihedral Blocks World Image Junctions only 16
cases!
Only 16 possible junctions in 2D formed by
viewing 3D corners formed by 3 planes and viewed
from a general viewpoint! From top to bottom
L-junctions, arrows, forks, and T-junctions.
59
How do we obtain the catalog?
  • think about solid/empty assignments to the 8
    octants about the X-Y-Z-origin
  • think about non-accidental viewpoints
  • account for all possible topologies of junctions
    and edges
  • then handle T-junction occlusions

60
Blocks world labeling
Left block floating in space
Right block glued to a wall at the back
61
Try labeling these interpret the 3D structure,
then label parts
What does it mean if we cant label them? If we
can label them?
62
1975 researchers very excited
  • very strong constraints on interpretations
  • several hundred in catalogue when cracks and
    shadows allowed (Waltz) algorithm works very
    well with them
  • but, world is not made of blocks!
  • later on, curved blocks world work done but not
    as interesting

63
Backtracking or interpretation tree
64
Necker cube has multiple interpretations
Label the different interpretations
A human staring at one of these cubes typically
experiences changing interpretations. The
interpretation of the two forks (G and H)
flip-flops between front corner and back
corner. What is the explanation?
65
Depth cues in 2D images
66
Interposition cue
Def Interposition occurs when one object
occludes another object, thus indicating that the
occluding object is closer to the viewer than the
occluded object.
67
interposition
  • T-junctions indicate occlusion top is occluding
    edge while bar is the occluded edge
  • Bench occludes lamp post
  • leg occludes bench
  • lamp post occludes fence
  • railing occludes trees
  • trees occlude steeple

68
  • Perspective scaling railing looks smaller at
    the left bench looks smaller at the right 2
    steeples are far away
  • Forshortening the bench is sharply angled
    relative to the viewpoint image length is
    affected accordingly

69
Texture gradient reveals surface orientation
( In East Lansing, we call it corn not maize.
)
Note also that the rows appear to converge in 2D
Texture Gradient change of image texture along
some direction, often corresponding to a change
in distance or orientation in the 3D world
containing the objects creating the texture.
70
3D Cues from Perspective
71
3D Cues from perspective
72
More 3D cues
Virtual lines
Falsely perceived interposition
73
Irving Rock The Logic of Perception 1982
  • Summarized an entire career in visual psychology
  • Concluded that the human visual system acts as a
    problem-solver
  • Triangle unlikely to be accidental must be
    object in front of background must be brighter
    since its closer

74
More 3D cues
2D alignment usually means 3d alignment
2D image curves create perception of 3D surface
75
structured light can enhance surfaces in
industrial vision
Potatoes with light stripes
Sculpted object
76
Models of Reflectance
We need to look at models for the physics of
illumination and reflection that will
1. help computer vision algorithms extract
information about the 3D world,
and 2. help computer graphics algorithms render
realistic images of model scenes.
Physics-based vision is the subarea of computer
vision that uses physical models to understand im
age formation in order to better analyze real-wor
ld images.
77
The Lambertian ModelDiffuse Surface Reflection
A diffuse reflecting surface reflects light unif
ormly in all directions
Uniform brightness for all viewpoints of a planar

surface.
78
Shape (normals) from shading
Clearly intensity encodes shape in this case
Cylinder with white paper and pen stripes
Intensities plotted as a surface
79
Shape (normals) from shading
Plot of intensity of one image row reveals the 3D
shape of these diffusely reflecting objects.
80
Specular reflection is highly directional and
mirrorlike.
R is the ray of reflection V is direction fr
om the surface toward the viewpoint ? is
the shininess
parameter
81
What about models for recognition
  • recognition to know again
  • How does memory store models of faces, rooms,
    chairs, etc.?

82
Human capability extensive
  • Child age 6 might recognize 3000 words
  • And 30,000 objects
  • Junkyard robot must recognize nearly all objects
  • Hundreds of styles of lamps, chairs, tools,

83
Some methods recognize
  • Via geometric alignment CAD
  • Via trained neural net
  • Via parts of objects and how they join
  • Via the function/behavior of an object

84
Side view classes of Ford Taurus (Chen and
Stockman)
These were made in the PRIP Lab from a scale
model. Viewpoints in between can be generated fr
om x and y curvature stored on boundary.
Viewpoints matched to real image boundaries via
optimization.
85
Matching image edges to model limbs
Could recognize car model at stoplight or gate.
86
Object as parts relations
  • Parts have size, color, shape
  • Connect together at concavities
  • Relations are connect, above, right of, inside
    of,

87
Functional models
  • Inspired by JJ Gibsons Theory of Affordances.
  • An object is what an object does
  • container holds stuff
  • club hits stuff
  • chair supports humans

88
Louise Stark chair model
  • Dozens of CAD models of chairs
  • Program analyzed for
  • stable pose
  • seat of right size
  • height off ground right size
  • no obstruction to body on seat
  • program would accept a trash can
  • (which could also pass as a container)

89
Minskis theory of frames(Schanks theory of
scripts)
  • Frames are learned expectations frame for a
    room, a car, a party, an argument,
  • Frame is evoked by current situation how?
    (hard)
  • Human fills in the details of the current frame
    (easier)

90
summary
  • Images have many low level features
  • Can detect uniform regions and contrast
  • Can organize regions and boundaries
  • Human vision uses several simultaneous channels
    color, edge, motion
  • Use of models/knowledge diverse and difficult
  • Last 2 issues difficult in computer vision
Write a Comment
User Comments (0)
About PowerShow.com