3D Scene Models - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

3D Scene Models

Description:

Map textures to vertical planes (as in TIP) ... N superpixels in constellation. Line and intersection detectors. Not used: constellation shape (contiguous, N ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 34
Provided by: peopleC
Category:

less

Transcript and Presenter's Notes

Title: 3D Scene Models


1
3D Scene Models 6.870 Object recognition and
scene understanding Krista Ehinger
2
Questions
  • What makes a good 3D scene model? How accurate
    does it need to be?
  • How far can you get with automatic surface
    detection? Where do you need human input?

3
Modelling the scene
  • Real scenes have way too many surfaces

4
Modelling the scene
  • Option 1 Diorama world

5
Tour Into the Picture (TIP)?
  • Model the scene as 5 planes foreground objects
  • Easy implementation planes/objects defined by
    humans

Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the
Picture Using a spidery mesh user interface to
make animation from a single image". ACM SIGGRAPH
1997
6
TIP Implementation
  • User defines vanishing point, rear wall of the
    scene (inner rectangle)?
  • Given some assumptions about the camera,
    position/size of all planes can be computed...

Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the
Picture Using a spidery mesh user interface to
make animation from a single image". ACM SIGGRAPH
1997
7
Defining the box
  • Define planes Floor - y0, Ceiling - yH
  • Given horizon (vanishing point), corners of
    floor, ceiling can be computed from 2D image
    position

Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the
Picture Using a spidery mesh user interface to
make animation from a single image". ACM SIGGRAPH
1997
8
Defining the box
  • Once the positions of the planes are known,
    compute the texture of the planes

Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the
Picture Using a spidery mesh user interface to
make animation from a single image". ACM SIGGRAPH
1997
9
What about foreground objects?
  • Assume a quadrangle attached to floor, compute
    attachment points, upper points
  • Hierarchical model of foreground objects

Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the
Picture Using a spidery mesh user interface to
make animation from a single image". ACM SIGGRAPH
1997
10
Extracting foreground objects
  • Foreground objects removed, added to mask
  • Holes in background filled in using photo
    completion software

Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the
Picture Using a spidery mesh user interface to
make animation from a single image". ACM SIGGRAPH
1997
11
TIP Demonstration
12
TIP Discussion
  • Pros
  • Accurate model (due to human input)?
  • Deals with foreground objects, occlusions
  • Cons
  • Requires human input, not automatic
  • Model too simple for many real-world scenes

13
Modelling the scene
  • Option 2 Pop-up book world

14
Automatic Photo Pop-Up
  • Three classes of surface ground, sky, vertical
  • Not just a box can model more kinds of scenes
  • Automatic classification, no labeling

D. Hoiem, A.A. Efros, and M. Hebert, "Automatic
Photo Pop-up", ACM SIGGRAPH 2005.
15
Photo Pop-Up Implementation
  • Pixels - superpixels - constellations
  • Automatic labeling of constellations as ground,
    vertical, or sky
  • Define angles of vertical planes (using
    attachment to ground)?
  • Map textures to vertical planes (as in TIP)?

D. Hoiem, A.A. Efros, and M. Hebert, "Automatic
Photo Pop-up", ACM SIGGRAPH 2005.
16
Superpixels, constellations
  • Superpixels are neighboring pixels that have
    nearly the same color (Tao et al, 2001)?
  • Superpixels assigned to constellations according
    to how likely they are to share a label (ground,
    vertical, sky) based on difference between
    feature vectors

17
Feature vectors
  • Color features RGB, hue, saturation
  • Texture features Difference of oriented
    Gaussians, Textons
  • Location (absolute and percentile)?
  • N superpixels in constellation
  • Line and intersection detectors
  • Not used constellation shape (contiguous, N
    sides), some texture features

18
Training process
  • For each of 82 labeled training images
  • Compute superpixels, features, pairwise
    likelihoods
  • Form a set of N constellations (N 3 to 25),
    each labeled with ground truth
  • Compute constellation features
  • Compute constellation label, homogeneity
    likelihood

19
Training process
  • Adaboost weak classifiers learn to estimate
    whether superpixels have same label (based on
    feature vector)?
  • Another set of Adaboost week classifiers learns
    constellation label, homogeneity likelihood
    (expressed as percent ground, vertical, sky,
    mixed)?
  • Emphasis on classifying larger constellations

20
Building the 3D model
  • Along vertical/ground boundary, fit line segments
    (Hough transform) goal is to find simplest
    shape (fewest lines)?
  • Project lines up from corners of boundary lines,
    cut and fold

D. Hoiem, A.A. Efros, and M. Hebert, "Automatic
Photo Pop-up", ACM SIGGRAPH 2005.
21
Photo Pop-Up Demonstration
D. Hoiem, A.A. Efros, and M. Hebert, "Automatic
Photo Pop-up", ACM SIGGRAPH 2005.
22
Photo Pop-Up Discussion
  • Pros
  • Automatic
  • Can handle a variety of scenes, not just boxes
  • Cons
  • No handling of foreground objects
  • Misclassification leads to very strange models
  • Only 2 kinds of surface ground, vertical

D. Hoiem, A.A. Efros, and M. Hebert, "Automatic
Photo Pop-up", ACM SIGGRAPH 2005.
23
Modelling the scene
  • Option 3 Actually try to model surface angles

24
3D Scene Structure from Still Image
  • Compute surface normal for each surface
  • No right-angle assumptions surfaces can have any
    angle
  • Automatic (trained on images with known depth
    maps)?

25
3D Scene Implementation
  • Segment image into superpixels
  • Estimate surface normal of each superpixel (using
    Markov Random Field model)?
  • Optional Detect and extract foreground objects
  • Map textures to planes

Original image
Modeled depth map
A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene
Structure from a Single Still Image". In ICCV
workshop on 3D Representation for Recognition
(3dRR-07), 2007
A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene
Structure from a Single Still Image". In ICCV
workshop on 3D Representation for Recognition
(3dRR-07), 2007
26
Image features
  • Superpixel features (xi)?
  • Color and texture features as in Photo Pop-Up
  • Vector also includes features of neighboring
    superpixels
  • Boundary features (xij)?
  • Color difference, texture difference, edge
    detector

27
Markov Random Field Model
  • First term model planes in terms of image
    features of superpixels
  • Second term model planes in terms of pairs of
    superpixels, with constraints...

A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene
Structure from a Single Still Image". In ICCV
workshop on 3D Representation for Recognition
(3dRR-07), 2007
28
Model constraints
  • Connected structure except where there is an
    occlusion, neighboring superpixels are likely to
    be connected
  • Coplanar structure except where there are folds,
    neighboring superpixels are likely to lie on the
    same plane
  • Co-linearity long straight lines in the image
    correspond to straight lines in 3D

29
Foreground objects
  • Automatically-detected foreground objects may be
    removed from model (for example pedestrians,
    using Dalal Triggs detector)?
  • Detected objects add 3D cues (pedestrians are
    basically vertical, occlude other surfaces)?

30
3D Scene Demonstration
31
Results
A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene
Structure from a Single Still Image". In ICCV
workshop on 3D Representation for Recognition
(3dRR-07), 2007
32
3D Scene Discussion
  • Pros
  • Handles a variety of scene types
  • Fairly accurate (about 2/3 of scenes correct)?
  • Automatic
  • Handles foreground objects
  • Cons
  • Still fails on 1/3 of scenes

33
Discussion
  • Simple 3D models are adequate for many scenes
  • You can get pretty far without human input (but
    still would be better results with human
    annotation of scenes)
  • Extensions?
  • Use photo completion techniques to handle
    occlusions?
  • Massive training sets - better 3D models?
Write a Comment
User Comments (0)
About PowerShow.com