Presentation at Almende - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Presentation at Almende

Description:

The best known ones are SIFT and histograms of orientation gradients (HOGs) ... SIFT and HOG on 360 and 180 degrees. Edge histogram ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 29
Provided by: aiR1
Category:

less

Transcript and Presenter's Notes

Title: Presentation at Almende


1
Presentation at Almende
  • Machine Vision for Scene
  • and Object Recognition
  • Dr. Marco A. Wiering
  • Department of Artificial Intelligence
  • University of Groningen

2
Outline of the Presentation
  • Machine vision and its applications
  • Image features
  • Shape and appearance descriptors
  • Image partitioning schemes
  • Scene recognition
  • Object recognition
  • Discussion

3
Machine Vision
  • Machine vision has the aim to understand the
    content of an image
  • Usually the input is an image and the machine
    vision system should output a class or category
  • Most machine vision algorithms consist of the
    following two parts
  • Describe the contents of an image using a visual
    descriptor to create a feature vector
  • Use a machine learning algorithm to learn to map
    feature vectors to class labels

4
Applications
  • Content based image retrieval
  • Automatic camera surveillance
  • Autonomous robotics
  • Forensic research
  • Human-machine interaction
  • Video content analysis

5
Image features
  • There are many different image descriptors, but
    they all rely on the following information
  • Color
  • Texture
  • Edges and shapes

6
Color features the color histogram
  • The simplest color-based descriptor is the color
    histogram
  • Each pixel is classified as belonging to one of N
    (e.g., N64) color categories
  • Then the system counts how many pixels in an
    image belong to each color category
  • After that the histogram is normalized
  • A problem with this approach is that it is not
    very discriminative

7
The color correlogram
  • A more advanced color feature is the color
    correlogram
  • Here the spatial distribution of colors is also
    taken into account
  • The correlogram is a 2D matrix that measures how
    often one color is a distance D apart from
    another color
  • Often D is set to 1

8
MPEG7 color features
  • The MPEG 7 standard has also identified
    particular important color features
  • Scalable color Uses Haar transform on HSV color
    histogram to create 64 features
  • Color layout Divides image into 8x8
    non-overlapping blocks and uses discrete cosine
    transform to create 12 features
  • Color structure Uses sliding window and
    Hue-max-min-diff color space to create 64 features

9
Texture
  • There are multiple texture features
  • We have used the MPEG-7 edge histogram
  • An image is divided into 4x4 non-overlapping
    blocks
  • Using an edge detector in each block one out of
    six edge-types is extracted
  • The edge-types are horizontal, vertical, 45
    degrees, 135 degrees, non-directional, no-edge
  • Finally a 80-bin histogram is constructed by not
    using the no-edge type

10
Edge and shape descriptors
  • Nowadays, the most successful image descriptors
    extract information about edges and shapes
  • The best known ones are SIFT and histograms of
    orientation gradients (HOGs)
  • They compute a distribution of orientations on
    local pixels in a histogram
  • We will first explain how orientations are
    computed

11
SIFT and HOG
  • Gradient magnitude
  • Gradient orientation

12
Orientation histogram
  • In a region all orientations of the pixels with a
    magnitude larger than a threshold are put in a
    histogram with 8 bins
  • The image is split in 4x4 blocks. This results in
    16x8 128 features

13
Shape and Appearance descriptors
  • If we have a histogram of orientations, we have
    some kind of representation of the shape
  • If we cluster the features that describe
    different blocks in different images, we create a
    visual code-book
  • An appearance descriptors computes the bag of
    visual keywords for an image
  • This is a histogram containing the cluster
    centroids of present in an image

14
Partitioning an image
  • In the past usually a global representation of an
    image was made
  • Now there is more focus on local partitioning
    schemes such as fixed partitioning

15
Spatial pyramids
  • One of the most recent image partitioning schemes
    combines different resolution levels

16
Scene Recognition with CIREC
  • We have introduced the cluster-correlogram model,
    and named our system CIREC

17
Descriptors and Learning Algorithms
  • We used the four different MPEG-7 descriptors
    described before
  • Each descriptors is used to create a separate
    cluster-correlogram
  • We used k-nearest neighbors and support vector
    machines as machine learning algorithms

18
Corel Database
  • The Corel database consists of 10 scene
    categories Africans, beaches, buildings, buses,
    dinosaurs, elephants, roses, horses, mountains,
    foods

19
Retrieval results
20
Overall retrieval results of Cirec
  • We measured how often a retrieved result belongs
    to the category of the query

21
Results of other approaches
  • The MPEG-7 features score 67 correct with 10
    retrieved images
  • The color correlogram scores 74 correct
  • CIREC achieves the best results and we also used
    it for categorization given an image, predict to
    which category is belongs
  • CIREC with SVMs scores 93 correct!

22
Some misclassifications of CIREC
  • CIREC classifies the top row as beaches and the
    bottom row as mountains. This is wrong.

23
Object Recognition
  • We also introduced the stacking support vector
    machine with spatial pyramids

24
Features used
  • We used 10 different features
  • SIFT and HOG on 360 and 180 degrees
  • Edge histogram
  • These 5 features with intensity (grey) and HSV
    color channels
  • We used 3 spatial resolution levels
  • We tested our approach on 20 object classes from
    Caltech-101

25
Caltech dataset
26
Results
  • Our system scored 83 correctly classified images
  • The naive approach that concatenated all features
    in one large input vector scored 75 correct
  • We have not used appearance based descriptors,
    which could improve the system

27
Discussion
  • Machine vision is an interesting research area
  • A lot of improvements have been made during the
    last decade
  • There are many possible applications
  • It is also a lot of fun!

28
Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com