Presentation at Almende - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Presentation at Almende

Description:

The best known ones are SIFT and histograms of orientation gradients (HOGs) ... SIFT and HOG on 360 and 180 degrees. Edge histogram ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 29

Provided by: aiR1

Category:

more less

Transcript and Presenter's Notes

Title: Presentation at Almende

1
Presentation at Almende

Machine Vision for Scene
and Object Recognition

Dr. Marco A. Wiering
Department of Artificial Intelligence
University of Groningen

2
Outline of the Presentation

Machine vision and its applications
Image features
Shape and appearance descriptors
Image partitioning schemes
Scene recognition
Object recognition
Discussion

3
Machine Vision

Machine vision has the aim to understand the
content of an image
Usually the input is an image and the machine
vision system should output a class or category
Most machine vision algorithms consist of the
following two parts
Describe the contents of an image using a visual
descriptor to create a feature vector
Use a machine learning algorithm to learn to map
feature vectors to class labels

4
Applications

Content based image retrieval
Automatic camera surveillance
Autonomous robotics
Forensic research
Human-machine interaction
Video content analysis

5
Image features

There are many different image descriptors, but
they all rely on the following information
Color
Texture
Edges and shapes

6
Color features the color histogram

The simplest color-based descriptor is the color
histogram
Each pixel is classified as belonging to one of N
(e.g., N64) color categories
Then the system counts how many pixels in an
image belong to each color category
After that the histogram is normalized
A problem with this approach is that it is not
very discriminative

7
The color correlogram

A more advanced color feature is the color
correlogram
Here the spatial distribution of colors is also
taken into account
The correlogram is a 2D matrix that measures how
often one color is a distance D apart from
another color
Often D is set to 1

8
MPEG7 color features

The MPEG 7 standard has also identified
particular important color features
Scalable color Uses Haar transform on HSV color
histogram to create 64 features
Color layout Divides image into 8x8
non-overlapping blocks and uses discrete cosine
transform to create 12 features
Color structure Uses sliding window and
Hue-max-min-diff color space to create 64 features

9
Texture

There are multiple texture features
We have used the MPEG-7 edge histogram
An image is divided into 4x4 non-overlapping
blocks
Using an edge detector in each block one out of
six edge-types is extracted
The edge-types are horizontal, vertical, 45
degrees, 135 degrees, non-directional, no-edge
Finally a 80-bin histogram is constructed by not
using the no-edge type

10
Edge and shape descriptors

Nowadays, the most successful image descriptors
extract information about edges and shapes
The best known ones are SIFT and histograms of
orientation gradients (HOGs)
They compute a distribution of orientations on
local pixels in a histogram
We will first explain how orientations are
computed

11
SIFT and HOG

Gradient magnitude
Gradient orientation

12
Orientation histogram

In a region all orientations of the pixels with a
magnitude larger than a threshold are put in a
histogram with 8 bins
The image is split in 4x4 blocks. This results in
16x8 128 features

13
Shape and Appearance descriptors

If we have a histogram of orientations, we have
some kind of representation of the shape
If we cluster the features that describe
different blocks in different images, we create a
visual code-book
An appearance descriptors computes the bag of
visual keywords for an image
This is a histogram containing the cluster
centroids of present in an image

14
Partitioning an image

In the past usually a global representation of an
image was made
Now there is more focus on local partitioning
schemes such as fixed partitioning

15
Spatial pyramids

One of the most recent image partitioning schemes
combines different resolution levels

16
Scene Recognition with CIREC

We have introduced the cluster-correlogram model,
and named our system CIREC

17
Descriptors and Learning Algorithms

We used the four different MPEG-7 descriptors
described before
Each descriptors is used to create a separate
cluster-correlogram
We used k-nearest neighbors and support vector
machines as machine learning algorithms

18
Corel Database

The Corel database consists of 10 scene
categories Africans, beaches, buildings, buses,
dinosaurs, elephants, roses, horses, mountains,
foods

19
Retrieval results
20
Overall retrieval results of Cirec

We measured how often a retrieved result belongs
to the category of the query

21
Results of other approaches

The MPEG-7 features score 67 correct with 10
retrieved images
The color correlogram scores 74 correct
CIREC achieves the best results and we also used
it for categorization given an image, predict to
which category is belongs
CIREC with SVMs scores 93 correct!

22
Some misclassifications of CIREC

CIREC classifies the top row as beaches and the
bottom row as mountains. This is wrong.

23
Object Recognition

We also introduced the stacking support vector
machine with spatial pyramids

24
Features used

We used 10 different features
SIFT and HOG on 360 and 180 degrees
Edge histogram
These 5 features with intensity (grey) and HSV
color channels
We used 3 spatial resolution levels
We tested our approach on 20 object classes from
Caltech-101

25
Caltech dataset
26
Results

Our system scored 83 correctly classified images
The naive approach that concatenated all features
in one large input vector scored 75 correct
We have not used appearance based descriptors,
which could improve the system

27
Discussion