SWE 423: Multimedia Systems - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

SWE 423: Multimedia Systems

Description:

http://homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htm. Similarity Metrics. Minkowski Distance ... Works well with regular curves (application in manufactured parts) ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 44

Provided by: wasfiga

Category:

more less

Transcript and Presenter's Notes

Title: SWE 423: Multimedia Systems

1
SWE 423 Multimedia Systems

Multimedia Databases

2
Outline

Image Processing Basics
Image Features
Image Segmentation
Textbook Section 4.3
Additional Reference Wasfi Al-Khatib, Y. Francis
Day, Arif Ghafoor, and P. Bruce Berra. Semantic
modeling and knowledge representation in
multimedia databases. IEEE Transactions on
Knowledge and Data Engineering, 11(1)64-80,
1999.

3
Image Processing

Image processing involves the analysis of scenes
or the reconstruction of models from images
representing 2D or 3D objects.
Image Analysis
Identifying Image Properties (Image Features)
Image Segmentation
Image Recognition
We will look at image processing from a database
perspective.
Objective Design of robust image processing and
recognition techniques to support semantic
modeling, knowledge representation, and querying
of images.

4
Semantic Modeling and Knowledge Representation in
Image Databases

Feature Extraction.
Salient Object Identification.
Content-Based Indexing and Retrieval.
Query Formulation and Processing.

5
Multi-Level Abstraction
Semantic Modeling And Knowledge
Representation Layer
Object Recognition Layer
Feature Extraction Layer
Multimedia Data
6
Feature Extraction Layer

Image features Colors, Textures, Shapes, Edges,
...etc.
Features are mapped into a multi-dimensional
feature space allowing similarity-based
retrieval.
Features can be classified into two types Global
and Local.

7
Global Features

Generally emphasize coarse-grained pattern
matching techniques.
Transform the whole image into a functional
representation.
Finer details within individual parts of the
image are ignored.
Examples Color histograms and coherence vectors,
Texture, Fast Fourier Transform, Hough Transform,
and Eigenvalues.
What are some of the example queries?

8
Color Histogram

How many pixels of the image take a specific
color
In order to control the number of colors, the
domain is discretized
E.g. consider the value of the two leftmost bits
in each color channel (RGB).
In this case , the number of different colors is
equal to __________
How can we determine whether two images are
similar using the color histogram?

9
Color Coherence Vector

Based on the color histogram
Each pixel is checked as to whether it is within
a sufficiently large one-color environment or
not.
i.e. in a region related by a path of pixels of
the same color
If so, the pixel is called coherent, otherwise
incoherent
For each color j, compute the number of coherent
and incoherent pixels (?j , ?j), j 1, ..., J
When comparing two images with color coherence
vectors (?j , ?j) and (?j , ?j), j 1, ..., J,
we may use the expression

10
Texture

Texture is a small surface structure
Natural or artificial
Regular or irregular
Examples include
Wood barks
Knitting patterns
The surface of a sponge

11
Texture Examples

Artificial/periodic
Artificial/non-periodic
Photographic/pseudo-periodic
Photographic/random
Photographic/structured
Inhomogeneous (non-texture)

12
Texture

Two basic approaches to study texture
Structural analysis searches for small basic
components and an arrangement rule
Statistical analysis describes the texture as a
whole based on specific attributes (local
gray-level variance, regularity, coarseness,
orientation, and contrast.
Either done in the spatial domain or the spatial
frequency domain

13
Global Features

Advantages
Simple.
Low computational complexity.
Disadvantages
Low accuracy

14
Local Features

Images are segmented into a collection of smaller
regions, with each region representing a
potential object of interest (fine-grained).
An object of interest may represent a simple
semantic object (e.g. a round object).
Choice of features is domain specific
X-ray imaging, GIS, ...etc require spatial
features (e.g. shapes may be calculated through
edges and dimensions.)
Paintings, MMR imaging, ...etc may use color
features in specific regions of the image.

15
Edge Detection

A given input image E is used to gradually
compute a (zero-initialized) output image A.
A convolution mask runs across E pixel by pixel
and links the entries in the mask at each
position that M occupies in E with the gray value
of the underlying image dots.
The result of the linkage (and the subsequent sum
across all products from the mask entry and the
gray value of the underlying image pixel) is
written to the output image A.

16
Convolution

Convolution is a simple mathematical operation
which is fundamental to many common image
processing operators.
Convolution provides a way of multiplying
together' two arrays of numbers, generally of
different sizes, but of the same dimensionality,
to produce a third array of numbers of the same
dimensionality.
This can be used in image processing to implement
operators whose output pixel values are simple
linear combinations of certain input pixel
values.
The convolution is performed by sliding the
kernel over the image, generally starting at the
top left corner, so as to move the kernel through
all the positions where the kernel fits entirely
within the boundaries of the image.

17
Convolution Computation

If the image E has M rows and N columns, and the
kernel K has m rows and n columns, then the size
of the output image A will have M - m 1 rows,
and N - n 1 columns and is given by
Example page 60.
http//homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htm

18
Similarity Metrics

Minkowski Distance
Weighted Distance
Average Distance
Color Histogram Intersection

19
Prototype Systems

QBIC (http//www.hermitagemuseum.org)
Uses color, shape, and texture features
Allows queries by sketching features and
providing color information
Chabot (Cypress)
Uses color and textual annotation.
Improved performance due to textual annotation
(Concept Query)
KMeD
Uses shapes and contours as features.
Features are extracted automatically in some
cases and manually in other cases.

20
Demo (Andrew Berman Linda G. Shapiro )

http//www.cs.washington.edu/research/imagedatabas
e/demo/seg/
http//www.cs.washington.edu/research/imagedatabas
e/demo/edge/
http//www.cs.washington.edu/research/imagedatabas
e/demo/fids/

21
Image Segmentation

Assigning a unique number to object pixels
based on different intensities or colors in the
foreground and the background regions of an image
Can be used in the object recognition process,
but it is not object recognition on its own
Segmentation Methods
Pixel oriented methods
Edge oriented methods
Region oriented methods
....

22
Pixel-Oriented Segmentation

Gray-values of pixels are studied in isolation
Looks at the gray-level histogram of an image and
finds one or more thresholds in the histogram
Ideally, the histogram has a region without
pixels, which is set as the threshold, and hence
the image is divided into a foreground and a
background based on that (Bimodal Distribution)
Major drawback of this approach is that object
and background histograms overlap.
Bimodal distribution rarely occurs in nature.

23
Edge-Oriented Segmentation

Segmentation is carried out as follows
Edges of an image are extracted (using Canny
operators, e.g.)
Edges are connected to form closed contours
around the objects.
Hough Transform
Usually very expensive
Works well with regular curves (application in
manufactured parts)
May work in presence of noise

24
Region-Oriented Segmentation

A major disadvantage of the previous approaches
is the lack of spatial relationship
considerations of pixels.
Neighboring pixels normally have similar
properties
The segmentation (region-growing) is carried out
as follows
Start with a seed pixel.
Pixels neighbors are included if they have some
similarity to the seed pixel, otherwise they are
not.
Homogeneity condition
Uses an eight-neighborhood (8-nbd) model

25
Region-Oriented Segmentation

Homogeneity criterion Gray-level mean value of a
region is usually used
With standard deviation
Drawbacks Computationally expensive.

26
Water Inflow Segmentation

Fill a gray-level image gradually with water.
Gray-levels of pixels are taken as height.
The higher the water rises, the more pixels are
flooded
Hence, you have lands and waters
Lands correspond to objects

27
Object Recognition Layer

Features are analyzed to recognize objects and
faces in an image database.
Features are matched with object models stored in
a knowledge base.
Each template is inspected to find the closest
match.
Exact matches are usually impossible and
generally computationally expensive.
Occlusion of objects and the existence of
spurious features in the image can further
diminish the success of matching strategies.

28
Template Matching Techniques

Fixed Template Matching
Useful if object shapes do not change with
respect to the viewing angle of the camera.
Deformable Template Matching
More suitable for cases where objects in the
database may vary due to rigid and non-rigid
deformations.

29
Fixed Template Matching

Image Subtraction
Difference in intensity levels between the image
and the template is used in object recognition.
Performs well in restricted environments where
imaging conditions (such as image intensity)
between the image and the template are the same.
Matching by correlation
utilizes the position of the normalized
cross-correlation peak between a template and
image.
Generally immune to noise and illumination
effects in the image.
Suffers from high computational complexity caused
by summations over the entire template.

30
Deformable Template Matching

Template is represented as a bitmap describing
the characteristic contour/edges of an object
shape.
An objective function with transformation
parameters which alter the shape of the template
is formulated reflecting the cost of such
transformations.
The objective function is minimized by
iteratively updating the transformations
parameters to best match the object.
Applications include handwritten character
recognition and motion detection of objects in
video frames.

31
Prototype System KMeD

Medical objects belonging only to patients in a
small age group are identified automatically in
KMeD.
Such objects have high contrast with respect to
their background and have relatively simple
shapes, large sizes, and little or no overlap
with other objects.
KMeD resorts to a human-assisted object
recognition process otherwise.

32
Demo

http//www.cs.washington.edu/research/imagedatabas
e/demo/cars/ (check car214)

33
Spatial Modeling and Knowledge Representation
Layer (1)

Maintain the domain knowledge for representing
spatial semantics associated with image
databases.
At this level, queries are generally descriptive
in nature, and focus mostly on semantics and
concepts present in image databases.
Semantics at this level are based on spatial
events'' describing the relative locations of
multiple objects.
An example involving such semantics is a range
query which involves spatial concepts such as
close by, in the vicinity, larger than. (e.g.
retrieve all images that contain a large tumor in
the brain).

34
Spatial Modeling and Knowledge Representation
Layer (2)

Identify spatial relationships among objects,
once they are recognized and marked by the lower
layer using bounding boxes or volumes.
Several techniques have been proposed to formally
represent spatial knowledge at this layer.
Semantic networks
Mathematical logic
Constraints
Inclusion hierarchies
Frames.

35
Semantic Networks

First introduced to represent the meanings of
English sentences in terms of words and
relationships between them.
Semantic networks are graphs of nodes
representing concepts that are linked together by
arcs representing relationships between these
concepts.
Efficiency in semantic networks is gained by
representing each concept or object once and
using pointers for cross references rather than
naming an object explicitly every time it is
involved in a relation.
Example Type Abstraction Hierarchies (KMeD)

36
Brain Lesions Representation
37
TAH Example
38
Constraints-based Methodology

Domain knowledge is represented using a set of
constraints in conjunction with formal
expressions such as predicate calculus or graphs.
A constraint is a relationship between two or
more objects that needs to be satisfied.

39
Example PICTION system

Its architecture consists of a natural language
processing module (NLP), an image understanding
module (IU), and a control module.
A set of constraints is derived by the NLP module
from the picture captions. These constraints
(called Visual Semantics by the author) are used
with the faces recognized in the picture by the
IU module to identify the spatial relationships
among people.
The control module maintains the constraints
generated by the NLP module and acts as a
knowledge-base for the IU module to perform face
recognition functions.

40
(No Transcript)
41
Mathematical Logic

Iconic Indexing by 2D strings Uses projections
of salient objects in a coordinated system.
These projections are expressed in the form of 2D
strings to form a partial ordering of object
projections in 2D.
For query processing, 2D subsequence matching is
performed to allow similarity-based retrieval.
Binary Spatial Relations Uses Allen's 13
temporal relations to represent spatial
relationships.

42
Inclusion Hierarchies

The approach is object-oriented and uses concept
classes and attributes to represent domain
knowledge.
These concepts may represent image features,
high-level semantics, semantic operators and
conditions.

43
Frames

A frame usually consists of a name and a list of
attribute-value pairs.
A frame can be associated with a class of objects
or with a class of concepts.
Frame abstractions allow encapsulation of file
names, features, and relevant attributes of image
objects.

Write a Comment

User Comments (0)