Title: SWE 423: Multimedia Systems
1SWE 423 Multimedia Systems
2Outline
- Image Processing Basics
- Image Features
- Image Segmentation
- Textbook Section 4.3
- Additional Reference Wasfi Al-Khatib, Y. Francis
Day, Arif Ghafoor, and P. Bruce Berra. Semantic
modeling and knowledge representation in
multimedia databases. IEEE Transactions on
Knowledge and Data Engineering, 11(1)64-80,
1999.
3Image Processing
- Image processing involves the analysis of scenes
or the reconstruction of models from images
representing 2D or 3D objects. - Image Analysis
- Identifying Image Properties (Image Features)
- Image Segmentation
- Image Recognition
- We will look at image processing from a database
perspective. - Objective Design of robust image processing and
recognition techniques to support semantic
modeling, knowledge representation, and querying
of images.
4Semantic Modeling and Knowledge Representation in
Image Databases
- Feature Extraction.
- Salient Object Identification.
- Content-Based Indexing and Retrieval.
- Query Formulation and Processing.
5Multi-Level Abstraction
Semantic Modeling And Knowledge
Representation Layer
Object Recognition Layer
Feature Extraction Layer
Multimedia Data
6Feature Extraction Layer
- Image features Colors, Textures, Shapes, Edges,
...etc. - Features are mapped into a multi-dimensional
feature space allowing similarity-based
retrieval. - Features can be classified into two types Global
and Local.
7Global Features
- Generally emphasize coarse-grained pattern
matching techniques. - Transform the whole image into a functional
representation. - Finer details within individual parts of the
image are ignored. - Examples Color histograms and coherence vectors,
Texture, Fast Fourier Transform, Hough Transform,
and Eigenvalues. - What are some of the example queries?
8Color Histogram
- How many pixels of the image take a specific
color - In order to control the number of colors, the
domain is discretized - E.g. consider the value of the two leftmost bits
in each color channel (RGB). - In this case , the number of different colors is
equal to __________ - How can we determine whether two images are
similar using the color histogram?
9Color Coherence Vector
- Based on the color histogram
- Each pixel is checked as to whether it is within
a sufficiently large one-color environment or
not. - i.e. in a region related by a path of pixels of
the same color - If so, the pixel is called coherent, otherwise
incoherent - For each color j, compute the number of coherent
and incoherent pixels (?j , ?j), j 1, ..., J - When comparing two images with color coherence
vectors (?j , ?j) and (?j , ?j), j 1, ..., J,
we may use the expression
10Texture
- Texture is a small surface structure
- Natural or artificial
- Regular or irregular
- Examples include
- Wood barks
- Knitting patterns
- The surface of a sponge
11Texture Examples
- Artificial/periodic
- Artificial/non-periodic
- Photographic/pseudo-periodic
- Photographic/random
- Photographic/structured
- Inhomogeneous (non-texture)
12Texture
- Two basic approaches to study texture
- Structural analysis searches for small basic
components and an arrangement rule - Statistical analysis describes the texture as a
whole based on specific attributes (local
gray-level variance, regularity, coarseness,
orientation, and contrast. - Either done in the spatial domain or the spatial
frequency domain
13Global Features
- Advantages
- Simple.
- Low computational complexity.
- Disadvantages
- Low accuracy
14Local Features
- Images are segmented into a collection of smaller
regions, with each region representing a
potential object of interest (fine-grained). - An object of interest may represent a simple
semantic object (e.g. a round object). - Choice of features is domain specific
- X-ray imaging, GIS, ...etc require spatial
features (e.g. shapes may be calculated through
edges and dimensions.) - Paintings, MMR imaging, ...etc may use color
features in specific regions of the image.
15Edge Detection
- A given input image E is used to gradually
compute a (zero-initialized) output image A. - A convolution mask runs across E pixel by pixel
and links the entries in the mask at each
position that M occupies in E with the gray value
of the underlying image dots. - The result of the linkage (and the subsequent sum
across all products from the mask entry and the
gray value of the underlying image pixel) is
written to the output image A.
16Convolution
- Convolution is a simple mathematical operation
which is fundamental to many common image
processing operators. - Convolution provides a way of multiplying
together' two arrays of numbers, generally of
different sizes, but of the same dimensionality,
to produce a third array of numbers of the same
dimensionality. - This can be used in image processing to implement
operators whose output pixel values are simple
linear combinations of certain input pixel
values. - The convolution is performed by sliding the
kernel over the image, generally starting at the
top left corner, so as to move the kernel through
all the positions where the kernel fits entirely
within the boundaries of the image.
17Convolution Computation
- If the image E has M rows and N columns, and the
kernel K has m rows and n columns, then the size
of the output image A will have M - m 1 rows,
and N - n 1 columns and is given by - Example page 60.
- http//homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htm
18Similarity Metrics
- Minkowski Distance
- Weighted Distance
- Average Distance
- Color Histogram Intersection
19Prototype Systems
- QBIC (http//www.hermitagemuseum.org)
- Uses color, shape, and texture features
- Allows queries by sketching features and
providing color information - Chabot (Cypress)
- Uses color and textual annotation.
- Improved performance due to textual annotation
(Concept Query) - KMeD
- Uses shapes and contours as features.
- Features are extracted automatically in some
cases and manually in other cases.
20Demo (Andrew Berman Linda G. Shapiro )
- http//www.cs.washington.edu/research/imagedatabas
e/demo/seg/ - http//www.cs.washington.edu/research/imagedatabas
e/demo/edge/ - http//www.cs.washington.edu/research/imagedatabas
e/demo/fids/
21Image Segmentation
- Assigning a unique number to object pixels
based on different intensities or colors in the
foreground and the background regions of an image - Can be used in the object recognition process,
but it is not object recognition on its own - Segmentation Methods
- Pixel oriented methods
- Edge oriented methods
- Region oriented methods
- ....
22Pixel-Oriented Segmentation
- Gray-values of pixels are studied in isolation
- Looks at the gray-level histogram of an image and
finds one or more thresholds in the histogram - Ideally, the histogram has a region without
pixels, which is set as the threshold, and hence
the image is divided into a foreground and a
background based on that (Bimodal Distribution) - Major drawback of this approach is that object
and background histograms overlap. - Bimodal distribution rarely occurs in nature.
23Edge-Oriented Segmentation
- Segmentation is carried out as follows
- Edges of an image are extracted (using Canny
operators, e.g.) - Edges are connected to form closed contours
around the objects. - Hough Transform
- Usually very expensive
- Works well with regular curves (application in
manufactured parts) - May work in presence of noise
24Region-Oriented Segmentation
- A major disadvantage of the previous approaches
is the lack of spatial relationship
considerations of pixels. - Neighboring pixels normally have similar
properties - The segmentation (region-growing) is carried out
as follows - Start with a seed pixel.
- Pixels neighbors are included if they have some
similarity to the seed pixel, otherwise they are
not. - Homogeneity condition
- Uses an eight-neighborhood (8-nbd) model
25Region-Oriented Segmentation
- Homogeneity criterion Gray-level mean value of a
region is usually used - With standard deviation
- Drawbacks Computationally expensive.
26Water Inflow Segmentation
- Fill a gray-level image gradually with water.
- Gray-levels of pixels are taken as height.
- The higher the water rises, the more pixels are
flooded - Hence, you have lands and waters
- Lands correspond to objects
27Object Recognition Layer
- Features are analyzed to recognize objects and
faces in an image database. - Features are matched with object models stored in
a knowledge base. - Each template is inspected to find the closest
match. - Exact matches are usually impossible and
generally computationally expensive. - Occlusion of objects and the existence of
spurious features in the image can further
diminish the success of matching strategies.
28Template Matching Techniques
- Fixed Template Matching
- Useful if object shapes do not change with
respect to the viewing angle of the camera. - Deformable Template Matching
- More suitable for cases where objects in the
database may vary due to rigid and non-rigid
deformations.
29Fixed Template Matching
- Image Subtraction
- Difference in intensity levels between the image
and the template is used in object recognition. - Performs well in restricted environments where
imaging conditions (such as image intensity)
between the image and the template are the same. - Matching by correlation
- utilizes the position of the normalized
cross-correlation peak between a template and
image. - Generally immune to noise and illumination
effects in the image. - Suffers from high computational complexity caused
by summations over the entire template.
30Deformable Template Matching
- Template is represented as a bitmap describing
the characteristic contour/edges of an object
shape. - An objective function with transformation
parameters which alter the shape of the template
is formulated reflecting the cost of such
transformations. - The objective function is minimized by
iteratively updating the transformations
parameters to best match the object. - Applications include handwritten character
recognition and motion detection of objects in
video frames.
31Prototype System KMeD
- Medical objects belonging only to patients in a
small age group are identified automatically in
KMeD. - Such objects have high contrast with respect to
their background and have relatively simple
shapes, large sizes, and little or no overlap
with other objects. - KMeD resorts to a human-assisted object
recognition process otherwise.
32Demo
- http//www.cs.washington.edu/research/imagedatabas
e/demo/cars/ (check car214)
33Spatial Modeling and Knowledge Representation
Layer (1)
- Maintain the domain knowledge for representing
spatial semantics associated with image
databases. - At this level, queries are generally descriptive
in nature, and focus mostly on semantics and
concepts present in image databases. - Semantics at this level are based on spatial
events'' describing the relative locations of
multiple objects. - An example involving such semantics is a range
query which involves spatial concepts such as
close by, in the vicinity, larger than. (e.g.
retrieve all images that contain a large tumor in
the brain).
34Spatial Modeling and Knowledge Representation
Layer (2)
- Identify spatial relationships among objects,
once they are recognized and marked by the lower
layer using bounding boxes or volumes. - Several techniques have been proposed to formally
represent spatial knowledge at this layer. - Semantic networks
- Mathematical logic
- Constraints
- Inclusion hierarchies
- Frames.
35Semantic Networks
- First introduced to represent the meanings of
English sentences in terms of words and
relationships between them. - Semantic networks are graphs of nodes
representing concepts that are linked together by
arcs representing relationships between these
concepts. - Efficiency in semantic networks is gained by
representing each concept or object once and
using pointers for cross references rather than
naming an object explicitly every time it is
involved in a relation. - Example Type Abstraction Hierarchies (KMeD)
36Brain Lesions Representation
37TAH Example
38Constraints-based Methodology
- Domain knowledge is represented using a set of
constraints in conjunction with formal
expressions such as predicate calculus or graphs. - A constraint is a relationship between two or
more objects that needs to be satisfied.
39Example PICTION system
- Its architecture consists of a natural language
processing module (NLP), an image understanding
module (IU), and a control module. - A set of constraints is derived by the NLP module
from the picture captions. These constraints
(called Visual Semantics by the author) are used
with the faces recognized in the picture by the
IU module to identify the spatial relationships
among people. - The control module maintains the constraints
generated by the NLP module and acts as a
knowledge-base for the IU module to perform face
recognition functions.
40(No Transcript)
41Mathematical Logic
- Iconic Indexing by 2D strings Uses projections
of salient objects in a coordinated system. - These projections are expressed in the form of 2D
strings to form a partial ordering of object
projections in 2D. - For query processing, 2D subsequence matching is
performed to allow similarity-based retrieval. - Binary Spatial Relations Uses Allen's 13
temporal relations to represent spatial
relationships.
42Inclusion Hierarchies
- The approach is object-oriented and uses concept
classes and attributes to represent domain
knowledge. - These concepts may represent image features,
high-level semantics, semantic operators and
conditions.
43Frames
- A frame usually consists of a name and a list of
attribute-value pairs. - A frame can be associated with a class of objects
or with a class of concepts. - Frame abstractions allow encapsulation of file
names, features, and relevant attributes of image
objects.