SWE 423: Multimedia Systems - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

SWE 423: Multimedia Systems

Description:

http://homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htm. Similarity Metrics. Minkowski Distance ... Works well with regular curves (application in manufactured parts) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 44
Provided by: wasfiga
Category:

less

Transcript and Presenter's Notes

Title: SWE 423: Multimedia Systems


1
SWE 423 Multimedia Systems
  • Multimedia Databases

2
Outline
  • Image Processing Basics
  • Image Features
  • Image Segmentation
  • Textbook Section 4.3
  • Additional Reference Wasfi Al-Khatib, Y. Francis
    Day, Arif Ghafoor, and P. Bruce Berra. Semantic
    modeling and knowledge representation in
    multimedia databases. IEEE Transactions on
    Knowledge and Data Engineering, 11(1)64-80,
    1999.

3
Image Processing
  • Image processing involves the analysis of scenes
    or the reconstruction of models from images
    representing 2D or 3D objects.
  • Image Analysis
  • Identifying Image Properties (Image Features)
  • Image Segmentation
  • Image Recognition
  • We will look at image processing from a database
    perspective.
  • Objective Design of robust image processing and
    recognition techniques to support semantic
    modeling, knowledge representation, and querying
    of images.

4
Semantic Modeling and Knowledge Representation in
Image Databases
  • Feature Extraction.
  • Salient Object Identification.
  • Content-Based Indexing and Retrieval.
  • Query Formulation and Processing.

5
Multi-Level Abstraction
Semantic Modeling And Knowledge
Representation Layer
Object Recognition Layer
Feature Extraction Layer
Multimedia Data
6
Feature Extraction Layer
  • Image features Colors, Textures, Shapes, Edges,
    ...etc.
  • Features are mapped into a multi-dimensional
    feature space allowing similarity-based
    retrieval.
  • Features can be classified into two types Global
    and Local.

7
Global Features
  • Generally emphasize coarse-grained pattern
    matching techniques.
  • Transform the whole image into a functional
    representation.
  • Finer details within individual parts of the
    image are ignored.
  • Examples Color histograms and coherence vectors,
    Texture, Fast Fourier Transform, Hough Transform,
    and Eigenvalues.
  • What are some of the example queries?

8
Color Histogram
  • How many pixels of the image take a specific
    color
  • In order to control the number of colors, the
    domain is discretized
  • E.g. consider the value of the two leftmost bits
    in each color channel (RGB).
  • In this case , the number of different colors is
    equal to __________
  • How can we determine whether two images are
    similar using the color histogram?

9
Color Coherence Vector
  • Based on the color histogram
  • Each pixel is checked as to whether it is within
    a sufficiently large one-color environment or
    not.
  • i.e. in a region related by a path of pixels of
    the same color
  • If so, the pixel is called coherent, otherwise
    incoherent
  • For each color j, compute the number of coherent
    and incoherent pixels (?j , ?j), j 1, ..., J
  • When comparing two images with color coherence
    vectors (?j , ?j) and (?j , ?j), j 1, ..., J,
    we may use the expression

10
Texture
  • Texture is a small surface structure
  • Natural or artificial
  • Regular or irregular
  • Examples include
  • Wood barks
  • Knitting patterns
  • The surface of a sponge

11
Texture Examples
  • Artificial/periodic
  • Artificial/non-periodic
  • Photographic/pseudo-periodic
  • Photographic/random
  • Photographic/structured
  • Inhomogeneous (non-texture)

12
Texture
  • Two basic approaches to study texture
  • Structural analysis searches for small basic
    components and an arrangement rule
  • Statistical analysis describes the texture as a
    whole based on specific attributes (local
    gray-level variance, regularity, coarseness,
    orientation, and contrast.
  • Either done in the spatial domain or the spatial
    frequency domain

13
Global Features
  • Advantages
  • Simple.
  • Low computational complexity.
  • Disadvantages
  • Low accuracy

14
Local Features
  • Images are segmented into a collection of smaller
    regions, with each region representing a
    potential object of interest (fine-grained).
  • An object of interest may represent a simple
    semantic object (e.g. a round object).
  • Choice of features is domain specific
  • X-ray imaging, GIS, ...etc require spatial
    features (e.g. shapes may be calculated through
    edges and dimensions.)
  • Paintings, MMR imaging, ...etc may use color
    features in specific regions of the image.

15
Edge Detection
  • A given input image E is used to gradually
    compute a (zero-initialized) output image A.
  • A convolution mask runs across E pixel by pixel
    and links the entries in the mask at each
    position that M occupies in E with the gray value
    of the underlying image dots.
  • The result of the linkage (and the subsequent sum
    across all products from the mask entry and the
    gray value of the underlying image pixel) is
    written to the output image A.

16
Convolution
  • Convolution is a simple mathematical operation
    which is fundamental to many common image
    processing operators.
  • Convolution provides a way of multiplying
    together' two arrays of numbers, generally of
    different sizes, but of the same dimensionality,
    to produce a third array of numbers of the same
    dimensionality.
  • This can be used in image processing to implement
    operators whose output pixel values are simple
    linear combinations of certain input pixel
    values.
  • The convolution is performed by sliding the
    kernel over the image, generally starting at the
    top left corner, so as to move the kernel through
    all the positions where the kernel fits entirely
    within the boundaries of the image.

17
Convolution Computation
  • If the image E has M rows and N columns, and the
    kernel K has m rows and n columns, then the size
    of the output image A will have M - m 1 rows,
    and N - n 1 columns and is given by
  • Example page 60.
  • http//homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htm

18
Similarity Metrics
  • Minkowski Distance
  • Weighted Distance
  • Average Distance
  • Color Histogram Intersection

19
Prototype Systems
  • QBIC (http//www.hermitagemuseum.org)
  • Uses color, shape, and texture features
  • Allows queries by sketching features and
    providing color information
  • Chabot (Cypress)
  • Uses color and textual annotation.
  • Improved performance due to textual annotation
    (Concept Query)
  • KMeD
  • Uses shapes and contours as features.
  • Features are extracted automatically in some
    cases and manually in other cases.

20
Demo (Andrew Berman Linda G. Shapiro )
  • http//www.cs.washington.edu/research/imagedatabas
    e/demo/seg/
  • http//www.cs.washington.edu/research/imagedatabas
    e/demo/edge/
  • http//www.cs.washington.edu/research/imagedatabas
    e/demo/fids/

21
Image Segmentation
  • Assigning a unique number to object pixels
    based on different intensities or colors in the
    foreground and the background regions of an image
  • Can be used in the object recognition process,
    but it is not object recognition on its own
  • Segmentation Methods
  • Pixel oriented methods
  • Edge oriented methods
  • Region oriented methods
  • ....

22
Pixel-Oriented Segmentation
  • Gray-values of pixels are studied in isolation
  • Looks at the gray-level histogram of an image and
    finds one or more thresholds in the histogram
  • Ideally, the histogram has a region without
    pixels, which is set as the threshold, and hence
    the image is divided into a foreground and a
    background based on that (Bimodal Distribution)
  • Major drawback of this approach is that object
    and background histograms overlap.
  • Bimodal distribution rarely occurs in nature.

23
Edge-Oriented Segmentation
  • Segmentation is carried out as follows
  • Edges of an image are extracted (using Canny
    operators, e.g.)
  • Edges are connected to form closed contours
    around the objects.
  • Hough Transform
  • Usually very expensive
  • Works well with regular curves (application in
    manufactured parts)
  • May work in presence of noise

24
Region-Oriented Segmentation
  • A major disadvantage of the previous approaches
    is the lack of spatial relationship
    considerations of pixels.
  • Neighboring pixels normally have similar
    properties
  • The segmentation (region-growing) is carried out
    as follows
  • Start with a seed pixel.
  • Pixels neighbors are included if they have some
    similarity to the seed pixel, otherwise they are
    not.
  • Homogeneity condition
  • Uses an eight-neighborhood (8-nbd) model

25
Region-Oriented Segmentation
  • Homogeneity criterion Gray-level mean value of a
    region is usually used
  • With standard deviation
  • Drawbacks Computationally expensive.

26
Water Inflow Segmentation
  • Fill a gray-level image gradually with water.
  • Gray-levels of pixels are taken as height.
  • The higher the water rises, the more pixels are
    flooded
  • Hence, you have lands and waters
  • Lands correspond to objects

27
Object Recognition Layer
  • Features are analyzed to recognize objects and
    faces in an image database.
  • Features are matched with object models stored in
    a knowledge base.
  • Each template is inspected to find the closest
    match.
  • Exact matches are usually impossible and
    generally computationally expensive.
  • Occlusion of objects and the existence of
    spurious features in the image can further
    diminish the success of matching strategies.

28
Template Matching Techniques
  • Fixed Template Matching
  • Useful if object shapes do not change with
    respect to the viewing angle of the camera.
  •  Deformable Template Matching
  • More suitable for cases where objects in the
    database may vary due to rigid and non-rigid
    deformations.

29
Fixed Template Matching
  • Image Subtraction
  • Difference in intensity levels between the image
    and the template is used in object recognition.
  • Performs well in restricted environments where
    imaging conditions (such as image intensity)
    between the image and the template are the same. 
  • Matching by correlation
  • utilizes the position of the normalized
    cross-correlation peak between a template and
    image.
  • Generally immune to noise and illumination
    effects in the image.
  • Suffers from high computational complexity caused
    by summations over the entire template.

30
Deformable Template Matching
  • Template is represented as a bitmap describing
    the characteristic contour/edges of an object
    shape.
  • An objective function with transformation
    parameters which alter the shape of the template
    is formulated reflecting the cost of such
    transformations.
  • The objective function is minimized by
    iteratively updating the transformations
    parameters to best match the object.
  • Applications include handwritten character
    recognition and motion detection of objects in
    video frames. 

31
Prototype System KMeD
  • Medical objects belonging only to patients in a
    small age group are identified automatically in
    KMeD.
  • Such objects have high contrast with respect to
    their background and have relatively simple
    shapes, large sizes, and little or no overlap
    with other objects.
  • KMeD resorts to a human-assisted object
    recognition process otherwise.

32
Demo
  • http//www.cs.washington.edu/research/imagedatabas
    e/demo/cars/ (check car214)

33
Spatial Modeling and Knowledge Representation
Layer (1)
  • Maintain the domain knowledge for representing
    spatial semantics associated with image
    databases.
  • At this level, queries are generally descriptive
    in nature, and focus mostly on semantics and
    concepts present in image databases.
  • Semantics at this level are based on spatial
    events'' describing the relative locations of
    multiple objects.
  • An example involving such semantics is a range
    query which involves spatial concepts such as
    close by, in the vicinity, larger than. (e.g.
    retrieve all images that contain a large tumor in
    the brain).

34
Spatial Modeling and Knowledge Representation
Layer (2)
  • Identify spatial relationships among objects,
    once they are recognized and marked by the lower
    layer using bounding boxes or volumes.
  • Several techniques have been proposed to formally
    represent spatial knowledge at this layer.
  • Semantic networks
  • Mathematical logic
  • Constraints
  • Inclusion hierarchies
  • Frames.

35
Semantic Networks
  • First introduced to represent the meanings of
    English sentences in terms of words and
    relationships between them.
  • Semantic networks are graphs of nodes
    representing concepts that are linked together by
    arcs representing relationships between these
    concepts.
  • Efficiency in semantic networks is gained by
    representing each concept or object once and
    using pointers for cross references rather than
    naming an object explicitly every time it is
    involved in a relation.
  • Example Type Abstraction Hierarchies (KMeD)

36
Brain Lesions Representation
37
TAH Example
38
Constraints-based Methodology
  • Domain knowledge is represented using a set of
    constraints in conjunction with formal
    expressions such as predicate calculus or graphs.
  • A constraint is a relationship between two or
    more objects that needs to be satisfied.

39
Example PICTION system
  • Its architecture consists of a natural language
    processing module (NLP), an image understanding
    module (IU), and a control module.
  • A set of constraints is derived by the NLP module
    from the picture captions. These constraints
    (called Visual Semantics by the author) are used
    with the faces recognized in the picture by the
    IU module to identify the spatial relationships
    among people.
  • The control module maintains the constraints
    generated by the NLP module and acts as a
    knowledge-base for the IU module to perform face
    recognition functions.

40
(No Transcript)
41
Mathematical Logic
  • Iconic Indexing by 2D strings Uses projections
    of salient objects in a coordinated system.
  • These projections are expressed in the form of 2D
    strings to form a partial ordering of object
    projections in 2D.
  • For query processing, 2D subsequence matching is
    performed to allow similarity-based retrieval.
  • Binary Spatial Relations Uses Allen's 13
    temporal relations to represent spatial
    relationships.

42
Inclusion Hierarchies
  • The approach is object-oriented and uses concept
    classes and attributes to represent domain
    knowledge.
  • These concepts may represent image features,
    high-level semantics, semantic operators and
    conditions.

43
Frames
  • A frame usually consists of a name and a list of
    attribute-value pairs.
  • A frame can be associated with a class of objects
    or with a class of concepts.
  • Frame abstractions allow encapsulation of file
    names, features, and relevant attributes of image
    objects.
Write a Comment
User Comments (0)
About PowerShow.com