Multimedia DBs - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Multimedia DBs

Description:

Mathematically, the distance function between. a vector x and a query q is: ... Project all points on the line the two objects define, to get the first coordinate ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 30
Provided by: ValuedSony2
Category:

less

Transcript and Presenter's Notes

Title: Multimedia DBs


1
Multimedia DBs
2
Multimedia dbs
  • A multimedia database stores text, strings and
    images
  • Similarity queries (content based retrieval)
  • Given an image find the images in the database
    that are similar (or you can describe the query
    image)
  • Extract features, index in feature space, answer
    similarity queries using GEMINI
  • Again, average values help!

3
Image Features
  • Features extracted from an image are based on
  • Color distribution
  • Shapes and structure
  • ..

4
Images - color
what is an image? A 2-d RGB array
5
Images - color
Color histograms, and distance function
6
Images - color
Mathematically, the distance function between a
vector x and a query q is
D(x, q) (x-q)T A (x-q) S aij (xi-qi) (xj-qj)
AI ?
7
Images - color
  • Problem cross-talk
  • Features are not orthogonal -gt
  • SAMs will not work properly
  • Q what to do?
  • A feature-extraction question

8
Images - color
  • possible answers
  • avg red, avg green, avg blue
  • it turns out that this lower-bounds the histogram
    distance -gt
  • no cross-talk
  • SAMs are applicable

9
Images - color
time
performance
seq scan
w/ avg RGB
selectivity
10
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q how to normalize them?

11
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q how to normalize them?
  • A divide by standard deviation)

12
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q other features / distance functions?

13
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q other features / distance functions?
  • A1 turning angle
  • A2 dilations/erosions
  • A3 ... )

14
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • Q how to do dim. reduction?

15
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • Q how to do dim. reduction?
  • A Karhunen-Loeve ( centered PCA/SVD)

16
Images - shapes
  • Performance 10x faster

log( of I/Os)
all kept
of features kept
17
Dimensionality Reduction
  • Many problems (like time-series and image
    similarity) can be expressed as proximity
    problems in a high dimensional space
  • Given a query point we try to find the points
    that are close
  • But in high-dimensional spaces things are
    different!

18
Effects of High-dimensionality
  • Assume a uniformly distributed set of points in
    high dimensions 0,1d
  • Lets have a query with length 0.1 in each
    dimension ? query selectivity in 100-d 10-100
  • If we want constant selectivity (0.1) the length
    of the side must be 1!

19
Effects of High-dimensionality
  • Surface is everything!
  • Probability that a point is closer than 0.1 to a
    (d-1) dimensional surface
  • D2 0.36
  • D 10 1
  • D100 1

20
Effects of High-dimensionality
  • Number of grid cells and surfaces
  • Number of k-dimensional surfaces in a
    d-dimensional hypercube
  • Binary partitioning ? 2d cells
  • Indexing in high-dimensions is extremely
    difficult curse of dimensionality

21
Dimensionality Reduction
  • The main idea reduce the dimensionality of the
    space.
  • Project the d-dimensional points in a
    k-dimensional space so that
  • k ltlt d
  • distances are preserved as well as possible
  • Solve the problem in low dimensions
  • (the GEMINI idea of course)

22
DR requirements
  • The ideal mapping should
  • Be fast to compute O(N) or O(N logN) but not
    O(N2)
  • Preserve distances leading to small discrepancies
  • Provide a fast algorithm to map a new query (why?)

23
MDS (multidimensional scaling)
  • Input a set of N items, the pair-wise (dis)
    similarities and the dimensionality k
  • Optimization criterion
  • stress (?ij(D(Si,Sj) - D(Ski, Skj) )2 /
    ?ijD(Si,Sj) 2) 1/2
  • where D(Si,Sj) be the distance between time
    series Si, Sj, and D(Ski, Skj) be the Euclidean
    distance of the k-dim representations
  • Steepest descent algorithm
  • start with an assignment (time series to k-dim
    point)
  • minimize stress by moving points

24
MDS
  • Disadvantages
  • Running time is O(N2), because of slow
    convergence
  • Also it requires O(N) time to insert a new point,
    not practical for queries

25
FastMap Faloutsos and Lin, 1995
  • Maps objects to k-dimensional points so that
    distances are preserved well
  • It is an approximation of Multidimensional
    Scaling
  • Works even when only distances are known
  • Is efficient, and allows efficient query
    transformation

26
FastMap
  • Find two objects that are far away
  • Project all points on the line the two objects
    define, to get the first coordinate

27
FastMap - next iteration
28
Results
Documents /cosine similarity -gt Euclidean
distance (how?)
29
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com