Multimedia DBs - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Multimedia DBs

Description:

Extract features, index in feature space, answer similarity queries using GEMINI. Again, average values help! (Used QBIC IBM Almaden) Image Features ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 37
Provided by: ValuedSony2
Learn more at: https://www.cs.bu.edu
Category:

less

Transcript and Presenter's Notes

Title: Multimedia DBs


1
Multimedia DBs
2
Multimedia dbs
  • A multimedia database stores text, strings and
    images
  • Similarity queries (content based retrieval)
  • Given an image find the images in the database
    that are similar (or you can describe the query
    image)
  • Extract features, index in feature space, answer
    similarity queries using GEMINI
  • Again, average values help!
  • (Used QBIC IBM Almaden)

3
Image Features
  • Features extracted from an image are based on
  • Color distribution
  • Shapes and structure
  • ..

4
Images - color
what is an image? A 2-d RGB array
5
Images - color
Color histograms, and distance function
6
Images - color
Mathematically, the distance function between a
vector x and a query q is
D(x, q) (x-q)T A (x-q) S aij (xi-qi) (xj-qj)
AI ?
7
Images - color
  • Problem cross-talk
  • Features are not orthogonal -gt
  • SAMs will not work properly
  • Q what to do?
  • A feature-extraction question

8
Images - color
  • possible answers
  • avg red, avg green, avg blue
  • it turns out that this lower-bounds the histogram
    distance -gt
  • no cross-talk
  • SAMs are applicable

9
Images - color
time
performance
seq scan
w/ avg RGB
selectivity
10
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q how to normalize them?

11
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q how to normalize them?
  • A divide by standard deviation)

12
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q other features / distance functions?

13
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • (Q other features / distance functions?
  • A1 turning angle
  • A2 dilations/erosions
  • A3 ... )

14
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • Q how to do dim. reduction?

15
Images - shapes
  • distance function Euclidean, on the area,
    perimeter, and 20 moments
  • Q how to do dim. reduction?
  • A Karhunen-Loeve ( centered PCA/SVD)

16
Images - shapes
  • Performance 10x faster

log( of I/Os)
all kept
of features kept
17
Dimensionality Reduction
  • Many problems (like time-series and image
    similarity) can be expressed as proximity
    problems in a high dimensional space
  • Given a query point we try to find the points
    that are close
  • But in high-dimensional spaces things are
    different!

18
Effects of High-dimensionality
  • Assume a uniformly distributed set of points in
    high dimensions 0,1d
  • Lets have a query with length 0.1 in each
    dimension ? query selectivity in 100-d 10-100
  • If we want constant selectivity (0.1) the length
    of the side must be 1!

19
Effects of High-dimensionality
  • Surface is everything!
  • Probability that a point is closer than 0.1 to a
    (d-1) dimensional surface
  • D2 0.36
  • D 10 1
  • D100 1

20
Effects of High-dimensionality
  • Number of grid cells and surfaces
  • Number of k-dimensional surfaces in a
    d-dimensional hypercube
  • Binary partitioning ? 2d cells
  • Indexing in high-dimensions is extremely
    difficult curse of dimensionality

21
X-tree
  • Performance impacted by the amount of overlap
    between index nodes
  • Need to follow different paths
  • Overlap, multi-overlap, weighted overlap
  • R-tree when overlap is small
  • Sequential access when overlap is large
  • When an overflow occurs
  • Split into two nodes if overlap is small
  • Otherwise create a super-node with twice the
    capacity
  • Tradeoffs made locally over different regions of
    data space
  • No performance comparisons with linear scan!

22
Pyramid Tree
  • Designed for Range queries
  • Map each d-dimensional point to 1-d value
  • Build B-tree on 1-d values
  • A range query is transformed into a set of 1-d
    ranges
  • More efficient than X-tree, Hilbert order, and
    sequential scan

23
Pyramid transformation
pyramids
  • 2d pyramids with top at
  • center of data-space
  • points in different pyramids
  • ordered based on pyramid id
  • points within a pyramid
  • ordered based on height
  • value(v) pyramid(v) height(v)

24
Vector Approximation (VA) file
  • Tile d-dimensional data-space uniformly
  • A fixed number of bits in each dimensions (8)
  • 256 partitions along each dimension
  • 256d tiles
  • Approximate each point by corresponding tile
  • size of approximation 8d bits d bytes
  • size of each point 4d bytes (assuming a word
    per dimension)
  • 2-step approach, the first using VA file

25
Simple NN searching
  • d distance to kth NN so far
  • For each approximation ai
  • If lb(q,ai) lt d then
  • Compute r distance(q,vi)
  • If r lt d then
  • Add point i to the set of NNs
  • Update d
  • Performance based on ordering of vectors and
    their approximations

26
Near-optimal NN searching
  • d kth distant ub(q,a) so far
  • For each approximation ai
  • Compute lb(q,ai) and ub(q,ai)
  • If lb(q,ai) lt d then
  • If ub(q,ai) lt d then
  • Add point i to the set of NNs
  • Update d
  • InsertHeap(Heap,lb(q,ai),i)

27
Near-optimal NN searching (2)
  • d distance to kth NN so far
  • Repeat
  • Examine the next entry (li,i) from the heap
  • If d lt li then break
  • Else
  • Compute r distance(q,vi)
  • If r lt d then
  • Add point i to the set of NNs
  • Update d
  • Forever
  • Sub-linear (log n) vectors after first phase

28
SS-tree and SR-tree
  • Use Spheres for index nodes (SS-tree)
  • Higher fanout since storage cost is reduced
  • Use rectangles and spheres for index nodes
  • Index node defined by the intersection of two
    volumes
  • More accurate representation of data
  • Higher storage cost

29
Metric Tree (M-tree)
  • Definition of a metric
  • d(x,y) gt 0
  • d(x,y) d(y,x)
  • d(x,y) d(y,z) gt d(x,z)
  • d(x,x) 0
  • Non-vector spaces
  • Edit distance
  • d(u,v) sqrt ((u-v)TA(u-v) ) used in QBIC

30
Basic idea
x,d(x,p),r(x)
y,d(y,p),r(y)
Parent p
y
x
d(y,z) lt r(y)
z
Index entry (routing object, distance to
parent,covering radius)
All objects in subtree are within a distance of
covering radius from routing object.
31
Range queries
x,d(x,p),r(x)
y,d(y,p),r(y)
Parent p
y
Query q with range t
x
t
q
z
d(q,z) gt d(q,y) - d(y,z) d(y,z) lt r(y) So,
d(q,z) gt d(q,y) -r(y) if d(q,y) - r(y) gt t then
d(q,z) gt t Prune subtree y if d(q,y) - r(y) gt t
(C1)
32
Range queries
x,d(x,p),r(x)
y,d(y,p),r(y)
Parent p
y
Query q with range t
x
t
q
z
Prune subtree y if d(q,y) - r(y) gt t (C1)
d(q,y) gt d(q,p) - d(p,y) d(q,y) gt d(p,y) -
d(q,p) So, d(q,y) gt d(q,p) - d(p,y) if d(q,p)
- d(p,y) - r(y) gt t then d(q,y) - r(y) gt
t Prune subtree y if d(q,p) - d(p,y) - r(y) gt t
(C2)
33
Range query algorithm
  • RQ(q, t, Root, Subtrees S1, S2, )
  • For each subtree Si
  • prune if condition C2 holds
  • otherwise compute distance to root of Si and
    prune if condition C1 holds
  • otherwise search the children of Si

34
Nearest neighbor query
  • Maintain a priority list of k NN distances
  • Minimum distance to a subtree with root
    x dmin(q,x) max(d(q,x) - r(x), 0)
  • d(q,p) - d(p,x) - r(x) lt d(q,x) - r(x)
  • may not need to compute d(q,x)
  • Maximum distance to a subtree with root
    x dmax(q,x) d(q,x) r(x)

x
q
d(q,z) r(x) gt d(q,x) d(q,z) gt d(q,x) - r(x)
r(x)
d(q,z) lt d(q,x) r(x)
z
35
Nearest neighbor query
  • Maintain an estimate dp of the kth smallest
    maximum distance
  • Prune a subtree x if dmin(q,x) gt dp

36
References
  • Christos Faloutsos, Ron Barber, Myron Flickner,
    Jim Hafner, Wayne Niblack, Dragutin Petkovic,
    William Equitz Efficient and Effective Querying
    by Image Content. JIIS 3(3/4) 231-262 (1994)
  • Stefan Berchtold, Daniel A. Keim, Hans-Peter
    Kriegel The X-tree An Index Structure for
    High-Dimensional Data. VLDB 1996 28-39
  • Stefan Berchtold, Christian Böhm, Hans-Peter
    Kriegel The Pyramid-Technique Towards Breaking
    the Curse of Dimensionality. SIGMOD Conference
    1998 142-153
  • Roger Weber, Hans-Jörg Schek, Stephen Blott A
    Quantitative Analysis and Performance Study for
    Similarity-Search Methods in High-Dimensional
    Spaces. VLDB 1998 194-205
  • Paolo Ciaccia, Marco Patella, Pavel Zezula
    M-tree An Efficient Access Method for Similarity
    Search in Metric Spaces. VLDB 1997 426-435
Write a Comment
User Comments (0)
About PowerShow.com