Multimedia DBs - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Multimedia DBs

Description:

Extract features, index in feature space, answer similarity queries using GEMINI. Again, average values help! (Used QBIC IBM Almaden) Image Features ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 37

Provided by: ValuedSony2

Learn more at: https://www.cs.bu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Multimedia DBs

1
Multimedia DBs
2
Multimedia dbs

A multimedia database stores text, strings and
images
Similarity queries (content based retrieval)
Given an image find the images in the database
that are similar (or you can describe the query
image)
Extract features, index in feature space, answer
similarity queries using GEMINI
Again, average values help!
(Used QBIC IBM Almaden)

3
Image Features

Features extracted from an image are based on
Color distribution
Shapes and structure
..

4
Images - color
what is an image? A 2-d RGB array
5
Images - color
Color histograms, and distance function
6
Images - color
Mathematically, the distance function between a
vector x and a query q is
D(x, q) (x-q)T A (x-q) S aij (xi-qi) (xj-qj)
AI ?
7
Images - color

Problem cross-talk
Features are not orthogonal -gt
SAMs will not work properly
Q what to do?
A feature-extraction question

8
Images - color

possible answers
avg red, avg green, avg blue
it turns out that this lower-bounds the histogram
distance -gt
no cross-talk
SAMs are applicable

9
Images - color
time
performance
seq scan
w/ avg RGB
selectivity
10
Images - shapes

distance function Euclidean, on the area,
perimeter, and 20 moments
(Q how to normalize them?

11
Images - shapes

distance function Euclidean, on the area,
perimeter, and 20 moments
(Q how to normalize them?
A divide by standard deviation)

12
Images - shapes

distance function Euclidean, on the area,
perimeter, and 20 moments
(Q other features / distance functions?

13
Images - shapes

distance function Euclidean, on the area,
perimeter, and 20 moments
(Q other features / distance functions?
A1 turning angle
A2 dilations/erosions
A3 ... )

14
Images - shapes

distance function Euclidean, on the area,
perimeter, and 20 moments
Q how to do dim. reduction?

15
Images - shapes

distance function Euclidean, on the area,
perimeter, and 20 moments
Q how to do dim. reduction?
A Karhunen-Loeve ( centered PCA/SVD)

16
Images - shapes

Performance 10x faster

log( of I/Os)
all kept
of features kept
17
Dimensionality Reduction

Many problems (like time-series and image
similarity) can be expressed as proximity
problems in a high dimensional space
Given a query point we try to find the points
that are close
But in high-dimensional spaces things are
different!

18
Effects of High-dimensionality

Assume a uniformly distributed set of points in
high dimensions 0,1d
Lets have a query with length 0.1 in each
dimension ? query selectivity in 100-d 10-100
If we want constant selectivity (0.1) the length
of the side must be 1!

19
Effects of High-dimensionality

Surface is everything!
Probability that a point is closer than 0.1 to a
(d-1) dimensional surface
D2 0.36
D 10 1
D100 1

20
Effects of High-dimensionality

Number of grid cells and surfaces
Number of k-dimensional surfaces in a
d-dimensional hypercube
Binary partitioning ? 2d cells
Indexing in high-dimensions is extremely
difficult curse of dimensionality

21
X-tree

Performance impacted by the amount of overlap
between index nodes
Need to follow different paths
Overlap, multi-overlap, weighted overlap
R-tree when overlap is small
Sequential access when overlap is large
When an overflow occurs
Split into two nodes if overlap is small
Otherwise create a super-node with twice the
capacity
Tradeoffs made locally over different regions of
data space
No performance comparisons with linear scan!

22
Pyramid Tree

Designed for Range queries
Map each d-dimensional point to 1-d value
Build B-tree on 1-d values
A range query is transformed into a set of 1-d
ranges
More efficient than X-tree, Hilbert order, and
sequential scan

23
Pyramid transformation
pyramids

2d pyramids with top at
center of data-space
points in different pyramids
ordered based on pyramid id
points within a pyramid
ordered based on height
value(v) pyramid(v) height(v)

24
Vector Approximation (VA) file

Tile d-dimensional data-space uniformly
A fixed number of bits in each dimensions (8)
256 partitions along each dimension
256d tiles
Approximate each point by corresponding tile
size of approximation 8d bits d bytes
size of each point 4d bytes (assuming a word
per dimension)
2-step approach, the first using VA file

25
Simple NN searching

d distance to kth NN so far
For each approximation ai
If lb(q,ai) lt d then
Compute r distance(q,vi)
If r lt d then
Add point i to the set of NNs
Update d
Performance based on ordering of vectors and
their approximations

26
Near-optimal NN searching

d kth distant ub(q,a) so far
For each approximation ai
Compute lb(q,ai) and ub(q,ai)
If lb(q,ai) lt d then
If ub(q,ai) lt d then
Add point i to the set of NNs
Update d
InsertHeap(Heap,lb(q,ai),i)

27
Near-optimal NN searching (2)

d distance to kth NN so far
Repeat
Examine the next entry (li,i) from the heap
If d lt li then break
Else
Compute r distance(q,vi)
If r lt d then
Add point i to the set of NNs
Update d
Forever
Sub-linear (log n) vectors after first phase

28
SS-tree and SR-tree

Use Spheres for index nodes (SS-tree)
Higher fanout since storage cost is reduced
Use rectangles and spheres for index nodes
Index node defined by the intersection of two
volumes
More accurate representation of data
Higher storage cost

29
Metric Tree (M-tree)

Definition of a metric
d(x,y) gt 0
d(x,y) d(y,x)
d(x,y) d(y,z) gt d(x,z)
d(x,x) 0
Non-vector spaces
Edit distance
d(u,v) sqrt ((u-v)TA(u-v) ) used in QBIC

30
Basic idea
x,d(x,p),r(x)
y,d(y,p),r(y)
Parent p
y
x
d(y,z) lt r(y)
z
Index entry (routing object, distance to
parent,covering radius)
All objects in subtree are within a distance of
covering radius from routing object.
31
Range queries
x,d(x,p),r(x)
y,d(y,p),r(y)
Parent p
y
Query q with range t
x
t
q
z
d(q,z) gt d(q,y) - d(y,z) d(y,z) lt r(y) So,
d(q,z) gt d(q,y) -r(y) if d(q,y) - r(y) gt t then
d(q,z) gt t Prune subtree y if d(q,y) - r(y) gt t
(C1)
32
Range queries
x,d(x,p),r(x)
y,d(y,p),r(y)
Parent p
y
Query q with range t
x
t
q
z
Prune subtree y if d(q,y) - r(y) gt t (C1)
d(q,y) gt d(q,p) - d(p,y) d(q,y) gt d(p,y) -
d(q,p) So, d(q,y) gt d(q,p) - d(p,y) if d(q,p)
- d(p,y) - r(y) gt t then d(q,y) - r(y) gt
t Prune subtree y if d(q,p) - d(p,y) - r(y) gt t
(C2)
33
Range query algorithm

RQ(q, t, Root, Subtrees S1, S2, )
For each subtree Si
prune if condition C2 holds
otherwise compute distance to root of Si and
prune if condition C1 holds
otherwise search the children of Si

34
Nearest neighbor query

Maintain a priority list of k NN distances
Minimum distance to a subtree with root
x dmin(q,x) max(d(q,x) - r(x), 0)
d(q,p) - d(p,x) - r(x) lt d(q,x) - r(x)
may not need to compute d(q,x)
Maximum distance to a subtree with root
x dmax(q,x) d(q,x) r(x)

x
q
d(q,z) r(x) gt d(q,x) d(q,z) gt d(q,x) - r(x)
r(x)
d(q,z) lt d(q,x) r(x)
z
35
Nearest neighbor query

Maintain an estimate dp of the kth smallest
maximum distance
Prune a subtree x if dmin(q,x) gt dp

36
References

Christos Faloutsos, Ron Barber, Myron Flickner,
Jim Hafner, Wayne Niblack, Dragutin Petkovic,
William Equitz Efficient and Effective Querying
by Image Content. JIIS 3(3/4) 231-262 (1994)
Stefan Berchtold, Daniel A. Keim, Hans-Peter
Kriegel The X-tree An Index Structure for
High-Dimensional Data. VLDB 1996 28-39
Stefan Berchtold, Christian Böhm, Hans-Peter
Kriegel The Pyramid-Technique Towards Breaking
the Curse of Dimensionality. SIGMOD Conference
1998 142-153
Roger Weber, Hans-Jörg Schek, Stephen Blott A
Quantitative Analysis and Performance Study for
Similarity-Search Methods in High-Dimensional
Spaces. VLDB 1998 194-205
Paolo Ciaccia, Marco Patella, Pavel Zezula
M-tree An Efficient Access Method for Similarity
Search in Metric Spaces. VLDB 1997 426-435