Nearest Neighbours Search using the PM-tree - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Nearest Neighbours Search using the PM-tree

Description:

Tom Skopal1 Jaroslav Pokorn 1 V clav ... methods for content-based retrieval in multimedia databases ... http://urtax.ms.mff.cuni.cz/~skopal/phd/thesis.pdf ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 19

Provided by: toms163

Category:

more less

Transcript and Presenter's Notes

Title: Nearest Neighbours Search using the PM-tree

1
Nearest Neighbours Search using the PM-tree

Tomáš Skopal1 Jaroslav Pokorný1 Václav
Snášel2

1 Charles University in Prague Department of
Software Engineering Czech Republic
2 VSB - Technical University of
OstravaDepartment of Computer Science Czech
Republic
2
Presentation Outline

Similarity search in Metric Spaces
M-tree
the structure
k-NN search
PM-tree (an extension of M-tree)
motivation
the structure
k-NN search
Experimental Results

3
Similarity search in Metric Spaces

Similarity search
methods for content-based retrieval in multimedia
databases
the similarity measure is often modelled by a
metric d (satisfying triangular inequality,
symmetry, reflexivity, non-negativity)
similarity queries (query by example) realized as
metric queries
range query (Q , rQ) (specified by a query object
Q and covering radius rQ)
k-NN query (Q , k) (specified by a query object Q
and number of nearest neighbours k)
Metric Access Methods (MAMs)
designed to search in metric datasets in order to
keep the search costs minimal
search costs number of distance computations
I/O costs
only distances between objects are used for
indexing (the structure of object
representation is not used for indexing)
many MAMs are not suitable for similarity search
in large datasets
either a static method or high I/O search costs
M-tree and (recently) D-index are the only
suitable candidates so far

4
M-tree (metric tree)

dynamic, balanced, and paged tree structure
(like e.g. B-tree, R-tree)
the leaves are clusters of indexed objects Oj
(ground objects)
routing entries in the inner nodes represent
hyper-spherical metric regions (Oi , rOi),
recursively bounding the object clusters in
leaves
the triangular inequality allows discarding of
irrelevant M-tree branches (metric regions
resp.) during query evaluation

5
k-NN search in the M-tree

branch-and-bound algorithm (similar to that of
R-tree)
modification of range query algorithm, but the
query radius rQ is dynamic
rQ decreasing from infinity to the distance to
the k-th neighbour
utilized two structures priority queue PR and
sorted array NN
PR stores requests for nodes not-filtered from
the search yet
request of form routing entry to a node N,
dmin(N), where dmin(N) is the lower bound
distance from Q to all possible objects in N,
i.e. dmin(rout. entry to N) max 0 , d(Q ,
Oi) rOi
where (Oi , rOi ) is region of the Ns routing
entry (requests in PR sorted by dmin(N))
NN stores k candidate objects (or distance upper
bounds)
at the end of algorithm run, NN contains the
result, i.e. the k nearest neighbours
entry of form candidate object Oi, d(Q,Oi) or
- , dmax(N), where dmax() is the upper bound
distance from Q to all possible objects in N, i.e
dmax(rout. entry to N) d(Q , Oi) rOi
PR stores only requests with dmin() lt dmax(),
other requests are removed from PR
i.e. such requests are removed, which do not
overlap the dynamic query region (Q , rQ)
Query processing the requests in PR are
processed in FIFO manner ? a node N is retrieved,
while PR and NN structures are updates by
routing/ground entries of N
PR is initialized to ( root , 8 ), NN is
initialized by k entries -,8 to ( - ,8 , -
,8 , ... )
optimal in I/O costs (the same I/O costs as range
query (Q , d(Q , NN5) ) )

6
k-NN search in M-tree example (k2)
7
k-NN search in M-tree example (k2)
8
k-NN search in M-tree example (k2)
5 nodes accessed, the same nodes accessed by
range query (Q , d(Q,O5) )
9
PM-tree motivation

metric regions in M-tree are unnecessarily large
? indexing of large portions of empty space (the
dead space)
? higher probability of intersection with query
region
? less efficient search
reduction of metric region volume should lead
to more effective discarding of irrelevant
subtrees
the question is how to specify a compact metric
region bounding all the objects more tightly ?
generalization of the M-tree for another metric
region shape representations

10
PM-tree region

utilization of global pivots (inspired by
LAESA-like methods)
given a fixed set of p global pivots Pi
(selected from (a part of) the dataset)
p hyper-ring regions (Pi , HRi) are defined for
each routing entry
array HR of p intervals ltHRi.min , HRi.maxgt
each interval HRi bounds the distances of
objects to the respective pivot Pi
PM-tree region M-tree region HR array
(pivots Pi shared by all PM-tree regions)
intersection of the hyper-sphere and the
hyper-rings forms a smaller region bounding all
the objects in leaves
the more pivots, the more tightly bounded region
PM-tree is built the same way as M-tree is
built, i.e. the hyper-rings only cut off the
M-tree sphere

11
PM-tree, query processing

distances d(Q , Pi) for all i p must be
computed prior to processing a query
metric region (Oi , rOi , HR) is relevant to
(intersected by) a range query (Q , rQ) just in
case that all the hyper-rings and the
hyper-sphere overlap the range query region ?
the more hyper-rings, the lower probability of
intersection with query
? no additional distance computations are
needed for the intersection test

Q
Q
M-tree region
PM-tree region
12
k-NN search in the PM-tree

3 modifications of M-trees k-NN algorithm
different intersection test between query region
(Q, rQ) and PM-tree region (Oi , rOi , HR)
??t1..p d(Pt , Q) rQ HRt.max ? d(Pt , Q)
rQ HRt.min
different dmin construction ( possible distance
increase to the farthest hyper-ring)
dmin(rout. entry to N) max 0, d(Q , Oi) rOi
, HRfarthest
HRfarthest max??t1..p d(Pt , Q) HRt.max
, HRt.min d(Pt , Q)
different dmax construction ( possible distance
decrease to the farthest object in the nearest
hyper-ring)dmax(rout. entry to N) max d(Q ,
Oi) rOi , HRnearest HRnearest min ?t1..p
d(Q , Oi) HRt.max

13
k-NN search in PM-tree example (k2)
14
k-NN search in PM-tree example (k2)
15
k-NN search in PM-tree example (k2)
5 nodes accessed, the same nodes accessed by
range query (Q , d(Q,O5) )
16
Experimental Results (synthetic datasets)

synthetic vector datasets (4D 60D) 100,000
tuples 1000 clusters
disk page sizes 1 KB 4 KB index sizes 4.5 MB
55 MB

17
Experimental Results(image database)

WBIIS image database appr. 10,000 256D-vectors
(gray histograms)
disk page size 32 KB index sizes 16 MB 20 MB

18
References

1 Skopal T., Pokorný J., Snášel V. PM-tree
Pivoting Metric Tree for Similarity Search in
Multimedia Databases, ADBIS 2004, Budapest,
Hungary
2 Skopal T. Pivoting M-tree A Metric Access
Method for Efficient Similarity Search, DATESO
2004, Desná, Czech Republic
3 Skopal T., Pokorný J., Krátký M., Snášel V.
Revisiting M-tree Building Principles. ADBIS
2003, Dresden, Germany, LNCS 2798, Springer
4 Skopal T.
Metric Indexing in Information Retrieval
PhD thesis, VSB-Technical University of Ostrava
http//urtax.ms.mff.cuni.cz/skopal/phd/thesis.pd
f