Nearest Neighbours Search using the PM-tree - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Nearest Neighbours Search using the PM-tree

Description:

Tom Skopal1 Jaroslav Pokorn 1 V clav ... methods for content-based retrieval in multimedia databases ... http://urtax.ms.mff.cuni.cz/~skopal/phd/thesis.pdf ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 19
Provided by: toms163
Category:

less

Transcript and Presenter's Notes

Title: Nearest Neighbours Search using the PM-tree


1
Nearest Neighbours Search using the PM-tree
  • Tomáš Skopal1 Jaroslav Pokorný1 Václav
    Snášel2

1 Charles University in Prague Department of
Software Engineering Czech Republic
2 VSB - Technical University of
OstravaDepartment of Computer Science Czech
Republic
2
Presentation Outline
  • Similarity search in Metric Spaces
  • M-tree
  • the structure
  • k-NN search
  • PM-tree (an extension of M-tree)
  • motivation
  • the structure
  • k-NN search
  • Experimental Results

3
Similarity search in Metric Spaces
  • Similarity search
  • methods for content-based retrieval in multimedia
    databases
  • the similarity measure is often modelled by a
    metric d (satisfying triangular inequality,
    symmetry, reflexivity, non-negativity)
  • similarity queries (query by example) realized as
    metric queries
  • range query (Q , rQ) (specified by a query object
    Q and covering radius rQ)
  • k-NN query (Q , k) (specified by a query object Q
    and number of nearest neighbours k)
  • Metric Access Methods (MAMs)
  • designed to search in metric datasets in order to
    keep the search costs minimal
  • search costs number of distance computations
    I/O costs
  • only distances between objects are used for
    indexing (the structure of object
    representation is not used for indexing)
  • many MAMs are not suitable for similarity search
    in large datasets
  • either a static method or high I/O search costs
  • M-tree and (recently) D-index are the only
    suitable candidates so far

4
M-tree (metric tree)
  • dynamic, balanced, and paged tree structure
    (like e.g. B-tree, R-tree)
  • the leaves are clusters of indexed objects Oj
    (ground objects)
  • routing entries in the inner nodes represent
    hyper-spherical metric regions (Oi , rOi),
    recursively bounding the object clusters in
    leaves
  • the triangular inequality allows discarding of
    irrelevant M-tree branches (metric regions
    resp.) during query evaluation

5
k-NN search in the M-tree
  • branch-and-bound algorithm (similar to that of
    R-tree)
  • modification of range query algorithm, but the
    query radius rQ is dynamic
  • rQ decreasing from infinity to the distance to
    the k-th neighbour
  • utilized two structures priority queue PR and
    sorted array NN
  • PR stores requests for nodes not-filtered from
    the search yet
  • request of form routing entry to a node N,
    dmin(N), where dmin(N) is the lower bound
    distance from Q to all possible objects in N,
    i.e. dmin(rout. entry to N) max 0 , d(Q ,
    Oi) rOi
  • where (Oi , rOi ) is region of the Ns routing
    entry (requests in PR sorted by dmin(N))
  • NN stores k candidate objects (or distance upper
    bounds)
  • at the end of algorithm run, NN contains the
    result, i.e. the k nearest neighbours
  • entry of form candidate object Oi, d(Q,Oi) or
    - , dmax(N), where dmax() is the upper bound
    distance from Q to all possible objects in N, i.e
  • dmax(rout. entry to N) d(Q , Oi) rOi
  • PR stores only requests with dmin() lt dmax(),
    other requests are removed from PR
  • i.e. such requests are removed, which do not
    overlap the dynamic query region (Q , rQ)
  • Query processing the requests in PR are
    processed in FIFO manner ? a node N is retrieved,
    while PR and NN structures are updates by
    routing/ground entries of N
  • PR is initialized to ( root , 8 ), NN is
    initialized by k entries -,8 to ( - ,8 , -
    ,8 , ... )
  • optimal in I/O costs (the same I/O costs as range
    query (Q , d(Q , NN5) ) )

6
k-NN search in M-tree example (k2)
7
k-NN search in M-tree example (k2)
8
k-NN search in M-tree example (k2)
5 nodes accessed, the same nodes accessed by
range query (Q , d(Q,O5) )
9
PM-tree motivation
  • metric regions in M-tree are unnecessarily large
  • ? indexing of large portions of empty space (the
    dead space)
  • ? higher probability of intersection with query
    region
  • ? less efficient search
  • reduction of metric region volume should lead
    to more effective discarding of irrelevant
    subtrees
  • the question is how to specify a compact metric
    region bounding all the objects more tightly ?
    generalization of the M-tree for another metric
    region shape representations

10
PM-tree region
  • utilization of global pivots (inspired by
    LAESA-like methods)
  • given a fixed set of p global pivots Pi
    (selected from (a part of) the dataset)
  • p hyper-ring regions (Pi , HRi) are defined for
    each routing entry
  • array HR of p intervals ltHRi.min , HRi.maxgt
  • each interval HRi bounds the distances of
    objects to the respective pivot Pi
  • PM-tree region M-tree region HR array
    (pivots Pi shared by all PM-tree regions)
  • intersection of the hyper-sphere and the
    hyper-rings forms a smaller region bounding all
    the objects in leaves
  • the more pivots, the more tightly bounded region
  • PM-tree is built the same way as M-tree is
    built, i.e. the hyper-rings only cut off the
    M-tree sphere

11
PM-tree, query processing
  • distances d(Q , Pi) for all i p must be
    computed prior to processing a query
  • metric region (Oi , rOi , HR) is relevant to
    (intersected by) a range query (Q , rQ) just in
    case that all the hyper-rings and the
    hyper-sphere overlap the range query region ?
    the more hyper-rings, the lower probability of
    intersection with query
  • ? no additional distance computations are
    needed for the intersection test

Q
Q
M-tree region
PM-tree region
12
k-NN search in the PM-tree
  • 3 modifications of M-trees k-NN algorithm
  • different intersection test between query region
    (Q, rQ) and PM-tree region (Oi , rOi , HR)
  • ??t1..p d(Pt , Q) rQ HRt.max ? d(Pt , Q)
    rQ HRt.min
  • different dmin construction ( possible distance
    increase to the farthest hyper-ring)
  • dmin(rout. entry to N) max 0, d(Q , Oi) rOi
    , HRfarthest
  • HRfarthest max??t1..p d(Pt , Q) HRt.max
    , HRt.min d(Pt , Q)
  • different dmax construction ( possible distance
    decrease to the farthest object in the nearest
    hyper-ring)dmax(rout. entry to N) max d(Q ,
    Oi) rOi , HRnearest HRnearest min ?t1..p
    d(Q , Oi) HRt.max

13
k-NN search in PM-tree example (k2)
14
k-NN search in PM-tree example (k2)
15
k-NN search in PM-tree example (k2)
5 nodes accessed, the same nodes accessed by
range query (Q , d(Q,O5) )
16
Experimental Results (synthetic datasets)
  • synthetic vector datasets (4D 60D) 100,000
    tuples 1000 clusters
  • disk page sizes 1 KB 4 KB index sizes 4.5 MB
    55 MB

17
Experimental Results(image database)
  • WBIIS image database appr. 10,000 256D-vectors
    (gray histograms)
  • disk page size 32 KB index sizes 16 MB 20 MB

18
References
  • 1 Skopal T., Pokorný J., Snášel V. PM-tree
    Pivoting Metric Tree for Similarity Search in
    Multimedia Databases, ADBIS 2004, Budapest,
    Hungary
  • 2 Skopal T. Pivoting M-tree A Metric Access
    Method for Efficient Similarity Search, DATESO
    2004, Desná, Czech Republic
  • 3 Skopal T., Pokorný J., Krátký M., Snášel V.
    Revisiting M-tree Building Principles. ADBIS
    2003, Dresden, Germany, LNCS 2798, Springer
  • 4 Skopal T.
  • Metric Indexing in Information Retrieval
  • PhD thesis, VSB-Technical University of Ostrava
  • http//urtax.ms.mff.cuni.cz/skopal/phd/thesis.pd
    f
Write a Comment
User Comments (0)
About PowerShow.com