Generalized Multidimensional Data Mapping and Query Processing GiMP - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Generalized Multidimensional Data Mapping and Query Processing GiMP

Description:

Generalizes the mapping-based indexing and query processing process (GiMP) ... cf) GiST : generalizes tree-search indexing scheme ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 24

Provided by: ydk

Category:

more less

Transcript and Presenter's Notes

Title: Generalized Multidimensional Data Mapping and Query Processing GiMP

1
Generalized Multi-dimensional Data Mapping and
Query Processing (GiMP)

Authors Rui Zhang et al.
ACM TODS 2005
Presented by Youngdae Kim _at_ IDS Lab.
18 Sep, 2007

2
Background

Multi-dimensional data
spatial data
geographic information
ex) Pohang located at (129, 35)
object with many fields
ex) employee relation with fields id, salary,
name, age, address,
Queries
point query
give me object(s) located at (3, 5)
give me employee(s) with age35 and name Jack
range query (window query)
give me all objects whose location overlap with
the range 3,7 and 4,6
kNN query
give me the k nearest neighbors of object a

d2
5
0
5
d1
3
Background (cont.)

Index structure
R-tree
pack regions into rectangles close to each other
recursively
do not use the stable DBMS index structure (e.g.,
B-tree)
not easy to integrate with current DBMSs
(complicated concurrency and recovery problem
exist)
why not use B-tree?
not easy to assign orders (or keys) to
multi-dimensional data sequentially while
preserving their proximity
but efficiency and reliability are high if we can
use B-tree

R-tree
close
still close?
one-d
multi-d

4
Mapping-based Indexing Schemes

General strategy
mapping
multi-dimensional data one-dimensional
data (key)
one-to-one or many-to-one
query processing using B-tree
transform multi-dimensional query into key
range(s)
get matched entries using B-tree
discard false positives
we obtained a super-set of answers ? possibly
there exist irrelevant data
discard them
Examples
UB-tree, Pyramid technique, iMinMax, iDistance

5
Observations

Crux of mapping-based indexing scheme
mapping method
distance from reference point scattering factor
query transformation
multi-dimensional window query transformed into
one-dimensional range query
for kNN query, use the incremental mapping
mechanism

distance
key (p1) distance scattering factor
p1
r
6
Contributions

Generalizes the mapping-based indexing and query
processing process (GiMP)
defined a framework for easy extension
cf) GiST generalizes tree-search indexing
scheme
Suggests a measurement to predict performance of
mapping-based indexing scheme
Solves the mappability problem
Is there an one-to-one mapping for given data
space?

7
GiMP Structure
GiMP
Components
Data Mapping
Reference(P) Distance(P1, P2) Base(P)
B-tree
Queries Point query Range query Nearest Neighbor
Basic operations Insert Delete
Components
Components
MapRange(rg) MapAnnulus(Q, rmin, rmax)
Insert(P) Delete(P)
8
GiMP Data Mapping

Components
Reference (P)
reference point for P
ex) starting point with Z-value 0
Distance (P1, P2)
distance between P1 and P2 in multi-dimensional
space
can be L1, Euclidean, Max ,or any user-defined
distance
Base (P)
value to be added to the transformed value
usually used for scattering keys
Key (P) Base (P) Distance (P, Reference (P))

9
GiMP Query Processing

Components
MapRange (rg)
transform given range (rg) into key range
MapAnnulus (Q, rmin, rmax)
transform given annulus into one-dimensional
intervals, usually incremental mapping
used for kNN search

a set of intervals a1, b2, a2, b2, ,
an, bn
rmin
a set of intervals a1, b2, a2, b2
rmax
10
GiMP Basic Operations

Components
Insert (P)
calculate Key (P) and insert into B-tree using
the usual B-tree insertion operation
Delete (P)
use the usual B-tree deletion operation

11
GiMP UB-tree Instantiation

Data mapping (one-to-one)
use Z-value to map multi-dimensional data
P Z-value, one-to-one mapping
Reference (P) the point with 0 Z-value
Distance (P1, P2) difference of Z-values
Base (P) 0 since Z-value mapping is one-to-one
Key (P) Base (P) Distance(P, Reference (P))
Z-value of P

12
GiMP UB-tree Instantiation (cont.)

Query processing
MapRange (rg)
find the Z-value range corresponding to the rg
ex) suppose rg is the orange region

intervals to search 12, 15, 24, 27

B-tree search
13
GiMP Pyramid Instantiation

Data mapping (many-to-one)
divide n-dimensional space into 2d pyramids that
share the center point of the space as their top
and a (d-1)-dimensional surface of the data space
as their base
each of 2d pyramids is divided into several
partitions
each data point has height
key (P) height of P pyramids number

(d-1)-dimensional surface
height of v
pyramid
partition
p3
p2
p0
center point
v
p1
data space
14
GiMP Pyramid Instantiation (cont.)

Data mapping (many-to-one) (cont.)
Reference (P) center point
Distance (P, Reference (P)) height of P
Base (P) pyramids number
Key (P) Base (P) Distance(P, Reference (P))
pyramids number height of P
ex) assume height of v 2.5, then Key (v) 1
2.5 3.5

15
GiMP Pyramid Instantiation (cont.)

Query processing
MapRange (rg)
find the key range for the partitions which
overlap the rg
ex) suppose rg is the dark-shaded region

d1
p3
the corresponding intervals for the light-shaded
partitions
p2
p0
p1
d0
16
GiMP Pyramid Instantiation (cont.)

Query processing (cont.)
MapAnnulus (Q, rmin, rmax)
incremental key range search
ex) suppose we first try (a) and then (b) for kNN
search
at (a), range query transforms to 2hQ-r0,
2hQr0 for pyr2
save the lower bound (2hQ-r0) and upper bound
(2hQr0)
at (b), range query transforms to 2, 2hQ-r0,
2hQr0, 2hQr0 dr for pyr2 ? the keys to be
searched form a continuous range

17
GiMP iDistance Instantiation

Data mapping (many-to-one)
data space is divided into Np partitions
each partition has a reference point
data point P belongs to Ni partition if i
argmin dist(P, ri)
key (P) distance (P, ri) i c
Reference (P) nearest reference point to P
Base (P) i c
Distance (P1, P2) Euclidean distance between P1
and P2
Key (P) Base (P) Distance (P, Reference (P))

N partitions
key (p) d 1

p
r2
rN
r1
d
18
GiMP iDistance Instantiation

Query processing
MapAnnulus (Q, rmin, rmax)

19
Performance of GiMP

Direct implementation vs GiMP

20
Performance Prediction

What dominates the overall performance?
the mapping process
how the query is mapped to the one-dimensional
ranges
redundant mapping causes performance degradation
Mapping redundancy
ratio between the mapped region and the query
region
mr 1 is optimal

nm the number of pages that contain the data
points that are in the mapped region
na the minimum number of pages that contain the
data points in the answer set of a query Q
21
Performance Prediction (cont.)

Experimental results with amr (averaged mapping
redundancy)

22
Mappibility Problem

Observation
naturally, one-to-one mapping shows better
performance than many-to-one mapping indexing
scheme
Mappibility
the existence possibility of one-to-one mapping
from d-dimensional data space to one-dimensional
domain
existence of one-to-one mapping depends on the
nature of the data space (countable or
uncountable property)

23
Conclusion

Users can define their own mapping-based indexing
scheme by implementing the components of GiMP
MR (mapping redundancy) is a governing factor in
the efficiency of mapping-based indexing schemes,
so that it can be used as a performance
prediction measurement
Existence of one-to-one mapping depends on the
nature of the data space

Write a Comment

User Comments (0)