Title: Indexing SpatioTemporal Data Warehouses
1Indexing Spatio-Temporal Data Warehouses
This work was supported by grants HKUST
6081/01E and 6070/00E from Hong Kong RGC.
- Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun
Zhang - Department of Computer Science
- Hong Kong University of Science and Technology
- Clear Water Bay, Hong Kong
- 26, Feb, 2002
2Outline
- Preliminary Spatial data warehouses and
aggregate trees - Applications and motivation
- Solution for static objects
- Solution for dynamic objects
- Performance study
- Conclusion
3Preliminary Spatial Data Warehouses
- Each spatial object carries some sort of
aggregate information (i.e., each landscape may
involve the population). - A common query is the window aggregate query,
which specifies a window query and retrieves the
aggregate sum of all objects intersecting it. - Analogy of the group-by in conventional data
warehouses. - Materialization techniques common in traditional
data warehouses are of limited use since possible
positions of queries are infinite. - Ad-hoc group-by
R2
R4
75
R1
12
R3
150
132
qs
4Preliminaries Spatial Data Warehouse
- A better approach is to deploy aggregate trees to
introduce the spatial hierarchy Kline and
Snodgrass, 1995, Papadias, et al, 2001, Lazaridis
and Mehrotra, 2001.
Aggregation R-tree
R6
R5
R5
R2
225
144
R4
R1
75
12
R3
150
R4
132
R3
R1
R2
R6
132
12
150
75
qs
Retrieve the sum of aggregate of objects
intersecting qs
5Spatio-Temporal DW Applications and Motivation
- Spatio-temporal databases deal with objects whose
properties may change with time. - Traditional studies in spatio-temporal databases
focus on retrieving the actual objects that
satisfy the query predicates. - Retrieve all vehicles that appear in the north
district during 3pm to 5pm yesterday. - A more useful type of queries may be to retrieve,
instead of the actual object IDs, the number of
objects that satisfy the query conditions. - Retrieve the (approximate) number of vehicles in
the north district during 3pm-5pm yesterday. - In the above example, the spatial objects (i.e.,
streets in the north district) that carry
aggregate information (i.e., number of cars) are
static. Other queries may involve dynamic
objects. - The mobile phone antenna (i.e., the aggregate
information of users served by the antenna)
whose spatial extents (i.e., covering areas) may
change over time.
6Example (Static Objects)
Query qs retrieve the aggregate sum (during time
T1-T4) of all rectangles that intersect it.
7Traditional Methods
- Pre-materialization
- Even more difficult than spatial DW due to the
inclusion of the temporal dimension. - Use an aggregation tree.
- When the aggregate of a region changes, create a
3D box. An aggregate 3D R-tree is used to index
all these boxes. - Problem The spatial extent of a region must be
duplicated many times although it does not
change.
3D boxes for region R1
130
T5
135
T4
145
T3
150
T1
8Aggregate RB-tree
Spatial extents are stored only once.
9Example (Dynamic Objects)
Situation during timestamps 1-4
qs
R2
R4
R1
R3
Query qs retrieve the aggregate sum (during time
T1-T4) of all rectangles that intersect it.
10Example (cont.)
change position at timestamp 5
R2
R4
R1
R3
qs
Query qs retrieve the aggregate sum (during time
T1-T4) of all rectangles that intersect it.
11Aggregate HRB-tree
- Integrates the previous idea with the
spatio-temporal access method HR-trees.
timestamp 1-4
timestamp 5
12Aggregate 3D RB-tree
- Creates a 3D box only when the spatial extent of
an object changes.
13Managing Numerous B-trees
- If each B-tree is too small (i.e., the rates of
spatial extent and aggregate changes are similar) - A block contains too few entries and much space
is wasted. - Not suitable for caching.
- Our solution is to use a B-File, which packs
numerous B-trees into a single file - Avoiding empty spaces in a disk page.
- Maintaining the same query performance.
14Performance
- Dataset settings
- Number of spatial objects 10,000
- History length 1,000 timestamps
- Aggregate agility describes how fast the
aggregate information changes (4, 8, 16, 32,
64) - Region agility describes how fast the spatial
extents change - 0 for static objects
- 0.01 for dynamic objects (capturing the fact
that spatial dimension changes much slower than
the aggregate data) - Datasets include 500,000 to 6,500,5000 records.
- Each query contains 2 parameters (spatial
extents and interval length).
15Results (Static Objects)
16Results (Static Objects)
17Results (Dynamic Objects)
18Results (Dynamic Objects)
19Conclusion
- We propose indexing techniques that replace the
data cube in spatio-temporal data warehouses and
answer ad-hoc group-by queries very efficiently. - Both static and dynamic spatial dimensions are
discussed. - Extensions
- Cost models that predict the performance of
alternative structures. - Query optimization based on the cost models.
- Complex query evaluation