Title: Efficient OLAP Operations in Spatial Data Warehouses
1Efficient OLAP Operations in Spatial Data
Warehouses
- Dimitris Papadias, Panos Kalnis, Jun Zhang and
Yufei Tao - Department of Computer Science
- Hong Kong University of Science and Technology
- Clear Water Bay, Hong Kong
2Motivating Scenario
The spatial dimension at the finest granularity
consists of a set of regions (e.g., road segments
in traffic supervision systems, areas covered by
cells in mobile communication systems)
The raw data provide the set of objects that fall
in each region every timestamp (e.g., cars in a
road segment, users serviced by a cell).
Queries ask for aggregate data over regions that
satisfy some spatio-temporal condition (find the
current traffic in all areas in a 1km range
around each hospital).
Unlike traditional OLAP, there do not exist
pre-defined hierarchies.
3The aggregate R-tree
An R-tree with aggregate data for every entry.
The same idea can be applied for other access
methods (e.g, quadtrees).
Other functions may be used (e.g., avg, max).
4Why keep spatiotemporal aggregate information
For efficient query processing (e.g., the number
of objects inside an area can be found by a
window query instead of a spatial join).
Aggregate information is all that we need/know
for some applications (e.g., traffic systems
record the number of cars in an area not their
ids)
Storing historical information about individual
objects may raise privacy issues (having all
locations of mobile phone users through history
may be illegal)
Although the actual data may be highly volatile
and involve extreme space requirements, the
summarized data are less voluminous and may
remain rather constant for long intervals.
5aR-trees and OLAP operations
The aR-tree corresponds to a lattice.
There may be multiple dimensions.
6Query Processing- Single Window
"find the total number of cars on all road
segments inside a query window"
Start from the root of the aR-tree for all
entries one of the following three conditions may
hold The entry is disjoint with the query
window thus, the corresponding node cannot
contain any cars contributing to the answer and
is not retrieved. The entry is inside the query
window in which case all aggregate information is
stored with the entry and the corresponding node
does not need to be accessed. The entry
partially overlaps the query window in which case
the corresponding node must be recursively
followed.
7Query Processing - Multiple Windows
"Find the total number of cars on road segments
inside each city suburb"
Without aR-trees, the query can be processed as a
multiway spatial join (suburbs, cars, road
segments).
With aR-trees, it is processed as a pairwise join
(suburbs, aR-tree).
If the query windows (i.e., suburbs) fit in
memory, we propose an extension of the
single-window technique that considers all
windows in parallel.
8Experimental Settings
Tiger Dataset (130,000 road segments)
We randomly selected 5,000 seed points which were
located on roads. For each seed point, we
generated a cluster with 250 points (i.e. car
positions) with Gaussian distribution therefore
the total number of cars was 1.25M.
The distribution of the queries follows the
distribution of the roads
9Evaluation for Single-Window Queries
Raw data approach join the cars and streets
datasets.
Fact table approach an R-tree indexes the fact
table (i.e., similar to aR-trees, but no
aggregate information in the intermediate nodes).
10Evaluation for Multiple-Window Queries
aR-tree (single queries) a set of single-window
queries processed using the single_aggregation
algorithm of aR-trees.
Fact table (join) join between the R-tree index
of the fact table and the query windows which fit
in memory.
Fact table (single) indexed nested loops using
the R-tree index of the fact table.
11Applications to spatio-temporal data
Query "find the total number of objects in the
regions intersecting some window qs during a time
interval qt"
12The aggregate 3DR-tree (a3DR-tree)
Each entry has the form ltr.MBR, r.pointer,
r.lifespan, r.aggrgt, that is, for each region
it keeps the aggregate value and the time
interval during which this value is valid.
Whenever the aggregate information about a region
changes a new entry is created.
Advantage the a3DR-tree integrates spatial and
temporal dimensions in the same structure (and
is, therefore, expected to be more efficient than
column scanning for queries that involve both
conditions)
Disadvantage it wastes space by storing the MBR
each time there is an aggregate change
13The aggregate RB tree
14Query Example
Find all objects in some region overlapping the
query window qs during the time interval 1-3
15The aggregate 3DRB-tree
16Conclusions and directions for future work
Spatio-temporal OLAP very promising direction of
work
Incorporation of multi-version structures for
dynamic dimensions
Formalization - analysis of when aggregation
multi-trees are preferable