Title: Approximate Spatial Query Processing Using Raster Signatures
1Approximate Spatial Query Processing Using
Raster Signatures
Federal University of Rio de Janeiro
- Leonardo Guerreiro Azevedo, Rodrigo Salvador
Monteiro, Geraldo Zimbrão Jano Moreira de Souza - Coppe Graduate School of Engineering
- Institute of Mathematics Computer Science
Department
2Common Spatial Queries
- Area of polygon
- Area of polygon within window
- Spatial Joins
- polygon ? polygon, polygon ? polyline polyline
? polyline - Distance
- Buffer
- Perimeter
- Topological queries
3Common Spatial Queries
- Approximate Area of polygon
- Approximate Area of polygon within window
- Approximate Spatial Joins
- polygon ? polygon, polygon ? polyline polyline
? polyline - Approximate Distance
- Approximate Buffer
- Approximate Perimeter
- Approximate Topological queries
4Approximate Answers to Spatial Queries
- What is an approximate answer?
- If the exact result is a number, the
approximate result will be a number and a
confidence interval - If not, the graphical display of approximate
answers is something like a fuzzy map
5Motivation
- The increase of storage capacity
- The decrease of hardware costs
- Disk access time is still high
- Complex queries
- Data stored in devices that are not on-line.
6Motivation
- Approximate answer may be enough
- exact answers are itself approximations
- Approximate answers can be computed quickly
- Spatial query processing
- Scale
- Quality
- Round-off errors
7Scenarios and Applications
- Decision Support System
- Increasing business competitiveness
- More use of accumulated data
- Data mining
- During drill down query sequence in ad-hoc data
mining - Earlier queries in a sequence can be used to find
out the interesting queries. - Data warehouse
- Performance and scalability when accessing very
large volumes of data during the analysis
process.
8Scenarios and Applications
- Query optimization
- To define the most efficient access plan for a
given query - Distributed data recording and warehousing
environments - Data may be remote, and even may be unavailable
- Old data can be disposed in order to make room
for new ones. Therefore it becomes impossible to
answer to queries on deleted information.
9Scenarios and Applications
- Mobile computing
- An approximate answer may be an alternative
- When the data is not available
- To save storage space
10A framework for approximate query processing
Data environment set-up for providing approximate
answers
New data
Approx. Query Engine
Database
Queries
Responses
11Four Color Raster Signature (4CRS)
- Raster approximation (VLDB98)
- Object representation upon a grid of cells.
- Each cell stores relevant information using few
bits. - Grid resolution can be changed
- Precision ? storage requirements
- 4 types of cells
124CRS Approximation Construction of Signatures
Polygon
4CRS
13Polygon approximate area
- The algorithm is based on the sum of the expected
area of each cell grid - Empty cells 0
- Full cells 100
- Weak and Strong cells ? supposing uniform
distribution - Weak cells (0, 0.5 interval ? mean 0.25
- Strong cells (0.5, 1) interval ? mean 0.75
- Count the number of each cell type in the
polygons 4CRS, and multiply these values by the
presumed cell area.
14Confidence interval
- A measure of answer accuracy
- The polygon area inside weak or strong cell is
assumed to be uniformly distributed. - Weak cells
- Strong cells
- Using Central Limit Theorem ? confidence interval
- 95
- 99
15Confidence interval (example)
- Query results
- weak cells 100
- strong cells 120
- full cells 400
- Confidence interval 95
- Weak cells
- Strong cells
- Full cells 400 (full cells have the exact area!)
- Total
- Error between -1.15 and 1.15
16Cell Area Distribution
Weak
Strong
Comparable to an uniform distribution Variance
0.021369 (U 0.020833) Mean 0.246453 (U 0.25)
17Example
- empty cells 55
- weak cells 27
- strong cells 26
- full cells 79
- Approximate area( S weak 0.25 S strong
0.75 S full ) cellArea - Exact area 106.40
- Appr. area 105.25
- Error 1.07
18Approximate area of polygon ? window intersection
- This algorithm is similar to the approximate
polygon area algorithm - There are two kinds of cell overlap
- The cell may be completely contained by the
window - The cell may be partially contained by the window
- proportional to its overlapping area
19Experimental tests
- Computer PC Pentium IV 1,8 GHz, 512 MB RAM
- Page size 2,048 Bytes
- Target to evaluate the use of 4CRS for
approximate query processing against exact query
processing related to the following aspects - Response time
- Storage requirements
- Accuracy
- The algorithms tested were
- Polygon approximate area
- Approximate area of polygon x window intersection
- 100 random windows for each data set (different
sizes and positions)
20Experimental tests
- Use of R-trees in order to reduce the search
space.
21Experimental tests
- The polygon real data sets used in the
experiments consist of township boundaries,
census block-group, topography, geologic map and
hydrographic map from Iowa (USA), and Brazilian
municipalities.
22Approximate polygon area
23Approximate polygon area
24Approximate polygon ? window area
25Approximate polygon ? window area
26Conclusion
- The experimental results demonstrated the
efficiency of the 4CRS use for approximate query
processing. - Storage requirements
- 4CRS has an average of 3.75 of the real data set
size - Accuracy
- Approximate area average error of 2.62
- Window query approximate area average error of
1 - Response time
- Approximate area average 28.41
- Window query approximate area average 7.22
- Disk access
- Approximate area average 1.90
- Window query approximate area average 7.04
27Future works
- Algorithms for the other operations
- Approximate area of polygon x polygon
intersection algorithm is being evaluated - Use of approximations for mobile computing