Title: Chap8: Trends in DBMS
1Chap8 Trends in DBMS
8.1 Database support for Field Entities 8.2
Content-based retrieval 8.3 Introduction to
spatial data warehouses 8.4 Summary
2Learning Objectives
- Learning Objectives (LO)
- LO1 Learn about field data
- LO2 Learn about storage and retrieval of field
data - LO3 Learn about spatial data warehouses
- What are data warehouses? Why are they
interesting? - What are aggregate functions? Which ones are easy
to compute? - Mapping Sections to learning objectives
- LO1 - 8.1
- LO2 - 8.1.2, 8.2
- LO3 - 8.3
38.3 Why are Data Warehouses Interesting?
- Data Warehouse facilitate group decision making
- Consider a dataset
- 1 measure (i.e. Sales)
- 3 dimensions (e.g. Company, Year, Region)
- Analysis questions
- Q1. Rank Regions by total sales.
- Q2. Rank years by total sales.
- Q3. Where are sales consistently growing?
- Cross tabulates summaries reports used to
analyze the trends - Example
48.3 Generating cross-tabulation summaries
- Traditional Approach
- Use custom software pulling data out of a DBMS
- Limitations redundant of work, inefficient use
of resources - Data Warehouse approach
- Cross-tab. Can be generated using a set of
simple report - Each report is generated from a SQL Select ...
group by statement - Example Fig. 8.19 (pp. 244) and Table 8.3 (pp.
245) - Cross-tab example in last slide is a union of
- SALES-L0-A, SALES-L1-A, SALES-L1-B and SALES-L2
- Table 8.3 shows SQL queries to compute each part
- Advantage
- Rest of SQL is available for pre/post processing
of data - Performance gains by eliminating unnecessary
copying of data
5Example Data Warehouse (Fig. 8.19)
Fig 8.19
68.3.4 Cross-tabulation vs.report hierarchy
- Spreadsheet view of a report
- Views a report a N-dim. Spreadsheet
- N number of dimension attributes
- Each cell contains value of measure
- Cross-tabulation view of a Report hierarchy
- Example report hierarchy for
- SALES-L0-A, SALES-L2-A, SALES-L1-B, SALES-L2,
Fig. 8.19 (pp. 244)
78.3 What is a Data Warehouse?
- Data Warehouse is a special purpose database
- Primarily used for specialized data analysis
purposes - Facilitates generation and navigation of a
hierarchy of reports - Special purpose data-sets and queries
- Data consists of
- a few measure attributes
- a set of dimension attributes
- The measure attribute depends on dimension
attributes - Queries generate reports
- Report measure for selected values of dimensions
- Aggregate measure for given subset of dimensions
- What is a spatial data warehouse?
- Data warehouses with spatial measures or
dimensions - Example census data - census tract is a spatial
dimension - Example logistics data - route is a spatial
dimension
88.3.4 Data Warehouse Operations
- Operations on a data warehouse
- Roll-up, Drill-down
- Slice, Dice
- Pivot
- Roll-up
- Inputs A report R, A subset S of dimensions in
R - Output A sequence of reports summarizing R
- Example 1 R SALES-Base, S (Year, Region) in
Fig. 8.19 (pp. 244) - Output consists of reports SALES-L0-A,
SALES-L1-B, SALES-L2 - Example 2 R SALES-Base, S (Region, Year)
- Output consists of reports SALES-L0-A,
SALES-L1-A, SALES-L2 - Drill-down
- Inputs A report R, A dimension D not in R
- Output A reports detailing R on D
- Example R SALES-L1-B, D Region in Fig. 8.19
(pp. 244) - Output report SALES-L0-A
98.3.4 Data Warehouse Operations
- Slice, Dice
- Reduce dimensions in a table- (Fig. 8.7, pp
232). - Inputs A report R, A value V for a dimension D
in R - Output A subset of R where D V
- Example R SALES-L0-A, D Year, V 1994 in
Fig. 8.19 (pp. 244) - Output Table 8.5 (pp. 246)
- includes tuple (ALL, 1994, America, 35)
Fig 8.7
108.3.4 Data Warehouse Operations
- Pivot
- For a spreadsheet view of reports
- Transposes a spreadsheet
- Example
- Inputs A spreadsheet view of a report R
- Output A transposed spreadsheet
- Ex. R SALES-L0-A, Fig. 8.19 (pp. 244)
11Logical Data Model of a DWH
- Purpose of a logical data model
- Specify a framework to specify computational
structure - Allow extension of SQL to model new needs
- Cube operation
- Input A fact table
- Output A set of summary reports covering all
subsets of dimension columns - Equivalent to union of all tables and reports in
Fig. 8.19 (pp. 244) - Ex. Fig. 8.18, pp. 243
- SELECT Company, Year, Region, Sum(Sales) AS Sales
- FROM SALES
- GROUP BY CUBE Company, Year, Region
12Fig 8.18
13Physical Data Model of a DWH
- Purpose Computationally efficient
implementation - Ideas
- Pre-computation -
- pre-compute some of reports and use those to
compute other reports - New indexing methods, e.g. bit-map index
- Query Processing Strategies
- Strategies for aggregate functions
- New strategies for multi-table joins
- Let us look at strategies for aggregate functions
14DWH Physical Model Aggregate function strategies
- Aggregate Functions
- Compute summary statistics for a given set of
values - Examples sum, average, centroid (Table 8.1, pp.
238) - Strategies for efficient computation
- Characterize easy to compute aggregate functions
- 3 categories
- Distributive
- Algebraic
- Holistic
- First 2 categories can be computed easily in one
scan of the dataset
15Definitions of Aggregate Function Categories
- Notation
- F, G, G1, G2, Gn are aggregate functions where
n is small - S is a set of values, e.g. S (1, 2, 3, 4)
- P (S1, S2, , Sp) is a partition of S, e.g. P
(S1, S2), S1 (1, 2), S2 (3, 4) - Distributive( F ) if there exists a G such that
- F( S ) G ( F(S1), F(S2), , F(Sn) )
- Example sum is distributive
- Illustration sum(1, 2, 3, 4) sum ( sum(1, 2),
sum(3, 4) - Algebraic( F ) if there exists G1, , Gn, (where
n is small) and - F( S ) G ( G1(S1), , Gn(S1), G2(S1), ,
Gn(S2), , G1(Sp), , Gn(Sp) ) - Example average is distributive
- Illustration average(1, 2, 3, 4)
- count(1, 2) average(1, 2) count(3, 4)
average / count(1,2) count(3,4)
16Example Distributive Aggregate Function
- Examples in cross-tabulation scenario (Fig.
8.14, pp.238) - Example 1. Min is distributive
- Example 2. Count is distributive
Fig 8.14
17Examples Algebraic Aggregate Functions
- Examples in cross-tabulation scenario (Fig.
8.15, pp.239) - Average and Variance are algebraic
Fig 8.15
18Discussion - Spatial Data Warehouse
- Example
- Consider the example in Fig. 8.16, pp. 241
- A map interpretation may be attached to each
report - Each row has a spatial footprint, which can be
aggregated by geometric-union - The collection of maps may be called a mapcube
- Issues
- What is needed in OGIS standard to support
map-cube operation? - Hierarchical collection of maps in mapcube
- What is an appropriate cartography to convey the
relationship among maps?
19Spatial Data Warehouses and Mapcube
Fig 8.16
20Fig 8.17
21Summary
- Field data
- useful in many applications due to rich content
- Represented as raster or image
- Operations can be categorized into local, focal,
zonal, and global - Field data storage and retrieval
- Tiling is a preferred way to divide raster data
into disk blocks - Meta-data based query is often used for retrieval
- Content based retrieval may be used for
similarity searches - Data warehouses support analysis e.g.
cross-tabulation reports - SQL CUBE operator support generation of DWH
reports - Distributive and Algebraic aggregate functions
can be computed easily