Chap8: Trends in DBMS - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Chap8: Trends in DBMS

Description:

LO2 : Learn about storage and retrieval of field data. LO3: Learn about spatial data warehouses ... Operations can be categorized into local, focal, zonal, and global ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 20
Provided by: wwwuser
Category:
Tags: dbms | chap8 | trends | zonal

less

Transcript and Presenter's Notes

Title: Chap8: Trends in DBMS


1
Chap8 Trends in DBMS
8.1 Database support for Field Entities 8.2
Content-based retrieval 8.3 Introduction to
spatial data warehouses 8.4 Summary
2
Learning Objectives
  • Learning Objectives (LO)
  • LO1 Learn about field data
  • LO2 Learn about storage and retrieval of field
    data
  • LO3 Learn about spatial data warehouses
  • What are data warehouses? Why are they
    interesting?
  • What are aggregate functions? Which ones are easy
    to compute?
  • Mapping Sections to learning objectives
  • LO1 - 8.1
  • LO2 - 8.1.2, 8.2
  • LO3 - 8.3

3
8.3 Why are Data Warehouses Interesting?
  • Data Warehouse facilitate group decision making
  • Consider a dataset
  • 1 measure (i.e. Sales)
  • 3 dimensions (e.g. Company, Year, Region)
  • Analysis questions
  • Q1. Rank Regions by total sales.
  • Q2. Rank years by total sales.
  • Q3. Where are sales consistently growing?
  • Cross tabulates summaries reports used to
    analyze the trends
  • Example

4
8.3 Generating cross-tabulation summaries
  • Traditional Approach
  • Use custom software pulling data out of a DBMS
  • Limitations redundant of work, inefficient use
    of resources
  • Data Warehouse approach
  • Cross-tab. Can be generated using a set of
    simple report
  • Each report is generated from a SQL Select ...
    group by statement
  • Example Fig. 8.19 (pp. 244) and Table 8.3 (pp.
    245)
  • Cross-tab example in last slide is a union of
  • SALES-L0-A, SALES-L1-A, SALES-L1-B and SALES-L2
  • Table 8.3 shows SQL queries to compute each part
  • Advantage
  • Rest of SQL is available for pre/post processing
    of data
  • Performance gains by eliminating unnecessary
    copying of data

5
Example Data Warehouse (Fig. 8.19)
Fig 8.19
6
8.3.4 Cross-tabulation vs.report hierarchy
  • Spreadsheet view of a report
  • Views a report a N-dim. Spreadsheet
  • N number of dimension attributes
  • Each cell contains value of measure
  • Cross-tabulation view of a Report hierarchy
  • Example report hierarchy for
  • SALES-L0-A, SALES-L2-A, SALES-L1-B, SALES-L2,
    Fig. 8.19 (pp. 244)

7
8.3 What is a Data Warehouse?
  • Data Warehouse is a special purpose database
  • Primarily used for specialized data analysis
    purposes
  • Facilitates generation and navigation of a
    hierarchy of reports
  • Special purpose data-sets and queries
  • Data consists of
  • a few measure attributes
  • a set of dimension attributes
  • The measure attribute depends on dimension
    attributes
  • Queries generate reports
  • Report measure for selected values of dimensions
  • Aggregate measure for given subset of dimensions
  • What is a spatial data warehouse?
  • Data warehouses with spatial measures or
    dimensions
  • Example census data - census tract is a spatial
    dimension
  • Example logistics data - route is a spatial
    dimension

8
8.3.4 Data Warehouse Operations
  • Operations on a data warehouse
  • Roll-up, Drill-down
  • Slice, Dice
  • Pivot
  • Roll-up
  • Inputs A report R, A subset S of dimensions in
    R
  • Output A sequence of reports summarizing R
  • Example 1 R SALES-Base, S (Year, Region) in
    Fig. 8.19 (pp. 244)
  • Output consists of reports SALES-L0-A,
    SALES-L1-B, SALES-L2
  • Example 2 R SALES-Base, S (Region, Year)
  • Output consists of reports SALES-L0-A,
    SALES-L1-A, SALES-L2
  • Drill-down
  • Inputs A report R, A dimension D not in R
  • Output A reports detailing R on D
  • Example R SALES-L1-B, D Region in Fig. 8.19
    (pp. 244)
  • Output report SALES-L0-A

9
8.3.4 Data Warehouse Operations
  • Slice, Dice
  • Reduce dimensions in a table- (Fig. 8.7, pp
    232).
  • Inputs A report R, A value V for a dimension D
    in R
  • Output A subset of R where D V
  • Example R SALES-L0-A, D Year, V 1994 in
    Fig. 8.19 (pp. 244)
  • Output Table 8.5 (pp. 246)
  • includes tuple (ALL, 1994, America, 35)

Fig 8.7
10
8.3.4 Data Warehouse Operations
  • Pivot
  • For a spreadsheet view of reports
  • Transposes a spreadsheet
  • Example
  • Inputs A spreadsheet view of a report R
  • Output A transposed spreadsheet
  • Ex. R SALES-L0-A, Fig. 8.19 (pp. 244)

11
Logical Data Model of a DWH
  • Purpose of a logical data model
  • Specify a framework to specify computational
    structure
  • Allow extension of SQL to model new needs
  • Cube operation
  • Input A fact table
  • Output A set of summary reports covering all
    subsets of dimension columns
  • Equivalent to union of all tables and reports in
    Fig. 8.19 (pp. 244)
  • Ex. Fig. 8.18, pp. 243
  • SELECT Company, Year, Region, Sum(Sales) AS Sales
  • FROM SALES
  • GROUP BY CUBE Company, Year, Region

12
Fig 8.18
13
Physical Data Model of a DWH
  • Purpose Computationally efficient
    implementation
  • Ideas
  • Pre-computation -
  • pre-compute some of reports and use those to
    compute other reports
  • New indexing methods, e.g. bit-map index
  • Query Processing Strategies
  • Strategies for aggregate functions
  • New strategies for multi-table joins
  • Let us look at strategies for aggregate functions

14
DWH Physical Model Aggregate function strategies
  • Aggregate Functions
  • Compute summary statistics for a given set of
    values
  • Examples sum, average, centroid (Table 8.1, pp.
    238)
  • Strategies for efficient computation
  • Characterize easy to compute aggregate functions
  • 3 categories
  • Distributive
  • Algebraic
  • Holistic
  • First 2 categories can be computed easily in one
    scan of the dataset

15
Definitions of Aggregate Function Categories
  • Notation
  • F, G, G1, G2, Gn are aggregate functions where
    n is small
  • S is a set of values, e.g. S (1, 2, 3, 4)
  • P (S1, S2, , Sp) is a partition of S, e.g. P
    (S1, S2), S1 (1, 2), S2 (3, 4)
  • Distributive( F ) if there exists a G such that
  • F( S ) G ( F(S1), F(S2), , F(Sn) )
  • Example sum is distributive
  • Illustration sum(1, 2, 3, 4) sum ( sum(1, 2),
    sum(3, 4)
  • Algebraic( F ) if there exists G1, , Gn, (where
    n is small) and
  • F( S ) G ( G1(S1), , Gn(S1), G2(S1), ,
    Gn(S2), , G1(Sp), , Gn(Sp) )
  • Example average is distributive
  • Illustration average(1, 2, 3, 4)
  • count(1, 2) average(1, 2) count(3, 4)
    average / count(1,2) count(3,4)

16
Example Distributive Aggregate Function
  • Examples in cross-tabulation scenario (Fig.
    8.14, pp.238)
  • Example 1. Min is distributive
  • Example 2. Count is distributive

Fig 8.14
17
Examples Algebraic Aggregate Functions
  • Examples in cross-tabulation scenario (Fig.
    8.15, pp.239)
  • Average and Variance are algebraic

Fig 8.15
18
Discussion - Spatial Data Warehouse
  • Example
  • Consider the example in Fig. 8.16, pp. 241
  • A map interpretation may be attached to each
    report
  • Each row has a spatial footprint, which can be
    aggregated by geometric-union
  • The collection of maps may be called a mapcube
  • Issues
  • What is needed in OGIS standard to support
    map-cube operation?
  • Hierarchical collection of maps in mapcube
  • What is an appropriate cartography to convey the
    relationship among maps?

19
Spatial Data Warehouses and Mapcube
Fig 8.16
20
Fig 8.17
21
Summary
  • Field data
  • useful in many applications due to rich content
  • Represented as raster or image
  • Operations can be categorized into local, focal,
    zonal, and global
  • Field data storage and retrieval
  • Tiling is a preferred way to divide raster data
    into disk blocks
  • Meta-data based query is often used for retrieval
  • Content based retrieval may be used for
    similarity searches
  • Data warehouses support analysis e.g.
    cross-tabulation reports
  • SQL CUBE operator support generation of DWH
    reports
  • Distributive and Algebraic aggregate functions
    can be computed easily
Write a Comment
User Comments (0)
About PowerShow.com