NIC Exposure Level Training

1 / 139
About This Presentation
Title:

NIC Exposure Level Training

Description:

OLTP (Online transaction processing) has been the standard reason for IS and DP ... Synonyms, homonyms. Entity level. No common keys or representation ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 140
Provided by: vijay77

less

Transcript and Presenter's Notes

Title: NIC Exposure Level Training


1
NIC Exposure Level Training
  • Vijayendra Gururao
  • Business Intelligence Consultant

2
Agenda
  • Data warehousing Concepts - Day 1
  • Govt Case Study - Day 2
  • Defense HR Case study - Day 3
  • Manufacturing Case Study - Day 4
  • Data mining - Day 5

3
The Evolution of Business Intelligence
4
Data Warehousing
5
Introduction
  • Definitions
  • Legacy Systems
  • Dimensions
  • Data Dependencies Model
  • Dimensional Model

6
An ER Model
7
Why Data Warehouses?
  • To meet the long sought after goal of providing
    the user with more flexible data bases containing
    data that can be accessed every which way.

8
OLTP vs. OLAP
  • OLTP (Online transaction processing) has been the
    standard reason for IS and DP for the last thirty
    years. Most legacy systems are quite good at
    capturing data but do not facilitate data access.
  • OLAP (Online analytical processing) is a set of
    procedures for defining and using a dimension
    framework for decision support

9
The Goals for and Characteristics of a DW
  • Make organizational data accessible
  • Facilitate consistency
  • Adaptable and yet resilient to change
  • Secure and reliable
  • Designed with a focus on supporting decision
    making

10
The Goals for and Characteristics of a DW
  • Generate an environment in which data can be
    sliced and diced in multiple ways
  • It is more than data, it is a set of tools to
    query, analyze, and present information
  • The DW is the place where operational data is
    published (cleaned up, assembled, etc.)

11
Data Warehousing is Changing!
Application requirements--not just data
requirements--are now driving need.
Customer Relationship Mgmt.
Campaign Management
ERP
Call Center
Knowledge Management
ERP
Target Marketing
Supply Chain
E-commerce
12
Organization of data in the presentation area of
the data warehouse
  • Data in the warehouse are dimensional, not
    normalized relations
  • However, data that are ultimately presented in
    the data warehouse will often be derived directly
    from relational DBs
  • Data should be atomic someplace in the warehouse
    even if the presentation is aggregate
  • Uses the bus architecture to support a
    decentralized set of data marts

13
Updates to a data warehouse
  • For many years, the dogma stated that data
    warehouses are never updated.
  • This is unrealistic since labels, titles, etc.
    change.
  • Some components will, therefore, be changed
    albeit, via a managed load (as opposed to
    transactional updates)

14
Basic elements of the data warehouse
DataStaging Area
Operational Source Systems
DataPresentation Area
DataAccessTools
Services Clean, combine, and standardizeConform
DimensionsNo user query services Data
Store Flat files and relational
tables Processing Sorting and sequential
processing
Data Mart 1 DimensionalAtomic and summary
dataBased on a single business process
Ad hoc query tools Report Writers Analytical
Applications Modeling Forecasting Scoring Dat
a Mining
Extract
Load
Access
DW BusConformed facts and dimensions
Extract
Data Mart 2 Similar design
Extract
Load
Access
15
Data Staging Area
  • Extract-Transformation-Load
  • Extract Reading the source data and copying the
    data to the staging area
  • Transformation
  • Cleaning
  • Combining
  • Duplicating
  • Assigning keys
  • Load present data to the bulk loading facilities
    of the data mart

16
Dimensional Modeling Terms and Concepts
  • Fact table
  • Dimension tables

17
Fact Tables
  • Fact table a table in the data warehouse that
    contains
  • Numerical performance measures
  • Foreign keys that tie the fact table to the
    dimension tables

18
Fact Tables
  • Each row records a measurement describing a
    transaction
  • Where?
  • When?
  • Who?
  • How much?
  • How many?
  • The level of detail represented by this data is
    referred to as the grain of the data warehouse
  • Questions can only be asked down to a level
    corresponding with the grain of the data warehouse

19
Dimension tables
  • Tables containing textual descriptors of the
    business
  • Dimension tables are usually wide (e.g., 100
    columns)
  • Dimension tables are usually shallow (100s of
    thousand or a few million rows)
  • Values in the dimensions usually provide
  • Constraints on queries (e.g., view customer by
    region)
  • Report headings

20
Dimension tables
  • The quality of the dimensions will determine the
    quality of the data warehouse that is, the DW is
    only as good as its dimension attributes
  • Dimensions are often split into hierarchical
    branches (i.e., snowflakes) because of the
    hierarchical nature of organizations
  • Product part ? Product ? Brand
  • Dimensions are usually highly denormalized

21
Dimension tables
  • The dimension attributes define the constraints
    for the DW. Without good dimensions, it becomes
    difficult to narrow down on a solution when the
    DW is used for decision support

22
Bringing together facts and dimensions Building
the dimensional Model
  • Start with the normalized ER Model
  • Group the ER diagram components into segments
    based on common business processes and model each
    as a unit
  • Find MM relationships in the model with numeric
    and additive non-key facts and include them in a
    fact table
  • Denormalize the other tables as needed and
    designate one field as a primary key

23
A Dimensional Model
24
Kimball Methodology
  • Conformed Dimensions

25
Review A Private Data Mart
  • A data mart containing one fact table and three
    dimension tables. We delivered all the tables by
    executing a fact build.
  • What if we want to add another fact table called
    F_Sales that will reference the three existing
    dimension tables?

26
Understand Conformed Dimensions
  • A conformed dimension is a dimension that is
    standardized across all data marts.

Location
Time
Sales Fact
Conformed Dimensions
Customer
Distribution Fact
Distributor
Promotion
Product
Order Fact
27
Advantages of Conformed Dimensions
  • Deliver incremental data marts in a short period
    of time.
  • Independent data marts become part of a fully
    integrated data warehouse.
  • Deliver a consistent view across your business.

28
Conformed Dimensions Within Bus Architecture
  • Identifying and designing the conformed
    dimensions is a critical step in the architecture
    phase of data warehouse design.

29
Design of Conformed Dimensions
  • A commitment to using conformed dimensions is
    more than just a technical consideration. It must
    be a business mandate.
  • Lay out a broad dimensional map for the
    enterprise.
  • Define conformed dimensions at the most granular
    (atomic) level possible.
  • Conformed dimensions should always use surrogate
    keys.
  • Define standard definitions for dimension and
    fact attributes.

30
Granularity in Conformed Dimensions
  • Conformed dimensions should be defined at the
    most granular (atomic) level so that each record
    in these tables corresponds to a single record in
    the base-level fact table.

Order FactDay IdProduct IdCustomer
IdCostNumberOrdered
D_TimeDayDay IdDayMonth Year
D_ProductProduct IdDescriptionProduct
TypeType DescriptionProduct LineLine
Description
D_CustomerCustomer IdLast NameFirst
NameAddress
31
Flexibility of Conformed Dimensions
  • Conformed dimensions are usually designed within
    star schema data marts. For multiple granularity
    fact tables, higher level views of dimensions can
    be used (or a snowflake table).

View or Snowflake table
Time Dimension
Order FactDay IdProduct IdCustomer
IdCostNumberOrdered
CustomerCustomer IdLast NameFirst NameAddress
Time(Day)Day IdDayMonth Id Period
Sales FactMonth IdProduct IdCustomer
IdAmountSoldRevenue
ProductProduct IdDescriptionProduct
TypeProduct Line
Time(Month) ViewMonth IdMonthPeriod
32
So, What is a DW?
  • A data warehouse is a subject-oriented,
    integrated, non-volatile, and time-variant
    collection of data in support of managements
    decisions W.H. Inmon (the father of DW)

33
Subject Oriented
  • Data in a data warehouse are organized around the
    major subjects of the organization

34
Integrated
  • Data from multiple sources are standardized
    (scrubbed, cleansed, etc.) and brought into one
    environment

35
Non-Volatile
  • Once added to the DW, data are not changed
    (barring the existence of major errors)

36
Time Variant
  • The DW captures data at a specific moment, thus,
    it is a snap-shot view of the organization at
    that moment in time. As these snap-shots
    accumulate, the analyst is able to examine the
    organization over time (a time series!)
  • The snap-shot is called a production data extract

37
Need for Data Warehousing
  • Integrated, company-wide view of high-quality
    information (from disparate databases)
  • Separation of operational and informational
    systems and data (for improved performance)

38
(No Transcript)
39
Data Warehouse Architectures
  • Generic Two-Level Architecture
  • Independent Data Mart
  • Dependent Data Mart and Operational Data Store
  • Logical Data Mart and _at_ctive Warehouse
  • Three-Layer architecture

All involve some form of extraction,
transformation and loading (ETL)
40
Generic two-level architecture
L
One, company-wide warehouse
T
E
Periodic extraction ? data is not completely
current in warehouse
41
Independent Data Mart
42
Dependent data mart with operational data store
43
Logical data mart and _at_ctive data warehouse
44
Three-layer architecture
45
DW Design
  • Mainly consists of
  • Logical Design
  • Physical Design

46
Logical Design of DW
  • Identification of
  • Entities
  • Relationships
  • Attributes
  • Uniqe identifiers

Conceptual and abstract. Results in Fact and
dimension tables Created using Pen and Paper OR
Modeling tools also
47
Physical Design of DW
  • Conversion of data gathered in Logical design to
    Physical database structure
  • Mainly driven for query performance

48
Logical ? Physical
49
Data CharacteristicsStatus vs. Event Data
Example of DBMS log entry
Event a database action (create/update/delete)
that results from a transaction
50
Data CharacteristicsTransient vs. Periodic Data
Figure 11-8 Transient operational data
Changes to existing records are written over
previous records, thus destroying the previous
data content
51
Data CharacteristicsTransient vs. Periodic Data
Periodic warehouse data
Data are never physically altered or deleted once
they have been added to the store
52
Data Reconciliation
  • Typical operational data is
  • Transient not historical
  • Not normalized (perhaps due to denormalization
    for performance)
  • Restricted in scope not comprehensive
  • Sometimes poor quality inconsistencies and
    errors
  • After ETL, data should be
  • Detailed not summarized yet
  • Historical periodic
  • Denormalized
  • Comprehensive enterprise-wide perspective
  • Quality controlled accurate with full integrity

53
Extract Transform Load
  • Extract data from operational system, transform
    and load into data warehouse
  • Why ETL?
  • Will your warehouse produce correct information
    with the current data?
  • How how can I ensure warehouse credibility?

54
Excuses for NOT Transforming Legacy Data
  • Old data works fine, new will work as well.
  • Data will be fixed at point of entry through GUI.
  • If needed, data will be cleaned after new system
    populated After proof-of-concept pilot.
  • Keys join the data most of the time.
  • Users will not agree to modifying or
    standardizing their data.

55
Levels of Migration Problem
  • Existing metadata is insufficient and unreliable
  • Metadata must hold for all occurrences
  • Metadata must represent business and technical
    attributes
  • Data values incorrectly typed and accessible
  • Values form extracted from storage
  • Values meaning inferred from its content
  • Entity keys unreliable or unavailable
  • Inferred from related values

56
Metadata Challenge
  • Metadata gets out of synch with details it
    summarizes
  • Business grows faster than systems designed to
    capture business info
  • Not at the right level of detail
  • Multiple values in a single field
  • Multiple meanings to a single field
  • No fixed format for value
  • Expressed in awkward of limited terms
  • Program/compiler view rather than business view

57
Character-level Challenge
  • Value instance level
  • Spelling, aliases
  • Abbreviations, truncations, transpositions
  • Inconsistent storage formats
  • Named type level
  • Multiple meanings, contextual meanings
  • Synonyms, homonyms
  • Entity level
  • No common keys or representation
  • No integrated view across records, files, systems

58
The ETL Process
  • Capture
  • Scrub or data cleansing
  • Transform
  • Load and Index

ETL Extract, transform, and load
59
The ETL Process
Presentation System
Source Systems
Staging Area
Transform
Extract
Load
60
Source Data
  • Record the name location and data that exists in
    the TPS environment.
  • File names and location
  • Layout
  • Attribute meaning

61
Extraction
  • Copy specific data directly from the source
    tables into a working dataset in the staging area.

62
Transformation (Dimension Tables)
  • Generate surrogate key in a primary-surrogate
    table. Make this permanent.
  • Insert the surrogate key into the working
    dimension tables.
  • Conduct any editing/cleaning operations you need
    (usually on the working table)
  • Generate any derived attributes you need.
  • Generate and retain process logs.

63
Transformation(Fact tables)
  • Join all dimensions to the fact table (using
    original primary keys). Insert surrogate keys
  • Generate derived facts
  • Generate indicator flags

64
Target Data
  • Describe the presentation data structure.
  • Model
  • Metadata
  • Usage and constraints

65
Flow Documentation
  • DFD for the ETL process
  • ERD for Source, Staging and Target databases.
  • Metadata
  • Usage notes.

66
Steps in data reconciliation
Capture extractobtaining a snapshot of a
chosen subset of the source data for loading into
the data warehouse
Incremental extract capturing changes that have
occurred since the last static extract
Static extract capturing a snapshot of the
source data at a point in time
67
Steps in data reconciliation (continued)
Scrub cleanseuses pattern recognition and AI
techniques to upgrade data quality
Fixing errors misspellings, erroneous dates,
incorrect field usage, mismatched addresses,
missing data, duplicate data, inconsistencies
Also decoding, reformatting, time stamping,
conversion, key generation, merging, error
detection/logging, locating missing data
68
Steps in data reconciliation (continued)
Transform convert data from format of
operational system to format of data warehouse
Record-level Selection data partitioning Joinin
g data combining Aggregation data
summarization
Field-level single-field from one field to
one field multi-field from many fields to one,
or one field to many
69
Steps in data reconciliation (continued)
Load/Index place transformed data into the
warehouse and create indexes
Refresh mode bulk rewriting of target data at
periodic intervals
Update mode only changes in source data are
written to data warehouse
70
Single-field transformation
In general some transformation function
translates data from old form to new form
Algorithmic transformation uses a formula or
logical expression
Table lookup another approach
71
Multi field transformation
M1 from many source fields to one target field
1M from one source field to many target fields
72
Derived Data
  • Objectives
  • Ease of use for decision support applications
  • Fast response to predefined user queries
  • Customized data for particular target audiences
  • Ad-hoc query support
  • Data mining capabilities
  • ? Characteristics
  • Detailed (mostly periodic) data
  • Aggregate (for summary)
  • Distributed (to departmental servers)

Most common data model star schema (also called
dimensional model)
73
Components of a star schema
Fact tables contain factual or quantitative data
Dimension tables are denormalized to maximize
performance
1N relationship between dimension tables and
fact tables
Dimension tables contain descriptions about the
subjects of the business
Excellent for ad-hoc queries, but bad for online
transaction processing
74
Star schema example
Fact table provides statistics for sales broken
down by product, period and store dimensions
75
Star schema with sample data
76
Advanced concepts
  • Slowly Changing dimensions
  • Ragged Hierarchies

77
What if Our Data is not Static?
  • Small occasional changes in dimension data are
    normal in business.
  • Examples of these changes include
  • addition of new members (a new product is
    launched)
  • changing of relationships within the dimension (a
    sales rep moves to another branch)
  • properties of members changed (a product is
    reformulated or renamed)
  • deletion of members (this is rare in data
    warehousing)

78
Understand Surrogates
  • 4-byte integer key (can hold more than 2 billion
    positive integers)
  • Internally assigned and meaningless
  • insures uniqueness
  • always known
  • Used in conjunction with business keys
  • business key is often mnemonic for example, OTA
    used for Ottawa office
  • surrogate key is numeric for example, 000128
  • Surrogate keys are never used in reports. They
    are used to link dimension tables to fact tables.

79
Understand Surrogate Keys Used In Operational
Systems
  • Operational databases also sometimes use
    surrogate keys (for example, Employee_No). These
    keys typically cannot be used as the data mart
    surrogate keys.
  • A single member in a data mart (for example, a
    particular employee) may have several data mart
    surrogate keys assigned over time to deal with
    slowly changing dimensions.
  • You may have to merge entities from separate
    operational systems, each with its own
    operational surrogate key (for example, customers
    from separate banking and insurance
    applications).
  • Operational surrogate keys are usually considered
    business keys in the data mart.

80
Understand Natural Keys Example
Metrics
Product Dimension
Customer Dimension
Cust Code Name
SA 1 11 Safeway LO 2 22
Loblaws SE 5 55 7-11
Fact Table
Prod Code Cust Code
PR X 002 39 SA 1 11 PR X 003 40 LO 2
22 PR Y 003 40 SE 5 55
Measures
81
Understand Surrogate Keys Example
Product Dimension
Customer Dimension
Cust Sur Cust Code Name
Prod Sur Prod Code Name
10 SA 1 11 Safeway
20 LO 2 22 Loblaws 30
SE 5 55 7-11
1 PR X 002 39 Soup
2 PR X 003 40 Beans 3
PR Y 003 40 Peas
Fact Table
Prod Sur Cust Sur
1 10 2
20 3
30
Measures
82
Track Dimensional Changes Over Time
  • Operational systems tend to contain data about
    the current state of the business.
  • A data warehouse is expected to hold data for
    five to 10 years.
  • Users may need to query data as of any particular
    date (for example, at which office(s) did Mary
    Smith work between January/1999 and
    December/1999?).
  • If Mary Smith changes offices, to which office do
    her sales apply, the old one or the new one?

83
Understand Slowly Changing Dimensions (SCD)
  • Operational dimensional data may often be thought
    of as static. It may only need to reflect the
    current state.
  • Data warehouse dimensional data often must show
    how the dimensional data changes over time. It is
    not static.
  • The term Slowly Changing Dimension (SCD) refers
    to the tracking of changes to dimensional data
    over time.

84
Understand Issues With Slowly Changing Dimensions
  • Maintaining SCDs can be complex without
    surrogates.

85
Use Different Methods of Handling Dimensional
Changes
  • Two most commonly used types of SCDs (according
    to Kimball)
  • Type 1. Overwrite the old value with the new
    value (do not track the changes).
  • Type 2. Add a new dimension record with a new
    surrogate key (track changes over time).
  • A single row may have a combination of columns of
    different types.

86
Type 1 Overwrite the Original Value
  • The organization may not choose to track certain
    data changes because
  • the original data may have been incorrect
  • the change is not considered relevant for tracking

Sales Rep Dimension Table
Sales Fact Table
87
Type 2 Add a New Dimension Record
  • When a tracked change is detected, a new
    surrogate key is assigned and a new row is added
    to the dimension table.
  • Usually, an effective begin/end date is also
    updated on the new and old rows.
  • Multiple rows may have the same business key, but
    they will always have unique surrogate keys.

Sales Rep Dimension Table
Sales Fact Table
88
Balanced and Ragged Hierarchies
  • Dimensional data are usually structured as
    hierarchies, either balanced or ragged
    (unbalanced).
  • Balanced hierarchies (those with a fixed number
    of levels) are most common and are the easiest to
    understand and analyze.
  • In ragged hierarchies, each branch does not break
    down into the same number of levels. They are
    harder to analyze and report against.
  • Also, PowerPlay requires that all leaf
    (lowest-level) nodes be at the same level to
    aggregate properly.

89
Parent-Child Relationships
  • Parent-child relationship are recursive
    relationships.
  • The levels of the hierarchy are determined by
    rows of the same table.

Reports To
Employees
Orders
90
Ragged Hierarchies
  • Leaf nodes have no children.
  • DecisionStream fact builds only look for leaf
    nodes at the lowest level.

91
Resolve Ragged Hierarchies Step 1
  • Create an auto-level hierarchy to obtain the
    number of levels.
  • Create a dimension build to create a physical
    table that will identify for each row the level
    it belongs to.

92
Resolve Ragged Hierarchies Step 1 (contd) Use
Auto-Level Hierarchies
  • The purpose of auto-level hierarchies in
    DecisionStream is to determine the number of
    levels in a hierarchy.

Top Level (Level 1)
Report to Andrew Fuller (Level 2)
Report to Steven Buchanan (Level 3)
Report to Andrew Fuller (Level 2)
93
Issues Regarding Star Schema
  • Dimension table keys must be surrogate
    (non-intelligent and non-business related),
    because
  • Keys may change over time
  • Length/format consistency
  • Granularity of Fact Table what level of detail
    do you want?
  • Transactional grain finest level
  • Aggregated grain more summarized
  • Finer grains ? better market basket analysis
    capability
  • Finer grain ? more dimension tables, more rows in
    fact table

94
The User InterfaceMetadata (data catalog)
  • Identify subjects of the data mart
  • Identify dimensions and facts
  • Indicate how data is derived from enterprise data
    warehouses, including derivation rules
  • Indicate how data is derived from operational
    data store, including derivation rules
  • Identify available reports and predefined queries
  • Identify data analysis techniques (e.g.
    drill-down)
  • Identify responsible people

95
  • Q A

96
Multi-dimensional data
97
On-Line Analytical Processing (OLAP)
  • The use of a set of graphical tools that provides
    users with multidimensional views of their data
    and allows them to analyze the data using simple
    windowing techniques
  • Relational OLAP (ROLAP)
  • Traditional relational representation
  • Multidimensional OLAP (MOLAP)
  • Cube structure
  • OLAP Operations
  • Cube slicing come up with 2-D view of data
  • Drill-down going from summary to more detailed
    views

98
Overall Plan
  • We need fast answers to analytical questions
  • Relational model may not be the answer
  • We can restructure data specifically for analysis
  • First we need to find out how people analyse data

99
Overall Plan
  • Analysis analysis reveals importance of measures
    and dimensions
  • So we structure the data with that in mind
  • The star schema is the physical structure that
    emerges
  • We can implement this as ROLAP, MOLAP and HOLAP
  • We achieve our objective rapid analytical
    processing

100
How do we make databases faster?
  • Indexing
  • Query design
  • Application design
  • Care with locking
  • Lots of ways
  • Data structuring

101
Relational
102
Data structuring Relational model
  • Pros
  • Data integrity in the face of user updates
  • Small data volume
  • Good for transactional queries
  • Cons
  • Poor analytical query performance

103
Poor analytical query performance
  • Why are relational databases slow?
  • Joins
  • Functions
  • Aggregations

104
Poor analytical query performance
  • So, there is a tension between
  • Transactions
  • Analytical querying
  • Solution
  • Split them up
  • Take a copy of the transactional database and
    structure it in a totally different way that is
    optimised for analytical querying

105
Structure for analytical querying
  • Great idea, but first we need to find out how
    people analyse their data

106
(No Transcript)
107
(No Transcript)
108
(No Transcript)
109
How people analyse their data
  • People analyse their data in terms of
  • Graphs
  • Grids
  • Reports
  • Do these have anything in common?

110
How people analyse their data
  • Do these have anything in common?
  • Measures
  • Numerical values
  • Typically plotted on the Y axis
  • Dimensions
  • Discontinuous variables
  • Slice the measures into aggregated groups
  • Typically plotted on the X axis

111
(No Transcript)
112
(No Transcript)
113
(No Transcript)
114
How people analyse their data
  • Dimensions are often hierarchical
  • People want to analyse
  • Time by Year, Quarter, Month, Day
  • Product by Warehouse, Type, Product
  • Customer by Country, County, Person

115
How people analyse their data
  • So, we need to summarise all of this..
  • Measures
  • Dimensions
  • Hierarchies

116
Squashed Octopus
Customer
Item
Country
Warehouse
Region
Class
Name
Product
Delay
Profit
Quantity
WeekDay
Month
Employee
Quarter
Region
Year
County
Time
Employee
117
Squashed Octopus
  • The SO is a logical model
  • What about a physical model?
  • (Recap why the relational model is slow for
    analytical querying)
  • Joins
  • Functions
  • Aggregations

118
Relational
119
Star Schema
120
Star Schema
  • What is in the fact table?
  • Facts

121
(No Transcript)
122
Star Schema
  • What is in a dimension table?
  • Dimensional information
  • Hierarchical information

123
(No Transcript)
124
Star Schema
  • How do dimension and fact tables work together?

125
(No Transcript)
126
(No Transcript)
127
Star Schema
  • Is it faster?

128
Query Time Relational Time Star Schema
Monthly totals 70 60
Sales in March by Product 18 6
Sales in March 2004 by Product 12 2
129
Star Schema
  • Is it faster?
  • Yes
  • How can we make it even faster?
  • Aggregation

130
(No Transcript)
131
Query Time Relational Time Star Schema Time Aggregated Star Schema
Monthly totals 70 60 lt1
Sales in March by Product 18 6 lt1
Sales in March 2004 by Product 12 2 lt1
132
Star Schema
  • If we leave this as a set of tables then it is
    ROLAP Relational OLAP
  • (OLAP On-Line Analytical Processing)
  • But it is a pain to manage
  • All those aggregation tables

133
Star Schema
  • So, the answer is MOLAP
  • (Multi-dimensional OLAP)

134
Star Schema
  • Finally HOLAP
  • (Hybrid OLAP)

135
On-Line Analytical Processing (OLAP)
  • OLAP Operations
  • Cube slicing come up with 2-D view of data
  • Drill-down going from summary to more detailed
    views

136
Figure 11-22 Slicing a data cube
137
Summary report
Example of drill-down
Drill-down with color added
138
Summary
  • We need fast answers to analytical questions
  • Relational model may not be the answer
  • We can restructure data specifically for analysis
  • First we need to find out how people analyse data

139
Summary
  • Analysis analysis reveals importance of measures
    and dimensions
  • So we structure the data with that in mind
  • The star schema is the physical structure that
    emerges
  • We can implement this as ROLAP, MOLAP and HOLAP
  • We achieve our objective rapid analytical
    processing
Write a Comment
User Comments (0)