Title: Data Warehouse
1Data Warehouse
IMS5024 presented by Eder Tsang
2Data Warehouse
- A data warehouse is a system consisting of
processes and databases used to provide the data
infrastructure for EIS and DSS - a subject-oriented, integrated, timevariant,
and non-volatile collection of data in support of
managements decisions - Inmon and Hackathorn (1994)
3Data warehouse - subject oriented
- The data warehouse is organised by data
subjects that are relevant to the organisation. - Customer, claim, shipment, product
- This may be contrasted with the process
orientation of many OLTP systems
4Data warehouse - integrated
- Data in the warehouse is structured based on a
corporate-wide model, spanning the functional
boundaries of legacy systems - This includes naming standards, units of
measurement and periodicity
5Data warehouse - time variant
- Data is the data warehouse is characterised by
the time-series nature of historical data - The data consists of a series of snapshots
which are time-stamped and record values at a
moment in time - This supports trend analysis of the data
6Data warehouse - non volatile
- The data warehouse is not continuously updated
(inserts, eletes and changes) like data in an
OLTP system - Data in a data warehouse is periodically
up-loaded at a scheduled time intervals (say
daily)
7Motivations for data warehousing
- Demands on OLTP data bases for query processing
would be too great - Data warehousing is designed for efficient
retrieval - Data in legacy systems is frequently
inconsistent, of poor quality and stored in
different formats - Reduce costs in providing data for decision
making
8Motivations for data warehousing
- Support for focus on complete business processes
(BPR) - Support for new initiatives
- CRM, Balanced Scorecard
- Industry sources quote ROIs averaging 401 over
3 years - Remain competitive
9Typical Data Warehouse Architecture
10An Actual Data Warehouse
11Data warehouse development
- Requirements identification
- Logical design, data modelling
- Data extract, transform and load (ETL)
- Warehouse architecture , technology and tools
- Physical database design
- Delivery systems
- Operational policies
12Designing a data warehouse data design
- There are two main approaches to data modelling
or data warehouse design - entity relationship modelling and
normalisation - dimensional modelling
13The design of databases using a traditional E-R
approach
- Entities and relationships
- Normalisation 3NF
14Entity relationship schema
15Why do we normalise data?
- Normalisation is a process for converting complex
data structures into simple, stable data
structures - Normalisation protects integrity of database by
avoiding anomalies (update, delete, create) - Normalised data models are
- robust and stable
- have minimum redundancy
16Dimensional Modeling (star schema)
17Dimensional Modeling (star schema)
- Components of dimensional model
- Fact Tables contain measurements of business
- eg. Sales, purchase order, shipment
- Dimension Tables store the descriptions of
the dimensions of the business - eg. Product, customer, vendor, store
18Dimensional Modeling (star schema)
- Each dimension table has a single primary key
that corresponds exactly to one of the components
of the multipart key in the fact table. - A fact table always expresses a many to many
relationship (the key is composed of foreign keys
- The most useful facts in a fact table are numeric
and additive - ( typically values are added up)
19Snowflake schema
- Snowflake schema all the tables are normalised
- Star schemas are preferable to snowflake fewer
joins for information retrieval
20Dimensional Modelling vs
E-R modelling
- the purpose of dimensional modelling structure
data for easy and efficient analysis - E-R modelling creates a single required to
support organisations Whereas - DM creates individual models for
business/decision interest - eg. model for sales info
- model for Inventory info
21Entity relationship schema (3NF)
22Corresponding Star schema
23Corresponding snowflake schema
24Dimensional Modelling vs
E-R modelling (Cont)
- OLTP and DW have different purpose operational
vs informational - Normalisation protects integrity of database by
avoiding anomalies (update, delete, create) - Data models for data warehouse do not have to be
normalised - In contrast, data in DW does not
change often - periodic additions of new data
25DM vs. E-R modeling debate (Kimballs view)
- OLTP systems are volatile high rates of update
transactions - In normalised models the goal is to reduce data
redundancy and prevent update anomalies - Data in a data warehouse does not need to be
normalised because it is periodically refreshed
not updated by user transactions