Data Warehouse - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Data Warehouse

Description:

Data Warehouse IMS5024 presented by Eder Tsang – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 26
Provided by: Jing1155
Category:
Tags: data | warehouse

less

Transcript and Presenter's Notes

Title: Data Warehouse


1
Data Warehouse
IMS5024 presented by Eder Tsang
2
Data Warehouse
  • A data warehouse is a system consisting of
    processes and databases used to provide the data
    infrastructure for EIS and DSS
  • a subject-oriented, integrated, timevariant,
    and non-volatile collection of data in support of
    managements decisions
  • Inmon and Hackathorn (1994)

3
Data warehouse - subject oriented
  • The data warehouse is organised by data
    subjects that are relevant to the organisation.
  • Customer, claim, shipment, product
  • This may be contrasted with the process
    orientation of many OLTP systems

4
Data warehouse - integrated
  • Data in the warehouse is structured based on a
    corporate-wide model, spanning the functional
    boundaries of legacy systems
  • This includes naming standards, units of
    measurement and periodicity

5
Data warehouse - time variant
  • Data is the data warehouse is characterised by
    the time-series nature of historical data
  • The data consists of a series of snapshots
    which are time-stamped and record values at a
    moment in time
  • This supports trend analysis of the data

6
Data warehouse - non volatile
  • The data warehouse is not continuously updated
    (inserts, eletes and changes) like data in an
    OLTP system
  • Data in a data warehouse is periodically
    up-loaded at a scheduled time intervals (say
    daily)

7
Motivations for data warehousing
  • Demands on OLTP data bases for query processing
    would be too great
  • Data warehousing is designed for efficient
    retrieval
  • Data in legacy systems is frequently
    inconsistent, of poor quality and stored in
    different formats
  • Reduce costs in providing data for decision
    making

8
Motivations for data warehousing
  • Support for focus on complete business processes
    (BPR)
  • Support for new initiatives
  • CRM, Balanced Scorecard
  • Industry sources quote ROIs averaging 401 over
    3 years
  • Remain competitive

9
Typical Data Warehouse Architecture
10
An Actual Data Warehouse
11
Data warehouse development
  • Requirements identification
  • Logical design, data modelling
  • Data extract, transform and load (ETL)
  • Warehouse architecture , technology and tools
  • Physical database design
  • Delivery systems
  • Operational policies

12
Designing a data warehouse data design
  • There are two main approaches to data modelling
    or data warehouse design
  • entity relationship modelling and
    normalisation
  • dimensional modelling

13
The design of databases using a traditional E-R
approach
  • Entities and relationships
  • Normalisation 3NF

14
Entity relationship schema
15
Why do we normalise data?
  • Normalisation is a process for converting complex
    data structures into simple, stable data
    structures
  • Normalisation protects integrity of database by
    avoiding anomalies (update, delete, create)
  • Normalised data models are
  • robust and stable
  • have minimum redundancy

16
Dimensional Modeling (star schema)
17
Dimensional Modeling (star schema)
  • Components of dimensional model
  • Fact Tables contain measurements of business
  • eg. Sales, purchase order, shipment
  • Dimension Tables store the descriptions of
    the dimensions of the business
  • eg. Product, customer, vendor, store

18
Dimensional Modeling (star schema)
  • Each dimension table has a single primary key
    that corresponds exactly to one of the components
    of the multipart key in the fact table.
  • A fact table always expresses a many to many
    relationship (the key is composed of foreign keys
  • The most useful facts in a fact table are numeric
    and additive
  • ( typically values are added up)

19
Snowflake schema
  • Snowflake schema all the tables are normalised
  • Star schemas are preferable to snowflake fewer
    joins for information retrieval

20
Dimensional Modelling vs
E-R modelling
  • the purpose of dimensional modelling structure
    data for easy and efficient analysis
  • E-R modelling creates a single required to
    support organisations Whereas
  • DM creates individual models for
    business/decision interest
  • eg. model for sales info
  • model for Inventory info

21
Entity relationship schema (3NF)
22
Corresponding Star schema
23
Corresponding snowflake schema
24
Dimensional Modelling vs
E-R modelling (Cont)
  • OLTP and DW have different purpose operational
    vs informational
  • Normalisation protects integrity of database by
    avoiding anomalies (update, delete, create)
  • Data models for data warehouse do not have to be
    normalised
  • In contrast, data in DW does not
    change often
  • periodic additions of new data

25
DM vs. E-R modeling debate (Kimballs view)
  • OLTP systems are volatile high rates of update
    transactions
  • In normalised models the goal is to reduce data
    redundancy and prevent update anomalies
  • Data in a data warehouse does not need to be
    normalised because it is periodically refreshed
    not updated by user transactions
Write a Comment
User Comments (0)
About PowerShow.com