Data Warehousing - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Data Warehousing

Description:

OLTP (Online transaction processing) has been the standard reason for IS and DP ... Once added to the DW, data are not changed (barring the existence of major errors) ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 27
Provided by: brianem4
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing


1
Data Warehousing
  • DSCI 4103
  • Dr. Mennecke

Introduction and Chapter 1
2
Introduction
  • Definitions
  • Legacy Systems
  • Dimensions
  • Data Dependencies Model
  • Dimensional Model

3
An ER Model
4
A Dimensional Model
Time
Market
Product
5
Why Data Warehouses?
  • To meet the long sought after goal of providing
    the user with more flexible data bases containing
    data that can be accessed every which way.

6
OLTP vs. OLAP
  • OLTP (Online transaction processing) has been the
    standard reason for IS and DP for the last thirty
    years. Most legacy systems are quite good at
    capturing data but do not facilitate data access.
  • OLAP (Online analytical processing) is a set of
    procedures for defining and using a dimension
    framework for decision support

7
The Goals for and Characteristics of a DW
  • Make organizational data accessible
  • Facilitate consistency
  • Adaptable and yet resilient to change
  • Secure and reliable
  • Designed with a focus on supporting decision
    making

8
The Goals for and Characteristics of a DW
  • Generate an environment in which data can be
    sliced and diced in multiple ways
  • It is more than data, it is a set of tools to
    query, analyze, and present information
  • The DW is the place where operational data is
    published (cleaned up, assembled, etc.)

9
Basic elements of the data warehouse
DataStaging Area
Operational Source Systems
DataPresentation Area
DataAccessTools
Services Clean, combine, and standardizeConform
DimensionsNo user query services Data
Store Flat files and relational
tables Processing Sorting and sequential
processing
Data Mart 1 DimensionalAtomic and summary
dataBased on a single business process
Ad hoc query tools Report Writers Analytical
Applications Modeling Forecasting Scoring Dat
a Mining
Extract
Load
Access
DW BusConformed facts and dimensions
Extract
Data Mart 2 Similar design
Extract
Load
Access
10
Data Staging Area
  • Extract-Transformation-Load
  • Extract Reading the source data and copying the
    data to the staging area
  • Transformation
  • Cleaning
  • Combining
  • Duplicating
  • Assigning keys
  • Load present data to the bulk loading facilities
    of the data mart

11
Organization of data in the presentation area of
the data warehouse
  • Data in the warehouse are dimensional, not
    normalized relations
  • However, data that are ultimately presented in
    the data warehouse will often be derived directly
    from relational DBs
  • Data should be atomic someplace in the warehouse
    even if the presentation is aggregate
  • Uses the bus architecture to support a
    decentralized set of data marts

12
Updates to a data warehouse
  • For many years, the dogma stated that data
    warehouses are never updated.
  • This is unrealistic since labels, titles, etc.
    change.
  • Some components will, therefore, be changed
    albeit, via a managed load (as opposed to
    transactional updates)

13
Dimensional Modeling Terms and Concepts
  • Fact table
  • Dimension tables

14
Fact Tables
  • Fact table a table in the data warehouse that
    contains
  • Numerical performance measures
  • Foreign keys that tie the fact table to the
    dimension tables

15
Fact Tables
  • Each row records a measurement describing a
    transaction
  • Where?
  • When?
  • Who?
  • How much?
  • How many?
  • The level of detail represented by this data is
    referred to as the grain of the data warehouse
  • Questions can only be asked down to a level
    corresponding with the grain of the data warehouse

16
Fact Tables
  • Fact tables contain numeric data that can be one
    of three types
  • Additive
  • Semi-additive
  • Non-additive
  • Fact tables contain foreign keys
  • A group of foreign keys will be used to create a
    concatenated primary key
  • Fact tables generally dont contain textual data

17
Dimension tables
  • Tables containing textual descriptors of the
    business
  • Dimension tables are usually wide (e.g., 100
    columns)
  • Dimension tables are usually shallow (100s of
    thousand or a few million rows)
  • Values in the dimensions usually provide
  • Constraints on queries (e.g., view customer by
    region)
  • Report headings

18
Dimension tables
  • The quality of the dimensions will determine the
    quality of the data warehouse that is, the DW is
    only as good as its dimension attributes
  • Dimensions are often split into hierarchical
    branches (i.e., snowflakes) because of the
    hierarchical nature of organizations
  • Product part ? Product ? Brand
  • Dimensions are usually highly denormalized

19
Dimension tables
  • The dimension attributes define the constraints
    for the DW. Without good dimensions, it becomes
    difficult to narrow down on a solution when the
    DW is used for decision support

20
Bringing together facts and dimensions Building
the dimensional Model
  • Start with the normalized ER Model
  • Group the ER diagram components into segments
    based on common business processes and model each
    as a unit
  • Find MM relationships in the model with numeric
    and additive non-key facts and include them in a
    fact table
  • Denormalize the other tables as needed and
    designate one field as a primary key

21
A Dimensional Model
22
So, What is a DW?
  • A data warehouse is a subject-oriented,
    integrated, non-volatile, and time-variant
    collection of data in support of managements
    decisions W.H. Inmon (the father of DW)

23
Subject Oriented
  • Data in a data warehouse are organized around the
    major subjects of the organization

24
Integrated
  • Data from multiple sources are standardized
    (scrubbed, cleansed, etc.) and brought into one
    environment

25
Non-Volatile
  • Once added to the DW, data are not changed
    (barring the existence of major errors)

26
Time Variant
  • The DW captures data at a specific moment, thus,
    it is a snap-shot view of the organization at
    that moment in time. As these snap-shots
    accumulate, the analyst is able to examine the
    organization over time (a time series!)
  • The snap-shot is called a production data extract
Write a Comment
User Comments (0)
About PowerShow.com