Chapter 3 part 2 The Data Warehouse and Design - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Chapter 3 part 2 The Data Warehouse and Design

Description:

... needs to know current bookings and historical bookings to know the optimal commission. Booking and flight history online takes too long, so it is done offline ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 36
Provided by: cltAs
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3 part 2 The Data Warehouse and Design


1
Chapter 3 (part 2)The Data Warehouseand Design
  • Team 3B
  • Michael Watson
  • Steve Gibson
  • Alicia Stevenson

2
Meta Data
  • Structure of data as known to the programmer
  • Structure of data as known to the DSS analyst
  • Transformation of data as it passes into the data
    warehouse
  • Data model
  • Relationship between the data model and the data
    warehouse
  • History of extracts

3
Managing Reference Tables
4
Managing Reference Tables
5
Cyclicity of DataThe Wrinkle of Time
6
Cyclicity of DataThe Wrinkle of Time
7
Cyclicity of DataThe Wrinkle of Time
8
Cyclicity of DataThe Wrinkle of Time
9
Complexity of Transformationand Integration
  • The extraction of data from the operational
    environment to the data warehouse environment.
  • The selection of data from the operational
    environment may be very complex.
  • Operational input keys usually need to be
    reconstructed and converted before they are
    written out to the data warehouse.
  • Nonkey data is reformatted as it passes from the
    operational environment to the data warehouse
    environment.
  • Data is cleansed as it passes from the
    operational environment to the data warehouse
    environment.

10
Complexity of Transformationand Integration
  • Multiple input sources of data exist and must be
    merged as they pass into the data warehouse.
  • When there are multiple input files, key
    resolution must be done before the files can be
    merged.
  • With multiple input files, the sequence of the
    files may not be the same or even compatible.
  • Multiple outputs may result.
  • Default values must be applied.
  • The efficiency of selection of input data for
    extraction often becomes a real issue.
  • Summarization of data is often required.

11
Complexity of Transformationand Integration
  • Renaming of data elements as they are moved from
    the operation environment to the data warehouse
    must be tracked.
  • Input records that must be read have exotic or
    nonstandard formats.
  • Fixed-length records
  • Variable-length records
  • Occurs depending on
  • Occurs clause
  • Data relationships that have been built into old
    legacy program logic must be understood and
    unraveled before those files can be used as input.

12
Complexity of Transformationand Integration
  • Data format conversion must be done. EBCDIC to
    ASCII (or vice versa) must be spelled out.
  • Massive volumes of input must be accounted for.
  • The design of the data warehouse must conform to
    a corporate data model.
  • The data warehouse reflects the historical need
    for information, while the operational
    environment focuses on the immediate, current
    need for information.
  • The data warehouse addresses the informational
    needs of the corporation, while the operational
    environment addresses the up-to-the-second
    clerical needs of the corporation.
  • Transmission of the newly created output file
    that will go into the data warehouse must be
    accounted for.

13
Triggering theData Warehouse Record
  • Events
  • Components of the Snapshot

14
Profile Records
  • Data in warehouse does not meet criteria
  • Massive volumes of data
  • No need for historical detail of data
  • Data changes frequently
  • A Profile record groups many different, detailed
    occurrences of operational data into a single
    record. They represent a snapshot. It is created
    from the grouping of many detailed records

15
Profile Record
16
Managing Volume
  • Detail is lost
  • Two approaches to make sure important detail is
    not lost
  • Build the profile records iteratively.
  • Go slow to make sure important detail is not
    lost
  • 2) Create an alternative level of historical
    detail along with
  • the profile record. It is slow and difficult to
    get to, but there if you need it.

17
Creating Multiple Profile Records
  • Multiple profile records can be created from the
    same detail.
  • EX Phone Company customer profile record,
    district traffic profile record, line analysis
    profile record, ect.
  • These records can go into a data mart or data
    warehouse.
  • Data warehouse general purpose
  • Data mart customized for the department that
    uses it
  • Aggregation of the operational records into a
    profile record is always done on the operational
    server

18
Going from the Data Warehouse to the Operational
Environment
  • For a variety of reasons. The flow of data from
    the operational environment to the data warehouse
    environment is natural and normal.
  • Q Can it go the opposite way?
  • A Yes, data can go from the data warehouse to
    the operational environment, yes it is not very
    common.

19
Direct Access of Data Warehouse Data
  • A request is made within the operational
    environment for data in the data warehouse. The
    request is transferred to the warehouse, the data
    is located and transferred back to the
    operational.
  • For direct access, time must not be a factor, the
    request is for a small amount of data, formatting
    of data must be nonexistent, and the two
    technologies must be compatible.

20
Indirect Access of Data Warehouse DataVery Common
  • An example of indirect access to a data warehouse
    occurs when a travel agent buys a ticket
  • Reservation clerk needs to know current bookings
    and historical bookings to know the optimal
    commission.
  • Booking and flight history online takes too long,
    so it is done offline periodically and a table is
    created.

21
Indirect Access of Data Warehouse DataVery Common
22
EX Retail Personalization System
  • Telemarketer needs to personalize phone call
  • last time purchased last type of purchase
  • male/female children
  • ages sports
  • This personalization of the phone call and
    knowledge of what products the customer is
    interested in is achieved by the indirect use of
    a database.
  • An analysis program is constantly reading and
    analyzing customer records. Periodically, it
    spins off a file to the operational environment
    that contains relevant information. The
    information is ready when the call is made.

23
EX Retail Personalization System
24
Indirect Use of Data Warehouse Data
  • Considerations of the elements
  • The Analysis program
  • Many AI characteristics
  • Free rein to run on any data warehouse data that
    is available
  • Runs in the background
  • Runs in harmony with the rate at which data
    warehouse changes

25
Considerations (cont.)
  • The periodic refreshment
  • Occurs infrequently
  • Operates in a replacement mode
  • Moves the data from the technology supporting the
    data warehouse to the technology supporting the
    operational environment

26
Considerations (cont.)
  • The online pre-analyzed data file
  • Small amount of data per unit of data
  • May contain a large amount of data
  • Contains precisely what the online clerk needs
  • Is not updated, but refreshed
  • Part of online high-performance environment
  • Efficient to access
  • Geared for access of individual units of data

27
Star Joins
  • Normalization within a data warehouse
  • Produces flexibility
  • Fits well with very granular data
  • Not optimized for any given set of processing
    requirements
  • Fits very nicely with the data model
  • Multidimensional approach
  • Star joins, fact tables, and dimensions
  • Exclusively to data marts

28
Data Marts vs. Data Warehouses
  • Shaped by departmental requirements
  • Serve one community
  • Data mart can be shaped into an optimal star join
    structure
  • Shaped by corporate requirements
  • Serve a very large community
  • Star join end result is a data warehouse
    optimized for one community at the expense of all
    other communities

29
Multidimensional Approach
  • The appeal of the multidimensional approach to
    database design for data marts begins with the
    data model

CUSTOMER
VENDOR
PRODUCT
ORDER
SHIPMENT
30
Star Join
  • Design structure that is required to manage large
    amounts of data residing in an entity in a data
    mart
  • The benefit of creating star joins is to
    streamline data for DSS processing
  • The center is called the fact table
  • The surrounding entities are called dimensions

31
Star Join Example
  • Dimension Tables Fact Table Dimension
    Tables



ORDER
SHIPMENT
VENDOR
ORDER ID ORDER DATA ORDER DATA VENDOR ID Nonkey
data CUST ID Nonkey data ORDER ID Nonkey
data PRODUCT ID Nonkey data
VENDOR ID VENDOR DATA VENDOR DATA
ORDER ID ORDER DATA ORDER DATA
CUSTOMER
PRODUCT
CUST ID CUST DATA CUST DATA
PRODUCT ID PRODUCT DATA PRODUCT DATA
Numeric
Character
32
Supporting the ODS
  • Three classes of ODS, in general
  • Class I
  • Updates are synchronous
  • Class II
  • Updates occur within a 2-3 hour time frame
  • Class III
  • Synchronization of updates occurs overnight
  • Another type of ODS structure
  • Class IV
  • Updates are unscheduled

33
Mikes Question
  • What is the rule for the cyclicity of data or the
    length of time a change of data in the
    operational environment takes to be reflected in
    the data warehouse?

34
Steves Question
  • Q What are two methods you can use to make sure
    important detail is not lost when making profile
    records.

35
Alicias Question
  • What is a star join and what does a star join
    consist of?
Write a Comment
User Comments (0)
About PowerShow.com