Title: Chapter 3 part 2 The Data Warehouse and Design
1Chapter 3 (part 2)The Data Warehouseand Design
- Team 3B
- Michael Watson
- Steve Gibson
- Alicia Stevenson
2Meta Data
- Structure of data as known to the programmer
- Structure of data as known to the DSS analyst
- Transformation of data as it passes into the data
warehouse - Data model
- Relationship between the data model and the data
warehouse - History of extracts
3Managing Reference Tables
4Managing Reference Tables
5Cyclicity of DataThe Wrinkle of Time
6Cyclicity of DataThe Wrinkle of Time
7Cyclicity of DataThe Wrinkle of Time
8Cyclicity of DataThe Wrinkle of Time
9Complexity of Transformationand Integration
- The extraction of data from the operational
environment to the data warehouse environment. - The selection of data from the operational
environment may be very complex. - Operational input keys usually need to be
reconstructed and converted before they are
written out to the data warehouse. - Nonkey data is reformatted as it passes from the
operational environment to the data warehouse
environment. - Data is cleansed as it passes from the
operational environment to the data warehouse
environment.
10Complexity of Transformationand Integration
- Multiple input sources of data exist and must be
merged as they pass into the data warehouse. - When there are multiple input files, key
resolution must be done before the files can be
merged. - With multiple input files, the sequence of the
files may not be the same or even compatible. - Multiple outputs may result.
- Default values must be applied.
- The efficiency of selection of input data for
extraction often becomes a real issue. - Summarization of data is often required.
11Complexity of Transformationand Integration
- Renaming of data elements as they are moved from
the operation environment to the data warehouse
must be tracked. - Input records that must be read have exotic or
nonstandard formats. - Fixed-length records
- Variable-length records
- Occurs depending on
- Occurs clause
- Data relationships that have been built into old
legacy program logic must be understood and
unraveled before those files can be used as input.
12Complexity of Transformationand Integration
- Data format conversion must be done. EBCDIC to
ASCII (or vice versa) must be spelled out. - Massive volumes of input must be accounted for.
- The design of the data warehouse must conform to
a corporate data model. - The data warehouse reflects the historical need
for information, while the operational
environment focuses on the immediate, current
need for information. - The data warehouse addresses the informational
needs of the corporation, while the operational
environment addresses the up-to-the-second
clerical needs of the corporation. - Transmission of the newly created output file
that will go into the data warehouse must be
accounted for.
13Triggering theData Warehouse Record
- Events
- Components of the Snapshot
14Profile Records
- Data in warehouse does not meet criteria
- Massive volumes of data
- No need for historical detail of data
- Data changes frequently
- A Profile record groups many different, detailed
occurrences of operational data into a single
record. They represent a snapshot. It is created
from the grouping of many detailed records
15Profile Record
16Managing Volume
- Detail is lost
- Two approaches to make sure important detail is
not lost - Build the profile records iteratively.
- Go slow to make sure important detail is not
lost - 2) Create an alternative level of historical
detail along with - the profile record. It is slow and difficult to
get to, but there if you need it.
17Creating Multiple Profile Records
- Multiple profile records can be created from the
same detail. - EX Phone Company customer profile record,
district traffic profile record, line analysis
profile record, ect. - These records can go into a data mart or data
warehouse. - Data warehouse general purpose
- Data mart customized for the department that
uses it - Aggregation of the operational records into a
profile record is always done on the operational
server
18Going from the Data Warehouse to the Operational
Environment
- For a variety of reasons. The flow of data from
the operational environment to the data warehouse
environment is natural and normal. - Q Can it go the opposite way?
- A Yes, data can go from the data warehouse to
the operational environment, yes it is not very
common.
19Direct Access of Data Warehouse Data
- A request is made within the operational
environment for data in the data warehouse. The
request is transferred to the warehouse, the data
is located and transferred back to the
operational. - For direct access, time must not be a factor, the
request is for a small amount of data, formatting
of data must be nonexistent, and the two
technologies must be compatible.
20Indirect Access of Data Warehouse DataVery Common
- An example of indirect access to a data warehouse
occurs when a travel agent buys a ticket - Reservation clerk needs to know current bookings
and historical bookings to know the optimal
commission. - Booking and flight history online takes too long,
so it is done offline periodically and a table is
created.
21Indirect Access of Data Warehouse DataVery Common
22EX Retail Personalization System
- Telemarketer needs to personalize phone call
- last time purchased last type of purchase
- male/female children
- ages sports
- This personalization of the phone call and
knowledge of what products the customer is
interested in is achieved by the indirect use of
a database. - An analysis program is constantly reading and
analyzing customer records. Periodically, it
spins off a file to the operational environment
that contains relevant information. The
information is ready when the call is made.
23EX Retail Personalization System
24Indirect Use of Data Warehouse Data
- Considerations of the elements
- The Analysis program
- Many AI characteristics
- Free rein to run on any data warehouse data that
is available - Runs in the background
- Runs in harmony with the rate at which data
warehouse changes
25Considerations (cont.)
- The periodic refreshment
- Occurs infrequently
- Operates in a replacement mode
- Moves the data from the technology supporting the
data warehouse to the technology supporting the
operational environment
26Considerations (cont.)
- The online pre-analyzed data file
- Small amount of data per unit of data
- May contain a large amount of data
- Contains precisely what the online clerk needs
- Is not updated, but refreshed
- Part of online high-performance environment
- Efficient to access
- Geared for access of individual units of data
27Star Joins
- Normalization within a data warehouse
- Produces flexibility
- Fits well with very granular data
- Not optimized for any given set of processing
requirements - Fits very nicely with the data model
- Multidimensional approach
- Star joins, fact tables, and dimensions
- Exclusively to data marts
28Data Marts vs. Data Warehouses
- Shaped by departmental requirements
- Serve one community
- Data mart can be shaped into an optimal star join
structure
- Shaped by corporate requirements
- Serve a very large community
- Star join end result is a data warehouse
optimized for one community at the expense of all
other communities
29Multidimensional Approach
- The appeal of the multidimensional approach to
database design for data marts begins with the
data model
CUSTOMER
VENDOR
PRODUCT
ORDER
SHIPMENT
30Star Join
- Design structure that is required to manage large
amounts of data residing in an entity in a data
mart - The benefit of creating star joins is to
streamline data for DSS processing - The center is called the fact table
- The surrounding entities are called dimensions
31Star Join Example
- Dimension Tables Fact Table Dimension
Tables -
ORDER
SHIPMENT
VENDOR
ORDER ID ORDER DATA ORDER DATA VENDOR ID Nonkey
data CUST ID Nonkey data ORDER ID Nonkey
data PRODUCT ID Nonkey data
VENDOR ID VENDOR DATA VENDOR DATA
ORDER ID ORDER DATA ORDER DATA
CUSTOMER
PRODUCT
CUST ID CUST DATA CUST DATA
PRODUCT ID PRODUCT DATA PRODUCT DATA
Numeric
Character
32Supporting the ODS
- Three classes of ODS, in general
- Class I
- Updates are synchronous
- Class II
- Updates occur within a 2-3 hour time frame
- Class III
- Synchronization of updates occurs overnight
- Another type of ODS structure
- Class IV
- Updates are unscheduled
33Mikes Question
- What is the rule for the cyclicity of data or the
length of time a change of data in the
operational environment takes to be reflected in
the data warehouse?
34Steves Question
- Q What are two methods you can use to make sure
important detail is not lost when making profile
records.
35Alicias Question
- What is a star join and what does a star join
consist of?