Title: Chapter 5B: The Data Warehouse
1Chapter 5B The Data Warehouse Technology
- Erin Bader
- David Hancock
- Tamya Stallings
2Multidimensional DBMS and the Data Warehouse
3Multidimensional Database Management Systems
Processing
- Provide information system with the structure
that allows an organization to have very flexible
access to data - To slice dice data any number of ways
- To dynamically explore the relationship between
summary and detail data - Offer flexibility and control to the end user
- Fit well in a DSS environment
4Differences between Multidimensional DBMS data
warehouse
- Data warehouse holds massive amounts of data the
multidimensional DBMS holds at least an order of
magnitude less data - The data warehouse is geared for a limited amount
of flexible access the multidimensional DBMS is
geared for very heavy and unpredictable access
and analysis of data - The data warehouse contains data with a very
lengthy time horizon from 5-10 years the
multidimensional DBMS holds a much shorter time
horizon of data
5Differences between Multidimensional DBMS data
warehouse
- The data warehouse allows analysts to access its
data in a constrained fashion the
multidimensional DBMS allows unfettered access - Instead of the data warehouse being housed in a
multidimensional DBMS, the multidimensional DBMS
and the data warehouse enjoy a complementary
relationship
6One Interesting feature of Data Warehouse MDBMS
- Data warehouse can provide a basis for very
detailed data that is normally not found in the
multidimensional DBMS - Data warehouse very fine degree of detail
- MDBMS lightly summarized and houses all but the
most detailed level of data
7Relational foundation for multidimensional DBMS
data marts
- Strengths
- Can support a lot of data
- Can support dynamic joining of data
- Has proven technology
- Is capable of supporting general-purpose update
processing - Is there is no known pattern of usage of data,
then the relational structure is as good as any
other
8Relational foundation for multidimensional DBMS
data marts
- Weaknesses
- Has performance that is less than optimal
- Cannot be purely optimized for access processing
9Cube Foundation for MDBMS data marts
- Strengths
- Performance that is optimal for DSS processing
- Can be optimized for very fast access of data
- If pattern of access of data is known, then the
structure the data can be optimized - Can easily be sliced and diced
- Can be examined in many ways
10Cube Foundation for MDBMS data marts
- Weaknesses
- Cannot handle nearly as much data as a standard
relational format - Does not support general-purpose update
processing - May take a long time to load
- If access is desired on a path not supported by
the design of the data, the structure is not
flexible - Questionable support for dynamic joins of data
11Flow of data from legacy system
12Flow of data from current level detail
13Unmanaged Spider Web
14Tamyas Question
- What are the differences between the
multidimensional DBMS and the data warehouse?
15Data Warehousing across Multiple Storage Media
- Occurs when large amounts of data are present
- 2 processing environments
- DASD
- Online/interactive processing
- Tape or Mass Storage
16Data Warehousing across Multiple Storage Media
cont.
- Logically, the two environments combine to form a
single data warehouse - Physically, the two are very different
- The underlying technologies are not the same.
- Mixing technologies in the data warehouse
environment is normal and natural when done this
way.
17Data Warehousing across Multiple Storage Media
cont.
- Another way technology can be split, but is not
normal nor natural - The DASD portion is split over more than one
technology - One part resides on one vendors technology
- The other part resides on another vendors
database technology
18Caution
- When the split is deliberate and part of a larger
distributed data warehouse, the split is fine. - If the split occurs for political or historical
reasons, it is not advised
19Meta Data in the Data Warehouse Environment
- The role of meta data in the data warehouse
environment is very different from the role of
meta data in the operational environment. - In the operational environment
- Meta data is treated as documentation
- Meta data is optional to the IT professional
20Meta Data in the Data Warehouse Environment cont.
- In the Data Warehouse environment
- Meta data is the first thing the DSS analyst
looks at in planning how to perform
informational/analytical processing - Meta data is much more important here
- Meta data is mandatory to the DSS analyst
21Meta Data in the Data Warehouse Environment cont.
- Data warehouse meta data is also used to manage
the mapping between the operational environment
and the data warehouse environment. Without the
mapping, controlling the interface would be
extremely difficult.
22From the operational to data warehouse
environments
- Conversion
- Filtering
- Summarization
- Structural Changes
- All these activities are documented in the meta
data - This allows for the drill down process to occur
- Must keep track of the changing structure over
time
23Context and Content
- Although current information such as account
balance or inventory on hand is important,
keeping track of the information and allowing it
to be viewed over a period of time can offer new
insight
24Contextual Information
- The contents of data by themselves are quite
inexplicable, but when context is added, together
they become enlightening
25Types of Contextual Information
- Simple Contextual Information
- Complex Contextual Information
- External Contextual Information
26Simple Contextual Information
- Relates to the basic structure of data itself and
was managed in the past by dictionaries,
directories, etc - It includes
- The structure of data
- The encoding of data
- The naming conventions used for data
- The metrics describing data
- How much data is there
- How fast the data is growing
- What sectors of data are growing
- How the data is being used
27Complex Contextual Information
- Describes same data as simple contextual
information, but from a different perspective - Addresses aspects such as
- Product Definitions
- Pricing
- Packaging
- It is some of the most useful, but elusive
information there is to capture - It is elusive because it is taken for granted and
in the background
28External Contextual Information
- Information outside the corporation that plays an
important role in understanding information over
time. - Examples include
- Inflation
- Financial trends
- Taxation
- Economic growth
- External contextual information says nothing
about the company but everything about the
universe in which the company operates
29Erins Question
- What are the three types of contextual
information and explain each one?
30Capturing and Managing Contextual Information
- Difficulties with Contextual Information
- Contextual information changes to quickly
- Difficult to put into the system due to the fast
changes - Complex and external types are hard to capture
because they are so unstructured
31Capturing and Managing Contextual Information
- Past types of managing contextual information
- Dictionaries
- Repositories
- Directories
- Libraries
32Capturing and Managing Contextual Information
- Problems with past types
- Information management was focused on the
developer and not the end user - Attempts were passive
- Management was only limited to simple contextual
information
33Testing
- The testing environment is where programmers test
the new programs and change to existing programs
34Testing
- The test environment is different for every data
warehouse - Reasons are
- The data warehouse is to large and is hard to
justify just one, much less two - Programs are not run in a repetitive manner
35Davids Question
- Why did past attempts at managing contextual
information fail?