Title: Designing the data warehouse / data marts
1Designing the data warehouse/ data marts
- Methodologies and Techniques
2Basic principles
3Life cycle of the DW
First time load
Operational Databases
Refresh
Refresh
Purge or Archive
Refresh
4Oracle Warehouse Components
Any Data
Any Access
Any Source
Relational tools
Relational /Multidimensional
Operational data
OLAP tools
Text, image
Spatial
Audio,video
External data
Web
Applications/ Web
5Oracle Intelligence Tools
IS develops users Views
Business users
Analysts
Current
Tactical
Strategic
Oracle Reports
Oracle Discoverer
Oracle Express
6Oracle Data Mart Suite
Data Modeling Oracle Data Mart Designer
OLTP Databases
Data Mart Database
OLTP Engines
Ware- housing Engines
Oracle8
SQLPLUS
Data Access Analysis Discoverer Oracle Reports
Data Management Oracle Enterprise Manager
Data Extraction Oracle Data Mart Builder
7Big Bang ApproachAdvantages and Disadvantages
- Advantages
- warehouse built as part of major project (eg
BPR) - Having a big picture of the data warehouse
before starting the data warehousing project - Disadvantages
- Involves a high risk, takes a longer time
- Runs the risk of needing to change requirements
- Costly and harder to get support for from users
8Incremental Approach to Warehouse Development
- Multiple iterations
- Shorter implementations
- Validation of each phase
9Benefits of an Incremental Approach
- Delivers a strategic data warehouse solution
through incremental development efforts - Provides extensible, scalable architecture
- Quickly provides business benefits and ensures a
much earlier return of investment - Allows a data warehouse to be built based on a
subject or application area at a time - Allows the construction of an integrated data
mart environment
10Data Mart
- A subset of a data warehouse that supports the
requirements of a particular department or
business function. - Characteristics include
- Do not normally contain detailed operational data
unlike data warehouses. - May contain certain levels of aggregation
11Dependent Data Mart
Flat Files
Operational Systems
Sales
Data Warehouse
Finance
Data Marts
External Data
12Independent Data Mart
Operational Systems
Flat Files
Sales or Marketing
External Data
13Reasons for Creating a Data Mart
- To give users more flexible access to the data
they need to analyse most often. - To provide data in a form that matches the
collective view of a group of users - To improve end-user response time.
- Potential users of a data mart are clearly
defined and can be targeted for support
14Reasons for Creating a Data Mart
- To provide appropriately structured data as
dictated by the requirements of the end-user
access tools. - Building a data mart is simpler compared with
establishing a corporate data warehouse. - The cost of implementing data marts is far less
than that required to establish a data warehouse.
15Data Marts Issues
- Data mart functionality
- Data mart size
- Data mart load performance
- Users access to data in multiple data marts
- Data mart Internet / Intranet access
- Data mart administration
- Data mart installation
16Example of DW tool OLAP
- Rotate and drill down to successive levels of
detail. - Create and examine calculated data interactively
on large volumes of data. - Determine comparative or relative differences.
- Perform exception and trend analysis.
- Perform advanced analytical functions for example
forecasting, modeling, and regression analysis
17Original OLAP Rules
- 1. Multidimensional conceptual view
- 2. Transparency
- 3. Accessibility
- 4. Consistent reporting performance
- 5. Client-server architecture
18Original OLAP Rules
- 6. Multiuser support
- 7. Unrestricted cross-dimensional operations
- 8. Intuitive data manipulation
- 9. Flexible reporting
- 10. Unlimited dimensions and aggregation levels
19Relational Database Model
Attribute 1Name
Attribute 2Age
Attribute 3Gender
Attribute 4Emp No.
1001 1007 1010 1020
31 42 22 32
F M M F
Anderson Green Lee Ramos
Row 1 Row 2 Row 3 Row 4
The table above illustrates the employee relation.
20Multidimensional Database Model
Store
Customer
Store
Time
Time
FINANCE
SALES
Product
GL_Line
- The data is found at the intersection of
dimensions.
21Two dimensions
22Three dimensions
23Specialised Multidimensional tool
- Benefits
- Quick access to very large volumes of data
- Extensive and comprehensive libraries of complex
functions - analysis
- Strong modeling and forecasting capabilities
- Can access multidimensional and relational
database structures - Caters for calculated fields
- Disadvantages
- Difficulty of changing model
- Lack of support for very large volumes of data
- May require significant processing power
24MOLAP Server
- The application layer stores data in a
multidimensional structure - The presentation layer provides the
multidimensional view
DSS client
MOLAP Engine
- Efficient storage and processing
- Complexity hidden from the user
- Analysis using preaggregated summaries and
precalculated measures
Application layer
Warehouse
25ROLAP Server
- The warehouse stores atomic data.
- The application layer generates SQL for the
three- dimensional view. - The presentation layer provides the
multidimensional view.
DSS client
ROLAP engine
Application layer
Multiple SQL
Warehouseserver
26MOLAP
MDDB
Query
Periodic load
Data
Express Server
Express user
Warehouse
27ROLAP
Cache
Live fetch
Query
Data cache
Data
Express user
Express Server
Warehouse
Also Hybrid (HOLAP)
28Choosing a Reporting Architecture
- Business needs
- Potential for growth
- interface
- enterprise architecture
- Network architecture
- Speed of access
- Openness
Good
Query Performance
OK
Simple
Complex
Analysis
29Data Acquisition
- Identify, extract, transform, and transport
source data - Consider internal and external data
- Perform gap analysis between source data and
target database objects - Plan move of data between sources and target
- Define first-time load and refresh strategy
- Define tool requirements
- Build, test, and execute data acquisition modules
30Modeling
- Warehouses differ from operational structures
- Analytical requirements
- Subject orientation
- Data must map to subject oriented information
- Identify business subjects
- Define relationships between subjects
- Name the attributes of each subject
- Modeling is iterative
- Modeling tools are available
31Modeling the Data Warehouse
1
- Defining the business model
- Creating the dimensional model
- Modeling summaries
- 4. Creating the physical model
2, 3
4
32Identifying Business Rules
Product Type Monitor Status PC 15
inch New Server 17 inch Rebuilt 19
inch Custom None
Location Geographic proximity 0 - 1 miles 1 - 5
miles gt 5 miles
Store Store gt District gt Region
Time Month gt Quarter gt Year
33Creating the Dimensional Model
- Identify fact tables
- Translate business measures into fact tables
- Analyze source system information for additional
measures - Identify base and derived measures
- Document additivity of measures
- Identify dimension tables
- Link fact tables to the dimension tables
- Create views for users
34Dimension Tables
- Dimension tables have the following
characteristics - Contain textual information that represents the
attributes of the business - Contain relatively static data
- Are joined to a fact table through a foreign key
reference
35Fact Tables
- Fact tables have the following characteristics
- Contain numeric measures (metrics) of the
business - May contain summarized (aggregated) data
- May contain date-stamped data
- Are typically additive
- Have key value that is typically a concatenated
key composed of the primary keys of the
dimensions - Joined to dimension tables through foreign keys
that reference primary keys in the dimension
tables
36Dimensional Model (Star Schema)
Fact table
Dimension tables
37Star Schema Model
Product Table Product_id Product_desc
Store Table Store_id District_id ...
Sales Fact Table Product_id Store_id Item_id Day_i
d Sales_dollars Sales_units ...
- Central fact table
- Radiating dimensions
- Denormalized model
Time Table Day_id Month_id Period_id Year_id
Item Table Item_id Item_desc ...
38Star Schema Model
- Easy for users to understand
- Fast response to queries
- Simple metadata
- Supported by many front end tools
- Less robust to change
- Slower to build
- Does not support history
39Snowflake Schema Model
Product Table Product_id Product_desc
Store Table Store_id Store_desc District_id
District Table District_id District_desc
Sales Fact Table Item_id Store_id Sales_dollars Sa
les_units
Time Table Week_id Period_id Year_id
Dept Table Dept_id Dept_desc Mgr_id
Mgr Table Dept_id Mgr_id Mgr_name
Item Table Item_id Item_desc Dept_id
40Snowflake Schema Model
- Direct use by some tools
- More flexible to change
- Provides for speedier data loading
- May become large and unmanageable
- Degrades query performance
- More complex metadata
41Using Summary Data
Phase 3 Modeling summaries
- Provides fast access to precomputed data
- Reduces use of I/O, CPU, and memory
- Is distilled from source systems and
precalculated summaries - Usually exists in summary fact tables
42Designing Summary Tables
Units Sales() Store
Product A Total Product B Total Product C Total
43Summary Tables Example
SALES FACTS Sales Region Month 10,000 North Jan
99 12,000 South Feb 99 11,000 North Jan
99 15,000 West Mar 99 18,000 South Feb
99 20,000 North Jan 99 10,000 East Jan
99 2,000 West Mar 99
SALES BY MONTH/REGION Month Region Tot_Sales Jan
99 North 41,000 Jan 99 East 10,000 Feb
99 South 40,000 Mar 99 West 17,000
SALES BY MONTH Month Tot_Sales Jan 99 51,000 Feb
99 40,000 Mar 99 17,000
44Summary Management in Oracle8i
Salessummary
Sales
Region
State
City
Time
Product
45The Time Dimension
- Time is critical to the data warehouse.
- A consistent representation of time is required
for extensibility.
Timedimension
Sales fact
How and where should it be stored?