Title: Populating Data Warehouse Structures
1Populating Data Warehouse Structures
2Examining the Star Schema
Sales Star Schema
Fact Table
Dimension Tables
Dimension Table
3Implementing the Star Schema
- 1. Extract Data From Multiple Sources
- 2. Integrate, Transform, and Restructure Data
- 3. Load Data Into Dimension Tables and Fact Tables
4The Star Schema Data Load
Polaris Data Warehouse
Heterogeneous Data Sources
Northwind OLTP
Staging Area
Sales Star
DTS
DTS
Financial
Inventory Star
Extracting Data From Transforming
Loading the Heterogeneous Sources Data
Star Schema
DTS
DTS
5Verifying the Dimension Source Data
- Verifying Accuracy of Source Data
- Integrating data from multiple sources
- Applying business rules
- Checking structural requirements
- Managing Invalid Data
- Rejecting invalid data
- Saving invalid data to a log
- Correcting Invalid Data
- Transforming data
- Reassigning data values
6Dimension Data Load Examples
DTS
DTS
DTS
7Maintaining Integrity of the Dimension
- Assigning a Surrogate Key to Each Record
- Defines the dimensions primary key
- Relates to the foreign key fields of the fact
table - Loading One Record Per Application Key
- Maintains uniqueness in the dimension
- Depends on how you manage changing dimension data
- Maintains integrity of the fact table
8Managing Changing Dimension Data
- Dimensions with Changing Column Values
- Inserts of new data
- Updates of existing data
- Slowly-Changing Dimension Design Solutions
- Type 1 Overwrite the dimension record
- Type 2 Write another dimension record
- Type 3 Add attributes to the dimension record
9Type 1 Overwriting the Dimension Slide
10Type 2 Writing Another Dimension Record
After
Rice Puffs 12 Oz Bag Grocery Dry
Goods Snacks 10-15-1998 ...
731
12 oz.
10 oz.
Adds a new record
11Type 3 Adding Attributes in the Dimension Record
12Verifying the Fact Table Source Data
- Verifying Accuracy of Source Data
- Integrating data from multiple sources
- Applying business rules
- Checking structural requirements
- Managing Invalid Data
- Rejecting invalid data
- Saving invalid data to a log
- Correcting Invalid Data
- Transforming data
- Reassigning data values
13Assigning Foreign Keys
DimensionTables
Source Data
customer_dim
customer id
product id
order date
quantity_sales
amount_sales
201 ALFI Alfreds
ALFI
123
1/1/2000
400
10,789
123
1/1/2000
400
10,789
product_dim
25 123 Chai
Sales Fact Data
cust_key
prod_key
time_key
quantity_sales
amount_sales
123
1/1/2000
400
25
134
400
10,789
201
14Defining Measures
- Loading Measures from the Source System
- Calculating Additional Measures
Source System Data
Fact Table Data
15Maintaining Data Integrity
- Adhering to the Fact Table Grain
- A fact table can only have one grain
- You must load a fact table with data at the same
level of detail as defined by the grain - Enforcing Column Constraints
- NOT NULL constraints
- FOREIGN KEY constraints
16Implementing Staging Tables
- Centralize and Integrate Source Data
- Break Up Complex Data Transformations
- Facilitate Error Recovery
Staging Area
sales_stage
market_stage
inventory_stage
shipments_stage
17DTS Functionality
- Accessing Heterogeneous Data Sources
- Importing, Exporting, and Transforming Data
- Creating Reusable Transformations and Functions
- Automating Data Loads
- Managing Metadata
- Customizing and Extending Functionality
18Defining DTS Packages
- Identifies Data Sources and Destinations
- Defines Tasks or Actions
- Implements Transformation Logic
- Defines Order of Operations
19Identifying Package Components
- Connections Access Data Sources and Destinations
- Tasks Describe Data Transformations or Functions
- Steps Define the Order of Task Operations or
Workflow - Global Variables Store Data that Can Be Shared
Across Tasks
20Creating Packages
- Using the DTS Import / Export Wizard
- Perform ad-hoc table and data transfers
- Develop a prototype package
- Using DTS Package Designer
- Edit packages created with the DTS Import/Export
Wizard - Create packages with a wide range of
functionality - Programming DTS Applications
- Directly access the functionality of the DTS
Object Model - Requires Microsoft Visual Basic or Microsoft
Visual C
21Using DTS to Populate the Sales Star
- Populating the Sales Star Dimensions
- Populating the Sales Star Fact Table
22Populating the Sales Star Dimensions
product_dim
Product Tab Delimited Files
DTS
customer_dim
Northwind OLTP
DTS
time_dim
SQL Server Stored Procedure
DTS
23Populating the Sales Star Fact Table
Sales Data File
sales_stage
DTS
sales_fact
DTS
24Designing Modular Packages
- Creating Modular Packages
- Simplify complex workflows
- Create more readable packages
- Produce smaller packages that are easier to debug
- Using Outer Packages
- Execute multiple packages within a single package
- Combine modular packages into logical workflows
- Reuse modular packages in different workflows
- Execute packages in parallel
25Using DTS to Populate the Sales Star