Populating Data Warehouse Structures - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Populating Data Warehouse Structures

Description:

Bag. Grocery. Dry Goods. Snacks. 12 oz. Type 2: Writing ... Bag. Grocery. Dry Goods. Snacks. 10-15-1998. 731. Type 3: Adding Attributes in the Dimension Record ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 26
Provided by: v1gr
Category:

less

Transcript and Presenter's Notes

Title: Populating Data Warehouse Structures


1
Populating Data Warehouse Structures
2
Examining the Star Schema
Sales Star Schema
Fact Table
Dimension Tables
Dimension Table
3
Implementing the Star Schema
  • 1. Extract Data From Multiple Sources
  • 2. Integrate, Transform, and Restructure Data
  • 3. Load Data Into Dimension Tables and Fact Tables

4
The Star Schema Data Load
Polaris Data Warehouse
Heterogeneous Data Sources
Northwind OLTP
Staging Area
Sales Star
DTS
DTS
Financial
Inventory Star
Extracting Data From Transforming
Loading the Heterogeneous Sources Data
Star Schema
DTS
DTS
5
Verifying the Dimension Source Data
  • Verifying Accuracy of Source Data
  • Integrating data from multiple sources
  • Applying business rules
  • Checking structural requirements
  • Managing Invalid Data
  • Rejecting invalid data
  • Saving invalid data to a log
  • Correcting Invalid Data
  • Transforming data
  • Reassigning data values

6
Dimension Data Load Examples

DTS
DTS
DTS
7
Maintaining Integrity of the Dimension
  • Assigning a Surrogate Key to Each Record
  • Defines the dimensions primary key
  • Relates to the foreign key fields of the fact
    table
  • Loading One Record Per Application Key
  • Maintains uniqueness in the dimension
  • Depends on how you manage changing dimension data
  • Maintains integrity of the fact table

8
Managing Changing Dimension Data
  • Dimensions with Changing Column Values
  • Inserts of new data
  • Updates of existing data
  • Slowly-Changing Dimension Design Solutions
  • Type 1 Overwrite the dimension record
  • Type 2 Write another dimension record
  • Type 3 Add attributes to the dimension record

9
Type 1 Overwriting the Dimension Slide
10
Type 2 Writing Another Dimension Record
After
Rice Puffs 12 Oz Bag Grocery Dry
Goods Snacks 10-15-1998 ...
731
12 oz.
10 oz.
Adds a new record
11
Type 3 Adding Attributes in the Dimension Record
12
Verifying the Fact Table Source Data
  • Verifying Accuracy of Source Data
  • Integrating data from multiple sources
  • Applying business rules
  • Checking structural requirements
  • Managing Invalid Data
  • Rejecting invalid data
  • Saving invalid data to a log
  • Correcting Invalid Data
  • Transforming data
  • Reassigning data values

13
Assigning Foreign Keys
DimensionTables
Source Data
customer_dim
customer id
product id
order date
quantity_sales
amount_sales
201 ALFI Alfreds


ALFI
123
1/1/2000
400
10,789
123
1/1/2000
400
10,789
product_dim
25 123 Chai
Sales Fact Data


cust_key
prod_key
time_key
quantity_sales
amount_sales
123
1/1/2000
400
25
134
400
10,789
201
14
Defining Measures
  • Loading Measures from the Source System
  • Calculating Additional Measures

Source System Data
Fact Table Data
15
Maintaining Data Integrity
  • Adhering to the Fact Table Grain
  • A fact table can only have one grain
  • You must load a fact table with data at the same
    level of detail as defined by the grain
  • Enforcing Column Constraints
  • NOT NULL constraints
  • FOREIGN KEY constraints

16
Implementing Staging Tables
  • Centralize and Integrate Source Data
  • Break Up Complex Data Transformations
  • Facilitate Error Recovery

Staging Area
sales_stage
market_stage
inventory_stage
shipments_stage
17
DTS Functionality
  • Accessing Heterogeneous Data Sources
  • Importing, Exporting, and Transforming Data
  • Creating Reusable Transformations and Functions
  • Automating Data Loads
  • Managing Metadata
  • Customizing and Extending Functionality

18
Defining DTS Packages
  • Identifies Data Sources and Destinations
  • Defines Tasks or Actions
  • Implements Transformation Logic
  • Defines Order of Operations

19
Identifying Package Components
  • Connections Access Data Sources and Destinations
  • Tasks Describe Data Transformations or Functions
  • Steps Define the Order of Task Operations or
    Workflow
  • Global Variables Store Data that Can Be Shared
    Across Tasks

20
Creating Packages
  • Using the DTS Import / Export Wizard
  • Perform ad-hoc table and data transfers
  • Develop a prototype package
  • Using DTS Package Designer
  • Edit packages created with the DTS Import/Export
    Wizard
  • Create packages with a wide range of
    functionality
  • Programming DTS Applications
  • Directly access the functionality of the DTS
    Object Model
  • Requires Microsoft Visual Basic or Microsoft
    Visual C

21
Using DTS to Populate the Sales Star
  • Populating the Sales Star Dimensions
  • Populating the Sales Star Fact Table

22
Populating the Sales Star Dimensions
product_dim
Product Tab Delimited Files
DTS
customer_dim
Northwind OLTP
DTS
time_dim
SQL Server Stored Procedure
DTS
23
Populating the Sales Star Fact Table
Sales Data File
sales_stage
DTS
sales_fact
DTS
24
Designing Modular Packages
  • Creating Modular Packages
  • Simplify complex workflows
  • Create more readable packages
  • Produce smaller packages that are easier to debug
  • Using Outer Packages
  • Execute multiple packages within a single package
  • Combine modular packages into logical workflows
  • Reuse modular packages in different workflows
  • Execute packages in parallel

25
Using DTS to Populate the Sales Star
Write a Comment
User Comments (0)
About PowerShow.com