Title: ISQS 6339, Business Intelligence Creating Data Marts
1ISQS 6339, Business IntelligenceCreating Data
Marts
- Zhangxi Lin
- Texas Tech University
2Why need Data Mart?
- Data mart complements the centralized data
warehousing based on UDM model, for the
situations where UDM cannot be used - Legacy databases
- Data are from nondatabase sources
- No physical connection the centralized data
warehouse - Data are not clean
3Data Mart Structures
- Fact tables
- Measures
- Dimension tables
- Dimensions and Hierarchies
- Attributes (or columns)
- Dimensional modeling Stars and Snowflakes
4The Three Fact Table Types
- Transaction fact table
- Periodic snapshot fact table
- Cumulative performance over specific time
intervals valuable to combine data across
several business processes in the value chain. - Accumulating snapshot fact table
- Constantly updated over time.
4
5Measures
- A numeric quantity expressing some of the
organizations performance. The information
represented by this quantity is used to support
or evaluate the decision making and performance
of the organization. - A measure is also called a fact
- The table holding measure information is called
as a fact table
6Attributes
- An additional piece of information pertaining to
a dimension member that is not the unique
identifier or the description of the member. - Attributes can be used to more fully describe
dimension members
7Slowly Changing Dimensions
- The attribute values in a dimension may change
over time, which are critical to understand the
dynamics of the business. The ability to track
the changes of facts over time is critical to a
DW/BI system. - Employees changed their departments
- Home moving (16.8 American moved per year) zip
code changes possible. - The dimensions that have changeable attribute
values slowly changing dimensions (SCDs) - Type 1 SCD overwrites the existing attribute
value with a new value. You dont care about
keeping track of historical values - Type 2 SCD change tracking ETL process creates
a new row in the dimension table to capture the
new values of the changed item - Type 3 SCD Similar to Type 2 SCD but only track
current state and the original state two
additional attribute SCD Start Date, SCD Initial
Value - Ask business users how they will use the
information to determine which SCDs are of type 1
and which are of type 2.
7
8Surrogate Key
- A surrogate key has a unique value assigned to
each row in the dimension. It becomes primary key
of the dimension table and is used to join the
dimension to the associated foreign key field in
the fact table - Benefits of surrogate keys
- Protect the DW/BI system from changes in the
source system - Allow the DW/BI system to integrate data from
multiple source system - Enable developers to add rows to dimensions that
do not exist in the source system - Provide the means for tacking changes in
dimension - Are efficient in the relational database and
analysis services - Surrogate keys are created when doing data
warehousing. They are new from the keys in
original database - They are also called meaningless keys, substitute
keys, non-natural keys, artificial keys - Specifically, surrogate keys are used in slowly
changed dimensions (SCD) management
8
9Aggregate Dimensions
- Situation data at different levels of
granularities - Two resolutions
- Removing a dimension
- Rolling up a dimensions hierarchy and provide a
new, shrunken dimension at the aggregate level
9
10Many-to-many or Multivalued Dimensions
- Relationship between a dimension table and fact
table is called one-to-many one row in the
dimension table may join to many rows in the fact
table. - Many-to-many or Multivalued Dimensions are
referred to as there are more than one row in a
dimension table joining to multiple rows in a
fact table - Bridge table supports many-to-many relationship
- fact-dimension
- dimension-dimension.
-
10
11Hierarchies
- Meaningful, standard ways to group the data
within a dimension - Variable-depth hierarchies
- Frequently changing hierarchies
- Examples of hierarchy in a dimension
- Address street, city, state, country
- Organization section, division, branch, region
- Time year, quarter, month, date
11
12Heterogeneous Products
- Several products with differentiated attributes
- Problem sharing one dimension or use different
dimension? - Resolutions
- One family-oriented dimension with core fact and
product tables plus specific information for each
line of product
12
13Junk Dimensions
- Also called miscellaneous or mystery dimensions
- They are miscellaneous attributes that dont
belong to any existing dimension. - Typically flags or indictors that describe or
categorize the transaction in some way. - Contents are often important
- Four alternatives for dealing with them
- Leave them in the fact table
- Create a separate dimension for each attribute
- Omit them
- Group them into a single junk dimension
13
14Degenerate Dimensions
- No description of its own
- No joining to an actual dimension table
- No attributes
- Example transaction ID
14
15Exercise 1 Walk through data warehousing process
- Learning Objectives
- To gain a general impression how to use SQL
Server 2008 to implement a data mart - Tasks
- Create your database with SSMS, named as
ISQS6339_lastname - Import data from Commrex_2011.xls
- Use SSMS to create a ERD diagram
- Create a SSAS project using BIDS
- Define data source, data source view, and cube
- Deliverable
- One-page printout of the screenshot of the cube
diagram
15
16Maximum Miniatures Manufacturing Designing Data
Mart
- General business needs
- To analyze the statistics available from the
manufacturing automation systems. The VP would
like an interactive analysis tool, rather than
printed reports, for the analysis. - The manufacturing automation system controls all
the machines to create figurines - Filling a mold with the raw material
- Aiding the hardening of this materials
- Removal from the mod when hardening is complete
- Computerized painting of the figurines
- Curing the paint if necessary
17Maximum Miniatures Manufacturing Creating Data
Mart
- Specific Business Needs
- Analyzing the following numbers
- Dollar value of products sold
- Number of products sold
- Sale tax charged on products sold
- Shipping charged on products sold
- These numbers should be viewable by
- Store
- Sales Promotion
- Product
- Day, Month, Quarter, and Year
- Customer
- Sales Person
18Data Requirements
- Number of accepted products by batch by product
by machines by day - Number of rejected products by batch by product
by machines by day - Elapsed time for molding and hardening by product
by machine by day - Elapsed time for painting and curing by curing
type by product by machine by day - Product rolls up into product subtype, which
rolls up into product type - Machine rolls up into machine type, which rolls
up into country - Day rolls up into month, which rolls up into
quarter, which rolls up into year - The information should be able to be filtered by
machine manufacturer and purchase date of the
machine
19Business Need of Sales
- The VP of sales for Max Min, Inc. would like to
analyze sales information. This information is
collected by three OLTP systems the Order
Processing System, the Point of Sale (POS)
system, and the MaxMin.com Online system. - To analyze the following numbers
- Dollar value of products sold
- Number of products sold
- Sales tax charged on product sold
- Shipping charged on product sold
- These number should be viewable by store, sales
promotion, product, time, customer, sales person
20Snowflake Schema of the Data Mart
Manufacturingfact
DimBatch
DimMachine
DimProduct
DimMachineType
DimPlant
DimProductSubType
DimMaterial
DimCountry
DimProductType
21Exercise 2 Creating a data mart with SSMS
- Learning Objectives
- How to design a dimensional model
- How to create a data mart with SSMS
- How to create a cube for a data mart.
- Tasks
- Manually create the fact table and DimProduct
table using SSMS (see the detailed information
from file DW_MMM.PDF in the shared directory
under \Docs) - Import remaining tables from oredb.lin.mmm.empty
- Define the primary keys of tables and the
relationships among them - Create a cube
- The primary key of the fact table is composed of
three foreign keys plus one time dimension key
ProductCode, BatchNumber, MachineNumber, and
DateOfManufacture. - Deliverable
- The printout of the screenshot of the cube
structure and the success of the deployment
22Hints for Deploying the OLAP Cube
- Due to the security restrictions, you need to
- Double click the entry in Data Source
- Type in your eRaider login information in the
Impersonation Information panel - Change the server to OREDB
22
23The screenshot of impersonation information
24The properties of the project
After this step you can proceed to deploy the cube
25Exercise 3 Creating a data mart with SSAS
- Learning Objectives
- Learn an alternative way to create a data mart
- How to deploy a data mart
- Tasks
- Create data mart MaxMinSalesDM with BIDS from a
cube template - Deploy the data mart
- Deliverable
- The printout of the screenshot of the cube
structure and the success of the deployment
26(No Transcript)
27(No Transcript)
28Cube structure
29Data Mart Tables