Title: The Data Warehouse Toolkit
1The Data Warehouse Toolkit
- Chapter 4 Procurement
- ????? ???(Evan Lin)
- MIS Dept., Yuan Ze Univ., Taiwan
- http//evan.scuec.net
- s917725_at_mail.yzu.edu.tw
2Procurement
- Value Chain reinforcement
- Blended versus separate transaction
- Slowly changing dimension techniques both basic
and advance
3What is Procurement
- Procurement is a Critical business activity.
- Procurement involves a wide range of activities
from - negotiating contract to issuing purchase
requisition - Purchase Order(POs) to tracking receipts
- Authorizing payment
4- Observe the Procurement transactions
- Purchase requisition(???)
- Purchase orders
- Shipping notification
- Receipts Just Like Chapter 3
- So we builder the Fact Table like CH3 have
- Date
- Vendor
- Procurement type
- Product
- Contract Term dimension
5Multi- versus Single-Transaction Fact Tables
- Purchase order, shipping notices, warehousing
receipts ,and vendor payment are all viewed as
separate and unique process. - There is no single procurement to source all
procurement transactions.(???????)
6- Control Number for DD(Degenerate Dimension)
- Should we build a blended transaction fact table
with Transaction Type Dimension - As Dimension Modeler we must decision single or
multi dimension fact table
7Follow consideration help to sort thing out about
single or multi Dimension Fact Table(1)
- What are users analytic requirement?
- Reduce complexity to present data in most
effective form - Users analytic need multi-transaction together
or single blended fact table? - Are they really multiple unique process?
- Seems buying product(purchase order) is
distinctly difference from receiving
product(receipts)?Use difference control number
for each step is a clue that we dealing with
separate processes. So we use ?Separate Fact
Table?..
8Follow consideration help to sort thing out about
single or multi Dimension Fact Table(2)
- Are multiple source system involved?
- Our example we dealing with separate source
system purchasing, warehousing ,account payable. - It likely daunting that single-dimension fact
table from three source system - What is the dimensionality of the facts?
- We discover several dimension that applied to
some transaction types but not to others. - It would lead us to separate fact table.
9When we separate to four fact tables
- Because user view these activities as separate
and distinct business process,and the data comes
from difference source system. - Multiple fact tables allow us to provide richer,
more descriptive dimensions and attributes. - As we progress from purchase requisition all the
way to vendor payments, we inherit date dimension
and degenerate dimensions from previous steps. - The single fact table approach would have
required generalization of labeling for some
dimensions. - Purchase order date and receipt date likely would
have been generated to transaction date.
10- We understand that multiple fact tables may
require more time to manage and administer
because there are more tables to load ,index, and
aggregate. - Someone would say increase complexity of data
staging processes, but in fact it may simplify
the staging activities. - Operational data exist in separate source system,
we would need multiple staging processes in
either fact table scenario. - Loading the data into separate fact tables likely
will be less complex than attempting to integrate
data from the multiple sources into single fact
table.
11Complementary Procurement Snapshot
- Separate from the decision regarding procurement
transaction fact tables, we may find that we also
of snapshot fact table to fully address the needs
of the business. - In Ch3 an accumulating snapshot that crosses
processes would be extremely useful if business
is interested in monitoring product movement as
it proceeds through the procurement pipeline - We spend more time in Ch5.
12Slowly Changing Dimensions(1)
- The dimension have been attributes are relatively
static, they are not fixed forever. - Designer must engage business representatives
proactively to help determine the appropriate
change-handing strategy. - Although the representatives did not mention.
- We assuming that change tracking is unnecessary.
- When we need track change its unacceptable
- Put everything into the fact table
- Make every dimension time-dependent to deal with
these changes
13Slowly Changing Dimensions(2)
- We can preserve the independent dimensional
structure with only relatively minor adjustments
to contend with the changes - We call it Slowly Changing Dimension,term them
SCDs. - For each attribute in our dimension tables, we
must specify a strategy to handle change - When an attribute value changes in the
operational world, how will we respond to the
change our dimension model?
14Type 1Overwrite the Value
- Type 1 is overwrite the old attribute value in
the dimension row. - Type 1 response is the simplest approach to
dealing with dimension attribute changes. - The advances is fast and easy.
- Lets see the example.
15Type 1 Example
16Type 1 Problem
- We lose all history attribute changes.
- Were left solely with attribute as they exist
today.. - Type 1 may be appropriate if there is no value in
keeping the old description. - Too often project teams use type 1 as default
response for dealing slowly changing dimension
missing the mark if the business needs to track
historical changes accurately..
17 - Type 1 response is easy to implement , but it
does not maintain any history of prior attribute
values. - When we use type 1 on IntelliKidz ,any preexist
aggregations based on the department value will
need to rebuild.. - How to solve it? Use Type 2..
18Type 2 Add a dimension row
- Type 2 is the predominant technology for
supporting prior history correctly. - Each of the surrogate keys identifies a unique
product attribute profile that was true for a
span of time. - Type 2 responses perfectly partition or segment
history to account for the change - The type 2 response is the primary technique for
accurately tracking slowly changing dimension
attribute. It is extremely powerful because the
new dimension row automatically partition history
in the fact table.
19Type 2 responseExample
- Look form several place..
- Product description ? one IntelliKidz
- Count number of product ? Use SKU
- Correct data ? newest Product Key
20Effective Date and Expiration Date
- Effective date is the first date the descriptive
profile is valid. Expiration date would be one
day less than the row effective date for next
assignment, or date the product was retired from
catalog. - They are necessary in the staging area because
wed need to know which surrogate key is valid
when were loading historical fact records. - If you need use this extra stamp, just remember
that there is no need to constrain on the
effective date in the dimension table in order to
get the right answer.
21Effective Date and Expiration Date ..
- If we had an effective date ,we would not dare
constrain on this date to partition sales because
the date has no relevance.
22Type 2 responseAdvance
- Type 2 response is the workhouse technology to
support analysis using historical accurate
attribute. - Perfectly Segment fact table because prechange
fact row use prechange surrogate key. - We can gracefully track as many dimension changes
as required(not to change exist aggregation)
23Type 2 responseabout data staging
- When we staging dimension tables, we are handed a
complete copy of latest, greatest source data. - It would be more wonderful to find only change
since last extract. Staging application has to
find change dimension. - Filed-by-filed comparison of each dimension row
identify the change would be extremely laborious. - CRC(Cyclic redundancy checksum) algorithm helps
us quickly recognize that a wide messy row change
without looking each of it.
24Type 2 ResponseProblem
- It may be an inappropriate technique for
dimension tables that already exceed a million
rows. - Large row would be discussed on chapter 6..
25Type 3 ResponseAdd a dimension column
- For a few traditional months, there may be a
desire to track history in terms of the new
district names and conversely to track new data
in terms of old district names. - Type 2 not support, but type 3 can.
- Not to issue new dimension row but a new column
to capture the attribute change. - Type 3 is appropriate when theres a strong need
to support two views of the world simultaneously.
- Some designers call this is an alternate reality
26Type 3 ResponseExample
- The prior and current descriptions can be
regarded as true as the same time - Type 3 is used infrequently.
- The type 3 slowly changing dimension technique
allows us to see new and historical fact data by
either the new or prior attribute values
27Type 3 ResponseProblem
- Type 3 is inappropriate if you want to track
impact of numerous intermediate attribute values. - If there is need to track a myriad(??) of
unpredictable changes,type 2 response should be
use instead in most cases.
28Hybrid SCD Technique
- Predictable changes with multi version overlays
- Use frequently to deal with sale organization.
- Its like type 3 but something difference because
we add more than two columns
29Unpredictable change with single-version overlay
30Unpredictable change with single-version overlay..
31More Rapidly changing dimensions
- Maybe the change is fast ..
- We discuss in chapter 6..
32Summary
- Discuss several approaches handling procurement
data - Several techniques to deal with changes to our
dimension table attributes..