The Data Warehouse Toolkit - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

The Data Warehouse Toolkit

Description:

... dimension have been attributes are relatively static, they are not fixed forever. ... If you need use this extra stamp, just remember that there is no need to ... – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 33
Provided by: Eva961
Category:

less

Transcript and Presenter's Notes

Title: The Data Warehouse Toolkit


1
The Data Warehouse Toolkit
  • Chapter 4 Procurement
  • ????? ???(Evan Lin)
  • MIS Dept., Yuan Ze Univ., Taiwan
  • http//evan.scuec.net
  • s917725_at_mail.yzu.edu.tw

2
Procurement
  • Value Chain reinforcement
  • Blended versus separate transaction
  • Slowly changing dimension techniques both basic
    and advance

3
What is Procurement
  • Procurement is a Critical business activity.
  • Procurement involves a wide range of activities
    from
  • negotiating contract to issuing purchase
    requisition
  • Purchase Order(POs) to tracking receipts
  • Authorizing payment

4
  • Observe the Procurement transactions
  • Purchase requisition(???)
  • Purchase orders
  • Shipping notification
  • Receipts Just Like Chapter 3
  • So we builder the Fact Table like CH3 have
  • Date
  • Vendor
  • Procurement type
  • Product
  • Contract Term dimension

5
Multi- versus Single-Transaction Fact Tables
  • Purchase order, shipping notices, warehousing
    receipts ,and vendor payment are all viewed as
    separate and unique process.
  • There is no single procurement to source all
    procurement transactions.(???????)

6
  • Control Number for DD(Degenerate Dimension)
  • Should we build a blended transaction fact table
    with Transaction Type Dimension
  • As Dimension Modeler we must decision single or
    multi dimension fact table

7
Follow consideration help to sort thing out about
single or multi Dimension Fact Table(1)
  • What are users analytic requirement?
  • Reduce complexity to present data in most
    effective form
  • Users analytic need multi-transaction together
    or single blended fact table?
  • Are they really multiple unique process?
  • Seems buying product(purchase order) is
    distinctly difference from receiving
    product(receipts)?Use difference control number
    for each step is a clue that we dealing with
    separate processes. So we use ?Separate Fact
    Table?..

8
Follow consideration help to sort thing out about
single or multi Dimension Fact Table(2)
  • Are multiple source system involved?
  • Our example we dealing with separate source
    system purchasing, warehousing ,account payable.
  • It likely daunting that single-dimension fact
    table from three source system
  • What is the dimensionality of the facts?
  • We discover several dimension that applied to
    some transaction types but not to others.
  • It would lead us to separate fact table.

9
When we separate to four fact tables
  • Because user view these activities as separate
    and distinct business process,and the data comes
    from difference source system.
  • Multiple fact tables allow us to provide richer,
    more descriptive dimensions and attributes.
  • As we progress from purchase requisition all the
    way to vendor payments, we inherit date dimension
    and degenerate dimensions from previous steps.
  • The single fact table approach would have
    required generalization of labeling for some
    dimensions.
  • Purchase order date and receipt date likely would
    have been generated to transaction date.

10
  • We understand that multiple fact tables may
    require more time to manage and administer
    because there are more tables to load ,index, and
    aggregate.
  • Someone would say increase complexity of data
    staging processes, but in fact it may simplify
    the staging activities.
  • Operational data exist in separate source system,
    we would need multiple staging processes in
    either fact table scenario.
  • Loading the data into separate fact tables likely
    will be less complex than attempting to integrate
    data from the multiple sources into single fact
    table.

11
Complementary Procurement Snapshot
  • Separate from the decision regarding procurement
    transaction fact tables, we may find that we also
    of snapshot fact table to fully address the needs
    of the business.
  • In Ch3 an accumulating snapshot that crosses
    processes would be extremely useful if business
    is interested in monitoring product movement as
    it proceeds through the procurement pipeline
  • We spend more time in Ch5.

12
Slowly Changing Dimensions(1)
  • The dimension have been attributes are relatively
    static, they are not fixed forever.
  • Designer must engage business representatives
    proactively to help determine the appropriate
    change-handing strategy.
  • Although the representatives did not mention.
  • We assuming that change tracking is unnecessary.
  • When we need track change its unacceptable
  • Put everything into the fact table
  • Make every dimension time-dependent to deal with
    these changes

13
Slowly Changing Dimensions(2)
  • We can preserve the independent dimensional
    structure with only relatively minor adjustments
    to contend with the changes
  • We call it Slowly Changing Dimension,term them
    SCDs.
  • For each attribute in our dimension tables, we
    must specify a strategy to handle change
  • When an attribute value changes in the
    operational world, how will we respond to the
    change our dimension model?

14
Type 1Overwrite the Value
  • Type 1 is overwrite the old attribute value in
    the dimension row.
  • Type 1 response is the simplest approach to
    dealing with dimension attribute changes.
  • The advances is fast and easy.
  • Lets see the example.

15
Type 1 Example
16
Type 1 Problem
  • We lose all history attribute changes.
  • Were left solely with attribute as they exist
    today..
  • Type 1 may be appropriate if there is no value in
    keeping the old description.
  • Too often project teams use type 1 as default
    response for dealing slowly changing dimension
    missing the mark if the business needs to track
    historical changes accurately..

17
  • Type 1 response is easy to implement , but it
    does not maintain any history of prior attribute
    values.
  • When we use type 1 on IntelliKidz ,any preexist
    aggregations based on the department value will
    need to rebuild..
  • How to solve it? Use Type 2..

18
Type 2 Add a dimension row
  • Type 2 is the predominant technology for
    supporting prior history correctly.
  • Each of the surrogate keys identifies a unique
    product attribute profile that was true for a
    span of time.
  • Type 2 responses perfectly partition or segment
    history to account for the change
  • The type 2 response is the primary technique for
    accurately tracking slowly changing dimension
    attribute. It is extremely powerful because the
    new dimension row automatically partition history
    in the fact table.

19
Type 2 responseExample
  • Look form several place..
  • Product description ? one IntelliKidz
  • Count number of product ? Use SKU
  • Correct data ? newest Product Key

20
Effective Date and Expiration Date
  • Effective date is the first date the descriptive
    profile is valid. Expiration date would be one
    day less than the row effective date for next
    assignment, or date the product was retired from
    catalog.
  • They are necessary in the staging area because
    wed need to know which surrogate key is valid
    when were loading historical fact records.
  • If you need use this extra stamp, just remember
    that there is no need to constrain on the
    effective date in the dimension table in order to
    get the right answer.

21
Effective Date and Expiration Date ..
  • If we had an effective date ,we would not dare
    constrain on this date to partition sales because
    the date has no relevance.

22
Type 2 responseAdvance
  • Type 2 response is the workhouse technology to
    support analysis using historical accurate
    attribute.
  • Perfectly Segment fact table because prechange
    fact row use prechange surrogate key.
  • We can gracefully track as many dimension changes
    as required(not to change exist aggregation)

23
Type 2 responseabout data staging
  • When we staging dimension tables, we are handed a
    complete copy of latest, greatest source data.
  • It would be more wonderful to find only change
    since last extract. Staging application has to
    find change dimension.
  • Filed-by-filed comparison of each dimension row
    identify the change would be extremely laborious.
  • CRC(Cyclic redundancy checksum) algorithm helps
    us quickly recognize that a wide messy row change
    without looking each of it.

24
Type 2 ResponseProblem
  • It may be an inappropriate technique for
    dimension tables that already exceed a million
    rows.
  • Large row would be discussed on chapter 6..

25
Type 3 ResponseAdd a dimension column
  • For a few traditional months, there may be a
    desire to track history in terms of the new
    district names and conversely to track new data
    in terms of old district names.
  • Type 2 not support, but type 3 can.
  • Not to issue new dimension row but a new column
    to capture the attribute change.
  • Type 3 is appropriate when theres a strong need
    to support two views of the world simultaneously.
  • Some designers call this is an alternate reality

26
Type 3 ResponseExample
  • The prior and current descriptions can be
    regarded as true as the same time
  • Type 3 is used infrequently.
  • The type 3 slowly changing dimension technique
    allows us to see new and historical fact data by
    either the new or prior attribute values

27
Type 3 ResponseProblem
  • Type 3 is inappropriate if you want to track
    impact of numerous intermediate attribute values.
  • If there is need to track a myriad(??) of
    unpredictable changes,type 2 response should be
    use instead in most cases.

28
Hybrid SCD Technique
  • Predictable changes with multi version overlays
  • Use frequently to deal with sale organization.
  • Its like type 3 but something difference because
    we add more than two columns

29
Unpredictable change with single-version overlay
30
Unpredictable change with single-version overlay..
31
More Rapidly changing dimensions
  • Maybe the change is fast ..
  • We discuss in chapter 6..

32
Summary
  • Discuss several approaches handling procurement
    data
  • Several techniques to deal with changes to our
    dimension table attributes..
Write a Comment
User Comments (0)
About PowerShow.com