Data Warehousing Principles - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Data Warehousing Principles

Description:

Design, architecture and performance compared with transactional data, ... Apart from deciding on the big question, ie. whether or not ... Summarised Data Marts ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 37
Provided by: mgib1
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing Principles


1
Data Warehousing Principles
  • AAIR Data Warehouse Special Interest Group
  • Monday, 17th July 2006
  • 400-500pm
  • Michael Gibson
  • Data Warehouse Manager
  • Deakin University

2
Data Warehouse principles
  • Design, architecture and performance compared
    with transactional data, relational data and
    other common databases.

3
Context
  • Apart from deciding on the big question, ie.
    whether or not to employ a Data Warehouse and how
    it will be used, the importance of some of the
    other issues are often over-emphasised.
  • For example, discussions relating to DW
    architecture can often distract from the more
    important questions, there are.
  • More critical predictors of success
  • Strategy
  • Fit to orgs strategic and operational plans
  • Governance / stewardship / data management
  • Development methodology (iterative)
  • Etc.
  • Obtaining strong Executive support
  • Expectations management
  • Etc.

4
Context
  • Be sure to focus on the things that are really
    important at the appropriate time, and not get
    distracted by the less relevant issues. The
    items discussed today are important, however
  • this topic should be seen in context.
  • Having said that.

5
A Few Key Questions
  • Why would you build a Data Warehouse as part of
    your BI infrastructure?
  • when they take a long time and cost a lot of
    money
  • If you do decide to build a Data Warehouse
  • Whats the best architecture?
  • Here are some reasons why you should build a Data
    Warehouse, and some architectures to consider

6
Why Would You Build a DW?
  • Why dont you just build reporting on top of the
    Transaction system?
  • Has anyone ever tried this before?
  • Complex and time consuming to build
  • Slow to operate (poor performance)
  • Cumbersome to use (Ad hoc reporting is out of
    reach of the average user)
  • A nightmare to maintain
  • It can be an almost impossible task, and doesnt
    always work!

7
Why Would You Build a DW?
  • A Data Warehouse lets you do stuff you wouldnt
    normally be able to do
  • Speed
  • Ease of use
  • Ability to value add
  • Consolidate data
  • Consistency / single version of the truth
  • Manage data quality

8
Why Would You Build a DW?
  • Speed
  • A Data Warehouse is specifically designed and
    optimised for reporting
  • not to support the activities of a transaction
    system
  • OLTP systems are optimised for individual
    create, update or deletes
  • A Data Warehouse delivers many times greater
    performance
  • than a normalised transactional DB

9
Why Would You Build a DW?
  • Ease of use
  • DBs supporting OLTP systems are often extremely
    complex
  • A well architected Data Warehouse is much, much
    simpler to navigate

10
Why Would You Build a DW?
  • Value Add
  • A Data Warehouse can provide more
    data/information than is available from the OLTP
    system by incorporating additional business rules
    into the ETL processes or Semantic Layer.
  • For example, arbitrary classification of students
    performed differently to how they are represented
    in the source.
  • A well designed DW can provide much more
    information to users
  • than can be found in the source system

11
Why Would You Build a DW?
  • Consolidation
  • A Data Warehouse can provide data sourced from
    many disparate sources
  • Student management
  • Human resources
  • Finance
  • Space management
  • Marketing
  • Example
  • Student to staff ratios
  • A DW can provide a unique perspective on many
    aspects of the organisation

12
Why Would You Build a DW?
  • Consistency / single version of the truth
  • A Data Warehouse can provide a commonly defined
    view of data and should become the authoritative
    source. This helps to minimise the distribution
    of conflicting data, and aids in common
    understanding.
  • A DW can deliver well understood and consistent
    data

13
Why Would You Build a DW?
  • Data quality
  • Data Quality components can be built into a Data
    Warehouse to ensure integrity is higher than the
    source.
  • For example, data validations for DEST
    submissions
  • General principle should be to identify data
    issues via the DW, but to correct in the source.
  • A well designed DW can help ensure data quality
    meets
  • the organisations needs

14
What is Architecture?
  • Firstly, what is an architecture?
  • basically outlines how the DW components fit
    together
  • One always exists, but there are two broad types
  • Ad hoc, or
  • Planned
  • And you should always choose the planned approach
    as it is the best way to achieve your goals of a
    well functioning, extendable, robust and
    maintainable DW.

15
Whats The Best DW Architecture?
  • Start by looking at the reasons for choosing a DW
    architecture
  • To deliver the items mentioned in the previous
    section, in addition to
  • Robustness
  • Maintainability
  • Extendable / Scalable (to the enterprise)

16
Whats The Best DW Architecture?
  • Youll find that there are many that will serve
    most purposes, and not all will suit your needs
    or circumstances. Aspects that can influence
    your chosen architecture
  • Projected scope? Pilot, Small or large over
    time?
  • Will it consolidate data?
  • Will it perform operational reporting?
  • Will it support standard and ad hoc reporting?
  • How current does the data need to be? Real-time
    perhaps?
  • What are the available resources (funds and
    skills)?

17
Whats The Best DW Architecture?
  • The different types of DW architecture
  • The basic types are
  • Independent Data Marts
  • Dependent Data Marts / Hub and Spoke
  • Bus Architecture
  • Central Data Warehouse (no dependent data marts)
  • Federated
  • And more
  • With many variances on the themes. Therefore
    some of the architectures represented here might
    look slightly different to how others represent
    them.

18
Independent Data Marts
How Data Warehousing was often performed in the
early days Individual projects developing
solutions into functional silos No program /
enterprise perspective No conformed dimensions
19
Dependent Data Marts / Hub Spoke
Usually employing a Top-Down approach (Inmon)
An approach also used in the early days, but
refined over time Originally suggested extensive
effort in building the DW Now recommends building
DW incrementally
20
Data Mart Bus (conformed)
Usually employing a Bottom-Up approach (Kimball)
Note
An approach also used in the early days, but
refined over time Originally suggested building
silos Now recommends enterprise perspective
21
Central Data Warehouse
Usually employing a Hybrid approach
Seeks to overcome the limitations of previous
architectures Highly variable with many
individual approaches
22
Federated Data Warehouse
An attempt to consolidate legacy Data Marts
23
Development Approach
  • Bottom-up (Kimball) (ie. Data Mart Bus)
  • Deliver to individual business need via
    individual Data Marts
  • Utilise a Staging Area
  • Top-Down (Inmon) (ie. Hub and Spoke)
  • Use of a central Enterprise Data Warehouse
  • Summarised Data Marts
  • Actual approaches have changed over-time (youll
    find different descriptions of exactly what these
    mean) due to ineffectiveness of some aspects.
  • They are now very similar in nature (its often
    the semantics that differ), as approaches change
    and improve over time.
  • Discussing the differences between the two is
    often a distraction

24
Development Approach (Cont.)
  • Most people employ a Hybrid approach with
    elements of Top-Down and Bottom-Up
  • Again, practitioners dont always concentrate on
    these issues and use this terminology, and just
    focus on best-practice
  • That would include
  • Build incrementally according to a business
    function
  • Employ an enterprise perspective
  • Dimensionally model data
  • Utilise conformed dimensional models
  • Employ a Staging Area or Data Warehouse
  • Store atomic data

25
Whats The Best DW Architecture?
  • There are similarities between many architectures
    and research shows the most popular types are
  • Dependant Data Marts / Hub and Spoke
  • Data Mart Bus
  • Centralised Data Warehouse
  • All of which are similar in nature, which is why
  • Research shows each has a similar success rate!
  • Data Warehouse Architectures Factors in the
    Selection Decision and the Success of the
    Architectures
  • Hugh J. Watson - University of Georgia
  • Thilini Ariyachandra - University of Cincinnati

26
Which is Best?
  • Does it then matter what Architecture you choose?
  • Yes, as long as it meets the minimum requirements
    of an effective architecture
  • Allows you to build incrementally
  • Provides scalability ie. for the whole enterprise
  • Dimensionally modelled ie. supports Star-Schema
    models
  • Stores atomic data
  • Ability to conform dimensions
  • Utilises at least 2 tiers (eg. incorporating a
    staging area)
  • Thats why these types satisfy best practice
  • Dependant Data Marts / Hub and Spoke
  • Data Mart Bus
  • Centralised Data Warehouse

27
My Two Cents
  • It was my choice to implement a Centralised DW
    architecture
  • Why?
  • I saw the benefits of a Staging Area
  • Easier to conform dimensions
  • No need to create separate physical data marts
  • Allows us to perform the critical tasks (outlined
    in the previous slide)
  • This architecture suits our requirements which
    are not particularly out of the ordinary, but it
    may not suit yours.

28
Our Staging Area
  • What is a staging area?
  • A copy of operational data from the transaction
    system
  • Why would you build a staging area?
  • Less impact of loads on transaction system
  • Snapshot allows repeatable / re-startable DW
    ETL processes
  • Area to store 3rd party data
  • Area to store data using different load schedules
    to the DW
  • Employ data manipulation not appropriate for the
    DW
  • Foundation of a future Operational Data Store
    (ODS)
  • Why not?
  • Data redundancy
  • Additional development time

29
Our Data Warehouse
  • As second and final tier
  • Scalable to the whole enterprise
  • Contains dimensionally modelled data ie. star
    schemas
  • More easily enables conformed dimensions
  • Stores atomic data
  • Can be logically separated in the Semantic layer
    (for security)
  • Easy to create 3rd tier (ie. data marts)
    virtually
  • Disadvantages
  • No physical separation of data
  • Issues in granting users access to the DB

30
Our Data Warehouse (Cont.)
  • Dimensional data modelling and the Star Schema

31
The Semantic Layer
  • A well designed Semantic Layer can further
    deliver some of the benefits described earlier
  • Ease of use
  • Ability to value add
  • Consolidate data
  • Current BI technology is very good at
  • implementing business rules.

32
The Architecture May Change
  • Our Architecture has been designed to accommodate
    changes, for example
  • Physical data marts (3rd physical tier)
  • Operational Data Store

33
Conclusion
  • You should implement a Data Warehouse
  • You should use one of the common architectures

34
Other Types of Architecture
  • Technical
  • Hardware / servers
  • Software / tools
  • Extract Transform Load
  • Business Intelligence
  • Database
  • Modelling

35
What Next?
  • How Do You Obtain the Required Expertise?
  • Buy it if you can afford it
  • Hire someone with experience
  • Consultants
  • Contractors
  • If you can help it
  • Dont try it yourself
  • Dont leave it to those who have never attempted
    Data Warehousing before (not even your IT folks)

36
Questions
  • ?
Write a Comment
User Comments (0)
About PowerShow.com