Title: Data Warehousing Principles
1Data Warehousing Principles
- AAIR Data Warehouse Special Interest Group
- Monday, 17th July 2006
- 400-500pm
- Michael Gibson
- Data Warehouse Manager
- Deakin University
2Data Warehouse principles
- Design, architecture and performance compared
with transactional data, relational data and
other common databases.
3Context
- Apart from deciding on the big question, ie.
whether or not to employ a Data Warehouse and how
it will be used, the importance of some of the
other issues are often over-emphasised. - For example, discussions relating to DW
architecture can often distract from the more
important questions, there are. - More critical predictors of success
- Strategy
- Fit to orgs strategic and operational plans
- Governance / stewardship / data management
- Development methodology (iterative)
- Etc.
- Obtaining strong Executive support
- Expectations management
- Etc.
4Context
- Be sure to focus on the things that are really
important at the appropriate time, and not get
distracted by the less relevant issues. The
items discussed today are important, however - this topic should be seen in context.
- Having said that.
5A Few Key Questions
- Why would you build a Data Warehouse as part of
your BI infrastructure? - when they take a long time and cost a lot of
money - If you do decide to build a Data Warehouse
- Whats the best architecture?
- Here are some reasons why you should build a Data
Warehouse, and some architectures to consider
6Why Would You Build a DW?
- Why dont you just build reporting on top of the
Transaction system? - Has anyone ever tried this before?
- Complex and time consuming to build
- Slow to operate (poor performance)
- Cumbersome to use (Ad hoc reporting is out of
reach of the average user) - A nightmare to maintain
- It can be an almost impossible task, and doesnt
always work!
7Why Would You Build a DW?
- A Data Warehouse lets you do stuff you wouldnt
normally be able to do - Speed
- Ease of use
- Ability to value add
- Consolidate data
- Consistency / single version of the truth
- Manage data quality
8Why Would You Build a DW?
- Speed
- A Data Warehouse is specifically designed and
optimised for reporting - not to support the activities of a transaction
system - OLTP systems are optimised for individual
create, update or deletes - A Data Warehouse delivers many times greater
performance - than a normalised transactional DB
9Why Would You Build a DW?
- Ease of use
- DBs supporting OLTP systems are often extremely
complex - A well architected Data Warehouse is much, much
simpler to navigate
10Why Would You Build a DW?
- Value Add
- A Data Warehouse can provide more
data/information than is available from the OLTP
system by incorporating additional business rules
into the ETL processes or Semantic Layer. - For example, arbitrary classification of students
performed differently to how they are represented
in the source. - A well designed DW can provide much more
information to users - than can be found in the source system
11Why Would You Build a DW?
- Consolidation
- A Data Warehouse can provide data sourced from
many disparate sources - Student management
- Human resources
- Finance
- Space management
- Marketing
- Example
- Student to staff ratios
- A DW can provide a unique perspective on many
aspects of the organisation
12Why Would You Build a DW?
- Consistency / single version of the truth
- A Data Warehouse can provide a commonly defined
view of data and should become the authoritative
source. This helps to minimise the distribution
of conflicting data, and aids in common
understanding. - A DW can deliver well understood and consistent
data
13Why Would You Build a DW?
- Data quality
- Data Quality components can be built into a Data
Warehouse to ensure integrity is higher than the
source. - For example, data validations for DEST
submissions - General principle should be to identify data
issues via the DW, but to correct in the source. - A well designed DW can help ensure data quality
meets - the organisations needs
14What is Architecture?
- Firstly, what is an architecture?
- basically outlines how the DW components fit
together - One always exists, but there are two broad types
- Ad hoc, or
- Planned
- And you should always choose the planned approach
as it is the best way to achieve your goals of a
well functioning, extendable, robust and
maintainable DW.
15Whats The Best DW Architecture?
- Start by looking at the reasons for choosing a DW
architecture - To deliver the items mentioned in the previous
section, in addition to - Robustness
- Maintainability
- Extendable / Scalable (to the enterprise)
16Whats The Best DW Architecture?
- Youll find that there are many that will serve
most purposes, and not all will suit your needs
or circumstances. Aspects that can influence
your chosen architecture - Projected scope? Pilot, Small or large over
time? - Will it consolidate data?
- Will it perform operational reporting?
- Will it support standard and ad hoc reporting?
- How current does the data need to be? Real-time
perhaps? - What are the available resources (funds and
skills)?
17Whats The Best DW Architecture?
- The different types of DW architecture
- The basic types are
- Independent Data Marts
- Dependent Data Marts / Hub and Spoke
- Bus Architecture
- Central Data Warehouse (no dependent data marts)
- Federated
- And more
- With many variances on the themes. Therefore
some of the architectures represented here might
look slightly different to how others represent
them.
18Independent Data Marts
How Data Warehousing was often performed in the
early days Individual projects developing
solutions into functional silos No program /
enterprise perspective No conformed dimensions
19Dependent Data Marts / Hub Spoke
Usually employing a Top-Down approach (Inmon)
An approach also used in the early days, but
refined over time Originally suggested extensive
effort in building the DW Now recommends building
DW incrementally
20Data Mart Bus (conformed)
Usually employing a Bottom-Up approach (Kimball)
Note
An approach also used in the early days, but
refined over time Originally suggested building
silos Now recommends enterprise perspective
21Central Data Warehouse
Usually employing a Hybrid approach
Seeks to overcome the limitations of previous
architectures Highly variable with many
individual approaches
22Federated Data Warehouse
An attempt to consolidate legacy Data Marts
23Development Approach
- Bottom-up (Kimball) (ie. Data Mart Bus)
- Deliver to individual business need via
individual Data Marts - Utilise a Staging Area
- Top-Down (Inmon) (ie. Hub and Spoke)
- Use of a central Enterprise Data Warehouse
- Summarised Data Marts
- Actual approaches have changed over-time (youll
find different descriptions of exactly what these
mean) due to ineffectiveness of some aspects. - They are now very similar in nature (its often
the semantics that differ), as approaches change
and improve over time. - Discussing the differences between the two is
often a distraction
24Development Approach (Cont.)
- Most people employ a Hybrid approach with
elements of Top-Down and Bottom-Up - Again, practitioners dont always concentrate on
these issues and use this terminology, and just
focus on best-practice - That would include
- Build incrementally according to a business
function - Employ an enterprise perspective
- Dimensionally model data
- Utilise conformed dimensional models
- Employ a Staging Area or Data Warehouse
- Store atomic data
25Whats The Best DW Architecture?
- There are similarities between many architectures
and research shows the most popular types are - Dependant Data Marts / Hub and Spoke
- Data Mart Bus
- Centralised Data Warehouse
- All of which are similar in nature, which is why
- Research shows each has a similar success rate!
- Data Warehouse Architectures Factors in the
Selection Decision and the Success of the
Architectures - Hugh J. Watson - University of Georgia
- Thilini Ariyachandra - University of Cincinnati
26Which is Best?
- Does it then matter what Architecture you choose?
- Yes, as long as it meets the minimum requirements
of an effective architecture - Allows you to build incrementally
- Provides scalability ie. for the whole enterprise
- Dimensionally modelled ie. supports Star-Schema
models - Stores atomic data
- Ability to conform dimensions
- Utilises at least 2 tiers (eg. incorporating a
staging area) - Thats why these types satisfy best practice
- Dependant Data Marts / Hub and Spoke
- Data Mart Bus
- Centralised Data Warehouse
27My Two Cents
- It was my choice to implement a Centralised DW
architecture - Why?
- I saw the benefits of a Staging Area
- Easier to conform dimensions
- No need to create separate physical data marts
- Allows us to perform the critical tasks (outlined
in the previous slide) - This architecture suits our requirements which
are not particularly out of the ordinary, but it
may not suit yours.
28Our Staging Area
- What is a staging area?
- A copy of operational data from the transaction
system - Why would you build a staging area?
- Less impact of loads on transaction system
- Snapshot allows repeatable / re-startable DW
ETL processes - Area to store 3rd party data
- Area to store data using different load schedules
to the DW - Employ data manipulation not appropriate for the
DW - Foundation of a future Operational Data Store
(ODS) - Why not?
- Data redundancy
- Additional development time
29Our Data Warehouse
- As second and final tier
- Scalable to the whole enterprise
- Contains dimensionally modelled data ie. star
schemas - More easily enables conformed dimensions
- Stores atomic data
- Can be logically separated in the Semantic layer
(for security) - Easy to create 3rd tier (ie. data marts)
virtually - Disadvantages
- No physical separation of data
- Issues in granting users access to the DB
30Our Data Warehouse (Cont.)
- Dimensional data modelling and the Star Schema
31The Semantic Layer
- A well designed Semantic Layer can further
deliver some of the benefits described earlier - Ease of use
- Ability to value add
- Consolidate data
- Current BI technology is very good at
- implementing business rules.
32The Architecture May Change
- Our Architecture has been designed to accommodate
changes, for example - Physical data marts (3rd physical tier)
- Operational Data Store
33Conclusion
- You should implement a Data Warehouse
- You should use one of the common architectures
34Other Types of Architecture
- Technical
- Hardware / servers
- Software / tools
- Extract Transform Load
- Business Intelligence
- Database
- Modelling
35What Next?
- How Do You Obtain the Required Expertise?
- Buy it if you can afford it
- Hire someone with experience
- Consultants
- Contractors
- If you can help it
- Dont try it yourself
- Dont leave it to those who have never attempted
Data Warehousing before (not even your IT folks)
36Questions