615644 Data Warehousing - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

615644 Data Warehousing

Description:

615-644 Week 1 Introduction to DW & BI. 3. Introduction to 615-644. Lecturer: Prof. Graeme Shanks. Availability: by appointment. Location: ICT Building, Room 3.52 ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 34
Provided by: Gra6162
Category:

less

Transcript and Presenter's Notes

Title: 615644 Data Warehousing


1
615-644 Data Warehousing
  • Week 1
  • Introduction to Data Warehousing Business
    Intelligence

2
615-644 Data Warehousing
  • Introduction to 615-644
  • Aims
  • Assessment
  • Lecture schedule
  • Introduction to Data Warehousing
  • Reading Watson (2002)
  • Introduction to Business Intelligence
  • Reading Negash (2004)

3
Introduction to 615-644
  • Lecturer Prof. Graeme Shanks
  • Availability by appointment
  • Location ICT Building, Room 3.52
  • Phone 8344 1577
  • Email gshanks_at_unimelb.edu.au

4
This Subject Aims to
  • Provide a familiarity with data warehousing and
    its relationship to decision-making
  • Provide an understanding of data warehouse design
    implementation, data quality and retrieval and
    analysis of data
  • Provide a familiarity with the role of data
    warehousing in customer relationship management
    systems.

5
Expectations of the Student
  • Attend one 3 hour session each week for lectures,
    tutorials and workshops
  • Read relevant material before lectures
  • Devote approximately 12 hours to the unit each
    week (3hrs in class, 9 hrs of reading,
    preparation and assignment work).

6
Assessment
  • Case Study Assignment 30
  • Data Warehousing Review Paper 30
  • Final examination 40

All work must be completed independently
Penalties apply to late submission
7
References
  • Prescribed
  • Shanks, G. (ed.) (2007) 615-644 Data Warehousing
    Selected Readings, Department of Information
    systems, The University of Melbourne.

Note the referencing style
8
Lecture Schedule
  • Introduction to data warehousing and business
    intelligence (1)
  • Definitions, purpose, terminology and history
  • Data Warehousing and decision-making
  • Data warehouse design (2-4)
  • Dimensional modelling
  • Data quality (5-6)
  • Understanding data quality
  • Measuring and improving data quality

9
Lecture Schedule
  • Identity and Privacy (7)
  • Identity management
  • Data Privacy issues
  • OLAP and Business Intelligence (8)
  • OLAP concepts
  • OLAP technology and use
  • Knowledge discovery and data mining (9-10)
  • Data mining techniques
  • Data mining process

10
Lecture Schedule
  • Customer Relationship Management (CRM) Systems
    (11)
  • CRM concepts
  • CRM benefits
  • Data Warehousing Case Study (12)

11
Introduction to Data Warehousing
  • A data warehouse is a collection of data designed
    to support decision-making applications
  • Data warehousing is the entire process of data
    access, transformation, and loading of data to
    the warehouse, and the access of the data by end
    users and applications

Watson (2001)
12
Data Warehousing Issues - Watson (2001)
  • Concepts and architecture
  • Data warehouse development
  • Data extraction, quality and loading
  • Metadata
  • Data storage
  • Relational and multidimensional
  • Data access
  • Data mining
  • OLAP
  • Enterprise portals

13
Data Warehouses and Transaction Processing Systems
14
Motivations for Data Warehousing
  • Demands on OLTP databases for query processing
    would be too great
  • Data warehouse is designed for efficient
    retrieval
  • Data in legacy systems is frequently redundant,
    inconsistent, of poor quality, and stored in
    different formats
  • Reduce costs in providing data for decision makers

15
Data Warehouse Characteristics
  • ... a subject-oriented, integrated,
    time-variant, validated and non-volatile
    collection of data in support of managements
    decisions Inmon and Hackathorn (1994)

16
Data WarehouseSubject-oriented
  • The data warehouse database is organised by data
    subjects that are relevant to the organisation
  • for example Sales, Claims, Shipments
  • This may be contrasted with the process
    orientation of many transaction processing systems

17
Data WarehouseIntegrated
  • Data in the data warehouse is structured based on
    a corporate-wide model, spanning the functional
    boundaries of legacy systems
  • This includes naming standards, units of
    measurement, and periodicity

18
Data WarehouseTime-variant
  • Data in the data warehouse is characterised by
    the time-series nature of historical data
  • The data consists of a series of snapshots
    which are time stamped and record values at a
    moment in time
  • This supports trend analysis of the data

19
Data WarehouseValidated
  • Data from various sources are validated before
    storing them in a data warehouse.
  • Data quality is crucial to the credibility of the
    warehouse

20
Data WarehouseNon-volatile
  • The data warehouse is not continuously updated
    (inserts, deletes, changes) like data in an OLTP
    system
  • Data in a data warehouse is periodically uploaded
    at scheduled time intervals (say daily)

21
Data Mart
  • A data mart stores data for a limited number of
    subject areas, such as marketing and sales data.
    It is used to support specific applications
  • An independent data mart is created directly from
    source systems
  • A dependent data mart is populated from a data
    warehouse

Watson (2001)
22
Operational Data store
  • An operational data store consolidates data from
    a number of source systems and provides a near
    real time, integrated view of volatile current
    data
  • Its purpose is to provide integrated data for
    operational purposes. It has add, change and
    delete functionality
  • It may be created to avoid a full-blown ERP
    implementation

Watson (2001)
23
Data Warehouse Architecture
Watson (2001)
24
Data Warehousing Process
  • Enterprise-wide warehouse, top down, the Inmon
    methodology
  • Data mart, bottom up, the Kimball methodology
  • When properly executed, both result in an
    enterprise-wide data warehouse

Watson (2001)
25
The Data Mart Strategy
  • The most common approach
  • Begins with a single mart and architected marts
    are added over time for more subject areas
  • Relatively inexpensive and easy to implement
  • Can be used as a proof of concept for data
    warehousing
  • Can perpetuate the silos of information problem
  • Requires an overall integration plan

26
The Enterprise-wide Strategy
  • A comprehensive warehouse is built initially
  • An initial dependent data mart is built using a
    subset of the data in the warehouse
  • Additional data marts are built using subsets of
    the data in the warehouse
  • Like all complex projects, it is expensive, time
    consuming, and prone to failure
  • When successful, it results in an integrated,
    scalable warehouse

27
Data Analysis Tools
  • Basic Query and Reporting
  • Basic question What happened?
  • Historical focus, limited flexibility
  • Data source OLTP database, ODS, data mart, data
    warehouse
  • On-Line Analytical Processing (OLAP)
  • Basic question What happened and why?
  • Historical focus, multi-dimensional - look at
    data from many points of view, with medium
    flexibility (slice dice, drill down)
  • Data Source Data Mart, Data Warehouse

28
Data Analysis Tools
  • Data Mining
  • Basic questions What is interesting? What might
    happen?
  • Future focus, high flexibility
  • Extract relationships, patterns and trends,
    predict future trends.
  • Data source Data Warehouse

29
OLAP Multi-dimensional View of Data
30
OLAP Example
31
OLAP Slice and Dice Example
32
OLAP Drill Down Example
33
Time for a Break!
Write a Comment
User Comments (0)
About PowerShow.com