Title: 615644 Data Warehousing
1615-644 Data Warehousing
- Week 1
- Introduction to Data Warehousing Business
Intelligence
2615-644 Data Warehousing
- Introduction to 615-644
- Aims
- Assessment
- Lecture schedule
- Introduction to Data Warehousing
- Reading Watson (2002)
- Introduction to Business Intelligence
- Reading Negash (2004)
3Introduction to 615-644
- Lecturer Prof. Graeme Shanks
- Availability by appointment
- Location ICT Building, Room 3.52
- Phone 8344 1577
- Email gshanks_at_unimelb.edu.au
4This Subject Aims to
- Provide a familiarity with data warehousing and
its relationship to decision-making - Provide an understanding of data warehouse design
implementation, data quality and retrieval and
analysis of data - Provide a familiarity with the role of data
warehousing in customer relationship management
systems.
5Expectations of the Student
- Attend one 3 hour session each week for lectures,
tutorials and workshops - Read relevant material before lectures
- Devote approximately 12 hours to the unit each
week (3hrs in class, 9 hrs of reading,
preparation and assignment work).
6Assessment
- Case Study Assignment 30
- Data Warehousing Review Paper 30
- Final examination 40
All work must be completed independently
Penalties apply to late submission
7References
- Prescribed
- Shanks, G. (ed.) (2007) 615-644 Data Warehousing
Selected Readings, Department of Information
systems, The University of Melbourne.
Note the referencing style
8Lecture Schedule
- Introduction to data warehousing and business
intelligence (1) - Definitions, purpose, terminology and history
- Data Warehousing and decision-making
- Data warehouse design (2-4)
- Dimensional modelling
- Data quality (5-6)
- Understanding data quality
- Measuring and improving data quality
9Lecture Schedule
- Identity and Privacy (7)
- Identity management
- Data Privacy issues
- OLAP and Business Intelligence (8)
- OLAP concepts
- OLAP technology and use
- Knowledge discovery and data mining (9-10)
- Data mining techniques
- Data mining process
10 Lecture Schedule
- Customer Relationship Management (CRM) Systems
(11) - CRM concepts
- CRM benefits
- Data Warehousing Case Study (12)
11Introduction to Data Warehousing
- A data warehouse is a collection of data designed
to support decision-making applications - Data warehousing is the entire process of data
access, transformation, and loading of data to
the warehouse, and the access of the data by end
users and applications
Watson (2001)
12Data Warehousing Issues - Watson (2001)
- Concepts and architecture
- Data warehouse development
- Data extraction, quality and loading
- Metadata
- Data storage
- Relational and multidimensional
- Data access
- Data mining
- OLAP
- Enterprise portals
13Data Warehouses and Transaction Processing Systems
14Motivations for Data Warehousing
- Demands on OLTP databases for query processing
would be too great - Data warehouse is designed for efficient
retrieval - Data in legacy systems is frequently redundant,
inconsistent, of poor quality, and stored in
different formats - Reduce costs in providing data for decision makers
15Data Warehouse Characteristics
- ... a subject-oriented, integrated,
time-variant, validated and non-volatile
collection of data in support of managements
decisions Inmon and Hackathorn (1994)
16Data WarehouseSubject-oriented
- The data warehouse database is organised by data
subjects that are relevant to the organisation - for example Sales, Claims, Shipments
- This may be contrasted with the process
orientation of many transaction processing systems
17Data WarehouseIntegrated
- Data in the data warehouse is structured based on
a corporate-wide model, spanning the functional
boundaries of legacy systems - This includes naming standards, units of
measurement, and periodicity
18Data WarehouseTime-variant
- Data in the data warehouse is characterised by
the time-series nature of historical data - The data consists of a series of snapshots
which are time stamped and record values at a
moment in time - This supports trend analysis of the data
19Data WarehouseValidated
- Data from various sources are validated before
storing them in a data warehouse. - Data quality is crucial to the credibility of the
warehouse
20Data WarehouseNon-volatile
- The data warehouse is not continuously updated
(inserts, deletes, changes) like data in an OLTP
system - Data in a data warehouse is periodically uploaded
at scheduled time intervals (say daily)
21Data Mart
- A data mart stores data for a limited number of
subject areas, such as marketing and sales data.
It is used to support specific applications - An independent data mart is created directly from
source systems - A dependent data mart is populated from a data
warehouse
Watson (2001)
22Operational Data store
- An operational data store consolidates data from
a number of source systems and provides a near
real time, integrated view of volatile current
data - Its purpose is to provide integrated data for
operational purposes. It has add, change and
delete functionality - It may be created to avoid a full-blown ERP
implementation
Watson (2001)
23Data Warehouse Architecture
Watson (2001)
24Data Warehousing Process
- Enterprise-wide warehouse, top down, the Inmon
methodology - Data mart, bottom up, the Kimball methodology
- When properly executed, both result in an
enterprise-wide data warehouse
Watson (2001)
25The Data Mart Strategy
- The most common approach
- Begins with a single mart and architected marts
are added over time for more subject areas - Relatively inexpensive and easy to implement
- Can be used as a proof of concept for data
warehousing - Can perpetuate the silos of information problem
- Requires an overall integration plan
26The Enterprise-wide Strategy
- A comprehensive warehouse is built initially
- An initial dependent data mart is built using a
subset of the data in the warehouse - Additional data marts are built using subsets of
the data in the warehouse - Like all complex projects, it is expensive, time
consuming, and prone to failure - When successful, it results in an integrated,
scalable warehouse
27Data Analysis Tools
- Basic Query and Reporting
- Basic question What happened?
- Historical focus, limited flexibility
- Data source OLTP database, ODS, data mart, data
warehouse - On-Line Analytical Processing (OLAP)
- Basic question What happened and why?
- Historical focus, multi-dimensional - look at
data from many points of view, with medium
flexibility (slice dice, drill down) - Data Source Data Mart, Data Warehouse
28Data Analysis Tools
- Data Mining
- Basic questions What is interesting? What might
happen? - Future focus, high flexibility
- Extract relationships, patterns and trends,
predict future trends. - Data source Data Warehouse
29OLAP Multi-dimensional View of Data
30OLAP Example
31OLAP Slice and Dice Example
32OLAP Drill Down Example
33Time for a Break!