Title: Data-Driven Business Intelligence Systems: Part I
1Data-Driven Business Intelligence Systems Part I
- Week 5
- Dr. Jocelyn San PedroSchool of Information
Management Systems - Monash University
2Lecture Outline
- Data-driven BIS
- Data warehouse
- Data warehouse architectures
- Entity-Relationship Modelling
- Multi-dimensional Modelling
- Star Schema
- An Example Retail Trading
3Learning Objectives
- At the end of this lecture, the students will
- Have better understanding of concepts, tools and
technology underlying data-driven business
intelligence systems - Have knowledge of multidimensional modelling and
star schema for data modelling for data
warehouses
4- Data-Driven Business
- Intelligence Systems
5Data-Driven BIS
- Data-driven BIS
- information systems that provide BI through
access and manipulation of large databases of
structured data - includes tools for
- drill down for more detailed information
- drill up for broader, more summarised view
- slice and dice for a change in data dimensions
6Data-Driven BIS
7Data-Driven BIS
8Data-Driven BIS
Slicing the cube
Product
Salesperson
Time
9Data-Driven BIS
Dicing the cube
225,764
201,196
162,504
182,500
123,033
72,528
116,963
75,048
68,792
Peacock
Leverling
Fuller
Davolio
Callahan
Suyaman
King
Dodsworth
Buchanan
Q1 Q2 Q3 Q4
10 11Data Warehouse
- A data warehouse is a subject-oriented,
integrated, time-variant, nonvolatile collection
of data in support of managements decision
making process Bill Inmon (1995) - Subject-oriented focus is on subjects related to
business or organisational activity like
customers, employees, suppliers (instead of
applications-oriented (finance, marketing,
production) - Integrated data from various databases is stored
in a consistent format through use of naming
conventions, domain constraints, physical
attributes and measurements - Time-variant associating data with specific
points in time - Nonvolatile data does not change once it is in
the data warehouse and stored in data warehouse
12Data Warehouse
- Data warehouse is a copy of transaction data
specifically structured for query and analysis
Ralph Kimball (1996) - Data warehouse is a specific database designed
and populated to provide decision support in an
organisation - Gray and Watson (1998)
13Data Warehouse
- Data warehousing emerged as result of
- improvements in database technology relational
data model and relational database management
systems (DBMS) - advances in computer hardware - emergence of
affordable mass storage and parallel computer
architectures - emergence of end-user computing, facilitated by
powerful, intuitive computer interfaces and tools - advances in middleware products that enable
enterprise database connectivity across
heterogeneous platforms
14Data Warehouse
- triggered by recognition of fundamental
differences between operational (or production)
systems and informational (or decision support)
systems - Operational system system that is used to run a
business in real time, based on current data
e.g. sales order processing, reservation systems,
patient registration - Informational systems designed to support
decision making based on stable point-in-time or
historical data for complex read-only queries or
data mining applications e.g. sales trend
analysis, customer segmentation, human resources
planning
15Data Warehouse
Comparison of Operational and informational
Systems McFadden, Hoffer and Prescott 1999
Characteristic Operational Systems Informational Systems
Primary purpose Run the business on a current basis Support managerial decision making
Type of data Current representation of state of the business Historical or point-in-time (snapshots)
Primary users Clerks, salespersons, administrators Managers, business analysts, customers
Scope of usage Narrow vs. simple updates and queries Broad vs. complex queries and analysis
Design goal performance Ease of access and use
16Data Warehouse Architectures
- Generic two-level architecture
17Data Warehouse Architectures
Three-level architecture
18Data Warehouse Architectures
- Data mart
- a data warehouse that is limited in scope
- contains selected and summarised data to support
specific decision support applications of
specific end-user group - e.g., marketing data mart, finance data mart
19Data Warehouse Architectures
- Three-layer data architecture
Enterprise data model
20Data Warehouse Architectures
- Enterprise data model
- Presents a total picture explaining the data
required by an organisation - Must be developed prior to designing a data
warehouse - Entity-Relationship Models traditional approach
in relational database design - Multidimensional Models are commonly used in
data warehouses and data marts for faster
retrieval for querying and analysis
21Data Warehouse Architectures
- Operational Data
- current or transient, not historical
- restricted in scope to a particular application
- poor quality
- not normalised (there are multi-valued attributes
or repeating groups, partial dependencies,
transitive dependencies in data relations)
22Data Warehouse Architectures
Sample Operational Data from Northwind database
23Data Warehouse Architectures
- Reconciled Data
- Detailed - rather than summarised
- Historical snapshots, periodic
- Comprehensive should reflect enterprise-wide
perspective conform to enterprise data model - Quality controlled
- Normalised 3NF or higher
- 3NF no multi-valued attributes, no partial
dependencies, no transitive dependencies
24Data Warehouse Architectures
Steps in Normalisation
25Data Warehouse Architectures
Sales relation with sample data
Relation in 3NF
26Data Warehouse Architectures
- Derived Data
- selected, formatted, aggregated
- provides ease of use for decision support
applications - provides fast response for user queries
- supports ad-hoc queries and data mining
applications - data model commonly used is star schema
27Data Warehouse Architectures
- Metadata
- data that describe the properties or
characteristics of other data - Operational metadata describe the data in
various operational systems (as well as external
data) that feed the EDW - EDW metadata describe the reconciled data layer
as well as the rules for transforming operational
data to reconciled data - Data mart metadata describe the data in derived
data layer and rules of transforming reconciled
data to derived data
28Data Warehouse Architectures
Sample data description
29Data Warehouse Architectures
Sample data description
30Data Warehouse Architectures
- Data Reconciliation Process
- Stage 1 Initial load, when EDW is first created
- Stage 2 Subsequent updates
- Steps in Data Reconciliation Process
- Capture extract relevant data from source/s
31Data Warehouse Architectures
- Scrub clean or upgrade the quality of raw data
before transformation and loading (using pattern
recognition, artificial intelligence techniques) - Track and correct errors misspelled names,
erroneous birthdates, missing data inconsistent
data formats
32Data Warehouse Architectures
- Transform - includes
- converting data format or representation from
source to target system - partitioning data according to predefined
criteria - aggregating data from detailed to summary level
33Data Warehouse Architectures
- Load and Index
- Refresh mode filling the EDW by bulk rewriting
of target data - Update mode only changes in source data are
written to the data warehouse at periodic
intervals, data warehouse is rewritten, replacing
previous contents without overwriting or deleting
previous contents - Create necessary indexes
34- Entity-Relationship Modelling
35Entity-Relationship Modelling
Customer
Region
Customer Type
groups
within
contains
makes
Sale
located at
Product
Store
Product Type
groups
in
within
Period
(based on Kimball (1996), p29, and Simsion-Bowles
(1996), p2)
36Entity-Relationship Modelling
- Entities, attributes and relationships
- Rules of normalisation
- 3NF is typical
- Protection of integrity of database by avoiding
anomalies - Every logical thing is represented only once
- Separate consideration of logical and physical
aspects
37Entity-Relationship Modelling
ER Model for the Northwind sample database
38Entity-Relationship Modelling
- Large numbers of tables
- Oracle Financials - 1,800 SAP 7 up to 8,000
- Commonly used
- Feels natural once you get used to it
- Research shows that they are not easily
understood by IT people - Especially concepts like abstraction,
generalisation, sub-types, etc.
39- Multi-dimensional Modelling
40Multi-dimensional Modelling
- It is possible to conceptualise data as
multi-dimensional - Difficult to design
- Easy to use resulting reports
- Advocated by Ralph Kimball (see his manifesto,
and a rebuttal, available on the web site). - A logical design technique that seeks to present
data in a standard framework that is intuitive
and allows for high-performance access.
41Multi-dimensional Modelling
- An approach to database design that provides an
easy to understand and navigate database - The aim is to encourage understanding,
exploration and learning - Each number in a database has a set of associated
attributes - What it measures, what point of time it was
created, what location its from, what product
its associated with, what promotion, etc. - This makes the number meaningful.
42Multi-dimensional Modelling
- Each attribute associated with each number
represents a dimension - Measure, time, location, product, location, etc.
- Resulting views are easy to navigate and move
around - Slice and dice
- Report template
43Multi-dimensional Modelling
Example Widget Sales (Million)
One Dimension (State)
43.6
53.4
31.4
27.5
28.3
14.7
Vic
NSW
QLD
WA
SA
TAS
State
48.2
53.4
31.4
28.4
25.1
15.4
2002
Two Dimensions (location x time)
50.1
57.2
33.6
28.1
22.5
16.3
2001
56.3
62.3
35.1
29.4
21.5
13.3
2000
Year
46.2
52.1
29.6
25.1
27.1
18.2
1999
43.6
53.4
31.4
27.5
28.3
14.7
1998
Vic
NSW
QLD
WA
SA
TAS
State
44Multi-dimensional Modelling
Three Dimensions (location x time x product)
48.2
53.4
31.4
28.4
25.1
15.4
2002
50.1
57.2
33.6
28.1
22.5
16.3
2001
56.3
62.3
35.1
29.4
21.5
13.3
2000
Year
46.2
52.1
29.6
25.1
27.1
18.2
1999
Flanges
43.6
53.4
31.4
27.5
28.3
14.7
Gaskets
1998
Sprockets
Vic
NSW
QLD
WA
SA
TAS
Product
Widgets
State
45Multi-dimensional Modelling
- Usually talk about information spaces as cubes,
or hyper-cubes, or n-cubes - Resulting views of databases are easy to navigate
and move around - Slicing and dicing
- Report Template
46Multi-dimensional Modelling
- Slicing and Dicing
- Select certain dimension values to examine a set
of data
47Multi-dimensional Modelling
- Report Templates
- One template is produced for a set of slices
- Data changes, layout doesnt
Location Drop Down Box
Year Drop Down Box
48Multi-dimensional Modelling
From Traditional Relational to Multi-dimensional
Typical relational data-base
From Pilot Software OLAP White Paper
Same data displayed in two-dimensions Easy!
(The key is to identify the continuous and
discrete variables in the flat file.)
49 50Star Schema
- Used to implement dimensional analysis using
standard relational database technology - Very common in data warehousing
- Many variations
- Two components
- Fact Table contains measurements of business,
eg. sales, purchase order, shipment - Dimension Tables stores the textual
descriptions of the dimensions of the business,
eg. product, customer, vendor, store.
51Star Schema
- Fact tables store the hard data
- Dimension tables store all the information about
our dimensions. - The fact table has a many-to-one relationship
with each dimension table - Each dimension table has a primary key that
appears as a foreign key in the fact table, whose
primary key is a concatenation of all of the
foreign keys.
52Star Schema
- Dimension tables in star schemas are denormalised
resulting in - Fewer tables
- Simpler for users to navigate
- Reduced number of complex multi-join tables.
53Star schema
Customer Customer key Name Customer type
Sale Time key Store key Customer key Product
key Dollar sales Unit sales
Product Product key Product type weight
Store Store key Address Region
Time Time key Day Month
Legend Primary Key Foreign key
54Snowflake schema
Customer Type
Do not snowflake your dimensions, even if very
large. If you do snowflake your dimensions,
prepare to live with poor performance Kimball
(1996)
Customer
Sale
Product
Store
Product Type
Region
Time
55Star Schema
- Dimensions can be shared amongst fact tables.
56Star Schema
- ER schemas are useful for data mapping to legacy
systems and for integration of the data warehouse - Star schemas are useful for the design of
warehouse databases as they are efficient and
easy to understand and use - Allow relational databases to support
multi-dimensional data cubes
57Star Schema
- Steps in the design process
- Choose a business process
- Choose the grain of the fact table
- Too fine gt Oversized database
- Too large gt Loss of meaningful information
- Choose the dimensions
- Choose the measured facts
- (usually numeric, additive quantities)
- Complete the dimension tables
- Kimball (1996)
58Extra steps in the design process
- Determine strategy for slowly changing dimensions
- Create aggregations and other physical storage
components - Determine the historical duration of the database
- Determine the urgency with which the data is to
be extracted and loaded into the data warehouse. - Kimball (1996)
59An Example Retail Trading
- A large grocery store with approx. 500 stores
- Each store has approx. 60,000 products on shelves
- Need to maximise profit and keep shelves stocked
- Important decisions concern pricing and promotion
- Promotion types are
- Temporary price reductions
- Newspaper advertisements
- Shelf and end-aisle displays
- Coupons
60An Example Retail Trading
- Choose a Business Process
- Daily Item Movement
- Choose the grain of the fact table
- Stock Keeping Unit (SKU) by store by promotion by
day - Choose the Dimensions
- Time, product, store and promotion
61An Example Retail Trading
Retail Trading Dimensions
Promotion Promotion key Other Promotion attributes
Sale Time key Product key Store Key Promotion
Key Facts to be detailed next
Product Product key Other product attributes
Store Store key Other Store Attributes
Time Time key Other Time Attributes
62An Example Retail Trading
- 4. Choose the measured facts
Promotion Promotion key Other Promotion attributes
Sale Time key Product key Store Key Promotion
Key Dollar Sales Unit Sales Dollar Costs Customer
Count
Product Product key Other product attributes
Store Store key Other Store Attributes
Time Time key Other Time Attributes
63An Example Retail Trading
- 5. Complete the dimension tables
Promotion Promotion key Other Promotion attributes
Product Product key SKU Description SKU
Number Package Size Brand Sub Category Department
Package Type Diet Type Weight Weight unit of
measure Units per retail case Units per ship
case Cases per pallet
Sale Time key Product key Store Key Promotion
Key Dollar Sales Unit Sales Dollar Costs Customer
Count
Store Store key Other Store Attributes
Time Time key Other Time Attributes
64References
- Inmon, W. H. (1996) Building the Data Warehouse
(2nd ed), Wiley, NY. - Kimball, R. (1996) The Data Warehouse Toolkit,
Wiley, NY. - McFadden, F., Hoffer, J. and Prescott, M. (1999)
Modern Database Management, Addison-Wesley.
65- Questions?
- Jocelyn.sanpedro_at_sims.monash.edu.au
- School of Information Management and Systems,
Monash University - T1.28, T Block, Caulfield Campus
- 9903 2735