Title: 615644 Data Warehousing
1615-644 Data Warehousing
Week 6 Measuring and Improving Data
Quality Larry English (1999)
2Larry English Approach to Data Quality
- Provides a definition for data quality
- Defines the TQdM (Total Data quality Management)
approach for assessing and improving data quality - Provides detailed guidelines for organisations
including process, product, tools, environment
and accountability - Widely used in practice
3Overview of TQdM
- Is a mind set, belief system, value system and
culture about information quality and customer
satisfaction - Is a continual information process improvement
methodology - Consists of 6 interconnected processes
4TQdM Methodology Overview
P6 Establish the Information Quality Environment
P4 Re-engineer and Cleanse Data
Cleansed Data
Data Definition Quality Assess- ment
Information Quality Assess- ment
P1 Assess Data Definition Information Architectu
re Quality
P2 Assess Information Quality
P3 Measure Non-quality Information Costs
Information Value/Costs
P5 Improve Information Process Quality
Information Quality Improvements
English (1999) Fig 4.1 p70
5TQdM P6 Establish the Information Quality
Environment
P6 Establish the Information Quality Environment
P4 Re-engineer and Cleanse Data
Cleansed Data
Data Definition Quality Assess- ment
Information Quality Assess- ment
P1 Assess Data Definition Information Architectu
re Quality
P2 Assess Information Quality
P3 Measure Non-quality Information Costs
Information Value/Costs
P5 Improve Information Process Quality
Information Quality Improvements
English (1999) Fig 4.1 p70
6TQdM P6 Establish the Information Quality
Environment
- Represents the systemic, management and cultural
requirements for a sustainable information
quality improvement environment - Understand information value chains
- Find out what information customers expect
- Education in information quality
- Change performance measures and incentives
7TQdM P1 Assess Data Definition Information
Architecture Quality
P6 Establish the Information Quality Environment
P4 Re-engineer and Cleanse Data
Cleansed Data
Data Definition Quality Assess- ment
Information Quality Assess- ment
P1 Assess Data Definition Information Architectu
re Quality
P2 Assess Information Quality
P3 Measure Non-quality Information Costs
Information Value/Costs
P5 Improve Information Process Quality
Information Quality Improvements
English (1999) Fig 4.1 p70
8TQdM P1 Assess Data Definition Information
Architecture Quality
TDQM Methodology Process P1 Assess Data
Definition Information Architecture
Quality
S1.1 Identify Data Definition Quality Measures
S1.4 Assess Data Definition Technical Quality
Technical Data Definition Quality Assessment S3.1,
S4.1, S5.1
Assessment Information Group S2.1
Begin
S1.2 Identify Information Group to Assess
S1.5 Assess Information Architecture DB
Design Quality
Information Architecture and Database Assessment S
3.1, S4.1, S5.1
S1.3 Identify Information Stakeholders
S1.6 Assess Customer Satisfaction With
Data Definition
Data Definition Customer Satisfaction Assessment S
3.1, S4.1, S5.1
English (1999) Fig 4.2 p73
9TQdM P1 Assess Data Definition Information
Architecture Quality
- The quality of a product (data) cannot be
measured without knowing the product
specifications are themselves accurate - Need both
- technical measures
- Customer satisfaction measures
10TQdM P1 Assess Data Definition Information
Architecture Quality
- Issues
- How do we assess data definition technical
quality? - Which stakeholders should be involved?
- How do we assess customer satisfaction of data
definitions?
11TQdM P2 Assess Information Quality
P6 Establish the Information Quality Environment
P4 Re-engineer and Cleanse Data
Cleansed Data
Data Definition Quality Assess- ment
Information Quality Assess- ment
P1 Assess Data Definition Information Architectu
re Quality
P2 Assess Information Quality
P3 Measure Non-quality Information Costs
Information Value/Costs
P5 Improve Information Process Quality
Information Quality Improvements
English (1999) Fig 4.1 p70
12TQdM P2 Assess Information Quality
TDQM Methodology Process P2 Assess
Information Quality
S2.1 Identify Information Group
for Assessment
S2.2 Identify IQ Objectives and Measures
S1.2
S2.3 Identify Information Value Cost Chain
S2.4 Determine Files or Processes to Assess
S2.5 Identify Data Validation Sources
Information Value Cost Chain Diagram S3.1,
S4.1, S5.1
S2.6 Extract Random Sample of Data
S2.7 Measure Information Quality
S2.8 Interpret and Report Information Quality
IQ Reports S3.1, S4.1, S5.1
English (1999) Fig 4.3 p75
13TQdM P2 Assess Information Quality
- Similar to measuring manufacturing product
quality - Needs both
- technical measures of conformance to
specification - Measures of customer satisfaction against
customer expectations
14TQdM P2 Assess Information Quality
- Issues
- Which data quality measures do we use?
- Inherent and pragmatic measures
- How do we identify all processes that impact data
quality? - How do we interpret and report information data
quality?
15TQdM P3 Measure Non-quality Information Costs
P6 Establish the Information Quality Environment
P4 Re-engineer and Cleanse Data
Cleansed Data
Data Definition Quality Assess- ment
Information Quality Assess- ment
P1 Assess Data Definition Information Architectu
re Quality
P2 Assess Information Quality
P3 Measure Non-quality Information Costs
Information Value/Costs
P5 Improve Information Process Quality
Information Quality Improvements
English (1999) Fig 4.1 p70
16TQdM P3 Measure Non-quality Information Costs
TDQM Methodology Process P3 Measure
Non-quality Information Costs
S3.1 Identify Business Performance Measures
S3.2 Calculate Information Costs
S3.3 Calculate Non-quality Information Costs
S1.4,5,6
S2.3,8
S3.4 Identify Customer Segments
S3.5 Calculate Customer Lifetime Value
S3.6 Calculate Information Value
Information Value Cost Chain Diagram S3.1,
S4.1, S5.1
English (1999) Fig 4.4 p76
17TQdM P3 Measure Non-quality Information Costs
- The costs of poor quality information can be
precisely measured - In terms of reduced profit and revenue
- Against formal and informal business drivers
- This established the business case for
information management and information quality
improvement
18TQdM P3 Measure Non-quality Information Costs
- Issues
- How do we assign a value to information?
- Value Benefit - Cost (English (1999) p
200) - How do we determine the business impact of data
quality problems? - Benefits and Costs
- How do we decide that data is no longer required
and may be disposed?
19TQdM P4 Re-engineer and Cleanse Data
P6 Establish the Information Quality Environment
P4 Re-engineer and Cleanse Data
Cleansed Data
Data Definition Quality Assess- ment
Information Quality Assess- ment
P1 Assess Data Definition Information Architectu
re Quality
P2 Assess Information Quality
P3 Measure Non-quality Information Costs
Information Value/Costs
P5 Improve Information Process Quality
Information Quality Improvements
English (1999) Fig 4.1 p70
20TQdM P4 Re-engineer and Cleanse Data
TDQM Methodology Process P4 Re-engineer and
Cleanse Data
S4.1 Identify Data Sources
S4.6 Identify IQ Objectives and Measures
S1.4,5,6 S2.3,8 S3.6
Data defect Type List S5.1
S4.2 Extract Analyse Source Data
S4.3 Standardise Data
Source Data
S4.4 Correct and Complete Data
S4.5 Match and Consolidate Data
Cleansed Data S4.7
English (1999) Fig 4.5 p78
21TQdM P4 Re-engineer and Cleanse Data
TDQM Methodology Process P4 (Continued) Condit
ion Data for Data Warehouse
Data Warehouse
S4.7 Transform And Enhance Data into Target
S4.9 Audit and Control Data Extract, Transforma
tion And Loading
S4.5
Target Databases
S4.8 Calculate Derivations And Summary Data
English (1999) Fig 4.6 p79
22TQdM P4 Re-engineer and Cleanse Data
- The process for information product improvement
- A process of information scrap and rework
(similar to manufacturing scrap and rework) - Brings the data up to an acceptable level of
quality - using both automatic and human data correction
23TQdM P4 Re-engineer and Cleanse Data
- Issues
- How much will it cost to fix data quality
problems? - How do we prioritise data cleansing projects?
- Do we also fix data quality problems in source
databases or just in the data warehouse?
24TQdM P5 Improve Information Process Quality
P6 Establish the Information Quality Environment
P4 Re-engineer and Cleanse Data
Cleansed Data
Data Definition Quality Assess- ment
Information Quality Assess- ment
P1 Assess Data Definition Information Architectu
re Quality
P2 Assess Information Quality
P3 Measure Non-quality Information Costs
Information Value/Costs
P5 Improve Information Process Quality
Information Quality Improvements
English (1999) Fig 4.1 p70
25TQdM P5 Improve Information Process Quality
TDQM Methodology Process P5 Improve
Information Process Quality
S1.4,5,6 S2.3,8 S3.6 S4.6
S5.1 Select Process for Information Quality Impr
ovement
S5.2 Develop Plan for Information Quality Improv
ement
S5.3 Implement Information Quality Improvements
S5.5 Act to Standardise Information Quality Impr
ovements
S5.4 Check Impact of Information Quality Improve
ments
Information Quality Improvements
English (1999) Fig 4.7 p81
26TQdM P5 Improve Information Process Quality
- Takes known data quality problems, analyses
underlying causes, and plans and implements
process improvements that prevent data defects - May be
- Formal or informal
- For an individual or across a supply chain
27TQdM P5 Improve Information Process Quality
- How pro-active can we be in eliminating data
quality problems at their source? - Data quality problems are much cheaper to fix at
their source rather than cleaning up after the
problems have occurred - What training is required to address data quality
problems at their source?
28Conclusion
- An overview of the TQdM (Total Data quality
Management) approach for assessing and improving
data quality - Provides organisations with a process for
assessing and improving data quality