Title: Understanding Data Quality Issues:
1Understanding Data Quality Issues
- Finding Data Inaccuracies
Art DeMaio Evoke Software VP Technical Sales
Support
2Agenda
- Why is Understanding Data Important
- Methodology for Assessing Data
- Defining
- Weighting
- Profiling
- Revisiting
- Finding
- Addressing
- Maintaining
- What is Profiling
- Benefits of the Assessment
3What the Experts say
- Information quality is not an esoteric notionit
directly affects the effectiveness and efficiency
of business processes. Information quality also
plays a major role in customer satisfaction. - - Larry P. English
4What the Experts say
- Poor data quality is costly. It lowers customer
satisfaction, adds expense, and makes it more
difficult to run a business and pursue tactical
improvements such as data warehouses and
re-engineering. - - Thomas C. Redman
5Whats in Your DATA
- three-quarters (of participating companies)
reported significant problems as a result of
defective data, with a third failing to bill or
collect receivables as a result. - - In a PricewaterhouseCoopers survey of 600 CIOs,
IT directors or similar executives
6What is Data Quality?
- Accuracy of Content
- Structure
- Completeness
- Timeliness
- Presentation
7Assessing Your Data
4-Revisit Definitions, Weights
Source Data
7-Maintain
3-Profile Data
2-Weight /Impact
5-Findings
1-Define Issues
6-Address
8Defining Issues
Source Data
- Standard list
- Key requirements
- Content
- Structure
- Completeness
- Update list by project or source
1-Define Issues
9Defining Issues-sample
Source Data
1-Define Issues
10Weight Impact
- After the issues are initially identified
- Some issues are more critical than others
- Weights are not priorities
- Assign a weighting factor (1-5)
- Weighting factors SHOULD change by project
Source Data
2-Weight /Impact
1-Define Issues
11Profile Data
Source Data
3-Profile Data
2-Weight /Impact
1-Define Issues
- What does Data Profiling mean?
12What is Data Profiling?
The use of analytical techniques on data for the
purpose of developing a thorough knowledge of
its content, structure and quality. A process
of developing information about data instead of
information from data.
13What is Data Profiling?
Information About Data (Data Profiling)
30 of entries in SUPPLIER_ID are blank the
range of values in UNIT_PRICE is 5.99 to 4599.99
there are 14 ORDER_HEADER rows with no
ORDER_DETAIL rows Information FROM Data (not
Data Profiling) Texas auto buyers buy more
Cadillacs per capita than any other state The
average mortgage amount increased last year by
6 10 of last year's customers did not buy
anything this year
14Profile Data
Source Data
3-Profile Data
2-Weight /Impact
1-Define Issues
- This is multi-step process
- Collect documentation
- Review the DATA itself
- Compare data to documentation
- Identify and detail specific issues
15Revisit
4-Revisit Definitions, Weights
Source Data
3-Profile Data
2-Weight /Impact
1-Define Issues
- Review the issues and weights
- Should there be more or less issues
- What are they?
- Are the relative importance of each issue
different?
16Findings
4-Revisit Definitions, Weights
Source Data
3-Profile Data
2-Weight /Impact
5-Findings
1-Define Issues
- Your findings tell others about the data
- Documented reports and/or charts
- Results database
- Quality Assessment Score
17Findings-Chart
18Findings-Chart
19Findings-Chart
20Findings-Chart
Weighted Issue Rate - 23.8
Weighted Assessment Score - 76.2
21Address the Issues
4-Revisit Definitions, Weights
Source Data
3-Profile Data
2-Weight /Impact
5-Findings
1-Define Issues
6-Address
- Addressing your findings
- Actual vs. Potential
- Subject Matter Expertise
- Cleansing Requirements
22Maintain Vigilance
4-Revisit Definitions, Weights
Source Data
7-Maintain
3-Profile Data
2-Weight /Impact
5-Findings
1-Define Issues
6-Address
- Maintain
- Complete the cycle
- Periodic review
- Document score changes
23Why Do The Assessment?
- Quantify the quality issues
- Isolate true problems
- Proactive review
- reduces the cost of resolving issues
- reduces the risk of customer dissatisfaction
- Define the scope of issues
- Determine the resources required to address issues
24Why Do The Assessment?
Project Costs
Cost to Address an Issue
Project Timeline
When you find an Issue
25Why should it be done
Pay me now or Pay
me later
TIME
26When Should It Be Done?
- Every IT data project
- Warehousing
- CRM
- ERP
- EAI
- MA
- Ongoing based on
- Criticality of the system
- Current status (score)
- Need to re-purpose data
27(No Transcript)
28Bibliography
- Larry P. English Improving Data Warehouse and
Business Information Quality, John Wiley Sons
Inc., 1999 - Jack Olson, Data Profiling The Accuracy
Dimension, - Morgan Kaufmann, 2002
- Thomas C. Redman Data Quality for the
Information Age, - Artech House, 1996
- PricewaterhouseCoopers, Global Data Management
Survey, - 2001