Understanding Data Quality Issues: - PowerPoint PPT Presentation

About This Presentation
Title:

Understanding Data Quality Issues:

Description:

Texas auto buyers buy more Cadillacs per capita than any other state ... 10% of last year's customers did not buy anything this year. What is Data Profiling? ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 29
Provided by: ADem4
Learn more at: https://dama-ncr.org
Category:

less

Transcript and Presenter's Notes

Title: Understanding Data Quality Issues:


1
Understanding Data Quality Issues
  • Finding Data Inaccuracies

Art DeMaio Evoke Software VP Technical Sales
Support
2
Agenda
  • Why is Understanding Data Important
  • Methodology for Assessing Data
  • Defining
  • Weighting
  • Profiling
  • Revisiting
  • Finding
  • Addressing
  • Maintaining
  • What is Profiling
  • Benefits of the Assessment

3
What the Experts say
  • Information quality is not an esoteric notionit
    directly affects the effectiveness and efficiency
    of business processes. Information quality also
    plays a major role in customer satisfaction.
  • - Larry P. English

4
What the Experts say
  • Poor data quality is costly. It lowers customer
    satisfaction, adds expense, and makes it more
    difficult to run a business and pursue tactical
    improvements such as data warehouses and
    re-engineering.
  • - Thomas C. Redman

5
Whats in Your DATA
  • three-quarters (of participating companies)
    reported significant problems as a result of
    defective data, with a third failing to bill or
    collect receivables as a result.
  • - In a PricewaterhouseCoopers survey of 600 CIOs,
    IT directors or similar executives

6
What is Data Quality?
  • Accuracy of Content
  • Structure
  • Completeness
  • Timeliness
  • Presentation

7
Assessing Your Data
4-Revisit Definitions, Weights
Source Data
7-Maintain
3-Profile Data
2-Weight /Impact
5-Findings
1-Define Issues
6-Address
8
Defining Issues
Source Data
  • Standard list
  • Key requirements
  • Content
  • Structure
  • Completeness
  • Update list by project or source

1-Define Issues
9
Defining Issues-sample
Source Data
1-Define Issues
10
Weight Impact
  • After the issues are initially identified
  • Some issues are more critical than others
  • Weights are not priorities
  • Assign a weighting factor (1-5)
  • Weighting factors SHOULD change by project

Source Data
2-Weight /Impact
1-Define Issues
11
Profile Data
Source Data
3-Profile Data
2-Weight /Impact
1-Define Issues
  • What does Data Profiling mean?

12
What is Data Profiling?
The use of analytical techniques on data for the
purpose of developing a thorough knowledge of
its content, structure and quality. A process
of developing information about data instead of
information from data.
13
What is Data Profiling?
Information About Data (Data Profiling)
30 of entries in SUPPLIER_ID are blank the
range of values in UNIT_PRICE is 5.99 to 4599.99
there are 14 ORDER_HEADER rows with no
ORDER_DETAIL rows Information FROM Data (not
Data Profiling) Texas auto buyers buy more
Cadillacs per capita than any other state The
average mortgage amount increased last year by
6 10 of last year's customers did not buy
anything this year
14
Profile Data
Source Data
3-Profile Data
2-Weight /Impact
1-Define Issues
  • This is multi-step process
  • Collect documentation
  • Review the DATA itself
  • Compare data to documentation
  • Identify and detail specific issues

15
Revisit
4-Revisit Definitions, Weights
Source Data
3-Profile Data
2-Weight /Impact
1-Define Issues
  • Review the issues and weights
  • Should there be more or less issues
  • What are they?
  • Are the relative importance of each issue
    different?

16
Findings
4-Revisit Definitions, Weights
Source Data
3-Profile Data
2-Weight /Impact
5-Findings
1-Define Issues
  • Your findings tell others about the data
  • Documented reports and/or charts
  • Results database
  • Quality Assessment Score

17
Findings-Chart
18
Findings-Chart
19
Findings-Chart
20
Findings-Chart
Weighted Issue Rate - 23.8
Weighted Assessment Score - 76.2
21
Address the Issues
4-Revisit Definitions, Weights
Source Data
3-Profile Data
2-Weight /Impact
5-Findings
1-Define Issues
6-Address
  • Addressing your findings
  • Actual vs. Potential
  • Subject Matter Expertise
  • Cleansing Requirements

22
Maintain Vigilance
4-Revisit Definitions, Weights
Source Data
7-Maintain
3-Profile Data
2-Weight /Impact
5-Findings
1-Define Issues
6-Address
  • Maintain
  • Complete the cycle
  • Periodic review
  • Document score changes

23
Why Do The Assessment?
  • Quantify the quality issues
  • Isolate true problems
  • Proactive review
  • reduces the cost of resolving issues
  • reduces the risk of customer dissatisfaction
  • Define the scope of issues
  • Determine the resources required to address issues

24
Why Do The Assessment?
Project Costs
Cost to Address an Issue
Project Timeline
When you find an Issue
25
Why should it be done
Pay me now or Pay
me later
TIME
26
When Should It Be Done?
  • Every IT data project
  • Warehousing
  • CRM
  • ERP
  • EAI
  • MA
  • Ongoing based on
  • Criticality of the system
  • Current status (score)
  • Need to re-purpose data

27
(No Transcript)
28
Bibliography
  • Larry P. English Improving Data Warehouse and
    Business Information Quality, John Wiley Sons
    Inc., 1999
  • Jack Olson, Data Profiling The Accuracy
    Dimension,
  • Morgan Kaufmann, 2002
  • Thomas C. Redman Data Quality for the
    Information Age,
  • Artech House, 1996
  • PricewaterhouseCoopers, Global Data Management
    Survey,
  • 2001
Write a Comment
User Comments (0)
About PowerShow.com