Architecting Data Quality: Establishing and Maintaining Data Integrity - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Architecting Data Quality: Establishing and Maintaining Data Integrity

Description:

Unifying/Coherent Structure Used to Maintain Data Quality In the Data Warehouse ... Ensures data conforms to organizations business requirements ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 38
Provided by: ralph99
Category:

less

Transcript and Presenter's Notes

Title: Architecting Data Quality: Establishing and Maintaining Data Integrity


1
Architecting Data QualityEstablishing and
Maintaining Data Integrity
  • Presented by
  • Ralph Mohr, Director
  • Covansys
  • Tuesday, July 10, 2002
  • 100 PM

2
AGENDA
  • Introduction Need for data quality
  • Basic definitions
  • Enterprise data quality program
  • Data Stewardship
  • Data quality dimensions
  • Model repositories

3
Introduction
  • Increased Attention to Data Quality
  • Competition
  • Reduce loss of productivity
  • Reduce loss of resources
  • Increase credibility internally and externally
  • Need to Cut Costs/Reduce Waste
  • Pragmatic Approach to Data Quality
  • Cost Effective
  • Based on Real Life Experience
  • Support Business Goals and Objectives

4
Traditional Threats To Data Quality
  • Metadata Drift
  • Data requirements changes over time. These
    changes may not always be reflected in the
    transformation rules.
  • Business Rule Changes
  • Process Errors In Data Entry Systems
  • Process Errors In Batch Systems
  • Domain Changes
  • Change Management Issues
  • Presentation Issues

5
New Threats To Data Quality
  • Pace of Mergers and Acquisitions
  • Response To Market Time
  • Global Nature of Business
  • New Data Entry Sources
  • Increased 3rd Party Data Sources
  • ERP Systems
  • EDI
  • Internet Sources
  • Smart Sources

6
Basic Definitions
  • Data Quality
  • Conformance to Requirements
  • Data Quality Architecture
  • Unifying/Coherent Structure Used to Maintain Data
    Quality In the Data Warehouse
  • Data Quality Success
  • Maintaining a High Level Quality Data in the
    Enterprise Over an Extended Period of Time

7
Basic Principles of Data Quality
  • Data Quality is everyones responsibility
  • Cost of resolving data quality problems is far
    less than the cost of bad data
  • Effective data quality programs determine and
    resolve the root causes that create poor quality
    data
  • Data quality and standardization are the critical
    success factors for decision support and
    transactional IT systems

8
Enterprise Data Quality Program
  • Enables an organization to develop a quality
    centric environment
  • Makes data quality a shared value
  • Represents a cultural change
  • Ensures data quality efforts are business driven

9
Requirements
  • Education
  • Tools
  • Processes
  • Accountability
  • Enterprise view of data

10
The Organizational Data Quality Plan
  • Define and map organizations information value
    chain
  • Identify Data Quality stakeholders
  • Identify data consumers
  • Identify executive sponsor
  • Develop a Data Quality Education Plan

11
The Organizational Data Quality Plan (cont.)
  • Develop Data Quality Improvement Plan
  • Data Quality Improvement team (Technical and
    Business members)
  • Develop Data Stewardship Council.
  • Develop feedback loop and procedures

12
Identify/Define Data Quality Infrastructure
  • Define data quality measurements and metrics
  • Select specific data subject area for review
  • Survey data consumers for quality satisfaction
  • Assess data definitions
  • Assess data architecture
  • Assess database architecture
  • Assess current processes, performance metrics and
    rewards

13
Data Quality Assessment
  • Develop sampling algorithm
  • Prioritize data elements for assessment
  • Perform analysis
  • Identify data anomalies
  • Root cause analysis for each anomaly
  • Report data quality issues

14
Data Quality Cost Analysis
  • Measure data anomalies against business drivers
  • Determine costs of the current information
    capture, storage, distribution and maintenance

15
Data Quality Cost Analysis (cont.)
  • Costs Due to Data Quality Anomalies
  • Lost productivity
  • Scrappage
  • Rework
  • Lost revenue
  • Correction time
  • Fines and penalties

16
Data Quality Cost Analysis (cont.)
  • Validate findings with data consumers
  • Determine financial benefits from correcting
    anomalies
  • Develop alternative resolutions to resolve the
    anomaly
  • Benefit/Risk/Mitigation analysis
  • Identify recommended resolution

17
Short-term Data Repair Processes
  • Identify data defects that can be corrected(Data
    Cleansing) for an interim period
  • Design cleansing process
  • Test cleansing process
  • Implement cleansing process
  • Data repair process is temporary until root
    cause problem is resolved.

18
Long term Process Re-engineering
  • Identify root cause for the data anomaly
  • Develop plan to resolve data anomaly
  • Design changes in automated and manual processes
  • Implement new processes
  • Monitor effectiveness of the new processes.

19
The Benefits Of Data Stewardship
  • Improves data accessibility and quality
  • Empowers the business community to set decision
    support priorities
  • Creates an integrated view of data
  • Facilitates flexibility and quick response to the
    ever changing requirements of the organization.

20
Data Stewardship
  • Management of the organizations data by the
    business community
  • Shared accountability for data quality.
  • Prioritization driven by the business community
  • Guardians of this enterprise asset

21
Business Rules
  • Created and maintained by the business community
    (a role of Stewardship)
  • Ensures data conforms to organizations business
    requirements
  • Are developed and maintained by the business
    community
  • Supported by an automated system

22
Six Dimensions of Data Quality
Accessible
Consistent
Timely
Accurate
Precise
Complete
23
Accuracy
  • Accuracy has two components
  • Correctness Represents the truth of the data.
  • Clearly Understood Meaning Can one truly and
    clearly understand what the data means.

24
Precision
  • Precision
  • Having the appropriate level of detail in the
    data.
  • Related to numeric data
  • Having a sufficient number of decimal places
  • Typical types of data that require appropriate
    precision
  • Currency
  • Dates
  • Times
  • Locations

25
Completeness
  • Completeness has two components
  • Existence
  • Existence relates to the presence or absence of
    data
  • Data might be missing for a variety of reasons
  • Comprehensiveness
  • Measures the data's ability to answer all
    possible business questions
  • Required metrics not captured or tracked in the
    organizations information systems

26
Consistency
  • Consistency has two components
  • Internal Consistency
  • Is data consistent with a set of rules about
    itself
  • External Consistency
  • One item or one set of data is consistent with
    another item or set of data
  • Entity Integrity
  • Referential Integrity
  • Time Integrity

27
Timeliness
  • Can the information be provided to the end user
    quickly?
  • Two components
  • How soon the relevant information created by a
    business event can be processed and cleansed and
    made available to the system.
  • System Performance How quickly data can be
    returned to a user upon query

28
Accessibility
  • Accessibility has three components
  • Security
  • Appropriate communications and platform
    architectures are in place
  • Friendliness of the user interface (ease of
    access to data)

29
Implementation
  • Develop Organizational Data Quality Plan
  • Identify/Define Data Quality Infrastructure
  • Data Quality Assessment
  • Data Quality Cost Analysis
  • Short term Data Repair Processes
  • Long term Process Re-engineering

30
Resolving Data Quality Anomalies
  • Determining Root Causes
  • Identify the Cause
  • Trace the Problem Back from the Warehouse to the
    Source System
  • Engage the Help of the Operational Team
  • Engage the Help of the Business Subject Matter
    Experts
  • Review Recent Changes to Source System
  • Identify solutions
  • Short Term
  • Long Term (Correct the Source)
  • Identify Costs and Benefits

31
Sample Data Quality Architecture
32
Assessment vs. Maintenance
  • Preliminary Data Quality Assessment
  • Analytical
  • Discovery (Identify Data Quality Issues
    Priorities)
  • Develops Transform Cleansing Rules
  • Work with Data Stewards
  • Focused to Analysis
  • Flexible Time Constraints
  • Continuous Data Quality Maintenance
  • Process Oriented
  • Addresses Data Quality Issues Discovered in DQA
  • Applies Transform/Cleansing Rules
  • Work with Data Stewards
  • Cyclic/Repeatable Process
  • Stringent Time Constraints

33
What We Monitor
  • Data Element Priority
  • Referential Data (H)
  • Dimensional Data (H)
  • Data Facts for Important Decisions (H)
  • Supportive Data (M)
  • Subsidiary Data Nice to Have (L)

34
Data Quality Certification
  • States overall confidence of level of data
    quality in the organization
  • Metrics for the certification period
  • Documents improvements
  • Highlights concerns
  • Sets goals, objectives, targets and priorities
    for next certification period

35
Summary
  • Data quality improvement requires an iterative
    approach
  • Key successful implementation of a data quality
    improvement plan
  • Monitoring
  • Cost benefit analysis
  • Prioritization based on business requirements

36
Final Thought
  • The problem is not knowing what to do,
  • the challenge is having the will to do it!

37
Questions and Answers
Write a Comment
User Comments (0)
About PowerShow.com