Data Quality - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Data Quality

Description:

CD Club Scam. What is Data? Working definitions: ... 'Fitness for Use' Different rules for different data sets. Includes, but is more than: ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 22
Provided by: davidl116
Learn more at: https://cs.nyu.edu
Category:
Tags: data | quality

less

Transcript and Presenter's Notes

Title: Data Quality


1
Data Quality
  • David Loshin

2
Course Structure
  • Overview of Data Quality
  • Data Ownership and Data Roles
  • Cost Analysis of Poor Data Qaulity
  • Dimensions of Data Quality
  • Data models, Data values, Presentation
  • Data Extraction and Transformation
  • ETL, Data transformation

3
Course Structure (2)
  • Data Quality Improvement
  • Metadata and Enterprise Reference Data
  • Domains and Mappings
  • Data Quality Rules
  • Definition of Rules
  • Discovery of Rules

4
Course Structure (3)
  • Using Data Quality Rules
  • Message Transformation and Routing
  • Data warehouse validation
  • GUI Generation
  • Data Warehouse Population

5
Course Structure (4)
  • Data Cleansing
  • Data Parsing
  • Standardization
  • Linkage
  • Duplicate Elimination
  • Approximate Searching
  • Scalability Issues

6
Project
  • Build a data quality tool
  • rule definition
  • data parsing
  • data element standardization
  • record linkage
  • Apply the tool in characterizing real-world data
    (Ill supply some, dont worry -)

7
Some Examples
  • Frequent Flyer Miles and Long-Distance Service
  • Corporate Credit Card
  • Direct Marketing Event
  • CD Club Scam

8
What is Data?
  • Working definitions
  • Data arbitrary values (with their own
    representation)
  • Information data within a context
  • Knowledge Understanding of information within
    its context
  • Metadata data about data

9
Who Owns Data?
  • Important question, because the answers indicate
    where responsibility for data quality lies
  • Data quality can be difficult to effect because
    of complicating notions
  • Data Processing as an information Factory
  • Actors in the information factory and their roles

10
Actors and Their Roles
  • Supplier
  • Acquirer
  • Creator
  • Processor
  • Packager
  • Delivery Agent
  • Consumer
  • Middle Manager
  • Senior Manager
  • Decision-maker

11
Ownership Responsibilities
  • Definition of data
  • Authorization and Security
  • User support
  • Data packaging and delivery
  • Maintenance
  • Data quality
  • Management of business rules
  • Management of metadata
  • Standards management
  • Supplier management

12
Owernship Paradigms
  • Creator
  • Consumer
  • Compiler
  • Enterprise
  • Funder
  • Decoder
  • Packager
  • Reader
  • Subject
  • Purchaser
  • Everyone

13
Complicating Notions
  • Ownerhsip is affected by the value of data
  • Privacy
  • Turf
  • Fear
  • Bureaucracy

14
The Data Ownership Policy
  • Order of enforcement
  • Identify stakeholders
  • Identify data sets
  • Allocation of ownership
  • Ownership roles and responsibilities
  • Dispute Resolution

15
The Data Ownership Policy (2)
  • Maintain a metadata database for data ownership
  • Parties table
  • Data set table
  • Roles and responsibilities
  • Policies (i.e., dispute resolution,
    communication, etc.)

16
Ownership Roles
  • CIO
  • CKO
  • Trustee
  • Policy Manager
  • Registrar
  • Steward
  • Custodian
  • Data Administrator
  • Security Administrator
  • Information Flow
  • Information Processing
  • Application development
  • Data Provider
  • Data Consumer

17
The Information Factory
  • Information processing can be broken down into a
    graph
  • Each node in the graph is a data producer, data
    consumer, or both
  • The edges represent communcation paths

18
What is Data Quality?
  • Fitness for Use
  • Different rules for different data sets
  • Includes, but is more than
  • Data cleansing
  • Standardization
  • Deduplification
  • Merge-purge

19
Lather, Rinse, Repeat
  • Data quality is a process
  • Assess the current state of the quality of data
  • Determine the area that needs most improvement
  • Determine success criteria
  • Implement the improvement
  • Measure against success threshold
  • If success goto 2

20
Data Quality is Hard to Do
  • No one wants to admit mistakes
  • Denial of responsibility
  • Lack of understanding
  • Dirty work
  • Lack of recognition

21
Steps to Data Quality
  • Training
  • Data ownership policy
  • Economic model of data quality
  • Current state assessment and requirements
    analysis
  • Project selection and implementation
Write a Comment
User Comments (0)
About PowerShow.com