615644 Data Warehousing - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

615644 Data Warehousing

Description:

Semiotics, ontology, service quality. Must be usable and useful (relevant) ... Semiotics (Pierce and Morris) Three Components. sign: actual representation ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: Gra6162
Category:

less

Transcript and Presenter's Notes

Title: 615644 Data Warehousing


1
615-644 Data Warehousing
Week 5 Understanding Data Quality
2
Data Quality
  • Data quality problems are widespread in practice
    and have significant economic impacts
  • Databases have significant error rates
  • between 1 and 10 percent of data in critical
    organisational databases are estimated to be
    inaccurate Klein, Goodhue and Davis (1996)
  • About 500,000 dead people were found to
    registered for Medicare benefits because of
    inaccurate data The Australian (March 15,
    2005)

3
Data Quality Spot the Errors
Customer
Valid Until Date
Name
Suburb
Postcode
Gender
Credit
Bill Smith
Caulfield
3145
M
10,000
31/12/99
ACME Pty Ltd
Malvern
3286
D
1f,000
31/10/99
Mary Whiete
Caulfield
5000
F
3145
30/12/99
Richmondd
2467
F
2,000
28/2/99
Bill Smyth
4
Understanding Data Quality
  • List of desirable dimensions
  • Accurate, complete, timely (eg Redman)
  • Expert opinion
  • trust me! (eg English)
  • Empirical
  • Survey data practitioners (eg Wang and Strong)
  • Theoretical
  • Rigorous (eg Wand and Wang)

5
English (1999) Framework
  • Inherent
  • Definition conformance, validity of business rule
    conformance, completeness (of values), accuracy
    (to surrogate source) .
  • Pragmatic
  • Timeliness, contextual clarity, derivation
    integrity, usability .

6
Wang and Strong (1996) Framework
  • Survey data consumers as to their opinions on
    data quality factors
  • Four groups of data quality factors resulted
  • intrinsic (accuracy, reputation )
  • contextual (timeliness, completeness )
  • representational (concise, understandable )
  • accessibility (accessible, security)

7
Kahn et al. (1997) Framework
  • Product Service Model

Conforms to Specifications
Meets or Exceeds Customer Expectations
Sound Information (complete, timely, believable,
consistent)
Useful Information (objective, relevant,
understandable,reputation)
Product Quality
Usable Information (secure, timely, concise,
accessible, consistent)
Effective Information (value-added, appropriate
amount)
Service Quality
8
Wang and Wand (1996) Framework
  • Theoretical based on Bunges ontology

Complete, Unambiguous, Meaningful, Non-redundant,
Correct
Database
Real World
9
Limitations of Existing Approaches
  • Quality dimensions are vaguely defined,
    overlapping, ambiguous
  • Limited rigor
  • Varying scope

10
Semiotic Information Quality Framework
  • Must be soundly based in theory (rigorous)
  • Semiotics, ontology, service quality
  • Must be usable and useful (relevant)
  • Must include different perspectives (scope)
  • Product view (of stored data)
  • Service view (of received information)
  • Must be clearly structured
  • Coherent categories and criteria

11
Steps in Developing the Framework
  • Use semiotic theory to
  • Define data quality categories
  • Determine criteria derivation method per category
    - Integrate theoretical empirical research
    approaches
  • Integrate objective subjective IQ views
  • IT Practitioner Academic focus groups
  • To refine the framework

12
Semiotics (Pierce and Morris)
  • Three Components
  • sign actual representation (perceivable)
  • referent intended meaning, represented
    phenomenon
  • interpretation received meaning, use of sign

13
Defining Data Quality Categories
Theory (semiotic level)
Application (IS equivalent)
14
Data Quality -Syntactic level (rule conformance)
  • Conforming to metadata
  • Data must conform to data integrity rules
  • Data quality can be checked using computer-based
    tools
  • Use integrity theory (eg. relational)
  • Domain integrity, Entity integrity, Referential
    integrity, Application specific integrity rules

15
Data Quality -Semantic level (external mapping)
  • Mapping from real world to database
  • Mapped Completely
  • Mapped Unambiguously
  • Mapped Correctly (Phenomena)
  • Mapped Correctly (Properties)
  • Mapped consistently
  • Mapped meaningfully
  • Use Bunge Wand Weber ontology theory

16
Data Quality -Semantic level (external mapping)
  • Mapped completely
  • Every external phenomena is represented

Incomplete
DB
RW
17
Data Quality -Semantic level (external mapping)
  • Mapped unambiguously
  • Each identifiable data unit represents at most
    one specific external phenomenon

Ambiguous
DB
RW
18
Data Quality -Semantic level (external mapping)
  • Mapped correctly (Phenomena)
  • Each identifiable data unit maps to the correct
    external phenomenon

Incorrect
DB
RW
19
Data Quality -Semantic level (external mapping)
  • Mapped correctly (Properties)
  • Non-key attribute values in an identifiable data
    unit match the property values for the
    represented external phenomenon

Correct
xxxx
xxxx
yyyy
yyyy
DB
RW
20
Data Quality -Semantic level (external mapping)
  • Mapped consistently
  • Each external phenomena is either represented by
    one identifiable data unit or by multiple but
    consistent identifiable units

Consistent
DB
RW
21
Data Quality -Semantic level (external mapping)
  • Mapped meaningfully
  • each identifiable data unit represents at least
    one specific external phenomenon

Non-meaningful
DB
RW
22
Data Quality -Pragmatic level (user perspective)
  • User perceptions of usefulness
  • Subjective set of quality criteria
  • Derive from previous work and focus groups
  • Use service quality theory
  • Compare expected with perceived actual quality
    assessments

23
Data Quality -Pragmatic level (user perspective)
  • Accessible
  • Data is easy and quick to retrieve
  • Timely
  • The currency (age) of data is appropriate for its
    use
  • Understandable
  • Data is presented in an intelligible manner

24
Data Quality -Pragmatic level (user perspective)
  • Secure
  • Data is appropriately protected from damage or
    abuse
  • Suitably presented
  • Data is presented in a manner appropriate for its
    use
  • Flexibly presented
  • Data can be easily manipulated and the
    presentation customised as needed

25
Data Quality -Pragmatic level (user perspective)
  • Allowing access to relevant metadata
  • Appropriate metadata is available to define,
    constrain and document data
  • Perceptions of syntactic and semantic criteria

26
Use of the Framework
  • Developing data quality assessment instrument
  • Assessing data quality
  • Defining strategies for improving data quality
Write a Comment
User Comments (0)
About PowerShow.com