Title: Quality Data An Improbable Dream?
1Quality DataAn Improbable Dream?
- Elizabeth Vannan
- Centre for Education Information
- Victoria, BC, Canada
2- Information quality is a journey, not a
destination - - Larry P. English
3Agenda
- Data Definitions and Standards Project
- What is Quality Data?
- The Cost of Poor-Quality Data
- Improving Data Quality Our Process
- Questions?
4BC Higher Education
- Canadas Western-most province
- Population 4.023 Million
- Land Area 366,795 Sq Miles
- Publicly Funded Post-Secondary System
- 22 Colleges
- 6 Universities
5CEISS
- The Centre for Education Information is an
independent organization that provides research
and technology services to improve the
performance of the BC education system
6CEISS
- Implement and manage administrative systems
- Perform custom surveys, research and analysis
- Facilitate development and implementation of data
standards - Negotiate and manage province wide software
contracts (Oracle, SCT Banner, Datatel)
7DDEF Project
- The Problem
- Better data about the BC higher education sector
needed for decision-making - No infrastructure in place to facilitate the
collection of data electronically
Data Definitions and Standards Project Initiated
in 1995
8DDEF Project
- The Solution
- Create data standards for all higher education
information (Student, HR, Finance) - Develop a data warehouse based on standards for
reporting - Implement a common technical infrastructure at
all higher education institutions
9DDEF Project
- Project Goals
- Improve the quantity and QUALITY of data
available - Reduce the number of data and reporting requests
- Develop business information system to support
the management and evaluation of the BC
Post-Secondary system
10How Are We Doing?
- 16 institutions implemented/implementing
- Institutions using data warehouses for internal
reporting - Data requests reduced
- Ministry using data
11Why Focus on Data Quality?
- Poor data quality in our data warehouse impacts
- Confidence
- Decision making
- Funding
12Quality Data Are
- The Four Attributes of Data Quality
13Quality Data Are
- Accurate
- Free from errors
- Representative
14Quality Data Are
- Complete
- All values are present
15Quality Data Are
- Timely
- Recorded immediately
- Available when required
16Quality Data Are
- Flexible
- Data definitions understood
- Can be used for multiple purposes
17Quality Data
- Dont have to be perfect
- Good enough to fill the business need at a price
youre willing to pay
Our Challenge Defining Quality Criteria
for Higher Education Data
18Cost of Poor-Quality Data
Incorrect Registrations Inaccurate Tuition
Billings Payroll Errors
19Cost of Poor-Quality Data
Re-collect Data Correct Errors Data Verification
20Cost of Poor-Quality Data
Substandard Customer Service Poor Decision
Making Loss of Reputation
21Improving Data Quality
Improved Data Quality
Business Process Review
22Business Process Review
- When, where, how is data collected?
- Where is data stored?
- Who creates data?
- Who uses data?
- What outputs are required?
- What quality checks already exist?
23Business Process Review
- Involve all stakeholders!
- For student data we involve
- Executive
- Registrars office
- IT Department
- Institutional Research
24Business Process Review
- Results
- Understanding of business practices
- Identification of data creators, custodians,
users - Preliminary quality metrics
- Problem business practices
25Data Quality Assessment
- Establish Metrics
- Apply metrics to data
- Review results
26Establish Metrics
- For each element determine quality criteria
- Acceptable range of values
- Acceptable syntax
- Comparison to known values
- Business rules
- Thresholds
27Quality Metrics
28Applying Metrics
- Collect known information for comparison
- Develop queries to test each of your validation
criteria - We use Oracle Discoverer, but other tools exist
(MS Access, SQL)
29Applying Metrics
Test 1 PEN must be 9 digits long. No characters,
no shorter values acceptable
30Test 1 Results
Two Student Records Contain Invalid PEN Numbers
31Test 1 Results
Invalid PENs Data Entry Error?
Can Identify specific students for data cleansing
32Applying Metrics
Test 2 At least 80 of student records must have
valid PEN number
33Test 2 Results
This Institution Meets the Quality Threshold
34Applying Metrics
Test 3 No Duplicate PENs
35Test 3 Results
This institution has a BIG problem! Can we see
more details?
36Test 3 Results
Addition information reveals data loading problems
37Reviewing Results
- Systematic approach needed
- Develop strategy for data cleaning
- Identify source of data problems
Deal with Disparate Data Shock!
38Reviewing Results
- Insert a quality review checklist
39Reviewing Results
40Data Cleansing
- Location
- Administrative System?
- Staging Area?
- Who
- Scope
41Typical Data Cleansing
- Correcting data entry errors
- Removing or correcting nonsensical dates
- Deleting garbage records
- Combining or deleting duplicates
- Updating and applying code sets
42Business Practice Change
- Two components
- Implementing changes to improve data quality
- Adopting ongoing data quality review process
Changing Business Practices is a Challenge Get
Stakeholder Support
43Business Practice Change
- Education
- Centralizing responsibility for codes
- Consolidating data collection
- Implementing validation routines
- Change business processes
44Quality Review Process
- Review data regularly
- Make someone responsible
- Establish procedures for correcting data problems
- Communicate quality improvements
45Some Changes in BC
- Creation of Data Manager position, responsible
for code sets, data quality - Regular education for registration clerks and
other data creators - Established relationships between data creators
and users - Re-engineered administrative systems
46Improvements to BC Data
- Improved data quality and quantity
- Nonsensical dates almost eliminated
- Completeness of key elements improved (from 50
to 80-90) - Data now being collected for CE in standard format
47Final Thoughts
- Quality Data are Probable if you are willing to
- Take a critical look at your existing data
- Implement changes to how you collect and manage
data - Invest the time to educate and communicate with
data users and creators - Make data quality improvement an on-going process
48Recommended Reading
- Brackett, Michael H., Data Resource Quality,
Turning Bad Habits into Good Practices (New
YorkAddison-Wesley, 2000) - English, Larry P., Improving Data Warehouse and
Business Information Quality (New York John
Wiley and Sons, 1999) - Redman, Thomas C., Data Quality for the
Information Age (BostonArtech House, Inc., 1996)
49Thank You!
Presentation Available At www.ceiss.org or evannan
_at_ceiss.org