Title: Data Quality and Ensuring Usability of routinely collected PC data
1Data Quality and Ensuring Usabilityof
routinely collected PC data
- Presented to
- Integrating Clinical and Genetic Datasets
Nirvana or Pandoras Box - Presented by
- Simon de Lusignan
- slusigna_at_sgul.ac.uk
2About me
- GP in Guildford
- 11,500 patient practice
- 6.5 Whole time equivalent GPs
- Computerised since 1988
- Senior Lecturer, St. Georges
- Primary Care Informatics (PCI) research group
- Using routinely collected data for quality
improvement research - Electronic libraries
- Computer in the consultation
- Telemonitoring
- Chair PCI WG of EFMI
- Developing a BSc in BMI
3Overview
- Introduction
- Benefits from linking clinical genetic data
- Growing volumes of accessible primary care data
increasingly used for quality improvement
research - Objective
- Is it possible to define the features of a
routinely collected dataset which can be
integrated to genetic data - Method
- Literature review 10 years of experiential
learning working with data - Features of quality data
- What is data quality?
- Unique identifiers denominators
- What need to be defined about data processing
storage - Discussion
4Introduction
- GIVEN Benefits from linking clinical and
genetic data - Routinely collected clinical data is used
increasingly for - Quality improvement
- Clinical Audit
- Health Service Planning
- Research
References 1. de Lusignan S, van Weel C. The use
of routinely collected computer data for research
in primary care opportunities and challenges.
Fam Pract. 2006 Apr23(2)253-63. 2 de Lusignan
S, Hague N, van Vlymen J, Kumarapeli P. Routinely
collected general practice data are complex but
with systematic processing can be used for
quality improvement and research. Accepted for
publication Informatics in primary care
5Objective
- To define the features of clinical data which
make them fit for integration with genetic data
6Features of quality data
- Defining Data Quality
- Unique identitifiers
- Defined process of data extraction storage
7 Defining data quality
- Evolving definitions
-
- Completeness accuracy (Pringle et al. BJGP
1995) - Currency (Williams, Methods 2003)
- Sensitivity positive predictive value (Thiru
et al., BMJ 2003) - Data Quality Probe (Brown Warmington IPC
2003) - Fit for purpose (PCI WG EFMI, 2005)
8Unique IDs
- Linkage of data
- Interoperability of systems
- Follow-up / traceability of individuals
- Population denominator ghosts.
- England Wales - NHS number
- Scotland - CHI number
- Our system
- MIQUEST unique ID for one practice
- compound with study number
- unique ID for practice
- Convert to non-case sensitive ASCII format
9Processing data
- Appreciation of data entry issues contemporary
perspective of system users - Defined stages of data processing applications
used at each stage, quality controls - Archive coding systems and the look-up tables
used to infer meaning or rubrics - The queries used to extract the data
- A metadata system to ensure traceability of each
cell of data - The ethical constraints that apply to the
dataset.
10(1) Data entry issues contemporary
perspective of users
- COPD and Bronchitis codes are easily confused
- Recoding half of the practice asthmatics from a
diagnosis to history of code
Ref Faulconer ER, de Lusignan S. An eight-step
method for assessing diagnostic data quality
COPD as an exemplar. Inform Prim Care.
200412(4)243-54.
11(2) Defined stages of data processing
- We have defined eight discrete steps in data
processing - (1) Design of queries, piloting,
- (2) Data entry, (already dealt with)
- (3) Extraction,
- (4) Migration, unique IDs essential
- (5) Integration,
- (6) Cleaning,
- (7) Processing, and
- (8) Analysis
Ref van Vlymen J, de Lusignan S, Hague N, Chan
T, Dzregah B. Ensuring the Quality of Aggregated
General Practice Data Lessons from the
Primary Care Data Quality Programme (PCDQ). Stud
Health Technol Inform. 20051161010-5.
12(3) Archive coding systems.
- Coding systems are constantly evolving
- In general coding systems are becoming larger
more complex - You can go from many to few but not from few to
many - We archive Clinical codes look-up engine used
- e.g. NHS Triset Browser
- Each relevant version
- E.g. 4 and 5-Byte Read Codes Drug Dictionary,
Proprietary codes
13 Example of look-up engine
14(4) The query library
- Re-issued by date
- Query set for each clinical programme
- e.g. C1, C2, C3 Cardiac programme
- Query set for each extraction type
- e.g. E4, E5, G4, G5 (E for EMIS, G for Generic)
- Defined look-up tables rubrics for queries
15 The query library
16The C2 queries
17 The C2 EMIS 5-Byte set
18(5) Metadata system
- Follows data from query set to analysis
- Preserves original data
- Derived variables clearly identified
- Associated dates numerics labelled
- Rules for units used
- Look-up table used to define variable names
van Vlymen J, de Lusignan S. A system of metadata
to control the process of query, aggregating,
cleaning and analysing large datasets of primary
care data. Inform Prim Care. 200513(4)281-91.
19Source data metadata structure
20Linking elementsQuery libraryQuery Core
Clinical Concept Read code
21Core clinical concept (CCC)
22Automation
23(6) Ethics
- The Ethical constrains on any dataset are
indexed in the query library
24Summary
25 Summary
- Data quality is best defined in terms of
- Fitness for purpose - What purpose when?
-
- Transparent methods of data processing allow
audit of results - Understanding data entry issues / context is
essential - Metadata can help control processing
- Careful curation of data may allow its use
beyond the timescale of the original study
26 - Thanks for listening
- Simon de Lusignan
- Tel 020 8725 5661
- Fax 020 8767 7697
- Email slusigna_at_sgul.ac.uk
- Web www.gpinformatics.org
- www.sgul.ac.uk/informatics/