Title: Data Collection and Management for Clinical Research
1Data Collection and Management for Clinical
Research
- Michael A. Kohn, MD, MPP
- 25 August 2008
2Clinical Research
- Choose the study design, and define the study
population, predictor variables, and outcome
variables - measure these variables and anticipate problems
with measurement - analyze the results
- Today, we discuss the nitty gritty of
collecting, storing, updating, and monitoring the
study measurements.
Private companies that make data management
systems for clinical research understand
clinical research to include only RCTs
preparatory to FDA drug or device approval, not
observational studies.
3Outline
- Paper data collection forms and data tables (rows
and columns) - Normalization of tables
- Online data collection forms
- Data entry options for coded responses
- Questionnaires
- Data management protocol
- Queries (?)
4Paper Forms
5One-Sentence Description of Study
- The cute acronym study is a design study of
the association between predictor and outcome
in study population.
6PHTSE
- The PHTSE (Pre-Hospital Treatment of Status
Epilepticus) Study was a randomized blinded trial
of lorazepam, diazepam, or placebo in the
treatment of pre-hospital status epilepticus.
The primary endpoint was termination of
convulsions by hospital arrival. To enroll
patients, paramedics contacted base hospital
physicians by radio.
Lowenstein DH, et al. Control Clin Trials. 2001
Jun22(3)290-309.
Alldredge BK, et al. N Engl J Med. 2001 Aug
30345(9)631-7.
7(No Transcript)
8(No Transcript)
9Display the data from these 2 data collection
forms in a 2-row data table.
10Create a 9-field data dictionary for the data
table.
11The paper data collection forms were completed by
busy base hospital physicians who were called
from the emergency department to a radio room.
What are the advantages and disadvantages of
using an on-screen computer form instead of a
paper form? If you designed the study, which
would you use?
12Advantages of an on-screen form
- No need for transcription from paper forms into
the computer data tables - Immediate feedback on invalid entries
- Programmed skip logic (if seizure stopped during
prehospital course, computer form prompts for
time seizure stopped, otherwise this field is
disabled and skipped) - Can be made available via a web-browser at
multiple sites simultaneously
13Disadvantages of an on-screen form
- Hardware requirement -- a computer workstation
- Some user training required
14Advantages of a paper form
- Ease and speed of use
- Portability
- Ability to enter unanticipated information or
unstructured data (notes in the margin, responses
that were not otherwise considered, etc.) - Hardware requirement a pen
- User training received by all data entry
personnel in elementary school
15Disadvantages of a paper form
- Requires subsequent transcription into the
computer database - No interactive feedback or automated skip logic
- Data viewing and entry limited to one person in
one place
16Which would you choose?
- Although data entry via on-screen data collection
forms has many advantages and we recommend it for
most research studies, in this study it is
impractical. The simplest, fastest, and most
user-friendly way to capture data on a
non-volatile medium is still to use a pen and
paper.
17Paper Data Collection Forms
- Paper forms filled out by an examiner or
interviewer - Paper forms filled out by the patient or subject
- How do you get the data off of the paper forms
and into the data table(s)? - Study data are stored in one or more data tables
in which
18Data Tables
- Study data are stored in one or more data tables
in which - Rows Records Entities
- Columns Fields Attributes
19Rows Records Entities
- Columns Fields Attributes
20Jaundice and Infant Feeding Study Cohort study to
determine the association between neonatal
jaundice or feeding disorders and 5-year
neuropsychological scores in children born at
Northern California Kaiser Hospitals.
Newman TB, et al. N Engl J Med. 2006 May
4354(18)1889-900.
21Infant Jaundice Study (Our fictional version of
JIFee)
Methods Design-Double Cohort study. Setting-Sing
le, urban medical center Subjects-Infants with
neonatal jaundice and randomly selected
non-jaundiced infants Predictor Variable-Presence
or absence of jaundice Outcome Variable- IQ score
(ranging from 55 to 145) at age 5 years Analysis-
?
22(No Transcript)
23Where is the Predictor Field?
- What is the predictor in the Infant Jaundice
Study? - Where is it on the data collection form?
24Demonstration Creating a Data Table
Label columns and enter rows of data in datasheet
view
25Demonstration Data Dictionary
- Table design view
- field (column) names,
- data types,
- definitions,
- validation rules
- (More on data types, free-text vs. coded
responses, later)
26(No Transcript)
27Demonstration Data Validation
Disallowed values
Duplicate primary keys
28Acceptable table showing one set of exam results
per participant. (BabyExamForFigure3)
29Demonstration Same Table in Excel, Stata
- Excel
- Stata
- Etc
- Rows Records Entities
- Columns Fields Attributes
Access and Stata have a special row at the top
for column headings (field names) Excel just
uses the first row.
30Table of Study Subjects
Table of Study Subjects Row Individual
Infant Columns ID, Name, DOB, Sex, Jaundice If
some infants have more than one exam, what do you
do?
31Undesirable table showing multiple exam results
per study participant. (BabyExamForFigure4)
32Demo
- Find highest IQ Score
- Find all exams done in April
33Common Error
- If you find yourself creating multiple columns
for the same measurement, e.g., Date1, Score1,
Date2, Score2, Date3, Score3, - Or if your table is more than about 30 columns
wide, - It is time to restructure your table.
34Undesirable table with participant-specific data
duplicated for each exam. (Note problem with
Helens DOB.) (ExamBabyForFigure5)
35Demo
- Find highest IQ Score
- Find all exams in a particular month
- What happened to Alejandro, Ryan, Zachary, and
Jackson?
36Normalization
If some infants have multiple exams, normalize
the records into two tables, one for subjects and
one for examinations.
37Data normalized into two tables one (Baby)
with rows comprising subject-specific
information the other (Exam) with rows
comprising exam-specific information. Note that
Helen can only have one birth date. Subjects
with no exams, e.g. Alejandro, still appear in
the database. SubjectID functions as the
primary key in the Baby table and as the
foreign key in the Exam table.
38Figure 7. Relationships diagram showing the
one-to-many relationship between the table of
subjects (Baby) and the table of measurements
(Exam).
39For more on normalization and the Relational
Database Model,
Relational Database Management Systems for
Clinical Research, UCSF EPI 218 Winter 2008 (1
unit)
- http//www.epibiostat.ucsf.edu/courses/schedule/da
ta_management.html
40Start with Data Tables or Data Collection Forms?
- It doesnt matter as long as the process is
iterative. - Can start with the tables and then develop the
forms, test the forms, find problems, and update
the tables. - Can start with a word-processed form, create the
tables, test, and update.
41Before seeking help with data management
- Search the internet and ask other researchers for
already developed data collection forms. - Draft your data collection form.
- Test your data collection form with dummy
subjects and, even better, with real
(de-identified) study subjects. - Enter your test data into a data table with rows
corresponding to subjects and columns
corresponding to data elements. (Use Excel,
Access, Stata, or even Word.) - Create or at least think about a data dictionary.
- Decide who will collect the data, and when/how
the data will be collected.
42On-Screen Data Collection Forms
- Will demonstrate using the race field from the
Infant Jaundice Study - Free text versus coded response
- Single response (mutually exclusive choices)
versus all that apply
43Free Text vs. Coded Responses
- Same as Open-Ended vs. Closed-Ended Questions
- Free text responses useful in developing coded
response options.
44Mutually Exclusive, Collectively Exhaustive
Response Options
- One field (column)
- Can always make responses exhaustive by including
an Other response - Drop down list (combo box) vs. pick list (field
list) vs. option group
45Drop-down List (Combo Box)
- Saves screen real estate
- Doesnt work on paper forms
(Master form)
46Combo Box
47Combo Box
48Pick List (Field List)
- Uses up screen real estate
- Useful on paper forms
(MasterRaceAsFieldList form)
49Field List
50Option Group
- Radio buttons (by convention)
- Uses up screen real estate
(MasterRaceAsOptionGroup form)
51Option Group
52Mutually Exclusive One Field
53All that apply
- Multiple fields ( columns)
- Use check boxes (by convention)
(MasterRaceAsAllThatApply form)
54All That Apply
55All that Apply Multiple Fields
56From Paper Data Forms to Data Table(s)
- Transcription directly into the table(s)
- Transcription via an online (screen) form
- Scanning using OMR software
Best option Dont use paper data collection
forms at all (more on this later).
57NHAMCS
58On-screen vs. paper forms
- Minimize the extent to which study measurements
are recorded on paper forms. -
- Enter data directly into the computer database or
move data from paper forms into the computer
database as close to the data collection time as
possible. - When you define a variable in a computer
database, you specify both its format and its
domain or range of allowed values. Using these
format and domain specifications, computer data
entry forms give immediate feedback about
improper formats and values that are out of
range. The best time to receive this feedback is
when the study subject is still on site. - You can always print out a paper copy of the
screen form or a report of the exam/interview
results once the data are collected. - Examples ATM Machines printed transaction
record, Gas Stations printed receipt (but
neither of these is going to be accessed or
updated by a research assistant so the printout
may be less important).
59Questionnaires
- Special type of data collection form
- Patient history (interview), not physical exam or
diagnostic test results - Often put a premium on branching logic.
Requires a subject who can talk. Cant
administer a questionnaire to infants or to
intubated/comatose patients.
See DCR 3rd Ed. Chapter 15
60Clinical research studies consistently under-plan
and under-budget for data management.
- One FTE research assistant, including benefits,
will cost your study 100k/year you should be
willing to spend one tenth as much for consulting
on your database system. - Dont leave data checking, validation, and
cleaning for the biostatistician - a) it will be too late
- b) biostatisticians do not view data cleaning
as part of their job.
61Data Management Protocol
- General description of database (including
relationships diagram) - Data collection and entry
- Error checking and data validation
- Analysis (e.g., export to Stata)
- Security/confidentiality
- Back up
62General Description of Database
- DBMS, e.g. MS Access XP
- of dynamic tables
- of static lookup tables
- of forms
- of reports
- An appendix should include the relationships
diagram, the table names and descriptions, and
the field names and descriptions (data
dictionary).
63Data Collection and Entry
- Import baseline data from existing systems
- Import lab results, scan results (e.g. DEXA),
holter monitor data, and other digital data. - For each form, who will collect the data?
- Collect onto paper forms and then transcribe?
Enter directly using screen forms? Scannable
forms?
64Error Checking and Validation
- Database automatically checks data against the
range of allowed values. - Periodic outlier detection. (Outliers still
within the range of allowed values.) - Calculation checks
- Double data entry?
65Analysis
- How will you get the data out of the database?
66Security/Confidentiality
- Keep identifying data (name, SSN, MRN) in a
separate table. - Link rest of DB to this table via a Subject ID
that has no meaning external to the DB. - Restrict access to identifying data.
- Password protect at both OS and application
levels. - Audit entries and updates.
67Back ups
- Ask your system person to restore a file
periodically. This tests both the back-up and
restore systems.
68Assignment?
Write a one-page data management section for your
research study protocol.
69Queries
70Select Queries
- Select queries (aka Views) organize, sort,
filter, and display data. - Queries use Standard Query Language (SQL), but
you dont have to learn it, because of graphical
query design tools. - A query can join data from two or more tables,
display only selected fields, and filter for
records that meet certain criteria.
71Demonstration
- Age in months and BMI at exam of subjects who
were examined in January and February of 2010.
72Select Queries Produce Table-Like Results
- Note that the result of a select query that joins
two tables, displays only certain fields, selects
rows based on special criteria, and calculates
age and BMI still looks like a table in datasheet
view. - But, remember that it is merely a view of data
from the underlying tables.
73Action Queries Change Data
- Update Query -- changes the values of specific
fields in existing records - Append Query -- adds new records (rows) to a
table - Delete Query -- deletes records from a table