Title: Collecting
1Collecting Managing Data
- General Clinical Research Center
- Informatics Core
- Richard Harris
- (HarrisRichardM_at_UAMS.edu)
- 501-257-5878
2Goal
- Introduction to Data Collection and Management
- Defining data
- Collecting data
- Storing data
- Retrieving data
3Data Definition
- What is data?
- Representation of facts or concepts in a manner
suitable for communication, interpretation, or
processing by humans or by automatic means - Something used as a basis for calculating or
measuring (Websters Dictionary)
4Representation of Facts
Representation of Facts
105 / 74 110 / 78 108 / 75 103 / 70
5Representation of Facts
Baghdad, Iraq
Average Temperature High / Low
105 / 74 110 / 78 108 / 75 103 / 70
Years Charted 9 Source International Station
Meteorological Climate Summary, Version 4.0
6Data to Value
Data produces information
Information
Information enhances knowledge
Knowledge drives action
Action produces outcomes
Outcomes deliver value
Dave Wells, TDWI
7Bench to Bedside
Drivers and Goals determine strategy
Strategy drives Protocols/Study design
Protocols produce Results
Positive Results produce Value
8Parallel Relationship
Dave Wells, TDWI
9Data Collection
-
- Begin with the End in Mind
- The 7 Habits of Highly Effective People
- by Stephen R. Covey
10General Considerations
- Determine
- What data you will collect
- Why you will collect it
- How you will collect it
- What you will do with it
11Data Collection
- Data Dictionary (Code Book, Metadata)
- What is the purpose of your protocol
- Can you explain your hypothesis in numerical
terms - List of fields/data points to collect
- Subject Name, demographic information
- Diagnosis, medication, lab results
- Relationship of data collected
12Data Dictionary Example
13Data Structure Considerations
- Use short, meaningful field names with no spaces
and no special characters (_at_!) - Store a 4 digit year, not 2
- Determine data type
- Will I want to derive averages, sums, StdDev?
(One, 1, 1) - Contemplate subject or sample ID numbering
scheme. (HIPAA) - Store identifying data in a different place but
linked. - Make users pick from a list when possible for
more uniform data entry - DO NOT put first and last name in same field!
- Height, Weight, Medication Dosage, etc.
14For note takers
Determine all relevant data to be
collected. Create a Data Dictionary
(Metadata). De-Identify data. Backup data
regularly.
15Storing Data
- A database is a collection of data organized in a
manner that allows access, retrieval, and use of
that data. - A stack of paper research subject data forms
could be considered a database. - Databases vary from the very simple to the very
complex
16Data Access
- Retrieving information, such as a list of
subjects with a specific diagnosis, from a stack
of hundred subject forms would be laborious.
Electronic databases make data management much
easier!
17Excel Example
18Excel as a Database
- An electronic spreadsheet can be considered a
database. - Most spreadsheets are not
Relational
19Relational Database
20Relational Database
Lab Table keySubjectID/Lab
Visit Table keySubjectID/Visit
More efficient compared to one spreadsheet!
21Table Example
Data Types SubjectID Numeric, Integer,
Text SubjectBirthYear Date (Date functions
built-in) Race / Ethnicity Make selection box
with NIH values
22Collection - Data Entry Form
Modeled to represent Collection Forms or
Flowsheets
23Collection - Data Entry Form
24Datasheet or Table View
25Importance of Relationships
- Subject Table each subject has one entry
- Visit Table - entry for each visit
- LabResults Table entry for each lab test
- Meds Table- entry for each medication
26Table Data
Table structures presented as Flat files
27Importing Data
- Importing supported from
- Access
- Excel
- Lotus
- Text
- Other file formats
28Relational Database vs Spreadsheet
- Unlike a spreadsheet a relational database can
help insure that - visit data not collected without subjects basic
(or root) data being present. - duplicate data not entered for the same subject.
- data not redundantly entered.
29Advantages of Relational Databases
- You define the relationships among tables.
- Each piece of information is stored in one place.
- Reduces error since changes to information occur
in only one place. - Easy to revise the structure of the data base.
30Database Management System (DBMS) Software
- Excel (?)
- Small Database Software
- Microsoft Access, FoxPro, mySQL
- Midsize Database Software
- Microsoft SQL Server, mySQL, PostgreSQL
- Large (user base/data set) Software
- Oracle, DB2, Sybase
31Query to Retrieve Data
QUERY question to the database to locate
specific information
Examples
- List all subject IDs with first visit after
April, 2004 - List the minimum, the maximum, and the average
pulse rate for subject group - List all subject IDs of all females receiving
DrugX - List all subjects and phone numbers in
alphabetical order by subject last name, then by
first name - List all subjects IDs whose blood pressure was
higher on their last visit than it was on the
their first visit and give the number of days
between those visits
32Structured Query Language (SQL)
- SELECT tblSubject.SubjectID, tblSubject.SubjectBir
thYear, tblSubject.SubjectGender,
tblSubject.SubjectEthnicity, tblVisit.VisitDate,
tblVisit.Systolic, tblVisit.Diastolic - FROM tblSubject
- INNER JOIN tblVisit ON
- tblSubject.SubjectID tblVisit.SubjectID
33Structured Query Language (SQL)
- SELECT tblVisit.SubjectID, tblVisit.VisitDate,
tblVisit.VisitReason - FROM tblVisit
- WHERE (((tblVisit.VisitDate) Between 4/15/2003
And 5/15/2003)) - ORDER BY tblVisit.VisitDate
34Query Result
35Common Query and SQL Shorthand
- or GT,
- or EQ, or NE
- IS NULL (-99)
- AND, OR, NOT
- BETWEEN
- and ? (with LIKE)
- Parentheses usage
36Structured Query Language (SQL)
- SELECT tblSubject.SubjectID, tblVisit.VisitDate,
tblSubject.SubjectGender, tblVisit.Systolic,
tblVisit.Diastolic, tblVisit.Pulse,
tblVisit.Respiration - FROM tblSubject INNER JOIN tblVisit ON
tblSubject.SubjectID tblVisit.SubjectID - WHERE ((tblSubject.SubjectGender"1") AND
(tblVisit.Systolic150) AND (tblVisit.Diastolic)) OR (tblVisit.Pulse90)
37Query Results
38Structured Query Language (SQL)
- SELECT tblSubject.SubjectID, tblVisit.VisitDate,
tblSubject.SubjectGender, tblVisit.Systolic,
tblVisit.Diastolic, tblVisit.Pulse,
tblVisit.Respiration - FROM tblSubject INNER JOIN tblVisit ON
tblSubject.SubjectID tblVisit.SubjectID - WHERE ((tblSubject.SubjectGender"1") AND
(tblVisit.Systolic150) AND (tblVisit.Diastolic)) OR (tblVisit.Pulse90)
39Query in Design View
40Analysis in Access
- Queries
- Sort
- Inclusion/Exclusion
- Boolean Logic
- Calculated values
- Cross tab results
- Pivot tables
- Reports
- Additional calculated values
- Visual Basic for complex calculations
41Reporting / Analysis
Queries produce a subset of records Reports
format and present the subsets Examples Query
the database for list of subjects WHERE vital
signs above or below expected values Format the
report in alphabetical order or date sequence or
descending order by specified field Format the
last and first name together with a comma
separator
42Reporting / Analysis
43Analysis - Report Design Example
44Common Reporting Needs
- Count number of records meeting criteria
- Calculate sums, averages, totals, standard
deviation and specialty specific equations - Accounting of study work and recruiting
- Cross sectional analysis or representation of
study findings - Statistical Analysis
45Regular Interim Reporting
- Reports should be generated early and often in
order to identify problems in data entry or
collection. - Simple summary statistics on variables helpful in
identifying outliers or nonsensical results. - A schedule should be followed.
46Exporting Data
Options Access Excel HTML Lotus Text RTF Others
47Why Export?
- Analyze data in a more robust program
- Analyze data in a program familiar to you
- Interface with a new or specialized program
- No re-keying increases data integrity
- No re-keying saves time!
48Export to Excel
- From Access, export to Excel using Save
As/Export - From Excel, open the file
- Data sorts
- Functions
- Pivot Table
- Data Analysis
- Charts
49Why Collect, Store, Retrieve Data?
- Quantifying the results of your protocol
- Explaining your hypothesis in numerical terms
- Statistical Analysis
- Progress Reports
- Publications, Publications, Publications!
- Even if findings do not support your hypothesis
- Data Sharing and Knowledgebase for collaborations
and future projects - And, Publications, Publications, Publications
50What can the GCRC do for you?
- Database Design Consultation
- Database Storage on Server
- Backup ?
- Data Translation Assistance
- Questions?