Title: Database Design Techniques for Clinical Research
1Database Design Techniques for Clinical Research
- Melissa K. Carroll, M.S.
- October 20, 2003
2Overview
- Relational database design
- Implementing a relational database in Microsoft
Access - Designing a database for a typical study by our
group - Resources
- Questions
3Relational Database Design
4What is a Database?
- Collection of data organized for efficient
operations - Everyone uses them multiple times a day, often
without realizing it - Examples
- Airline reservations
- Online shopping
- Underlying design principles are largely universal
5Problems with Flat Files
- Data Redundancy
- Leads to more work and inconsistencies
- Wreaks havoc on performing basic manipulations
such as searching and sorting - File Management
- Multiple files
- Concurrent users
- Security
- Intermediate results
- Ad hoc programming (reinventing the wheel)
6Data Modeling Entity-Relationship Model
- Models data as entities, with attributes, and
relationships between entities - Entity person, place, or thing
- Instance example of an entity
- Attribute feature of an entity
- Relationship describes association between
(usually two) entities
7E-R Notation
8Designing an E-R Diagram
- Issues to Consider
- What questions will the data be used to answer?
- What are the entities and how do they relate to
each other? - What attributes uniquely identify entities?
- What attributes need to be sub-divided?
- Goal Eliminate Redundancy
- Process is called normalizing data
9Sample E-R Diagram MP3 Files
10Relational Model
- Lower-level model used for actual database
implementation - Translating from E-R model
- Entities become tables
- Attributes become fields
- Many-to-many relationships become tables
- Unique identifiers from involved tables as fields
- Unique identifiers from one sides are added as
fields to corresponding many sides
11Relational Database Management Systems (RDBMSs)
- Database Management System (DBMS) software with
purpose of helping user design and use a database - Relational Database Management System (RDBMS)
DBMS for databases based on relational model - Most major commercial products (e.g. MS Access,
Oracle, MySQL, SQL Server)
12SQL
- Need language to tell the DBMS
- The design of the database
- Actual data to be entered
- What data to retrieve and in what format
- SQL standardized language used by almost all
major DBMSs - Standard language provides interoperability and
portability
13SQL Examples
- CREATE TABLE artist (artistID INT
AUTO_INCREMENT, artistName VARCHAR(75)) - INSERT INTO artist (artistName) VALUES (The
Beatles) - UPDATE album SET label EMI WHERE albumTitle
Abbey Road
14SQL Examples Continued
- SELECT songTitle, quality FROM song, recording
WHERE song.songID recording.songID - SELECT songTitle, quality FROM song INNER JOIN
recording ON song.songID recording.songID - SELECT albumTitle, albumAge AS releaseYear -
Date() FROM album
15SQL Examples Continued
- SELECT Count(artistID) from artist
- SELECT MAX(recording.quality) FROM artist,
recorded, recording WHERE artist.artistName
recorded.artistName and recorded.recordingID
recording.recordingID and artist.artistName
The Beatles
16Relational Database Implementation in Microsoft
Access
17Clinical Research Database Design
18Typical Simple Study
- Baseline and fixed number of follow-ups
- Subject reaches each time point only once
- Different time points have different scale
protocols - Considerable overlap in scales between time
points - Isolated from other studies
19Four Database Design Approaches
- Approach One entire assessment administration as
entity, e.g. all of baseline or all of 12 week - One table per time point, items as attributes
- Approach Two scale administrations within each
assessment as entity, e.g. 12 Week Hamilton - One table per scale per time point, items as
attributes - Approach Three scale administration as entity
- One table per scale, items as attributes
- Approach Four item as entity
- One table (theoretically)
20Evaluation of Approach One
- May seem appropriate because common format for
analysis is one record per subject - Problems
- Limited number of fields allowed in some DBMSs
- Will have many missing values
- General redundancy issues (shares with Approach
Two, to follow)
21Pros and Cons of Approach Two Versus Approach
Three
- Pros
- Horizontal format
- Flexibility for handling inter-time point scale
disparities - Cons (for simple studies)
- Data model complexity
- Table creation and modification time multiplied
- Space consumption
- More data locations (entry and retrieval
complexity) - Re-assigning to different time points
22Reassigning Scale Time Points Using Approaches
Two and Three
23Reassigning Scales Modified Approach Three
24Approach Two Cons for More Complex Studies
- Poor at handling an indefinite number of
follow-up time points - Modified Approach Three is better at handling
studies in which subjects are assessed at the
same time point multiple times - May happen due to progressing through the study
multiple times - May also happen due to e.g. being screened
multiple times
25Evaluation of Approach Four
- Pros
- Could potentially handle changes more elegantly
- Perhaps more normalized theoretically
- Cons
- Considerably harder to design entry interface
- Harder to obtain data in formats usually required
- Doesnt fix non-database problems with data
collection changes
26Databases and Datasets
- Database Collection of data organized for
efficient entry, updating, storage, and retrieval - Dataset Subset of data retrieved from database
in a format optimized for a specific reporting or
analysis purpose - Well-designed databases should facilitate
creation of datasets in any desired format - Datasets should be formatted for a particular
purpose and used only for that purpose
27Normalizing Data Within Scales Medication Data
28Normalizing Data Continued Comparison of
Medication Queries
29Multi-Study Issues To Separate or Not To Separate
- If same data will count for multiple studies
- Keeping design and data in sync
- E.g. updating all copies when data changed
- E.g. ensuring scale changes are reflected in all
tables and forms - If handling multiple, possibly isolated studies
- Keeping design in sync
- Can still use views so actual storage is
transparent to user
30Summary
- Careful planning must go into designing a
database - First step in design is to model the data
- E-R ? relational model is effective
- DBMSs, such as Access, offer tools for creating,
using, and maintaining databases - When designing clinical research databases, as
with any databases, priority should be
normalization, hence elimination of redundancy - Properly designed databases will supply data in
any format desired
31Resources
- Access Help (Help in top menu, Contents and
Index, Contents tab) - Access Database Wizard (in main menu upon
opening) - Oreilly Access Database Design Programming, 3rd
Edition - For database design theory online chapter at
http//www.oreilly.com/catalog/accessdata3/chapter
/ch04.html - Access (97/2000/etc.) Bible
- Available here not 100 accurate
- Database System Concepts Fourth Edition
(Silberschatz, Korth, Sudarshan)