Title: Collaborative Data Management for Longitudinal Studies
1Collaborative Data Management for Longitudinal
Studies
- Stephen Brehm
- coauthors L. Philip Schumm Ronald A. Thisted
- University of Chicago
- (Supported by National Institute on Aging Grant
P01 AG18911-01A1)
2Agenda
1. Background on Study
2. Problem Data Management Deficiencies
3. Solution Collaborative Data Management
4. STATA Programs maketest makedata
3Background on Study
- NIH-funded Longitudinal Study
- Loneliness Health
- Thousands of Measures
- Loneliness
- Depression
- 230 subjects
- Repeated Yearly
4Problem Data Management Deficiencies
- Code Not Modular
- Difficult to manage the data cleaning code
- Limited code reuse from year to year Difficult
to collaborate among interns - No Established Set of Data Cleaning Steps
- Difficult for research assistants (turn-over)
- Inconsistent data cleaning techniques
- Data cleaning code difficult to read
5Problem Data Management Deficiencies
Research Assistant
Research Assistant
Research Assistant
Core File Set
Research Assistant
Research Assistant
6Solution Collaborative Data Management
- Process
- Established Steps
- File System Layout
- Automated Tests
- Collaboration
- Concepts
- Module
- Batch
- Data Certification
- STATA Programs
- maketest
- makedata
7Solution Collaborative Data Management
- Process
- Established Steps
- File System Layout
- Automated Tests
- Collaboration
- Concepts
- Module Exloneliness
- Batch
- Data Certification
- STATA Programs
- maketest
- makedata
8Solution Collaborative Data Management
- Process
- Established Steps
- File System Layout
- Automated Tests
- Collaboration
- Concepts
- Module Exloneliness
- Batch Exyr1, yr2, yr3
- Data Certification
- STATA Programs
- maketest
- makedata
9Solution Collaborative Data Management
Set of Files for Each Module acquire-module.do
fix-module.do test-module.do derive-module
.do label-module.do
Year-Specific
60 Code Reuse Files Shared Between Years
Acquire Fix
Derive
Test
Label
10STATA Program maketest
- Purpose
- Auto-generation of Data Certifying Tests
- Functionality
- Tests Variable Type
- Checks Consistency of Value Labels
- Verifies Existence of Variable
11STATA Program maketest
- Syntax
- maketest varlist using, REQuire(varlist)
append replace - Example
- maketest using filename.do, replace
- Options
- using specifies file to write
- REQ requires presence of variables in list
- append add to existing test .do file
- replace overwrite existing .do file
12STATA Program makedata
Bringing it all together
13STATA Program makedata
- Syntax
- makedata namelist, Pattern(string) replace
clear Noisily Batch(namelist) TESTonly - Example
- makedata ats, p("acquire-.do") b(yr1) clear
replace - Options
- p pattern file naming convention
- replace overwrite existing data file
- clear clear current data in memory
- Noisily full output (default summary)
- b batch year, wave, center
- TESTonly only run tests step
14Other Applications
- Beyond Longitudinal Data
- Teaching Data Cleaning with STATA
- Contact Information
- Stephen Brehm
- sbrehm_at_uchicago.edu
- L. Philip Schumm
- pschumm_at_uchicago.edu
- Ronald A. Thisted
- thisted_at_health.bsd.uchicago.edu
- Supported by National Institute on Aging
- Grant P01 AG18911-01A1