Finding and Using Publicly Available Datasets for Secondary Data Analysis Research

1 / 42
About This Presentation
Title:

Finding and Using Publicly Available Datasets for Secondary Data Analysis Research

Description:

... Serial cross-sectional Visit-focused Not comprehensive, ? value for chronic diseases Discharge ... hospital stays collected from ~20% ... Division of Geriatrics ... –

Number of Views:324
Avg rating:3.0/5.0
Slides: 43
Provided by: Department765
Category:

less

Transcript and Presenter's Notes

Title: Finding and Using Publicly Available Datasets for Secondary Data Analysis Research


1
Finding and Using Publicly Available Datasets for
Secondary Data Analysis Research
  • KL2 Seminar
  • February 2011

2
Disclosures and acknowledgements
  • Disclosures
  • None
  • Acknowledgements
  • Alex Smith, Michael McWilliams, Ann Nattinger,
    SGIM Research Committee

3
Two shout-outs
  • Comparative Effectiveness Research through CTSI

Smith AK et al, JGIM 2011
4
Learning objectives
  • Appreciate key conceptual and practical issues
    involved in secondary data analysis
  • Identify and use online tools for locating and
    learning about publicly available datasets
    relevant to your research
  • Focus on what is useful to you

5
(My) Definition of Secondary Data
  • Data that have been collected
  • but not for you

6
Types of Secondary Data
  • Survey (NHIS, NHANES, HRS, BRFSS)
  • Administrative (Medicare claims)
  • Discharge (HCUP SID and NIS)
  • Medical chart / EMR
  • Disease registries (SEER)
  • Aggregate (ARF, US Census)
  • Research databases (SOF)
  • Combinations and linkages

7
Key Conceptual Issues
  • Someone elses secondary data is your primary
    data
  • Treat data and research plan with same rigor as
    would for a primary data collection study
  • Research questions should be conceptually driven,
    interesting a priori
  • Some exceptions Warren Browner rule
  • Know data as well as if you had collected it
    yourself
  • Who is in the cohort?
  • Strengths and limitations of data collection
    procedures, instruments

8
Selecting a Database
  • Compatibility with research question(s)
  • Availability and expense
  • Sample representativeness, power
  • Measures of interest present and valid
  • Messiness and missingness
  • Local expertise
  • Linkages

9
Resources Needed
  • Your effort
  • Computer resources and security
  • Programmer and/or statistician effort
  • PhD statistical support complex sampling or
    analyses
  • Coordinator if merging datasets
  • Realistic timeline / Gantt chart

10
Cases
  • Amita is a junior faculty member interested in
    doing a secondary data analysis project on
    association between race/ethnicity and the
    prevalence and outcomes of atrial fibrillation.
    No prior experience and limited direct
    mentorship.
  • Eric is a junior faculty member with past
    experience. Wants to find new dataset around
    which write grant on association between SES and
    ADL function in elders.

11
Amita Getting Started
  • Amita
  • Get acquainted with basics
  • Find dataset and assess merit and feasibility
  • Find a mentor / get expert help
  • www.sgim.org/go/datasets

12
(No Transcript)
13
(No Transcript)
14
Get Acquainted with Basics
15
(No Transcript)
16
Find a Dataset, Assess Merit Feasibility
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
CARDIA
21
CARDIA
22
Get Expert Help
23
Getting Expert Help
  • Request a consultation
  • 1 on 1 consultation
  • Clear, defined questions about dataset
  • strengths and weaknesses about using XYZ to
    study patterns of medication use for heart
    failure

24
Eric Getting Down to Business
  • Identify datasets relevant to his research
    interests
  • Identify health statistics, validated
    instruments, funding sources
  • www.sgim.org/go/datasets

25
(No Transcript)
26
Finding Additional Resources
  • National Information Center on Health Services
    Research and Health Care Technology (NICHSR)
  • Inter-University Consortium for Political and
    Social Research (ICPSR)
  • Partners in Information Access for the Public
    Health Workforce
  • Roadmap K-12 Data Resource Center (UCSF)
  • List of datasets from the American Sociologic
    Association
  • Canadian Research Data Centers Data Sets and
    Research Tools (Canada)
  • Directory of Health and Human Services Data
    Resources
  • Publicly Available Databases from National
    Institute on Aging (NIA)
  • Publicly Available Databases from National Heart,
    Lung, Blood Institute (NHLBI)
  • National Center for Health Statistics (NCHS) Data
    Warehouse
  • Medicare Research Data Assistance Center
    (RESDAC) and Centers for Medicare and Medicaid
    Services (CMS) Research, Statistics, Data
    Systems
  • Veterans Affairs (VA) data

27
CELDAC
  • Comparative Effectiveness Large Dataset Analysis
    Core
  • UCSF CTSI
  • Access to local and national datasets and
    expertise

http//ctsi.ucsf.edu/research/celdac
28
National Information Center on Health Services
Research and Health Care Technology (NICHSR)
  • Databases, data repositories, health statistics
  • Fellowship and funding opportunities
  • Glossaries, research and clinical guidelines
  • Evidence-based practice and health technology
    assessment
  • Specialized PubMed searches on healthcare quality
    and costs

http//www.nlm.nih.gov/hsrinfo/index.html
29
ISPOR
  • International Society for Pharmacoepidemiology
    and Outcomes Research

http//www.ispor.org/DigestOfIntDB/CountryList.asp
x
30
Inter-University Consortium for Political and
Social Research (ICPSR)
  • Worlds largest archive of social science data
  • Searchable
  • Many sub-archives relevant to HSR
  • Health and Medical Care Archive
  • National Archive of Computerized Data on Aging

http//www.icpsr.umich.edu/icpsrweb/ICPSR/access/i
ndex.jsp
31
Questions?
  • Specific high-value datasets
  • Causal inference / comparative effectiveness
  • Which comes first RQ or dataset?
  • Evaluating and managing validity of measures
  • Analyzing complex survey data

32
EXTRA SLIDES
  • Additional brief information about specific
    high-value datasets
  • VA administrative data
  • NHANES
  • NAMCS
  • NIS

33
Administrative Data (VA)
  • VA has multiple high-value administrative
    databases
  • Outpatient visit information
  • Visit date, type of clinic, provider, ICD9
    diagnoses
  • Inpatient information
  • Admitting dx(s), discharge dx(s), CPT codes, bed
    section, meds administered
  • Lab data
  • gt40 labs
  • Pharmacy data
  • All inpatient and outpatient fills
  • Academic affiliation
  • etc

34
Administrative Data (VA)
  • Huge bureaucracy and paperwork

35
Administrative Data (VA)
  • Messy data
  • Huge size
  • 2 TB server
  • Data analyst

36
Survey Data (NHANES)
  • National Health and Nutrition Examination Survey
    (NHANES)
  • Nationally representative sample of gt10K patients
    every 2 years
  • Extensive interview data on clinical history
    (including diseases, behaviors, psychosocial
    parameters, etc.)
  • Physical exam information (e.g. VS)
  • Labs, biomarkers

37
Survey Data (NHANES)
  • Free and easy to download
  • (Relatively) easy to use
  • Although requires careful reading of
    documentation
  • Serial cross-sectional
  • Disease data self-report
  • Very limited information about providers and
    systems of care

38
Survey Data (NAMCS)
  • National Ambulatory Medical Care Survey (NAMCS)
    and National Hospital Ambulatory Medical Care
    Survey (NHAMCS)
  • Nationally representative sample of 70K
    outpatient and ED visits per year
  • Physician-completed form about office visit

39
(No Transcript)
40
Survey Data (NAMCS)
  • Data more from physician perspective (diagnoses,
    treatments Rxed, etc) and some info on providers
    (e.g., clinic organization, use of EMRs, etc)
  • Serial cross-sectional
  • Visit-focused
  • Not comprehensive, ? value for chronic diseases

41
Discharge Data (NIS)
  • National Inpatient Sample (NIS)
  • Database of inpatient hospital stays collected
    from 20 of US community hospitals by AHRQ
  • Diagnoses and procedures, severity adjustment
    elements, payment source, hospital organizational
    characteristics
  • Hospital and county identifiers that allow
    linkage to the American Hospital Association
    Annual Survey and Area Resource File

42
Discharge Data (NIS)
  • Relatively easy to access (DUA, 200/yr)
  • Relatively easy to use
  • Though need close attention to documentation
  • Limited data elements
  • Huge data files
Write a Comment
User Comments (0)
About PowerShow.com