Some key developments in data analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Some key developments in data analysis

Description:

Some key developments in data analysis Michael Babyak, PhD – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 17
Provided by: MikeBa166
Learn more at: https://people.duke.edu
Category:

less

Transcript and Presenter's Notes

Title: Some key developments in data analysis


1
Some key developments in data analysis
  • Michael Babyak, PhD

2
(No Transcript)
3
(No Transcript)
4
Areas of development
  • Discarding flawed techniques
  • New types of models
  • Treatment of missing data
  • Simulation and empirical tests
  • Validation

5
Techniques largely discredited or highly suspect
  • Categorization of continuous variables without
    good reason
  • Automated variable selection without validation
  • Overfitted or cherry-picked models

6
New types of models
  • Regression family
  • Clustered data
  • Factor analysis family

7
Generalized Linear Model
Normal
Binary/Binomial
Count, heavy skew, Lots of zeros
Poisson, ZIP, Negbin, gamma
General Linear Model/ Linear Regression
Logistic Regression
ANOVA/t-test ANCOVA
Transformed
Chi-square
Can be applied to clustered (e.g, repeated
measures data)
8
Factor Analytic Family
Structural Equation Models
Partial Least Squares
Latent Variables (Common Factor Analysis)
Multiple regression
Principal Components
9
You Use Latent Variables Every Day
  • A Single Measurement is an indicator of an
    underlying phenomenon, e.g. mercury rising in a
    sphygmomanometer measures the underlying
    construct of blood pressure.
  • How do you improve the reliability of blood
    pressure measurement? Measure more than once,
    perhaps even in different setting (e.g.
    ambulatory monitoring).
  • A Psychometric Scale is also a collection of
    indicators of an underlying process, attempting
    to triangulate on an underlying construct by
    multiple items (indicators).
  • A Latent Variable is a collection of indicators
    with the unshared/unreliable part of the
    indicators removedwhats the problem?

10
Missing Data
  • Imputation or related approaches are almost
    ALWAYS better than deleting incomplete cases
  • Multiple Imputation
  • Full Information Maximum Likelihood

11
Out of Missing Data Work
  • Propensity Scoring
  • Matches individuals on multiple dimensions to
    improve baseline balance
  • Complier Average Causal Effect (CACE)
  • Generates a guess at the effect of a treatment
    among all potential compliers, including those in
    the control arm

12
Simulation Example
Y .4 X error
bs1
bs2
bsk-1
bsk
bs3
bs4
.
Evaluate
13
True ModelY .4x1 e
14
Validation
  • Split-half better than nothing, but often too
    conservative
  • Bootstrap
  • Repeated splitting

15
Some Premises
  • Statistics is a cumulative, evolving field
  • Newer is not necessarily better, but should be
    entertained as regards the scientific question at
    hand
  • Keeping up is hard to do
  • Theres no substitute for thinking about the
    problem

16
  • http//www.duke.edu/mababyak
  • michael.babyak _at_ duke.edu
  • http//symptomresearch.nih.gov/chapter_8/
Write a Comment
User Comments (0)
About PowerShow.com