Biostatistics Research Outside of Academia - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Biostatistics Research Outside of Academia

Description:

Baker and Lindeman (1994) ... detail, when other equally important factors can only be guessed at Robert May ... Baker, Freedman, Parmar (1993) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 58
Provided by: karenli
Category:

less

Transcript and Presenter's Notes

Title: Biostatistics Research Outside of Academia


1
Biostatistics Research Outside of Academia
  • The experience of one alum at the National
    Institutes of Health
  • Stuart G. Baker, Sc.D.

2
Background
  • Harvard-Biostatistics (1980-84 post-doc 1985)
  • Special thanks to Steve Lagakos, Christine
    Waternaux, Milt Weinstein, Nan Laird, and Marvin
    Zelen
  • NIH-Cancer Prevention-Biometry (1985-present)
  • Special thanks to Peter Greenwald, David Byar,
    Laurence Freedman, and Philip Prorok

3
Unusual co-authors
  • My wife (Karen Lindeman, M.D., Dept of
    Anesthesiology, Johns Hopkins)
  • Members of my wifes department
  • My elementary school best friend who I had not
    seen in 20 years until the ENAR meeting in
    Memphis (Paul Pinsky, Ph.D. now at NCI)
  • The editor of the Journal of the National Cancer
    Institute (Barry Kramer, M.D.)
  • Three Nordic researcher pen pals

4
Research projects
  • Grouped by motivation

5
Supporting general initiatives
6
Figuring out the question
  • Need to make sense of all the anticipated data
    from early detection biomarkers
  • Generally agreed that lots of computing will be
    neededNASA called in
  • Computers are useless. They only give answers
    Pablo Picasso

7
Key question Which biomarkers, if any, are
promising for further study as triggers for early
intervention ?
  • Baker(1998) Baker,Srivastava,Kramer(2002)
  • Study design
  • cohort study with stored specimens
  • test for marker in all cases and some controls
  • sample size
  • Estimation
  • false and true positive rates (target values)
  • avoid overfitting

8
Harried Sorter and the Curse of Dimensionality
Controls False Positive Rate
Cancers True positive rate
Ratio true/false positive rate
What regions (of A and B) optimize ROC curve?
No need to check 29 regions
Select regions by ratio of true to false positive
rate
9
Harried Sorter and the Curse of Dimensionality
Controls False Positive Rate
Cancers True positive rate
Ratio true/false positive rate
Positive if A3, B3
10
Harried Sorter and the Curse of Dimensionality
Controls False Positive Rate
Cancers True positive rate
Ratio true/false positive rate
Positive if A3, B3
or if A3, B2
11
Harried Sorter and the Curse of Dimensionality
Controls False Positive Rate
Cancers True positive rate
Ratio true/false positive rate
Positive if A3, B3
or if A3, B2
or if A2, B2
12
Harried Sorter and the Curse of Dimensionality
Controls False Positive Rate
Cancers True positive rate
Ratio true/false positive rate
Positive if A3, B3
or if A3, B2
or if A2, B2
or if A3, B2
or if A2, B3
13
Clarifying the issues
  • Validation of surrogate endpoints
  • Muddled clinical view
  • Validation is one of those words ... that is
    constantly used and seldom defined - Alvin
    Feinstein
  • Confusing statistics literature
  • Day Duffy / Begg Leung paradoxes
  • Prentice criterion for estimation
  • Baker, Izmirlian, and Kipnis (submitted)

14
Again, need the question
  • Question Is inference about treatment effect
    likely to be the same when using a potential
    surrogate endpoint as when using a true endpoint?
  • A correlate does not a surrogate make Fleming
    and DeMets (1996)
  • A perfect correlate does not a surrogate make
    Baker and Kramer (2003)
  • Graphical clarification

15
Perfect correlation (lines)
Prentice Criterion (lines coincide)
Yields valid hypothesis testing
True
Study group
Control group
control
study
Surrogate
control group
Study group
16
Perfect correlation, No Prentice Criterion
(lines differ)
Hypothesis testing Not valid
Valid estimation (if estimate lines from previous
study)
True
Study group
Control group
study
control
Surrogate
control
study
17
Cancer screening-not your usual randomized trial
Simple adjustment for dilution after screening
stopped (Baker, et al. 2003)
5 year follow-up
10 year follow-up
15 year follow-up
adaptive
0
40
Reduction in breast cancer mortality per 10,000
due to receipt of screening data from HIP trial
18
Thinking out of the box
19
Missing, missing everywhere, with finesse to
sparepotential outcomes
  • Baker and Lindeman (1994)
  • paired availability design for combining data
    from multiple before-and-after studies
  • effect of epidural analgesia on probability of
    Cesarean section
  • before period epidural less available
  • after period epidural more available

20
Thought experiment
21
Thought experiment
22
Thought experiment
23
Thought experiment
  • Estimate effect of receipt of epidural in NE
  • Similar model later independently proposed by
    Angrist, Imbens, Rubin (1996) randomized trials

24

NN NE
before
no epidural
pr(NN)


pr(NE)
epidural
EE
pr(EE)
no epidural
after
NN
pr(NN)
NE EE
epidural
pr(NE)
pr(EE)


pr( epidural after)
pr( epidural before)
-
pr(NE)
25

NN NE
before
no epidural
pr(NN)


pr(NE)
epidural
EE
pr(EE)
no epidural
after
NN
pr(NN)
NE EE
epidural
pr(NE)
pr(EE)

26
pr( Cesarean section (CS) before)

NN NE
X
before
no epidural
Pr(CS NN, no epidural )
pr(NN)
pr(CS NE, no epidural)
X


pr(NE)
epidural
pr(CS EE, epidural)
X

EE
pr(EE)
pr(Cesarean-section (CS) after)
no epidural
after
NN
pr(NN)
X
pr(CS NN, no epidural)
NE EE
epidural
pr(CS NE, epidural)
pr(NE)
X

pr(CS EE, epidural)
pr(EE)
X
-
pr(CS after)
pr(CS before)
-
pr(CSNE,no epid)
pr(CSNE,epid)

-
pr(epidafter)
pr(epidbefore)
27
Applications of potential outcomes
  • Paired availability design
  • results compared with randomized trials,
    multivariate adjustment for observational data
    (Baker and Lindeman 2000)
  • cancer screening (Baker et al., in press)
  • Non-compliance and survival
  • screening trial with refusers (Baker 1998)
  • Non-compliance and auxiliary variable
  • missing by design (no biopsy if low PSA) (Baker
    2000)

28
Computational Necessity
29
Regression models with missing categorical data
  • Computations to extend thesis to discrete
    survival, diagnostic testing, longitudinal data
    . case-control with haplotypes
  • User specifies matrices and link functions
  • Program generates EM, Newton-Raphson
  • To paraphrase the maxim that amateurs discuss
    strategy and professional discuss logistics,
    users discuss models and developers discuss
    computation from Baker (1994)

30
Generality leads to simplicity
  • How to maximize a multinomial likelihood with
    parameters that are ratios of functions to
    summations of functions?
  • Poisson likelihood with an extra parameters for
    each multinomial constraint
  • MP (Multinomial-Poisson) transformation
  • Dont prove anything unless you know it is true -
    Steve Piantadosi, Johns Hopkins

31
Perfect-fit paradigm
  • Models for missing, survival, non-compliance
    multinomial data
  • Goal Minimum assumptions
  • Perfect-fit closed-form solutions
  • Simple asymptotic variance
  • Delta method M-P transformation
  • symbolically computed derivatives

32
Computational Serendipity
33
Models to evaluate cancer screening
  • Trying to develop a model with as few underlying
    parameters as possible
  • It makes no sense to convey a beguiling sense of
    reality with irrelevant detail, when other
    equally important factors can only be guessed at
    Robert May
  • Weak links of cancer screening models
  • effect of screening on cancer mortality
  • self-selection bias

34
Simulation Surprise
  • Rate of cancer detection in absence of screening
    is identifiable using only data from subjects
    screened
  • Reduces self-selection bias (no birth cohort
    effect, progressive detection)
  • Baker and Chu (1990) and simplified in
    Baker(1998) Baker,et al.(2003)
  • Make everything as simple as possible and not
    simpler -Albert Einstein

35

Estimating detection rate if no screening using
only data from subjects who were screened

If first screen at age 50
age
age
If first screen at age 51
50
51

F50
N50
I50
F51
S50
F50 pr(detected first screen age 50)
pre-clinical phase
I50 pr(detected in interval age 50)
S50 pr(detected second screen after age 50)
F51 pr(detected first screen age 51)
N50 pr(detected no screen age 50)
KEY EQUATION N50 F50 I50 S50 F51
36
Analyzing data
37
Observer agreement with replicates
  • Baker, Freedman, Parmar (1993)
  • Goal estimate within- and between- observer
    variation in pathology classification
  • Data repeated classifications by pathologists
  • Method novel latent class model

38
Double sampling survival data
  • Baker, Wax, and Patterson (1993)
  • Goal Estimate the effect of a drain on the
    probability of a wound infection
  • Data
  • Full follow-up after hospital discharge
  • Partial follow-up censored at discharge
  • Model informative censoring

39
Non-ignorable missing survey data
  • Baker, Ko, and Graubard (2002)
  • Data large health survey
  • Goal Estimate the effect of balance on the
    probability of depression
  • Method
  • Missing in depression depends on depression and
    covariates
  • Matrix variance formula with complex survey
    design

40
Improving understanding
41
Seeing is understanding
  • Simpsons Paradox Good for men, good for women,
    bad for people Tom Louis
  • BK-Plot (Baker, Kramer 2002) but independently
    developed earlier
  • Stiglers Law of Eponymy no invention or
    discovery is ever named after the right person -
    Howard Wainer (Chance magazine), who coined the
    name BK-Plot

42
Good for Boys, Good for Girls, Bad for Children
(for 3rd grade math class)
43
BK-Plot
Fraction that grow a lot
9/10 if milk
girls
8/10 if no milk
If no milk
6/10
both
5/10
If milk
If milk
3/10
boys
2/10
If no milk
2/3
1/3
1
0
both
milk
No milk
Fraction that are girls
44
A casual remark on causality
  • A bests B in one randomized trial
  • B bests C in another randomized trial
  • Does A best C?
  • Apply B-K plot (Baker and Kramer, 2003)
  • It's odd that logical acuity often reveals
    hidden ambiguities" John Allen Paulos,
    mathematician

45
Two Randomized Trials
One Randomized Trial
A bests B
C bests A
B bests C
Benefit
Benefit
B
B
A
C
C
B
A
30
10
10
Percent with confounder
Percent with confounder
46
Meta-analysis - binary outcomes
  • Compare
  • risk difference
  • relative risk
  • odds ratio
  • When the distribution of an unobserved binary
    variable (with no treatment interaction) differs
    across studies

47
Risk Difference
Constant
risk difference
probability
0
0
1
1
Fraction with unobserved binary variable
Fraction with unobserved binary variable
48
Relative Risk
Constant
relative risk
probability
0
0
1
1
Fraction with unobserved binary variable
Fraction with unobserved binary variable
49
Odds Ratio
NOT Constant
Odds ratio
probability
0
0
1
1
Fraction with unobserved binary variable
Fraction with unobserved binary variable
50
Missing binary outcomes in a randomized trial
  • Baker and Freedman (2003)
  • Missing due to an unobserved binary covariate
    with no interaction with treatment
  • Sensitivity analysis
  • one parameter instead of the usual two
  • explicit use of randomization

51
Helping out
52
The FDA comes to NIH
  • Design a study to compare performance of digital
    versus analog mammography
  • Baker et al. (1998) Baker Pinsky (2001)
  • Modified paired design
  • if analog negative, only a random sample get
    digital (reduces costs)
  • partial area under ROC curves

53
Inspiration from an unlikely source protocol
review
  • Estimate effect of tamoxifen on breast cancer in
    gene carriers (randomized trial)
  • Protocol
  • case-only design (observational study)
  • estimate RR
  • Baker and Kramer (submitted)
  • nested-case control design
  • estimate risk difference by age

54
Its not what the investigators want--its why
they cant get it
  • Request
  • Identify high-risk subjects for cancer prevention
    trial
  • increase power
  • But more complicated
  • Power increases if RR same (low high risk)
  • Power decreases if risk difference same
  • Baker, Kramer, Corle (submitted)

55
A simple request that led to an international
collaboration
  • Report on twin study
  • Lichtenstein et al NEJM, 2000
  • genetics are a minor component of cancer
  • Latent class model (based on genetics)
  • Age-specific data from Lichtenstein et al
  • Results agreed with Lichtenstein et al.
  • Baker, Lichtenstein, Kaprio, Holm (in press)

56
Final quotes
57
Take-home messages
  • Take chances, get messy, make mistakes -The Magic
    School Bus, television show
  • Mathematics is the language of precise thinking
    R. Hamming, mathematician
  • In my experience, any really groundbreaking paper
    had difficulty being accepted, while the mundane
    sailed through -Elias Zerhouni, NIH Director
Write a Comment
User Comments (0)
About PowerShow.com