Title: A Brief Introduction to Survival Analysis
1A Brief Introduction to Survival Analysis
Page C. Moore, Ph.D. Department of
Biostatistics GCRC Clinical Research
Course October 12, 2006
2Outline and Course Objectives
- What are survival data?
- Why do we need special methods?
- Assumptions about censoring
- Estimating survival curves
- Comparing survival curves
- Incorporating covariates and prognostic factors
3Methods are called survival analysis for
historical reasons, but are useful for analyzing
time to events other than death - e.g., -
time to relapse (pediatric ALL studies) - time
to pregnancy (infertility studies) - time to
developmental milestones (infant studies related
to size at birth) - time to divorce (marital
studies) - time to drop-out (high school
retention studies)
Why are standard methods of estimation (i.e.
sample mean/median) and analysis (t-tests,
chi-square, linear regression) inadequate for
these situations?
4Censoring
5Censoring
Censoring occurs when a subject is observed for
some period of time without the event of interest
(death, relapse, bone marrow engraftment, etc.)
occurring.
- Censoring may result from
- Loss to follow-up
- Follow-up ends before event occurs
- Competing risks (e.g. bone marrow transplant
patient dies of opportunistic infection before
engraftment ALL patient dies in automobile
accident before relapsing)
6When the prolonged observation of an individual
is not necessary to assess occurrence of the
event Example Surgical mortality Statistical
Analysis 2x2 contingency chi-square analysis
may be used to assess differences in survival
between groups of subjects.
Chi-Square 0.04 Degrees of Freedom (2-1)(2-1)
1 p 0.084
7Assumption -
- The censoring process is independent of the event
(failure) process. - Violations can be subtle,
- e.g., patients might drop out of a study because
advanced disease makes them feel they are too
weak or ill to continue
8- How can we account for partial information
provided by censored observations? - Time measured (approximately) continuous (e.g.,
days or weeks) - Kaplan-Meier plots (a.k.a. - actuarial curves,
- product limit curves, survival curves)
-
- Event times are grouped into larger time
intervals (e.g., years of decades) - use special but similar methods
9Basis - Survival Rates Probability of surviving
2 days is probability of surviving day 2 given
survival of day 1, multiplied by the probability
of surviving day 1. Probability of surviving 3
days is probability of surviving day 3 given
survival of day 2, multiplied by the probability
of surviving day 2 (see above). . . . Etc.
10Survival Rates - In Notation P(surviving t
days) P(surviving day t survived day
t-1)P(surviving day t-1 survived day
t-2)P(surviving day t-2survived day t-3) .
. . P(surviving day 3survived day
2)P(surviving day 2survived day 1)P(surviving
day 1)
11Example Remission time of acute leukemia
- Purpose evaluate drugs ability to maintain
remissions - Patients randomly assigned
- Study terminated after 1 year
- Different follow up times due to sequential
enrollment - 6-MP
- 6,6,6,7,10,22,23,6,9,10,11,17,19,20,25,32
,32,34,35 - Placebo
- 1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,22,23
12Example Remission time of acute leukemia
- Statistic of Interest t-year survival rate
(weeks) - number of individuals relapse-free longer
than t weeks - total number of individuals in data set
- Without censoring - Placebo group
- 10-wk remission duration rate 8/21 X 100
38.1 - What can we do about censoring?
- Kaplan Meier (Product limit) method for
estimating survival rates
13How can I Calculate a Survival Rate??
Column 1
Column 2
Column 3
Column 4
Column 5
Ranks 1 to n
Ranks for uncensored observations only
Time t survival rate multiply values in Col. 4
up to and including t
14(No Transcript)
15(No Transcript)
16Comments and Observations
- The Kaplan Meier curve is a step function (i.e.,
it does not change on days when no events occur).
- Step sizes are not all equal they depend on
changes in denominator. - Even with heavy censoring, the Kaplan-Meier curve
is an unbiased estimate of the true (population)
survival curve. Censoring affects the precision
but not the accuracy (bias). - Censoring must be independent of occurrence of
endpoint for estimate to be unbiased.
17Comments and Observations
- If there is no censoring, the Kaplan-Meier
estimate is the same as the simple observed
proportion surviving. - For example, if there are 100 observations and no
censoring the curve will have the value 0.99
between the first and second failure times
(assuming only one individual failed at time 1).
If there are two failures at time 2 (3 failures
now), the curve will be 0.97 between times 2 and
3. - Dont over interpret plateaus!
18(No Transcript)
19(No Transcript)
20Additional Comments
- There are estimators of variance of the
Kaplan-Meier estimate at any time point t. These
can be used to calculate a confidence interval
for the proportion surviving at time t (i.e., a
five-year survival rate for breast cancer
patients). - There are statistical tests for comparing the
survival durations in two or more groups. The
most frequently used are the log-rank test and
the Gehan test. Both have test statistics that
are compared to critical values of the chi-square
distribution.
21Covariates and Prognostic Factors
- Regression models for survival data allow us
to
- Evaluate more than one risk factor at a
time - Evaluate relative treatment effects while
controlling for potential confounding
factors investigate interactive effects among
factors
- They are not a panacea for flawed study designs!!
- The model most often used is the proportional
hazards model developed by Cox in 1972 - - often referred to simply as the Cox model.
22Covariates and Prognostic Factors
- The hazard (instantaneous failure rate) function
is more conventional mathematically than the
survivor function. - - There is a one-one relationship between
the functions, so identifying factors which
affect the hazard identifies those factors which
affect survival. - A strong advantage of the proportional hazards
model is that we do not need to make assumptions
about the form of the failure time distribution
for a given set of covariate values. - - However, it is assumed that the covariate
value has the same proportional effect on
increasing or decreasing an individuals hazard
relative to the baseline, regardless of time.
23Covariates and Prognostic Factors
- We estimate the regression parameters ß using a
principle called maximum likelihood. - We use what we know about the asymptotic
(infinite sample size) behavior of these
estimates to make inferences about our finite
samples. - - Clearly, the smaller our sample size, the more
questionable are our approximations. - In practice, we usually dont do too badly.
24Rule of Thumb Sample Size
- Need 10 times as many observed events as
factors in the model. (e.g., 3 factors 30
observed events, 10 events for each factor) - The distribution across categories is important
as well as the total sample size. - For example, if failure to thrive (FTT) is a
factor you wish to control for but you only have
two patients out of your sample of 100 who have
FTT, the estimated effect of FTT will be
unreliable.
25Why Study Prognostic Factors?
1. To learn about natural history of disease 2.
To adjust for imbalances in comparing
treatments 3. To aid in designing future
studies 4. To look for treatment-covariate
interaction 5. To predict outcome for
individual patients 6. To intervene in the
course of disease 7. To explain variation and
detect interaction --Byar (in Buyse, et
al, 1988)
26- How Do We Identify Prognostic Factors?
- A. Initial screening
- Developing multivariate models
27Developing Multivariate Models
- We draw conclusions about the importance of the
factor in question by making inferences about the
magnitude and sign (/-) of the regression
coefficient associated with that factor. - Because of inter-relationships among the
prognostic factors, the values of the
corresponding regression coefficients (and hence,
their statistical significance) will depend on
what other factors are in the model. - The purpose of the modeling determines the
modeling strategy.
28Additional Resources
- Text
- Kleinbaum, D.G. and Klein, M., Survival Analysis
A Self Learning Text, Springer, New York 2005. - Klein, J.P. and Moeschberger, M.L., Survival
Analysis, Springer, New York 2005. - Computer Software
- SAS (http//www.sas.com/)
- S-plus (http//www.splus.com/)
- NCSS (http//www.ncss.com/download.html)