Title: Introduction to Biostatistics II
1Introduction to Biostatistics II
2Survival analysis
- The outcome is survival.
- In general this is, the outcome is time to an
event - e.g., response
- failure
- death
- pregnancy
- infection
3Survival curves for three population groups
4USA life table 1979-1981
5Survival curve for the US population, 1979-1981
6Hemophiliac data example
7Estimates of the survival curve
- Consider the probability that an individual
younger than 40 years of age in the previous data
set will die at time t6 months after initiation
of observation. - This individual must have survived up to the
six-month point and then expired a short time
after that, at time point t?, where ? symbolizes
a very small unit of time. How will the
probability of a death at six months be
calculated? - Recall the definition of the conditional
probability of an event B given an event A - which results in the multiplicative law of
probability
8Estimates of the survival curve (contd)
- Considering two events ASurvive up to time t
and B Failed at time t? (i.e., shortly after
t). - The event S(t?)Survived up to time t? is
- i.e., the probability of surviving up to time t?
is equal to the probability of surviving up to t
times the probability of failing at t? given
survival up to t.
9The life table method
- The life-table method of estimation of the
survival curve works as follows - Splits the time scale into J time intervals of
the type tj-1-tj for j1,?,J - The number of people dying in each interval is dj
- The number of people alive at the beginning of
the interval (number at risk) is rj
10Derivation of the life table method
- To derive the life table estimate of the survival
distribution we need to estimate the following
quantities - Conditional probability of dying at interval j
given survival up to j - P(BjAj)?qt dj/rj
- Thus, probability of survival up to j
- The life-table estimate of the survival
distribution is constructed as follows
11Life table of under-40 hemophiliac data
This subfile contains 12 observations
Life Table Survival Variable survival
Number Number Number Number
Cumul Intrvl Entrng Wdrawn Exposd of
Propn Propn Propn Proba- Start this
During to Termnl Termi- Sur- Surv
bility Hazard Time Intrvl Intrvl Risk
Events nating viving at End Densty
Rate ------ ------ ------ ------ ------
------ ------ ------ ------ ------ .0
12.0 .0 12.0 2.0 .1667 .8333
.8333 .0333 .0364 5.0 10.0 .0
10.0 3.0 .3000 .7000 .5833 .0500
.0706 10.0 7.0 .0 7.0 1.0
.1429 .8571 .5000 .0167 .0308 15.0
6.0 .0 6.0 3.0 .5000 .5000
.2500 .0500 .1333 20.0 3.0 .0
3.0 .0 .0000 1.0000 .2500 .0000
.0000 25.0 3.0 .0 3.0 1.0
.3333 .6667 .1667 .0167 .0800 30.0
2.0 .0 2.0 2.0 1.0000 .0000
.0000 These calculations
for the last interval are meaningless. The
median survival time for these data is 15.00
12Life table estimate of the survival distribution
13The Kaplan-Meier method
- The K-M method differs from the life-table method
in that it separates the time spectrum according
to failure times (instead of fixed-width
intervals). - The first interval is (0 2) (2 is the time of
the first failure) when 1/12 individuals failed
(died) so 11/12 survived. The survival estimate
at t2 is, S(2)11/120.9167. - The second interval is (2 3) (the second
failure happens at t3) when 1/11 individuals
fails. The survival estimate at t3 is
S(3)(11/12)(10/11)(10/12)0.8333, since to
survive up to t3 you must survive up to t2 and
(given that you survived up to t2) then survive
beyond t3. - And so on
14The product-limit Method
- Nothing happens except at the time of failure.
Survival Analysis for survival Time
Status Cumulative Standard
Cumulative Number
Survival Error Events
Remaining 2 Selected .9167
.0798 1 11 3
Selected .8333 .1076
2 10 6 Selected
3 9
6 Selected .6667 .1361
4 8 7 Selected
.5833 .1423 5
7 10 Selected .5000
.1443 6 6 15
Selected
7 5 15 Selected
.3333 .1361 8 4
16 Selected .2500 .1250
9 3 27 Selected
.1667 .1076 10
2 30 Selected .0833
.0798 11 1 32
Selected .0000 .0000
12 0 Number of Cases 12
Censored 0 ( .00) Events 12
Survival Time Standard Error 95
Confidence Interval Mean 14
3 ( 8, 20 ) Median
10 5 ( 1,
19 ) Percentiles
25.00 50.00 75.00
Value 16.00 10.00
6.00 Standard Error 9.00 4.62
2.45
15Kaplan-Meier estimate of the survival distribution
- This plot is the Kaplan-Meier estimate of the
hemophiliac-patient survival distribution
corresponding to the previous output.
16Censoring
- When failure has not been observed, then the only
information from the data is that the failure
time is no less than the time of the last
available observation (e.g., clinical visit).
This is easily incorporated into the estimation
procedure. - For example, consider the following data where
subjects 2 and 6 completed observation without
failure at months 3 and 10 (censor0 means
censoring)
17Life table method in the presence of censoring
- To carry out the life-table estimate of the
survival distribution, when data include censored
observations, we include the number of censored
observations in interval j. - cj is the number of censored observations in
interval j - Since we do not know when exactly the censoring
occurred we have the following options for
calculating the number of individuals surviving
up to j - at the beginning of the interval (so the number
at risk at the beginning of interval j is
r'jrj-cj) - at the end of the interval (so the number at risk
is r'jrj) - at the middle of the interval (assuming that
censoring happens uniformly through the interval,
so r'jrj-cj/2). - The latter case is called the actuarial estimator
of survival.
18Derivation of the life-table method
- To calculate the life table method for the period
between 5 and 10 (interval j1) months in our
example we proceed as follows - There is one failure and one censored observation
in the first interval (i.e., between 0 and 5
months). Assuming that the censoring happened at
the midpoint of the interval (actuarial survival)
the (effective) number at risk is
r'1(r1-c1/2)11.5. - Thus, ?q11/11.50.0870, so
- S(1)?q10.9130
- For the second interval (j2, time between 5 and
10 months) we have that three failures occurred
with no censoring thus after removing the first
failure and censored observation r'2r210 and
?q23/100.3000, so
19Analysis via the life-table method
- Life Table
- Survival Variable survival
- Number Number Number Number
Cumul - Intrvl Entrng Wdrawn Exposd of Propn
Propn Propn Proba- - Start this During to Termnl Termi-
Sur- Surv bility Hazard - Time Intrvl Intrvl Risk Events nating
viving at End Densty Rate - ------ ------ ------ ------ ------ ------
------ ------ ------ ------ - .0 12.0 1.0 11.5 1.0 .0870
.9130 .9130 .0174 .0182 - 5.0 10.0 .0 10.0 3.0 .3000
.7000 .6391 .0548 .0706 - 10.0 7.0 1.0 6.5 .0 .0000
1.0000 .6391 .0000 .0000 - 15.0 6.0 .0 6.0 3.0 .5000
.5000 .3196 .0639 .1333 - 20.0 3.0 .0 3.0 .0 .0000
1.0000 .3196 .0000 .0000 - 25.0 3.0 .0 3.0 1.0 .3333
.6667 .2130 .0213 .0800 - 30.0 2.0 .0 2.0 2.0 1.0000
.0000 .0000 - These calculations for the last interval
are meaningless. - The median survival time for these data is
17.18
20Life table estimate of the survival distribution
in the presence of censoring
21K-M estimate in the presence of censoring
- Consider how censoring is handled in the K-M
procedure - In the first interval (time 0-2) one out of 12
individuals fails at 2 months so that the
estimate of survival at t2 is - No one fails at t3 months (second interval).
- At t6 months two total subjects have failed out
of the remaining ten (since one subject was
censored at 3 months and is no longer part of the
at-risk sample at six months), so (1q6 0.2000)
is the probability of failure at t6 months. The
estimate of the survival distribution is - S(6) S(2)(1-1q6) 0.9167(1-0.2000)0.7333
- So, censored observations are present up to the
interval where they are censored and disappear
after that.
22Kaplan-meier estimate with censored observations
- Survival Analysis for survival
- Time Status Cumulative Standard
Cumulative Number - Survival Error
Events Remaining - 2 1.00 .9167 .0798
1 11 - 3 .00
1 10 - 6 1.00
2 9 - 6 1.00 .7333 .1324
3 8 - 7 1.00 .6417 .1441
4 7 - 10 .00
4 6 - 15 1.00
5 5 - 15 1.00 .4278 .1565
6 4 - 16 1.00 .3208 .1495
7 3 - 27 1.00 .2139 .1325
8 2 - 30 1.00 .1069 .1005
9 1 - 32 1.00 .0000 .0000
10 0
23The K-M plot with censoring
- The K-M estimate of the survival distribution in
the presence of censoring is as shown in the
figure.
24Testing
- Consider the survival curves of hemophiliacs
contracting AIDS above 40 years of age and before
40 years of age.
25Survival distribution of gt40 year-olds
- Survival Analysis for survival
- Factor age gt40
- Time Status Cumulative Standard
Cumulative Number - Survival Error
Events Remaining - 1 1.00
1 8 - 1 1.00
2 7 - 1 1.00
3 6 - 1 1.00 .5556 .1656
4 5 - 2 1.00 .4444 .1656
5 4 - 3 1.00
6 3 - 3 1.00 .2222 .1386
7 2 - 9 1.00 .1111 .1048
8 1 - 22 1.00 .0000 .0000
9 0 - Number of Cases 9 Censored 0 (
.00) Events 9
26Comparing two survival distributions
27The log-rank test
- The log-rank test evaluates the null hypothesis
- H0 Slt40(t) Sgt40(t) versus the alternative
- H0 Slt40(t) ? Sgt40(t)
- the test is based on the statistic
- where, for each failure time j and group i1,2,
- , where dj is the number of deaths, Y(t)
is the number at risk (alive) at time t and Y1(t)
and Y2(t) are the total number at risk in group 1
and 2 respectively and - and are the
total numbers of expected and observed deaths.
28The log-rank test with SPSS
- The SPSS output for the log-rank test is as
follows - since p0.006lt0.05, there is a statistically
significant difference in survival between the
two groups. - Since the log-rank test is two-sided, we must
check the median survival time to see the
direction of the difference (here it is the
younger lt40 year-old patients).
Test Statistics for Equality of Survival
Distributions for age Statistic
df Significance Log Rank
7.61 1 .0058