Title: Advanced Statistics for Interventional Cardiologists
1Advanced Statistics for Interventional
Cardiologists
2What you will learn
- Introduction
- Basics of multivariable statistical modeling
- Advanced linear regression methods
- Logistic regression and generalized linear model
- Multifactor analysis of variance
- Cox proportional hazards analysis
- Propensity analysis
- Bayesian methods
- Resampling methods
- Meta-analysis
- Most popular statistical packages
- Conclusions and take home messages
3What you will learn
- Cox proportional hazards analysis
- Checking assumptions
- Variable selection methods
4Survival analysis
- A collection of statistical procedures for data
analysis for which - - the outcome variable is time until event
occurs - the study design has follow up
- event dichotomous (e.g. death, TLR,
MACE...) For combined endpoints (e.g. MACE), 1
event counts hierarchical (most severe first,
e.g. in MACE 1 death, 2 MI, 3 TVR) or
temporal (first to happen) order - time (survival or failure time) days, weeks,
years
5Survival analysis
- KEY PROBLEM
- Censored data
- We dont know their survival time exactly
- Who are the censored?
- The study ends and no event occurs
- The patient is lost to follow up
- The patient withdraws from the study
6Survival analysis
- Assumptions about censoring
- non-informative (no info about patient outcome)
- Patients censored and non censored should have
the same chance of failure - Chance of censoring independent of failure
- Censored patients should be representative of
those at risk at censoring time - Censored patients are supposed to survive to the
next time point - - issue of patients lost to follow up
7Survival analysis
8Survival analysis
A and F events B, C, D and E censored
9Survival analysis
Survival function S(t) P(Tgtt) Probability of
survival time T at time t S(0) 1 S(8) 0 S(t)
is not increasing as t increases It is a
probability thus 0S(t)1 Theoretical S(t) is
curvilinear In practice (Kaplan Meyer, Cox) S(t)
is a step-function We want to study how S(t)
goes down
10Survival analysis
Hazard function h(t) Instantaneous failure
rate - The event rate at time t conditional on
survival until time t or later - Instantaneous
potential for failure per unit of time given
survival up to time t - It is a rate thus
0h(t)lt8 If I am driving 55 Km/h, this does not
mean that I will do 55 Km in the next hour, but I
have the potential to do so. If I change my
instantaneous speed I can change also the
potential kilometers I can do in a fixed time.
11Survival analysis
There is a mathematical relationship between
Survival function S(t) and Hazard function
h(t) S(t) e-h(t)t In practice, the higher
the hazard rate the lower the survival
probability
12Survival analysis
- Goals of survival analysis
- To estimate and interpret survival and/or hazard
functions - To compare survival functions
- To assess the relationship of explanatory
variables to survival time controlling for
covariates - This requires modeling, e.g. using the Cox
proportional hazards model
13Data layout for the CPU
MACE event (1,0) TimeMACE time to event
(days) Sex, Age, Typesten, explanatory
variables
Cosgrave et al, AJC 2005
14Data layout for theory
Ordered failure times Number of failures Number of censored Risk set
t(0) 0 f(0) 0 c(0) 0 R(t0) all subjects
t(1) earliest of failure time f(1) number of failures at t(1) c(1) number of censored between t(0) and t(1) R(t1) f1 ------------ all c1
Risk set allows us to use all information up to
time of censorship
15Hazard ratio
Example (2 cohorts of patients with max follow up
35 weeks) Group 1 (n21) failures9 (censored
12), time to failure17 weeks Group 2 (n21)
failures21(censored 0), time to failure8
weeks Hazard Group 1 rate of failures (9/21) /
mean time of survival (17) 0.025 Group 2 rate
of failures (21/21 / mean time of survival (8)
0.125 Hazard Ratio 0.125 / 0.025 5 (this is a
cumulative ratio, we can also
calculate istantaneous ratios) Interpretation of
the Hazard Ratio similar to the Odds Ratio HR1
gt no relationship HR5 gt hazard of the
exposed is 5 times the one of unexposed HR0.5 gt
the hazard of the exposed is half that of the
unexposed
16Kaplan Meyer
T N M Q Survival function
0 21 0 0 1
6 21 3 1 1x18/210.857
7 17 1 1 0.857x16/170.807
10 15 1 2 0.807X14/150.753
13 12 1 0 0.753x11/120.69
16 11 1 3 0.69X10/110.623
22 7 1 0 0.623X6/70.538
23 6 1 5 0.538X5/60.448
17Kaplan Meyer
Univariate modeling
18Kaplan Meyer
19Impact of a few changes in events
- Any survival curve has a ladder trend, with many
steps - Each step occurs when an event occurs, and the
height of the step depends on the number of
events and of censored data at each specific time
20Impact of a few changes in events
- Any survival curve has a ladder trend, with many
steps - Each step occurs when an event occurs, and the
height of the step depends on the number of
events and of censored data at each specific time
21Kaplan-Meier and log-rank
Comparison between survival curves is usually
performed with the non-parametric
Mantel-Haenzel-Cox test (log-rank test)
TAPAS 1 year, Lancet 2008
22Log-rank test
- Are the K-M curves statistically equivalent?
- Chi-square test
- Overall comparison of KM curves
- Observed versus Expected counts
- Categories defined by ordered failure times
- (O-E)2
- Log rank statistic
- Var(O-E)
- Censorship plays a role in the subjects at risk
- for every time point when O-E is computed
- (i.e. when an event occurs)
23Survival analysis with SPSS
24Survival analysis with SPSS
25Survival analysis with SPSS
Cosgrave et al, AJC 2005
26Hypothesis testing for survival
- K-M curves and log rank test are appropriate if
the comparison comes from randomized allocation
(univariate analysis) - How do we deal with registry/observational data?
- It is possible to adjust for other relevant
factors which may be heterogeneously distributed
across groups - We can create subgroups strata according to
these factors - Multivariable modeling
27Stratification
28Stratification
IVUS vs. non-IVUS Log Rank 0.18
29Stratification
Distal vs. Non-distal LM Log Rank 0.02
30Stratification
IVUS in 54 of non-distal LM IVUS in 31 of
distal LM
P0.08
Log Rank 0.69
31Stratification
32Hypothesis testing for survival
- K-M curves and log rank test allow for
comparisons based on one grouping factor
(predictor) at a time - How can we account for multiple factors
simultaneously for each subject in a time to
event study? - How can we estimate adjusted survival-predictor
relationships in the presence of potential
confounding?
33Hypothesis testing for survival
- K-M curves and log rank test are appropriate if
the comparison comes from randomized allocation
(univariate analysis) - How do we deal with registry/observational data?
- It is possible to adjust for other relevant
factors which may be heterogeneously distributed
across groups - We can use Cox Proportional Hazards (PH) analysis
34Cox PH analysis
Sir David Cox in 2006
35Cox PH analysis
- Problem
- Cant use ordinary linear regression because how
do we account for the censored data? - Cant use logistic regression without ignoring
the time component - with a continuous outcome variable we use linear
regression - with a dichotomous (binary) outcome variable we
use logistic regression - where the time to an event is the outcome of
interest, Cox regression is the most popular
regression technique
36Cox PH analysis
37Cox PH analysis
38Cox PH analysis
- Allows for prognostic factors
- Explore the relationship between survival and
explanatory variables - Multivarible modeling
- Models and compares the hazards and their
magitude for different groups/factors - Important assumption
- Survival curves must have proportional hazards
(i.e. risk of an event at different time points) - It assumes the ratio of time-specific outcome
(event) risks (hazard) of two groups remains
about the same over time - This ratio is called the hazards ratio
39Cox PH analysis
- h(t,X) h0(t) eSßiXi
- Cox PH analysis models the effect of covariates
on the hazard rate but leaves the baseline hazard
rate unspecified - Does NOT assume knowledge of absolute risk
- Estimates relative rather than absolute risk
- h0(t) eSßiXi
- HR expSßi(Xi-Xi)
- h0(t) eSßiXi
40Cox PH analysis
h(t,X) h0(t) eSßiXi
h0(t) eSßiXi
Baseline hazard Involves t but not X Not known Exponential Involves X but not t X are assumed to be time-independent
If we want Hazard Ratio, h0(t) is deleted in the
ratio, thus we do not need to calculate it
41Cox PH analysis
Cosgrave et al, AJC 2005
42Cox PH analysis
Cosgrave et al, AJC 2005
43Cox PH analysis
Cosgrave et al, AJC 2005
44Cox PH analysis
Cosgrave et al, AJC 2005
45Cox PH analysis
Adjusted Hazard Ratios
Unadjusted Hazard Ratios
95,0 CI for Exp(B)
B
SE
Wald
df
Sig.
Exp(B)
Lower
Upper
Stent type
-,157
,198
,633
1
,426
,855
,580
1,259
Diabetes
,710
,204
12,066
1
,001
2,034
1,363
3,036
Cosgrave et al, AJC 2005
46Cox PH analysis
Agostoni et al, AJC 2005
47Cox PH analysis
Agostoni et al, AJC 2005
48Cox PH analysis
Agostoni et al, AJC 2005
49Cox PH analysis
Agostoni et al, AJC 2005
50Cox PH analysis
Agostoni et al, AJC 2005
51Cox PH analysis
Agostoni et al, AJC 2005
52Cox PH analysis
- Aronud 260 deaths
- Around 300 MIs
- Around 500 deaths MIs
Marroquin et al, NEJM 2008
53Cox PH analysis
Marroquin et al, NEJM 2008
54Cox PH analysis
55Stepwise regression removes and adds variables to
the regression model for the purpose of
identifying a useful subset of predictors
- Forward, or Step-Up, Selection
- This method is often used to provide an initial
screening of the candidate variables when a large
group of variables exists - You begin with no candidate variables in the
model - Select the most significant variable
- At each step, select the next most significant
candidate variable - Stop adding variables when none of the remaining
variables are significant - Backward, or Step-Down, Selection
- This method begins with a model in which all
variables have been included - The user sets the significance level at which
variables can enter the model - The backward selection model starts with all
variables in the model - At each step, the variable that is the least
significant is removed - This process continues until no non-significant
variables remain
56Cox PH analysis
- Age
- Sex
- Elective/Urgent
- Pre PCI
- Pre CABG
- CKD
- CHF
- DM
- 1/2/3 VD
- N. Lesions
- SA/UA/STEMI
- Therapy
- Off label DES
57Cox PH analysis
58Checking PH assumptions
- Graphical techniques
- Compare the log-log survival curves over
different categories of variables parallel
curves imply PH assumption is ok - Compare observed (KM curves) with predicted
(using PH model) survival curves if observed and
predicted curves are close, PH assumpiton is ok
59Log-Log curves
- Most commonly used, and relatively easy to
perform - They can involve subjectivity in the
interpretation of the graphs (typically we look
for strong indications of non-parallelism) - Continuous variables can be a problem (it is not
possible to create 2 lines such as in dichotomous
variables, however we can create 2 groups by
categorizing continuous variables, e.g. above
and below the mean or the median)
60Log-Log curves
Cosgrave et al, AJC 2005
61Log-Log curves
If curves are parallel PH asumption is met
Cosgrave et al, AJC 2005
62Checking PH assumptions
- If PH assumption is not met for one variable
- Stratify for the variable that does not satisfy
the PH assumption and run a Cox analysis into
each stratum adjusting for the other variables
that meet the PH assumption - If the variable can change over time, include
time-dependent variable in the model extended
Cox modeling
63Extended Cox Model
- Time-dependent covariates
- Add interaction term involving time to the model
CALL THE STATISTICIAN !
64Questions?
65For further slides on these topics please feel
free to visit the metcardio.org
websitehttp//www.metcardio.org/slides.html