Title: Statistics 102
1Statistics 102
2Outline
- I. Sensitivity and Specificity/Likelihood Ratios
- II. Statistical significance for group data
- III. Statistical significance for correlational
data - IV. Non-inferiority trials
- V. Linear regression
- VI. Logistic regression
- VII. Stepwise multivariate regression
- VIII. Type I and II errors/ Sample size estimates
3I. Sensitivity and Specificity
- Sensitivity true positives (proportion of
individuals with the disease who test ranges
from 0 to 1, or from 0 to 100) - 1-Sensitivity false negatives (proportion of
individuals with the disease who test - ranges
from 0 to 1, or 0 to 100) - If sensitivity 0.8 (80), 1-sensitivity 0.2
(20) - Specificity true negatives (proportion of
individuals without the disease who test -
ranges from 0 to 1, or from 0 to 100) - 1-Specificity false positives (proportion of
individuals without the disease who test
ranges from 0 to 1, or 0 to 100) - If specificity 0.92 (92), 1-specificity 0.08
(8)
4Use of Sensitivity and 1-Specificity in Receiver
Operating Curves (ROCs) and the Areas under the
ROCs (the AUC)
-
- Plots sensitivity of the test (true rate, TPR)
on Y axis, from 0 to 1 vs. 1-specificity (false
rate, FPR) on X axis, from 0 to 1 at different
test cutoffs - Perfect Classification AUC1 (area of a
square with sides1) - Random guess AUC0.5 (area of a
triangle with base and height1) - AUC between 0.5 and 1 Test is Better than
a random guess - AUC between 0 and 0.5 Test is Worse than
a random guess - AUC has a 95 CI
- e.g., 0.78 (0.69-0.87)
5ROCs with AUCs better than a random guess
(between 0.5 and 1.0)
? sweet spot cut off, a trade off between
sensitivity and specificity
High cut off ??????????????? Low cut off
6Additional terms that can be derived from
sensitivity and specificity
- Likelihood ratios does the test usefully change
the probability (likelihood) of a disease or
condition? - Positive likelihood ratio true/false
sensitivity/1-specificity. - The higher the likelihood ratio, the more
confident we are that the patient has the
condition if the test is . LR can approach ?. -
- Negative likelihood ratio false-/true - 1-
sensitivity/ specificity. - The lower the likelihood ratios, the more
confident we are that the patient does not have
the condition if the test is -. LR can approach
0.
7Example 1 Use of and - likelihood ratios
- Your patient with COPD has an acute onset of
worsening dyspnea. He had arthroscopic knee
surgery 2 weeks ago. There is no leg swelling or
leg pain, hemoptysis, personal or family history
PE or DVT, or malignancy. You clinically assess
the odds of him having a PE as 5050, or equally
likely that he had a PE as that he did not have a
PE. - If ordered and performed, how would the results
of a CT angiogram (CTA) of the pulmonary arteries
change your estimated likelihood of PE in this
patient? In other words, how good is CTA in
helping you diagnose or exclude a PE in this
patient?
8Example 1, continued
-
- Literature (Annals Internal Medicine 136
286-287, 2002) -
- CTA and pulmonary angiography (gold standard)
were performed in 250 patients with possible PE. - 50 (20) of the patients had PE on pulmonary
angiography. 200 had no PE on angiography. - Results
- CTA CTA-
- PE on pulm angio (n50) 35 15 No PE on
pulm angio (n200) 2 198
9Example 1, continued
-
- Likelihood ratio (LR) calculations
- CTA sensitivity (true ) 35/50 (.70) , or 70
- 1-sensitivity (false - ) 15/50 (.30)
- CTA specificity (true - ) 198/200 (.99), or
99 - 1-specificity (false )2/200 (.01)
- LR sensitivity/1-specificity true/false
.70/.0170 (PE 70 x as likely as before test). - -LR 1-sensitivity /specificity false-/true-
.30/.99.303 (PE .3 x as likely as before
test) -
- Annals Internal Medicine 136 286-287, 2002
10II. Are measured group differences in variables
or outcomes statistically significant? Which
test(s) to use?
- If data are normally distributed
- Use paired t (if each subject is his/her own
control) 1 - Use unpaired t (group t) if there are two groups
2 - If data are skewed (not normally distributed)
- Is the variable a continuous one, such as age
or PaO2? - Use Mann Whitney U, 3, or
- Use Wilcoxons sign rank 4
- Is the variable a categorical one, such as
gender or age gt 65? - Use Fishers exact, 5, or
- Use chi square test 6
- If there gt2 study groups
- Use analysis of variance (ANOVA) 7
11 III. Are correlations between variables
statistically significant? What test(s) to use?
- If the variables are normally distributed
- Use Pearsons test 8
- Pearsons r ranges from -1 to 1.
- r ? 0 indicates no correlation.
- P value depends both on r and N.
- If the variables are skewed (not normally
distributed) - Use Spearmans test 9
- Spearmans r ranges from -1 to 1
- r ? 0 indicates no correlation.
- P values depend both on r and N. Plt 0.05 usually
used.
12Example 2 METABOLIC ALKALOSIS
13IV. Non-Inferiority Trials
- A New Treatment Can Truly Be
- Better (Superior)
- Essentially equal
- Worse (Inferior) than the usual treatment
- A Trial Can Test Whether New is
- Better (superior)
- Not better (non-superior)
- Not worse (non-inferior)
- Worse (Inferior)
- rarely done
14Non-inferiority trials
- Non-inferiority trials are intended to show that
the effect of a new treatment is not worse than
that of an active control by more than a
specified amount. - A little like a point spread in football.
- The non-inferiority margin (NIM) is chosen by the
investigators before the study (a priori) and can
be somewhat arbitrary. - Study endpoints in non-inferiority trials can be
efficacy or safety parameters or a combination of
the two. - Study design may include 3 arms with placebo
group (preferred) or 2 arms with only new and
usual treatments (much less ideal, since no
internal validation that new treatment is better
than placebo) - Delta (d) is the measured difference (best
estimate of the true difference) between the two
active treatments. This d will have a 95 CI. - Example 3 d -4 (95 CI, -9 to 1)
15Example 3 d -4 (95 CI, -9 to 1), the
control Rx being slightly better
- If the NIM had been chosen to be -10 by the
investigators a priori, using Example 3, the new
drug would be shown to be non-inferior to the
control, as -10 , the NIM, was less that the 95
CI for d, -9 to 1. - (If the NIM had been chosen to be -5 by the
investigators a priori, using Example 3, the new
drug would not be shown to be non-inferior- to
the control, as -5, the NIM, fell in the 95 CI
for d, -9 to 1. In this context,
non-inferiority not shown is the same as being
inferior.)
16V. Linear regression
- Simple regression ymxb
- one independent variable, x
- one dependent variable, y
- If x0, yb, the intercept
- b can be , as shown, zero, or
?y /?x m, slope
y, dep variable
b
x, indep variable
17METABOLIC ALKALOSIS
18V. Linear regression
- Simple regression ymxb
- one independent variable, x
- one dependent variable, y
- If x0, yb, the intercept
- b can be , as shown, zero, or -
- More complex regression ym1x1m2x2b
- two independent variables
- one dependent variable
- If x1x20, yb, the intercept
- b can be , zero, or -
19VI. Logistic regression. A popular method
- A model predicting the probability of a
dependent categorical outcome, such as death,
using 2 or more patient-specific independent
variables. - The logit, z, is the total contribution of ALL
the patient-specific independent variables used
in the model to predict the outcome, f(z), the
dependent variable. - zß0ß1x1ß2x2 ßnxn.
- ß0 intercept
- ß1, ß2, ßn are regression coefficients for
x1,x2, ... xn - If x1, x2, ,xn all 0 (the pt has no risk
factors), zß0the risk of the dependent outcome
(such as death) when no factors affecting risk
are present.. - If ßn gt 0, then the variable, n, increases the
risk of the outcome. - If ßnlt0, then the variable, n, reduces risk of
the outcome. - Large ßn means the variable, n, has a large
influence on the outcome. - Small ßn means the variable n has a small
influence on the outcome - f(z) likelihood of outcome, such as death
ez/(ez1) 1/(1e-z)
20The logistic function is useful because it can
input any z from ? to -? whereas the output,
f(z), will be confined to values between zero and
1.
Note if z0, f(z)0.5, because
1/1e-01/111/20.5
21Example 4. Logistic regression.
- Three independent variables , x1, x2, and x3 are
studied to try to predict the 10-year death risk
from heart disease. Using data obtained from a
large study population, the following logistic
regression model was derived to best fit the
data zß0ß1x1ß2x2 ß3x3 - x1 age in years above 50 (age is a continuous
variable) ß1 2.0 - x2 sex, where 0 is male and 1 is female (gender
is a categorical variable) ß2 -1.0 - x3 blood cholesterol in mmol/L above 5 mmol/L
(194 mg/dL) ß31.2 - ß0 -5.0.
- Risk of deathf(z)1/(1e-z), where z (the
logit) -5.02.0 x1 - 1.0 x2 1.2 x3. - Thus, in a 50 y.o. female with a cholesterol of 5
mmol/L, z?0-5.0 (see prev Fig)
22Logistic regression.
- Three independent variables , x1, x2, and x3 are
studied to try to predict the 10-year death risk
from heart disease. Using data obtained from a
large study population, the following logisitic
regression model was derived to best fit the
data zß0ß1x1ß2x2 ß3x3 - x1 age in years above 50 (age is a continuous
variable) ß1 2.0 - x2 sex, where 0 is male and 1 is female (gender
is a categorical variable) ß2 -1.0 - x3 blood cholesterol in mmol/L above 5 mmol/L
(194 mg/dL) ß31.2 - ß0 -5.0.
- Risk of deathf(z)1/(1e-z), where z (the
logit) -5.02.0 x1 - 1.0 x2 1.2 x3. - Thus, in a 50 y.o. female with a cholesterol of 5
mmol/L, z?0-5.0 - Example 4 What is the risk of death in the next
10 years from heart disease in a 50 year man with
a blood cholesterol of 7 mmol/L (272 mg/dL)? - z -5.02(50-50) -1(0)1.2(7-5). Thus, z
-5.0002.4 -2.6 - Since z -2.6 in this man, f(z)his risk of
10-year death from heart disease 1/(1e-z)
1/(1e2.6 ) 0.07, a 7 10-yr risk. - The 95 confidence intervals can also easily be
calculated for f(z).
23Ex. 4
24VII. Stepwise multivariate regression
- If several variables, INDIVIDUALLY, help to
predict an outcome by univariate analysis, but
these variables could be closely related to each
other, stepwise multivariate analysis helps sort
out independent contributions of the variables. - e.g., blood pressure, BMI and type 2 DM EACH
increase risk of MI - This procedure is used primarily in regression
modeling. At each step, after a new variable is
added, a test is made to see if some variables
can be deleted without appreciably increasing the
discrepancy between the data and the regression
model. - The procedure terminates when the measure is
maximized or when the available improvement (by
adding more variables) falls below some critical,
predetermined value.
25Example 5. Stepwise multivariate regression.
- Cohort of ? 300 outpatients with low serum TSH
undergoing radioiodine uptake and scan. - Many, but not all, had thyroid disease (e.g.,
Graves). Numerous variables were examined to see
which correlated with a normal uptake and scan
result. - Three of the numerous variables examined
predicted a normal uptake and scan - If patient was using a statin OR 6.5 (95 CI,
2.9-14.6) - If patient was a man OR 2.5 (95 CI, 1.3-4.5)
- If patient was gt 45 years of age OR 2.0 (95
CI, 1.1-3.6) - Which of these variables independently predicted
a normal thyroid uptake and scan despite the low
serum TSH? - Is it statin use, being male, and/ or being older
than predicts normal thyroid function if a
patient has low serum TSH?
26Example 5 Stepwise multivariate regression
Step 1 STATIN USE ?2 21.8 Plt0.001
Step 2 OLDER AGE ?28.5 P0.004
Step 3 MALE GENDER ?23.9 Not significant
from Yandell et al. Thyroid 2008 181039-42.
27VIII. Type 1(a) and Type 2 (ß) Errors
Null Hypothesis there is no differences between
two treatments
tests
Reject the null hypothesis
Accept null hypothesis
Correct decision (no error)
Error
Correct decision (no error)
Error
Type 1 (?) error (P), which can be large or
small
Type 2 (?) error, which can large or small
28Choosing the size of ? and ? errors
- The type 1 error, or ? (also called P) is
conventionally set at 0.05 (5) - i.e, chance of a type 1 error if the null
hypothesis is rejected is lt 5 - Can state plt0.05 or give exact p value (e.g.,
p0.001, or p0.049) - The type 2 error, or ?, is often set at 2 to 4
times ? , or 0.10-0.20 (10-20) - i.e., chance of making a type 2 error if the null
hypothesis is accepted is 10-20 - Power to detect a real difference (and thus to
reject the null hypothesis ) 1- ? - smaller ? (e.g., 0.1), more power (.9)
- larger ? (e.g., 0.2), less power (.8)
- If a study is highly powered and the null
hypothesis is accepted, the chance of there being
a true difference is quite small. - If the study is under-powered and the null
hypothesis is accepted, there can be little
confidence that a true difference has been
excluded.
29Example 6 Use of a and ß in sample size
planning
A new antibiotic is developed for C. difficile.
How many patients would be needed to be included
in a phase 3 trial to be able to show that this
new drug is superior to metronidazole? To
answer this question, we need to know 1. What is
the expected success rate for metronidazole?
P1 2. What would be a clinically important and
expected improvement in success rate (based on
phase 1/2 studies) with the new drug? P2 3.
What should be the ? (type 1 error) and the ?
(type 2 error) for the study? (Recall Power
1- ?.)
30Sample size estimation, contd
- P1 0.75 (metronidazole, based on literature)
- P2 0.90 (New Rx, based on small phase 1/2
trials) - ? 0.05 (1 in 20)
- ? 0.10 (1 in 10). Power 0.90 (9 in 10)
- Needed N1 and N2 158 per group (from Fleiss
tables), or 316 patients in total - If ?10 drop out rate is expected, then
15816174 per group, or 348 patients in total
would need be randomized. - (This sample size may necessitate a multi-center
study to enroll sufficient patients during the
proposed time frame.) - Analyze data by intent-to-treat and by evaluable
patients.