Statistics%20for%20clinicians - PowerPoint PPT Presentation

About This Presentation
Title:

Statistics%20for%20clinicians

Description:

Statistics for clinicians Biostatistics course by Kevin E. Kip, ... Note the use of a pooled standard deviation denoted SP. ... but the natural log (ln) ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 56
Provided by: Kip65
Learn more at: https://sites.pitt.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistics%20for%20clinicians


1
Statistics for clinicians
  • Biostatistics course by Kevin E. Kip, Ph.D.,
    FAHAProfessor and Executive Director, Research
    CenterUniversity of South Florida, College of
    NursingProfessor, College of Public
    HealthDepartment of Epidemiology and
    BiostatisticsAssociate Member, Byrd Alzheimers
    InstituteMorsani College of MedicineTampa, FL,
    USA

2
SECTION 3.1 Module Overview and
Introduction Confidence intervals, estimation of
parameters, and hypothesis testing.
3
  • Module 3 Learning Objectives
  • Describe the concepts of parameter estimation and
    confidence intervals
  • Apply use of the z and t distribution for
    calculation of confidence intervals based on
    sample size
  • Select appropriate z and t values based on the
    width of a desired confidence interval
  • Calculate and interpret confidence intervals for
    means, proportions, and relative risk for one and
    two sample designs including matched design
  • Use SPSS to calculate confidence intervals
  • Distinguish the theoretical relationship between
    the risk ratio and odds ratio

4
  • Module 3 Learning Objectives
  • List the concept, guidelines, and primary steps
    involved in hypothesis testing
  • Differentiate between the null and
    alternative hypothesis.
  • Understand and interpret parameters used in
    hypothesis testing (level of significance,
    p-value).
  • Differentiate type I and type II error and
    factors that impact statistical power.
  • Calculate and interpret sample hypotheses
  •  
  • a) One-sample - continuous outcome
  • b) One-sample - dichotomous outcome
  • c) One-sample - categorical/ ordinal outcome
  • d) Matched design continuous outcome

5
Assigned Reading Textbook Essentials of
Biostatistics in Public Health Chapters 6
and 7
6
Key terms Estimation Process of determining a
likely value for a population parameter (e.g.
mean or proportion) based on a sample. Point
Estimate Single valued estimate of a population
parameter, such as a mean or a proportion. Confid
ence Interval (CI) Range of values (e.g. likely)
for a population parameter with a level of
confidence attached (e.g. 95 confidence that the
interval contains the unknown parameter). Genera
l form for CI is point estimate margin of
error Common confidence levels are 90, 95,
and 99 but, theoretically, any level between 0
and 100 can be selected.
7
SECTION 3.2 Use of the z and t distributions for
calculation of confidence intervals
8
For the standard normal distribution, the
following is true P(-1.96 lt z lt 1.96)
0.95 i.e. there is a 95 probability that a
standard normal variable, denoted z, will fall
between -1.96 and 1.96. Using the Central Limit
Theorem, and some algebra, the 95 confidence
interval (CI) for the population mean is
General form for a CI can be written as point
estimate zSE(point estimate) where z is value
from standard normal distribution reflecting the
desired confidence level, and SEstandard error
of the point estimate
9
For the formula below for the mean (or any other
parameter, we often do not know the true value of
the population standard deviation (s)
  • For large sample sizes (n gt 30), s can be
    estimated from the sample
  • standard deviation (s) based on the Central Limit
    Theorem.
  • For small sample size (n lt 30), the Central Limit
    Theorem does not
  • apply, and instead, the t distribution is used
    (Table 2 of Appendix)
  • t values depend on n
  • small samples have larger t value (less
    precision)
  • values are indexed by degrees of freedom (df
    n-1)

10
Listing of Selected t Values for Confidence
Intervals
Example For a confidence interval of a mean with
n lt 30, use t
Confidence Level Confidence Level Confidence Level Confidence Level Confidence Level Confidence Level
df 80 90 95 98 99
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
11
  • SECTION 3.3
  • Calculation and interpretation of confidence
    intervals
  • One Sample
  • Continuous outcome
  • Dichotomous outcome

12
CI for One Sample Continuous Outcome Parameter
Mean Body Mass Index (BMI) Sample N 180 (n gt
30, so use large sample z value) Sample
Mean 28.2 Sample SD 5.4 Confidence
Level 95 Z value 1.96 95 Confidence
Interval for µ
28.2 1.96 x (5.4 / sqrt(180)) 28.2
0.79 (27.4, 29.0)
13
28.2 1.96 x (5.4 / sqrt(180)) 28.2
0.79 (27.4, 29.0)
95 C.I.
µ 28.2
Lower limit 27.4
Upper limit 29.0
From the sample, we estimate the mean BMI as
28.2, and are 95 confident that the true
population mean lies between the interval of 27.4
to 29.0
14
CI for One Sample Continuous Outcome
(Practice) Parameter Mean diastolic blood
pressure Sample N 503 Sample
Mean 80.69 Sample SD 10.176 Confidence
Level 95 Z value ___ or t value
____ 95 Confidence Interval for µ
15
CI for One Sample Continuous Outcome
(Practice) Parameter Mean diastolic blood
pressure Sample N 503 (large sample, n gt 30)
Sample Mean 80.69 Sample SD 10.176 Confidence
Level 95 Z value 1.96 95 Confidence
Interval for µ
80.69 1.96 x 10.176 / sqrt(503) 80.69
0.889 (79.8, 81.6)
16
CI for One Sample Continuous Outcome
(Practice) Parameter Mean diastolic blood
pressure Sample N 503 (large sample, n gt 30)
Sample Mean 80.69 Sample SD 10.176 Confidence
Level 95 Z value 1.96
80.69 1.96 x 10.176 / sqrt(503) 80.69
0.889 (79.8, 81.6)
SPSS Analyze Compare Means One Sample T
Test Options 95 confidence interval
17
CI for One Sample Continuous Outcome
(Practice) Parameter Mean resting pulse (beats
per minute) Sample N 14 Sample
Mean 63.3 Sample SD 9.5 Confidence
Level 95 Z value ___ or t value
____ 95 Confidence Interval for µ
18
CI for One Sample Continuous Outcome
(Practice) Parameter Mean resting pulse (beats
per minute) Sample N 14 (small sample, n gt 30)
Sample Mean 63.3 Sample SD 9.5 Confidence
Level 95 t value 2.16 (i.e. n-1) 95
Confidence Interval for µ
63.3 2.16 x 9.5 / sqrt(14) 63.3 5.484
(57.8, 68.8)
19
CI for One Sample Dichotomous
Outcome Parameter Proportion of population
treated for hypertension Sample N 3,532
(large sample, so use z value) Sample
Proportion 0.345 (i.e. 1,219 /
3,532) Confidence Level 95 Z value
1.96 95 Confidence Interval for
0.345 0.016 (0.329, 0.361)
From the sample, we estimate the proportion of
persons treated for hypertension to be 0.345, and
we are 95 confident that the true proportion
lies between the interval of 0.329 to 0.361.
20
CI for One Sample Dichotomous Outcome
(Practice) Parameter Proportion of population
with diabetes Sample N 501 Sample
Proportion (91 / 501) _______ Confidence
Level 95 Z value _______ 95 Confidence
Interval for
21
CI for One Sample Dichotomous Outcome
(Practice) Parameter Proportion of population
with diabetes Sample N 501 (large
sample, so use z value) Sample Proportion (91 /
501) 0.1816 Confidence Level 95 Z value
1.96 95 Confidence Interval for
0.1816 0.0338
(0.148, 0.215)
From the sample, we estimate the proportion of
persons with diabetes to be 0.1816, and we are
95 confident that the true proportion lies
between the interval of 0.148 to 0.215.
22
  • SECTION 3.4
  • Calculation and interpretation of confidence
    intervals
  • Two Samples Matched
  • Continuous outcome

23
  • CI for Two Samples Matched Continuous Outcome
  • Often used for intervention studies with a pre-
    and post-measurement design (e.g. before and
    after treatment)
  • Goal is to compare the mean score before and
    after the intervention
  • Because the sample is matched (same persons
    completing pre- and post measurements), cannot
    use aggregate means (i.e. see below)
  • Subject ID Pre Post Difference
  • 1 158 132 -26
  • 2 148 138 -10
  • 3 152 158 6
  • 4 155 131 -24
  • Parameter of interest is the mean difference,
    denoted µd
  • Parameter of interest is SD of the difference
    scores, denoted sd

24
CI for Two Samples Matched Continuous
Outcome Parameter Mean difference in
depressive symptom scores after taking a new
drug Xd -12.7 Sample N 100 (number of
persons, not measurements) Sample SD SD of
difference scores sd 8.9 Confidence Level
95 Z value 1.96
-12.7 1.96 x (8.9 / sqrt(100)) -12.7
1.74 (-14,4, -11.0)
25
CI for Two Samples Matched Continuous Outcome
(Practice) Parameter Mean difference in
anxiety symptom scores after psychotherapy Xd
-14.8 Sample N 52 (number of persons, not
measurements) Sample SD SD of difference
scores sd 9.6 Confidence Level 90 Z
value ______
26
CI for Two Samples Matched Continuous Outcome
(Practice) Parameter Mean difference in
anxiety symptom scores after psychotherapy Xd
-14.8 Sample N 52 (number of persons, not
measurements) Sample SD SD of difference
scores sd 9.6 Confidence Level 90 Z
value 1.645
-14.8 1.645 x (9.6 / sqrt(52)) -14.8
2.19 (-17.0, -12.6)
From the sample, we estimate a mean difference in
anxiety scores of -14.8 after undergoing
psychotherapy, and we are 90 confident that the
true proportion lies between the interval of
-16.7 to -12.6.
27
  • SECTION 3.5
  • Calculation and interpretation of confidence
    intervals
  • Two Samples - Independent
  • Continuous mean difference
  • Dichotomous risk difference
  • Dichotomous risk ratio
  • Dichotomous odds ratio

28
  • CI for Two Samples Independent Continuous
    Outcome
  • Common parameter of interest is difference in
    means between the two groups, X1 and X2, and
    denoted for the population as
  • Since there are 2 independent groups, we also
    have
  • n1 and n2 and s1 and s2
  • If the sample variances are approximately equal,
    then we can pool the standard deviations, s1
    and s2. A typical rule of thumb to pool is
  • s21 / s22 gt 0.5 and s21 / s22 lt 2.0
  • The pooled (common) standard deviation is a
    weighted average

µ1 µ2
29
CI for Two Samples Independent Continuous
Outcome
Parameter Mean difference in systolic blood
pressure between a sample of men and a sample
of women Xmen 128.2 n1 1623 s1
17.5 Xwomen 126.5 n2 1911 s2
20.1 Note s21 / s22 0.76, so can use pooled
SD (Sp) Confidence Level 95 Z value 1.96
sqrt(359.12) 19.0
Formula
30
CI for Two Samples Independent Continuous
Outcome
Parameter Mean difference in systolic blood
pressure between a sample of men and
women Xmen 128.2 n1 1623 s1
17.5 Xwomen 126.5 n2 1911 s2 20.1
Formula
1.7 1.26 (0.44, 2.96)
31
CI for Two Samples Independent Continuous
Outcome (Practice)
Parameter Mean difference in depression scores
between a sample of men and women Xmen
5.77 n1 163 s1 7.674 Xwomen
6.86 n2 333 s2 8.714
Note s21 / s22
_________
Assume calculation of a 95 confidence interval
32
CI for Two Samples Independent Continuous
Outcome (Practice)
Parameter Mean difference in depression scores
between a sample of men and women Xmen
5.77 n1 163 s1 7.674 Xwomen
6.86 n2 333 s2 8.714
Note s21 / s22 0.78, so can use pooled SD
(Sp)
sqrt((9540 25210) / 494) 8.39
1 1 163 333
(5.77 6.86) 1.96(8.39)
(-2.66, 0.49)

-1.09
33
CI for Two Samples Independent Continuous
Outcome (Practice)
Parameter Mean difference in depression scores
between a sample of men and women Xmen
5.77 n1 163 s1 7.674 Xwomen
6.86 n2 333 s2 8.714
From the sample, we estimate a mean difference in
depression scores between men and women of -1.09,
and we are 95 confident that the true mean
difference lies between the interval of -2.66 to
0.49.
SPSS Analyze Compare Means Independent
Samples T Test Test Variable Grouping
Variable Options CI percentage
34
  • CI for Two Samples Independent Risk Difference
  • Parameter of interest is the risk difference for
    the incidence proportions in the population,
    denoted as RD p1 p2
  • For a sample, the point estimate for the risk
    difference is denoted
  • as RD p1 p2

Formula
Example Incidence of CVD in Smokers and
Non-Smokers
No CVD CVD Total Incidence
Current smoker 663 81 (x1) 744 p1 81 / 744 0.1089
Non-smoker 2757 298 (x2) 3055 p2 298 / 3055 0.0975
Total 3420 379 3799
35
  • CI for Two Samples Independent Risk Difference
  • Example Compare the incidence proportion of CHD
    among smokers (exposed) and non-smokers (not
    exposed)
  • Smokers n1 744 w/CHD(x1) 81 p1 0.1089
  • Non-smokers n2 3055 w/CHD(x2) 298 p2
    0.0975
  • Confidence Level 95 Z value 1.96

0.0114 0.0247 (-0.0133, 0.0361)
36
  • CI for Two Samples Independent Risk Difference
    (Practice)
  • Example Compare the incidence proportion of
    sleep disorder among person on statins (exposed)
    and not on statins (not exposed)
  • Confidence Level 95 Z value _______

Sleep OK Sleep Dx Total Incidence
Statin user 91 14 (x1) 105 p1 14 / 105 0.1333
Non-statin user 369 28 (x2) 397 p2 28 / 397 0.0705
Total 460 42 502
37
  • CI for Two Samples Independent Risk Difference
    (Practice)
  • Example Compare the incidence proportion of
    sleep disorder among person on statins (exposed)
    and not on statins (not exposed)
  • Confidence Level 95 Z value 1.96

Sleep OK Sleep Dx Total Incidence
Statin user 91 14 (x1) 105 p1 14 / 105 0.1333
Non-statin user 369 28 (x2) 397 p2 28 / 397 0.0705
Total 460 42 502
0.1333(1 0.1333) 0.0705(1 0.0705)
0.1333 0.0705 1.96

105
397
0.063 0.0697 (-0.007, 0.133)
38
  • CI for Two Samples Independent Risk Difference
    (Practice)
  • Example Compare the incidence proportion of
    sleep disorder among person on statins (exposed)
    and not on statins (not exposed)
  • Confidence Level 95 Z value 1.96

Sleep OK Sleep Dx Total Incidence
Statin user 91 14 (x1) 105 p1 14 / 105 0.1333
Non-statin user 369 28 (x2) 397 p2 28 / 397 0.0705
Total 460 42 502
0.063 0.0697 (-0.007, 0.133)
From the sample, we estimate that absolute risk
of sleep disorder is 0.063 higher in statin-users
compared to non-users, and we are 95 confident
that the true risk difference lies between the
interval of -0.007 to 0.1333.
39
  • CI for Two Samples Independent Risk Ratio
  • Parameter of interest is the ratio of the
    incidence proportions for the population, denoted
    as RR p1 / p2
  • For a sample, the point estimate for the risk
    ratio (RR) is denoted as
  • RR p1 / p2
  • Note that the RR does not follow a normal
    distribution, but the natural log (ln) of the RR
    is approximately normally distributed and is used
    to calculate the confidence interval this
    entails 2 steps
  • --- Calculate CI for ln(RR)
  • --- Calculate CI for RR (i.e. transform)

CI for ln(RR)
CI for (RR)
exp(Lower limit), exp(Upper limit)
40
CI for Two Samples Independent Risk Ratio RR
p1 / p2
CI for ln(RR)
CI for (RR)
exp(Lower limit), exp(Upper limit)
  • Example Compare future risk of CHD among smokers
    (exposed) and non-smokers (not exposed)
  • Smokers n1 744 w/CHD(x1) 81 p1 0.1089
  • Non-smokers n2 3055 w/CHD(x2) 298 p2
    0.0975
  • Confidence Level 95 Z value 1.96

RR p1 / p2 0.1089 / 0.0975 1.12
CI for ln(RR)
0.113 0.232 (-0.119, 0.345) (exp(-0.119)
, exp(0.345)) (0.89, 1.41)
41
CI for Two Samples Independent Risk Ratio
(Practice) RR p1 / p2
CI for ln(RR)
CI for (RR)
exp(Lower limit), exp(Upper limit)
  • Example Compare the future risk of sleep
    disorder among statin users (exposed) versus
    non-statin users (not exposed)
  • Confidence Level 95 Z value _______

Sleep OK Sleep Dx Total Incidence
Statin user 91 14 (x1) 105 p1 14 / 105 0.1333
Non-statin user 369 28 (x2) 397 p2 28 / 397 0.0705
Total 460 42 502
RR p1 / p2
CI for ln(RR)
42
CI for Two Samples Independent Risk Ratio
(Practice) RR p1 / p2
CI for ln(RR)
CI for (RR)
exp(Lower limit), exp(Upper limit)
  • Example Compare the future risk of sleep
    disorder among statin users (exposed) versus
    non-statin users (not exposed)
  • Confidence Level 95 Z value 1.96

Sleep OK Sleep Dx Total Incidence
Statin user 91 14 (x1) 105 p1 14 / 105 0.1333
Non-statin user 369 28 (x2) 397 p2 28 / 397 0.0705
Total 460 42 502
RR p1 / p2 0.1333 / 0.0705 1.89
CI for ln(RR)
0.6366 0.6044 (0.0322,
0.6044) (exp(0.0322), exp(1.24)) (1.03,
3.46)
43
CI for Two Samples Independent Risk Ratio
(Practice)
  • Example Compare the future risk of sleep
    disorder among statin users (exposed) versus
    non-statin users (not exposed)
  • Confidence Level 95 Z value 1.96

Sleep OK Sleep Dx Total Incidence
Statin user 91 14 (x1) 105 p1 14 / 105 0.1333
Non-statin user 369 28 (x2) 397 p2 28 / 397 0.0705
Total 460 42 502
RR p1 / p2 0.1333 / 0.0705 1.89
CI for ln(RR)
0.6366 0.6044 (0.0322,
0.6044) (exp(0.0322), exp(1.24)) (1.03,
3.46)
From the sample, we estimate that risk of sleep
disorder is 1.89 times higher in statin-users
compared to non-users, and we are 95 confident
that the true risk lies between the interval of
1.03 to 3.46.
44
  • CI for Two Samples Independent Odds Ratio
  • Conceptually similar to risk ratio, yet the
    parameter of interest is the odds ratio (OR),
    defined as
  • Odds of exposure among cases / Odds of exposure
    among controls

Example Prevalence of CVD in Smokers and
Non-Smokers (95 C.I.)
CVD (D) No-CVD (D-)
Current smoker(E) 81 663
Non-smoker (E-) 298 2757
Cases Controls
Exposed a b
Not exposed c d
OR (81 / 298) / (663 / 2757) 1.13 Z 1.96
CI for ln(OR)
0.122 0.260 (-0.138, 0.382) (exp(-0.138)
, exp(0.382)) (0.87, 1.47)
45
CI for Two Samples Independent Odds Ratio
(Practice) OR Odds of exposure among cases /
Odds of exposure among controls
Prevalence of Sleep Disorder Among Statin and
Non-Statin Users (95 C.I.)
Cases Controls
Exposed a b
Not exposed c d
Sleep Dx Sleep OK
Statin user (E) 14 91
Non-statin user (E-) 28 369
OR (a / c) / (b / d) _________ Z
___________
CI for ln(OR)
CI for (OR)
exp(Lower limit), exp(Upper limit)
46
CI for Two Samples Independent Odds Ratio
(Practice) OR Odds of exposure among cases /
Odds of exposure among controls
Example Prevalence of Sleep Disorder Among
Statin and Non-Statin Users
Cases Controls
Exposed a b
Not exposed c d
Sleep Dx Sleep OK
Statin user (E) 14 91
Non-statin user (E-) 28 369
OR (14 / 28) / (91 / 369) 2.027 Z 1.96
CI for ln(OR)
0.7066 0.6813 (0.0253,
1.3879) (exp(0.0253), exp(1.3879)) (1.03,
4.01)
47
CI for Two Samples Independent Odds Ratio
(Practice) OR Odds of exposure among cases /
Odds of exposure among controls
Example Prevalence of Sleep Disorder Among
Statin and Non-Statin Users
Cases Controls
Exposed a b
Not exposed c d
Sleep Dx Sleep OK
Statin user (E) 14 91
Non-statin user (E-) 28 369
OR (14 / 28) / (91 / 369) 2.027
0.7066 0.6813 (0.0253,
1.3879) (exp(0.0253), exp(1.3879)) (1.03,
4.01)
From the sample, we estimate that the odds of
statin use among persons with sleep disorder are
2.03 times higher that the odds of statin-use
among persons without sleep disorder, and we are
95 confident that the value lies between the
interval of 1.03 to 4.01.
48
SECTION 3.6 Use of SPSS to calculate confidence
intervals
49
CI for Two Samples Independent Odds Ratio
(Practice)
Example Prevalence of Sleep Disorder Among
Statin and Non-Statin Users
Cases Controls
Exposed a b
Not exposed c d
Sleep Dx Sleep OK
Statin user (E) 14 91
Non-statin user (E-) 28 369
OR (14 / 28) / (91 / 369) 2.027
0.7066 0.6813 (0.0253,
1.3879) (exp(0.0253), exp(1.3879)) (1.03,
4.01)
SPSS Analyze Descriptive Statistics Crosstab
s Row and Column Variable Statistics
(check Risk)
50
Sleep Dx Sleep OK
Statin user (E) 14 91
Non-statin user (E-) 28 369
OR 2.027 95 C.I. 1.04, 4.01
1.0 Null value
OR 2.03
Lower limit 1.04
Upper limit 4.01
0 Bounded at 0
10 Unbounded
  • Note
  • The confidence interval for a continuous variable
    such as mean or difference in mean is symmetric
    around the point estimate.
  • In contrast, for the risk ratio and odds ratio,
    the confidence interval is skewed to the right of
    the point estimate This is because
  • Values for RR and OR have a lower bound of 0 yet
    no upper bound
  • The C.I. formulas are based on an exponential
    function

51
SECTION 3.7 Relationship between the risk ratio
and the odds ratio
52
Odds Ratio Risk Ratio
  • Relationship between RR and OR
  • The odds ratio will provide a good estimate of
    the
  • risk ratio when
  • The outcome (disease) is rare
  • OR
  • 2. The effect size is small or modest

53
Odds Ratio Risk Ratio
  • The odds ratio will provide a good estimate of
    the
  • risk ratio when
  • The outcome (disease) is rare

a / (a b ) RR ------------ c / (c d)
D D-
E a b
E- c d

If the disease is rare, then cells (a) and (c)
will be small
a / (a b ) a / b ad RR ------------
------ --- OR c / (c d) c / d bc
OR (a / c) / (b / d)
OR (ad) / (bc)
54
Odds Ratio Risk Ratio
The odds ratio will provide a good estimate of
the risk ratio when 2. The effect size is small
or modest.
D D-
E 40 60
E- 120 180

(40 / 120) 0.333 OR ------------ -------
1.0 (60 / 180) 0.333
40 / (40 60) 0.40 RR --------------------
------ 1.0 120 / 120 180) 0.40
55
Odds Ratio Risk Ratio
Finally, we expect the risk ratio to be closer to
the null value of 1.0 than the odds ratio.
Therefore, be especially cautious when
interpreting the odds ratio as a measure of
relative risk when the outcome is not rare and
the effect size is large.
(20 / 10) 2.0 OR ------------ -------
6.0 (30 / 90) 0.333
D D-
E 20 30
E- 10 90

(20 / 50) 0.40 RR ------------ -------
4.0 (10 / 100) 0.10
Write a Comment
User Comments (0)
About PowerShow.com