Title: Statistical Inference
1Statistical Inference
- Hypothesis Testing, Types of Errors, Probability,
Statistical Power
2Statistical Inference
- A representative sample of the population is
studied, then an attempt is made to extrapolate
the conclusions to the population as a whole - Test the hypothesis, looking for significant
differences between the treatment and control
group. - Look for errors in sampling
3Definitions
- Population- collection of people which have at
least one characteristic in common - Sample- a portion of the population
- Parameter- things that characterize the
population (standard deviation, mean) - Statistics- tests used to estimate parameters on
a sample from the population
4Example
- A controlled study was performed in 61 patients
with severe alcoholic hepatitis to determine the
effect of prednisone on their survival rate. The
mean cumulative survival rate at 2 months was 88
in the prednisone group compared to 45 in
patients receiving placebo. - What is the statistic to be determined in the
study?
5Hypothesis Testing
- Null Hypothesis the hypothesis which is directly
tested using statistical tests. Only when the
null hypothesis is rejected, can the alternative
hypothesis be accepted. - Alternative hypothesis Statement which indicates
that there is a difference between the groups
studied. One-tailed or two-tailed. - Estimation- process of using data from sample and
make conclusions about population parameters.
6Buspirone Example
- A studied stated, The objective was to determine
if Buspirone will reduce the withdrawal symptoms
associated with smoking cessation. - Is this a 1-tailed or 2-tailed hypothesis?
7Hypothesis Testing-- Errors
- Sources of error in a study
- inappropriate selection of sample
- measurement error
- improper assignment of patients to study groups
- random error due to unforeseen factors.
8Types of Errors
- Type I error- rejecting the null hypothesis even
though it is in fact true. ( You falsely conclude
that a significant difference exists between
groups when one really doesnt exist. - Type II error- failing to reject (accepts) the
null hypothesis, when in fact it is false.
(Falsely conclude that no signif. difference
exists when one really does exist.
9Type I and II Error Example
- A double blind placebo controlled study was
performed to determine the efficacy of Captopril
in delaying the progression of congestive heart
failure. No significant difference was found
between the two groups in terms of sudden death
(13 in the Captopril group, and 11 in the
placebo group). - What is the null hypothesis?
- If there was an error in the findings from this
study, which type of error might have been
responsible for it?
10Alpha
- Alpha- the probability of making a Type I error.
Set at 0.05 as the cut-off for the amount of Type
I error one is willing to accept in a study. Also
called Level of Significance - Alpha of 0.05-- one is willing to accept a Type I
error of 5/100 or 1 time out of 20.
11Beta
- Beta- The probability of making a Type II error.A
beta of less than 0.2 is usually accepted as the
maximum amount of Type II error one is willing to
accept. - Beta is important in calculating the statistical
power of a study.
12Reducing Type I and Type II Errors
- 1. Make the acceptable Type I error smaller by
reducing the alpha level (level of significance)
to less than 0.05. - 2. Increase the sample size to reduce the
possibility of Type II error - The larger the sample size, the more likely it is
that the study sample represents the population
and the less chance of sampling error.
13Probability Values
- Usually reported as plt or pgt or p some
number. Some number is the alpha level or
probability of a Type I error. - If plt 0.05, it means that there is less than a 1
out of 20 likelihood that the difference in
measured parameters would have resulted from
chance alone. So the difference is said to be
statistically significant and due to treatment
effect.
14Probability Values...
- pgt0.05 means that a difference as large as the
observed difference might arise from chance and
the author will not accept this degree of chance. - It is important that not significant means that
a difference is not proved it doesnt mean
that there is absolutely no difference.
15Probability Example
- A study was performed which compared the efficacy
of Piroxicam and Diflunisal for the treatment of
osteoarthritis. One group received Piroxicam and
the other received Diflunisal. After 1 month of
therapy, the patients who received Piroxicam had
significant less pain (plt0.002) than the
Diflunisal group.
16Probability values continued...
- If p 0.04 What does this mean?
- If statistical tests were performed on 25
different variables, then through chance alone,
you would expect at least 1 of the tests to
produce a positive result through a Type I error. - 0.04 4/100 1/25.
17Probability
- Probability values and CIs compliment each
other. Example The cure rate was 86 (95
CI 76-96) for the treatment group as compared to
the placebo group (plt0.05) - The 95 CI for the mean of 86 represent the
likelihood that the true population cure rate
fall within these ranges (76-96). The p-value
shows the chance of a Type I error is very small,
so the results are significant and not due to
chance.
18Type 1, Type II error Example
- A distribution of the means from repeated samples
from a group of patients ( group A) given
Propranolol to reduce BP showed a mean DBP value
of 73 mmHg (plt0.05). The reduction in group B
patients given Atenolol showed a mean value of
75 mmHg (plt0.2). - 73 mmHg (95CI 69-77)
- 75 mmHG (95CI 65-95)
19Statistical Power
- Power means the ability of a statistical test to
detect a significant difference between samples
if the difference actually exists. - Defined as 1-beta. Since beta is the probability
of making a Type II error (accepting null
hypothesis as true when it is really false), then
1-beta (power) means the probability of NOT
making a Type II error.
20Statistical Power
- Beta is usually 0.2, then power 1-0.2 0.8
(ie. 80 or greater). - Power is determined prior to study.
- Degree of Power depends on
- Sample size -- the larger the sample size, the
greater the power of statistical tests - Alpha level- larger the alpha level, smaller the
beta - Effect size- refers to difference between
populations which one would detect as signif.
21Effect Size
- Suppose you are measuring the effects of
Lovastatin and Pravastatin on Cholesterol levels.
You would like to detect a cholesterol level drop
of 20 points as significant. The 20 points would
be the effect size. The alpha level is set at
0.01.
22Statistical Power
- Hypothesis testing with the probability values
provide a yes or no answer concerning whether
differences exist between the population studied.
The p value itself doesnt say anything about
the size of the difference between means.
23Probability
- Example The authors state the drop in
cholesterol levels was highly significant (plt
0.001) - The highly significant is not appropriate. The
p value only tells you that it is very unlikely
that the drop in cholesterol levels was due to
chance, it does NOT tell you the magnitude of the
difference.
24Example
- A study demonstrated a correlation between
alcohol consumption and elevated blood pressure.
The amount of alcohol consumption was compared
with blood pressure readings. A difference in
average BP was found between patients who drank
small amts of alcohol and those with high intake.
Those who drank more had higher blood pressure.
This finding was reported as statistically
significant with a p-value of 0.0001. - What does this value mean to us?
- It means that chance is an unlikely explanation
for the results. The extremely small p-value is
due to the size of the sample 2100 - Does this statistic alone mean that alcohol
consumption is a cause of hypertension?
25Methadone Example
- Buprenorphine tx was significantly better than
methadone 20 mg/day (plt0.001) and methadone
60mg/day was better than methadone 20mg/day
(plt0.04). - Would you conclude that the difference between
buprenorphine and methadone 20mg/d was more
highly significant than the difference between
the two methadone txs?
26Statistical Testing
- P values do not describe the size of differences.
- Statistical significance does not imply clinical
significance - Consider both statistical significance and
clinical significance when evaluating the true
usefulness of the study results and drug.
27Questions to answer before statistical test is
chosen
- 1. What is the main study hypothesis? Is is
one-tailed or two-tailed? - 2. Are the data independent or paired/matched?
- 3. What type of data are being measured?
- Scale level of measurement
28One tailed vs. Two tailed tests
0.025 tail (alpha0.05) 0.025 tail
29One Tailed vs. Two Tailed Tests
- The use of a 1-tailed test is appropriate only
when the authors have clearly stated a 1-tailed
hypothesis direction. - Example A study of the efficacy of zinc lozenges
in treating the common cold stated, We
determined the duration of cold symptoms in zinc
and placebo treated patients. A 2-sided p value
of 0.05 was used in the hypothesis testing.
30Parametric Tests
- 3 types of research questions asked
- 1. Are there differences between or among groups?
- 2. What are the associations among groups?
- 3. What conclusions and predictions can be made
as a result of the data collected? - Statistical tests are used to answer these
questions (parametric and non-parametric)
31Parametric Tests
- To use the data is assumed to follow a normal
distribution (or near normal) - Used for continuous level data measurements
(ratio or interval data) - More powerful than non-parametric tests. They are
better able to detect existing differences
between or among groups
32Non Parametric Tests
- Only used when Parametric tests cant be used.
- Used for Non-continuous data (nominal or ordinal
level data) - Used when data are not normally distributed or
they deviate from normal distribution
substantially
33Parametric Test T-test or Students T-test
- Sample data is normally (or near normally)
distributed. - The variances of the sample populations data are
nearly equal - The measurements within the sample population are
independent of each other. - Data from only 2 groups are being compared.
34Types of T-tests
- Non-paired t-test, Students t-test-- used when
different control and experiment groups are used. - Paired t-test- used when measurements are taken
in the same group of patients (before and after) - On either type alpha level predetermined, a
number called a t-value is calculated using a
table of t-values determine the p value
35T-tests
- One tailed t-test if a direction of difference
is postulated - Two tailed t-test if no direction of difference
is postulated.
36T-test Example
- A study was performed to determine if Pravastatin
is more effective than placebo in the lowering of
triglycerides. 50 patients with moderately
elevated triglycerides were randomly selected to
receive Pravastatin and 50 patients were give
placebo. A t-test was performed using the mean
triglyceride level in both groups at the end of
12 weeks. The difference in triglyceride levels
from baseline was statistically significant (plt
0.05)
37Example
- A study to see the effect of Amiodarone on serum
digoxin concentrations was done. 30 patients with
heart failure who were taking digoxin and had an
indication for amiodarone were enrolled if their
serum digoxin concentration was not more than 1.5
ng/ml. Each subject had a serum digoxin
concentration measured before starting Amiodarone
and after 3 months. - What type of t-test?
- Paired or non-paired
- one-tailed or two-tailed?
38Calculating the t-value
- Suppose a new drug is being tested to see if it
will decrease arterial pressure in people with
hypertension. - 2 sample groups, alpha level pre-set, 1-tail.
- Data collected, descriptive statistics applied
- t-value computed from equations
- Table of critical t-values consulted
39t-test calculation
XT - XC t varT
varC nT nC X
mean, T treatment, C control var variance , n
sample number
40Multiple T-tests
- Multiple t-tests are NOT appropriate when
comparing more than 2 groups. - The probability of making a type I error
increases as the number of tests are performed.
41Analysis of Variance (ANOVA)
- When comparing differences between 3 or more
groups. - More powerful than using multiple t-tests
- Type 1 error (alpha level) stays constant
regardless of the number of groups in the design. - Examining the variability both between and within
the study groups
42Analysis of Variance (ANOVA)
- Assumptions to use ANOVA
- data is continuous level (interval or ratio) and
near normally distributed. - The variance of the populations from which the
samples were drawn are nearly equal. - The observations or measurements within a
population or sample are independent of each
other.
43F-Ratio
- ANOVA involves calculation of a F-ratio, which is
compared with a critical F-ratio in a table. - F-Ratio Between groups variance
- Within groups variance
- Used to answer question Is the variability
between groups large enough in comparison to
variability within groups to say that the groups
differ.
44F-Ratio
- If the between groups variance and the within
groups variance are about equal, then the value
of the F-ratio will be small and the null
hypothesis will be assumed to be true. - If the F-ratio is large (between groups effect is
greater than any effect within groups), the more
likely it is that real differences in drug
treatments will be found. - F-ratios reported as plt0.05, etc.
45Analysis of Variance
- One-way ANOVA an independent variable in an
ANOVA is called a factor. Each factor may have
several levels such as a treatment factor with
several doses. A study of ONE factor is called a
one-way ANOVA. - A study of two different doses of a drug at 3
different times has two factors (drug/time). Use
a two-way ANOVA
46Analysis of Variance
- Repeated Measures ANOVA used when measurements
are repeated in the study groups over time. - Example The antihypertensive effect of a new
drug (Drug X) is being compared to Clonidine. BP
measurements for both drug groups are taken after
1 week, 2 weeks and 1 month.
47ANOVA Example
- Study to looked at efficacy of Desipramine,
Amitriptyline and placebo in pain relief for
peripheral neuropathy patients. The investigators
also studied whether depressed patients responded
any differently compared to non-depressed
patients using Desipramine and Amitriptyline. - What type of ANOVA would be indicated?
48Analysis of Variance
- After ANOVA, researcher states either that no
difference exists among groups or that a
difference exists somewhere, but this test does
not indicate which of the groups differs from the
others. Other tests are then applied to find the
specific group responsible for the differences.
(Least Signif.Difference (LSD) test, Neuman-Keuls
test, Dunnet test, Tukey, Dunn and Scheffes
procedures, Bonferroni Procedure
49Other Tests- Multiple Comparison Tests
- Dunnetts test- only when comparing several tx
group means with one control group mean. - Tukey, Scheffes- used when large number of
comparisons are made - Bonferroni- Correction to the use of multiple
t-tests, which takes into account the number of
comparisons being made
50Non-Parametric Statistical Tests
- Used when
- the data distribution diverges from expected
normal characteristics - when the scale level of measurement of data is
not continuous (ie. Nominal or ordinal) - when the data is continuous but does not meet the
criteria for parametric tests - Nonparametric tests lt powerful than parametric
51Nominal Data Tests
- Chi Square test (x2) used for frequencies and
proportions and distributions. - Matrix is made w/ rows and columns into cells.
Calculations are compared between observed values
and expected values. The greater the difference
between tx, the larger X2 will be. From
statistical table it is determined if the value
is large enough for statistical significance.
Reported as plt0.05
52Chi Square 2 x 2 table
Outcome Cimetidine Ranitidine Total GI
bleed 22 (18.5) 15 (18.5) 37 No bleed 18
(21.5) 25 (21.5) 43 ( ) expected frequencies if
no differences between groups. E11 row 1 total X
column 1 total N (total in all cells) E11
(2215) X (2218) 37 X 40 18.5 37
43 80
53Calculating the x2
- 1. Subtract each expected number from each
observed number in each cell. (O-E) - Square the difference. (O-E)2
- Divide the squares obtained for each cell of the
table by the expected number for that cell
(O-E)2 - E
- X2 the sum of all cells numbers
- Use X2 and df to get p-value from table.
54Chi Square Table
Distribution of Probability d.f. 0.5 0.10
0.05 0.02 0.01 0.00l 1 0.455 2.706
3.841 5.412 6.635 10.827 2 1.386 4.605
5.991 7.824 9.210 13.815 3 2.366 6.251
7.815 9.837 11.345 16.268 4 3.357 7.779
9.488 11.668 13.277 18.465 5 4.351 9.236
11.070 13.388 15.086 20.517 degrees of freedom
columns minus one X rows minus one
55Chi Square Example
Example A study was done to determine efficacy
of Cimetidine, Ranitidine and Famotidine for
preventing GI bleeding in critically ill
patients. Each drug was given to 40 pts. The
number of pts. experiencing a GI bleed were 22,
15 and 16 respectively. Outcome Cimetidine Ranitid
ine Famotidine GI bleed 22 15 16 No
bleed 18 25 24 How many total cells would this
table contain?
56Chi Square Test (x2) For Nominal Data
- Assumptions for use with Chi Square test
- total number of observations must be greater than
20 - No more than 20 of cells have expected frequency
of less than 5. - For a 2 X 2 table, no cell has expected freq. lt5
- For a 2 X 3 table, 1 cell can have exp.freq lt5
- Samples must be independent from each other
57Yates Correction Factor
- Applied to 2 x 2 chi-square tables and relatively
small sample size (lt40) - Also called Continuity Correction
- Advantage reduces the risk of a type I error.
But when type I error risk reduced, type II
error risk is increased. If type II error risk
increased, power of test decreases.
58Non Parametric Tests For Nominal Data
- Fishers Exact Test
- only used for 2 X 2 tables.
- Used for Chi Square when the total of observ.
Is lt20, but each cell still has an expected freq.
of not less than 5 OR total of observ. Is gt20,
but one of the cells has an expected frequency
value of less than 5.
59Fishers Exact Test Example
Example The efficacy of heparin and low dose
warfarin for the prevention of deep vein
thrombosis (DVT) was determined. 20 patients
received heparin and 25 received warfarin. DVT
occurred in 3 patients in the heparin group and 5
patients in the warfarin group. Outcome Heparin Wa
rfarin Total DVT 3 (3.6) 5 (4.4) 8 No
DVT 17 20 37
60Mantel-Haenszel Procedure
- Correction test used when you want to adjust for
extraneous independent variables that could
influence the outcome of a test. - When a Fisher Exact test hasnt found statistical
significance, sometimes when combined with the
Mantel-Haenszel one can obtain statistical
significance.
61Non Parametric Tests For Nominal Data- McNemars
Test
- McNemars Test
- Level of Data is nominal
- Used when study samples are not independent
- Can also be applied to prospective cohort studies
or case-control studies in which each member is
paired with a control.
62McNemar Test Example
The efficacy of two anti-nausea drugs was
determined in 20 patients receiving chemotherapy
every month. During the first month of
chemotherapy, the patients received drug l and
the presence or absence of nausea was determined.
The next month, the patients received Drug 2 and
the presence or absence of nausea was again
assessed. The data are nominal (presence or
absence) The same group of patients were
evaluated (paired sample- not independent
63Non-Parametric Tests for Ordinal Data
- Mann-Whitney U test (Single comparison)--
comparison between only 2 independent groups
(Drug A group vs. Drug B group. - Similar to the non-paired t-test in parametric
tests. - Used for ordinal data, or for continuous data
only when its not normally distributed, or when
assumptions for t-test are violated. - Can identify where signif. differences exist
between pairs
64Non Parametric Tests for Ordinal Data
- Wilcoxon Signed Rank test
- Used for ordinal data or interval level data when
other t-test assumptions arent met. - Used when 2 sample groups are NOT independent
(when same patients receive the different tx and
serve as their own controls) - Counterpart to paired t-test.
65Non Parametric Tests for Ordinal Data (multiple
comparison test)
- Kruskal-Wallis test (counterpart to one-way
ANOVA).Used for - comparison of 3 or more samples.
- Used on ordinal level data or continuous when
there might be violations for ANOVA - Analysis of one independent variable
- Does not tell where significant difference is.
- Use Mann-Whitney U test to locate differences.
66Non Parametric Tests for Ordinal Data (Multiple
comparison tests)
- Friedman Test- For comparison among more than 3
groups. - counterpart to repeated measures ANOVA
- used for ordinal level data or continuous level
data when other assumptions for ANOVA are
violated.
67Correlation Analysis
- The statistical method to describe the strength
and direction of the association of 2 or more
variables. - Example Is there a relationship between
cigarette smoking and atherosclerotic heart
disease?
AHD
cigarettes/day
68If two variables are correlated are they causally
related?
- Do not confuse correlation with causation.
- Correlation shows that the two variables are
associated. There may be a third variable or a
confounding variable that is related to both of
them. - Example Monthly deaths by drowning and monthly
sales of ice-cream are positively correlated, but
does one cause the other?
69Correlation between 2 variables
- In an efficacy study, you may want to know the
relationship between drug dosage and the degree
of response. - If the 2 variables are correlated, the values for
one variable will vary depending upon the values
of the other. - The number calculated from the analysis is called
a correlation coefficient r
70Correlation Coefficient r
- r only ranges in value from 1 to -1.
- If r0, there is NO linear association between
variables.(data points scattered) - As r approaches 1, the association becomes
stronger (more linear) in a postive direction.
One variable increases, the other variable
increases also. - As r approaches -1, assoc. becomes stronger in
a negative direction.
71Correlation Coefficient r
R 0.86
R 0.1
R - 0.92
72Correlation Analysis Tests
- Pearson (Pearson product-moment) r
- Used to describe the strength and direction of
the relationship between 2 variables that are - continuous level data (interval/ratio)
- follow a normal distribution pattern
- have a linear relationship
73Correlation Analysis Tests
- Spearman (Spearman rank order) r
- This is used to describe the strength and
direction of relationship between 2 variables - in which at least 1 variable is ordinal level
data - or which are continuous but do not follow normal
distribution patterns.
74Correlation Coefficient r
- An r from 0 to /- 0.25 indicates no or only
slight linear relationship between var. - An r from /- 0.25 to /- 0.75 indicates linear
relationship which is weak to fairly strong - An r from /- 0.75 - /- 1 indicates a strong
to very strong linear relationship between
variables.
75Correlation Coefficient r
- For a given r value, you can predict how much
of the variability in one measurement can be
accounted for by the presence of the other. - If you square r (r2), the resulting number is
used as a percentage estimate of this variability
76Example
- The correlation of the blood pressure lowering
effect of Atenolol with patients renal function
status was reported as r 0.4 - What percent of the variability in the patients
BP response to Atenolol can be explained by
differences in renal function? - r2 0.42 0.16 16 can be explained by
differences in renal function.
77Correlation Coefficient (r)
- There is no direct proportional relationship
between different r values. (0.4 is not half the
strength of 0.8) - Outlying data points will effect the strength of
the linear relationship and will lower or raise
the r value, depending on where the outliers are.
78Statistical Significance of Correlation
- p values will be reported along with correlation
coefficients so you can determine if the r
value is statistically significant. But remember - The larger the number of data points in a
correlation analysis, the more likely it is that
even a small r value will be statistically
signif. - A statistically signif. r value doesnt mean
clinical significance.
79Correlation Example
1.A study examines the relationship between a
patients height and the blood pressure effect of
Clonidine. 500 patients were included and a
correlation coefficient of r 0.17 was
determined. (p0.026) Can you conclude a
substantial relationship exists between height
and Clonidines BP effect? 2. How about with
an r 0.85 and a p 0.063? 3. Would an r 0.85
represent a stronger degree of correlation
between 2 variables than an r - 0.85?
80Example Problem
- You collect data on the ages and weights of 100
persons- both children and adults- who were
attending picnic. If you calculate the
correlation coefficient between the ages and the
weights of these 100 persons,which of the
following would most likely be the correlation
you would find? - A. -0.90 B. 1.00 C. 0.02 D. 0.50
81Regression
- Linear Regression (simple and multivariate)
- Association between one variable as a function of
the other variable - Pearson Product moment testnormal distr,
continuous data - Spearman Rank test ordinal data
- Logistic Regression
- Stepwise multiple regression that involve
manipulating multiple variables simultaneously to
determine which best predicts the outcome. - Uses nominal data and is used to compute odds
ratio
82Regression when two variables are related
- Correlation describes the strength of association
between two variables and is completely
symmetrical. - If two variables are related means that when one
changes by a certain amount the other changes by
a certain amount also. - The relationship of how they are related is
called the regression line and is based on the
math computation of a straight line - y a Bx (remember y 2x b )
83Regression line
84Multiple Regression Analysis(Multivariate
Analysis)
- Testing multiple variables
- Testing whether confounding variables influence
outcomes - Example relationship between age and blood
pressure, with sodium intake as a possible
confounder variable.
85Analysis of Covariance (ANCOVA)
- Combination of ANOVA (discrete independent
variables) and regression (quantitative
independent variables) - Discrete variable is treatment or class
variable - Quantitative independent variable is covariate,
continuous or independent variable. - This test will show the interactions and
influences of the covariate on the class variable
86Survival analysis
- Collection of statistical procedures for analysis
of data in which the variable of interest is
time until an event occurs. - Often seen as hazard ratio when the event of
interest is death, the hazard association with
a particular moment in time is the probability of
death at that moment or survival until that moment
87Survival Analysis cont..
- Kaplan-Meier curves show the probability of being
alive at any specified future time. - Kaplan-Meier curves for more than 2 groups can be
evaluated to determine whether the curves are
statistically different by using the log rank
test (which is a large sample chi square test)
88Survival Analysis cont
- Cox proportional hazard models
- Non-parametric test that provides an estimate of
a hazard function. The ratio of the hazards for 2
individuals, one with and one without a risk
factor, given equal covariates can be estimated.