Title: Medical Statistics: Hypothesis Testing
1Medical Statistics Hypothesis Testing
Nimrod Lavi, MD Adhir Shroff, MD, MPH
2Agenda
- Types of variables
- Descriptive statistics
- What is a hypothesis
- Definition of a p-value
- Sample vs. universe
- Comparative statistics
- T-tests
- Chi-square
3Agenda
- Types of variables
- Descriptive statistics
- What is a hypothesis
- Definition of a p-value
- Sample vs. universe
- Comparative statistics
- T-tests
- Chi-square
4Continuous variable
- One in which research participants differ in
degree or amount. - susceptible to infinite gradations (p. 176,
Pedhazur Schmelkin, 1991) - Examples height, weight, age
5Categorical variable
- Participants belong to, or are assigned to,
mutual exclusive groups - Nominal
- Used to group subjects
- Numbers are arbitrary
- Examples sex, race, dead/alive, marital status
- Ordinal (rank)
- Given a numerical value in accordance to their
rank on the variable - Numerical values assigned to participants tells
nothing of the distance between them - Examples class rank, finishers in a race
6Independent vs Dependent Variable
- Independent
- predictor variable
- Usually on the x axis
- Dependent
- outcome variable
- Usually on the y axis
- The independent variable (a treatment) leads to
the dependent variable (outcome) - Ultimately, we are interested in differences
between dependent variables
Dependent
Independent
7Agenda
- Types of variables
- Descriptive statistics
- What is a hypothesis
- Definition of a p-value
- Sample vs. universe
- Comparative statistics
- T-tests
- Chi-square
8Descriptive Statistics
- These are measures or variables that summarize a
data set - 2 main questions
- Index of central tendency (ie. mean)
- Index of dispersion (ie. std deviation)
9Descriptive Statistics
- Data set for ICD complications in 2005
- 14 patients
- Sex F, F, M, M, F, F, F, M, F, M, M, F, F, F
- Make G, S, G, G, G, M, S,S, G,G, M, S
- Central tendency is summarized by proportion or
frequency - Sex
- M 5/14 .36 or 36
- F 9/14 .64 or 64
- Make
- G 6/12 .5 or 50
- S 4/12 .33 or 33
- M 2/12 .17 or 17
- Dispersion not really used in categorical data
10Descriptive Statistics
- Data set SBP among a group of CHF pts in VA
clinic - 13 patients
- 100, 95, 98, 172, 74, 103, 97, 106, 100, 110,
118, 91, 108
- Central Tendency
- Mean
- mathematical average of all the values
- S (xixiixn)/n
- Median
- value that occupies middle rank, when values are
ordered from least to greatest - Mode
- Most commonly observed value(s)
11Descriptive Statistics
- Data set SBP among a group of CHF pts in VA
clinic - 13 patients
- 100, 95, 98, 172, 74, 103, 97, 106, 100, 110,
118, 91, 108
- Central Tendency
- Mean
- mathematical average of all the values
- S (xixiixn)/n
- (100959817274103 97106100110118
91108)/13 105.5
12Descriptive Statistics
- Data set SBP among a group of CHF pts in VA
clinic - 13 patients
- 100, 95, 98, 172, 74, 103, 97, 106, 100, 110,
118, 91, 108
- Central Tendency
- Median
- value that occupies middle rank, when values are
ordered from least to greatest - 74, 91, 95, 97, 98, 100, 100,
- 103, 106, 108, 110, 118,
- 172
- Useful if data is skewed or there are outliers
13Descriptive Statistics
- Data set SBP among a group of CHF pts in VA
clinic - 100, 95, 98, 172, 74, 103, 97, 106, 100, 110,
118, 91, 108
- Index of dispersion
- Standard deviation
- measure of spread around the mean
- Calculated by measuring the distance of each
value from the mean, squaring these results (to
account for negative values), add them up and
take the sq root
14Descriptive Statistics Normal
15Descriptive Statistics Confidence Intervals
- Range of values which we can be confident
includes the true value - Defines the inner zone about the central index
(mean, proportion or ration) - Describes variability in the sample from the mean
or center - Will find CI used in describing the difference
between means or proportions when doing
comparisons between groups
Altman DG. Practical Statistics for Medical
Research 1999
16Descriptive Statistics Confidence Intervals
- For example, a 95 CI indicates that we are 95
confident that the population mean will fall
within the range described - Can be used similar to a p-value to determine
significant differences - CI is similar to a measure of spread, like SD
- As sample size increase or variability in the
measurement decrease, the CI will become more
narrow
17Descriptive Statistics Confidence Intervals
L a n c e t 1999 3 5 4 7 0 8 1 5
- Prospective, randomized, multicenter trial of
different management strategies for ACS - 2500 pts enrolled in Europe with 6 month
follow-up - Primary endpoints Composite endpoint of death
and myocardial infarction after 6 months
18Descriptive Statistics Confidence Intervals
L a n c e t 1999 3 5 4 7 0 8 1 5
19Descriptive Statistics Confidence Intervals
L a n c e t 1999 3 5 4 7 0 8 1 5
Risk ratio Riskinvasive / Risknoninvasive
When CI cross 1 or whatever designates
equivalency, the p-value not be significant.
20Descriptive Statistics Confidence Intervals
L a n c e t 1999 3 5 4 7 0 8 1 5
- Review
- Calculate
- RRR, ARR, NNT
ARR 12.1 - 9.4 2.7
NNT 100 / ARR 100 / 2.7 37
21Agenda
- Types of variables
- Descriptive statistics
- What is a hypothesis
- Definition of a p-value
- Sample vs. universe
- Comparative statistics
- T-tests
- Chi-square
22Hypothesis
- Statement about a population, where a certain
parameter takes a particular numerical value or
falls in a certain range of values. - Examples
- A director of an HMO hypothesizes that LOS p AMI
is longer than for CHF exacerbation - An investigator states that a new therapy is 10
better than the current therapy - Bivalirudin is not-inferior to heparin/eptifibitid
e for coronary PCI
23Null Hypothesis (Ho)
- Innocent until proven guilty
- Null hypothesis (Ho) usually states that no
difference between test groups really exists - Fundamental concept in research is the concept of
either rejecting or conceding the Ho - State the Ho
- A director of an HMO hypothesizes that LOS p AMI
is longer than for CHF exacerbation - An investigator states that a new therapy is 10
better than the current therapy - Bivalirudin is not-inferior to heparin/eptifibitid
e for PCI
24Null Hypothesis (Ho) Courtroom Analogy
- The null hypothesis is that the defendant is
innocent. - The alternative is that the defendant is guilty.
- If the jury acquits the defendant, this does not
mean that it accepts the defendants claim of
innocence. - It merely means that innocence is plausible
because guilt has not been established beyond a
reasonable doubt.
Graduate Workshop in Statistics Session 4.
Hamidieh K. 2006 Univ of Michigan
25Agenda
- Types of variables
- Descriptive statistics
- What is a hypothesis
- Definition of a p-value
- Sample vs. universe
- Comparative statistics
- T-tests
- Chi-square
26Extrapolation of Research Findings
- Sample population vs. the world
- If your study shows that treatment A is better
than treatment B - You cannot conclude that treatment A is ALWAYS
better than treatment B - You only sampled a small portion of the entire
population, so there is always a chance that your
observation was a chance event
27Extrapolation of Research Findings
- At what point are we comfortable concluding that
there is a difference between the groups in our
sample - In other words, what is the false-positive rate
that we are willing to accept - What is this called in statistical terms?
28Agenda
- Types of variables
- Descriptive statistics
- What is a hypothesis
- Definition of a p-value
- Sample vs. universe
- Comparative statistics
- T-tests
- Chi-square
29Definition of p-value
- With any research study, there is a possibility
that the observed differences were a chance event - The only way to know that a difference is really
present with certainty, the entire population
would need to be studied - The research community and statisticians had to
pick a level of uncertainty at which they could
live
30Definition of p-value
- This level of uncertainty is called type 1 error
or a false-positive rate
31Two Types of Errors
Trt has no effect
Trt has an effect
Graduate Workshop in Statistics Session 4.
Hamidieh K. 2006 Univ of Michigan
32Definition of p-value
- This level of uncertainty is called type 1 error
or a false-positive rate (a) - More commonly called a p-value
- Statistical significance will be recognized if
p 0.05 (can be set lower if one wishes)
33Trade-Off in Probability for Two Errors
There is an inverse relationship between the
probabilities of the two types of
errors. Increase probability of a type I error
? decrease in probability of a type II error
.05
.01
Graduate Workshop in Statistics Session 4.
Hamidieh K. 2006 Univ of Michigan
34Definition of p-value
- This level of uncertainty is called type 1 error
or a false-positive rate (a) - More commonly called a p-value
- In general, p 0.05 is the agreed upon level
- In other words, the probability that the
difference that we observed in our sample
occurred by chance is less than 5 - Therefore we can reject the Ho
35Definition of p-value
Stating the Conclusions of our Results
- When the p-value is small, we reject the null
hypothesis or, equivalently, we accept the
alternative hypothesis. - Small is defined as a p-value ? a, where a
acceptable false () rate (usually 0.05). - When the p-value is not small, we conclude that
we cannot reject the null hypothesis or,
equivalently, there is not enough evidence to
reject the null hypothesis. - Not small is defined as a p-value gt a, where a
acceptable false () rate (usually 0.05).
Graduate Workshop in Statistics Session 4.
Hamidieh K. 2006 Univ of Michigan
36Agenda
- Types of variables
- Descriptive statistics
- What is a hypothesis
- Definition of a p-value
- Sample vs. universe
- Comparative statistics
- t-tests
- Chi-square
37(No Transcript)
38Two Sample Tests Continuous Variable
- t-test
- Comparing two groups, statistical significance is
determined by - Magnitude of the observed difference
- Bigger differences are more likely to be
significant - Spread, or variability, of the data
- Larger spread will make the differences not
significant
39Two Sample Tests Continuous Variable
40Two Sample Tests Continuous Variable
- t-test
- Comparing two groups, statistical significance is
determined by - Magnitude of the observed difference
- Bigger differences are more likely to be
significant - Spread, or variability, of the data
- Larger spread will make the differences not be
significant - Key is to compare the difference between groups
with the variability within each group
41Two Sample Tests Continuous Variable
- Types t-tests
- Student t-test or two sample t-test
- Used if independent variables are unpaired
- Example
- A randomized trial to high dose statin versus
placebo post AMI - Paired t-test
- Used if independent variables are paired
- Each person is measured twice under different
conditions - Similar individuals are paired prior to an
experiment - Each receives a different trt, same response is
measured - Example
- A study of ejection fraction in patients before
and after Bi-V pacing
42Two Sample Tests Continuous Variable
- t-test
- Tails
- Two-tailed
- Most commonly used in clinical research studies
- Means that the treatment group can be better or
worse than the control group - One-tailed
- Used only if the groups can only differ in one
direction
43Example t-test
- What type of test should be run?
- How are the data related or are they?
- Data entered into a statistical program
- p value 0.2329, not significant
44Agenda
- Types of variables
- Descriptive statistics
- What is a hypothesis
- Definition of a p-value
- Sample vs. universe
- Comparative statistics
- T-tests
- Chi-square
45Two Sample Tests Categorical Variables
- Chi square (?2) analysis
- Data that is organized into frequency, generate
proportions - Based on comparing what values are expected from
the null hypothesis to what is actually observed - Greater the difference between the observed and
expected, the more likely the result will be
significant
46Chi square (?2) analysis
Outcome
Therapy
Totals
abcd
- Null hypothesis states that outcomes of therapy
A and B are equally successful - This is how the expected outcomes are determined
47Chi square (?2) analysis
Outcome
Therapy
Totals
abcd
- Next the actual observed values are then
recorded - With this information the ?2 value can be
calculated and a p-value will be generated
48Example ?2 analysis
- Arrange data into a 2x2 table
- Treatment groups along the vertical axis,
Outcomes alone the horizontal axis
49Example ?2 analysis
- Data entered into a statistical program
- P-value 0.6392
- Not a significant difference
50Example Ear Infections and Xylitol
Experiment n 533 children randomized to 3
groups Group 1 Placebo Gum Group 2
Xylitol Gum Group 3 Xylitol
Lozenge Response Did child have an ear
infection?
Group Infection Count 1 placebo Y
49 2 gum N 150 3 lozenge
Y 39 4 placebo N 129 5 gum
Y 29 6 lozenge N 137
Graduate Workshop in Statistics Session 5.
Hamidieh K. 2006 Univ of Michigan
51Two Sample Tests Categorical Variables
Outcome
Therapy
52Example Ear Infections and Xylitol
Compute expected count for each cell Expected
count (Row total) ? (Column total) / Total n
Example 39.1 (178 117) / 533 Or
intuitively, calculate overall infection rate
total number infected / total number
117/533 .2195 Now, assuming no difference
between treatments, the infection rate will be
the same in each group .2195 x total for each
group .2195 x 178 39.1
Graduate Workshop in Statistics Session 5.
Hamidieh K. 2006 Univ of Michigan
53Example Ear Infections and Xylitol
? From a table, p 0.035
Graduate Workshop in Statistics Session 5.
Hamidieh K. 2006 Univ of Michigan
54Conclusion
- There are many ways to describe ones data
- P-values are the maximum acceptable false
positive rate - Remember the Courtroom Analogy when it comes to
the Null hypothesis - Choice of statistical test depends on type of
variable and number of comparison groups
55References
- Neely JG, et al.
- Laryngoscope, 11212491255, 2002
- Laryngoscope, 11315341540, 2003
- Laryngoscope, 1131719 1724, 2003
- Guyatt G, et al. Basic Statistics for Clinicians.
CMAJ. 1/1/95 - http//www-personal.umich.edu/khamidie/?MA
- Altman, DG. Practical Statistics for Medical
Research. 1999.
56