Title: Revision
1Revision
- A hypothesis test is a formal way to determine if
a difference/change is due to an underlying
difference/change in the population or due to
chance. - Compare means from two independent
groups independent t-test - Compare whether the mean change (before vs after)
is equal to zero paired t-test
2Hypothesis Test (1)
Paired samples t-test Compare the mean of the
differences from paired data (to zero) H0
?d 0 Vs H1 ?d ? 0
Test is based on data collected from a paired
data set. The statistics of interest are
mean difference and SDd sd
0
tn-1
Test statistic
SE( )
3Hypothesis Test (2)
Independent samples t-test Compare the
population means in two independent groups
H0 ?A ?B Vs H1 ?A ? ?B
H0 ?A ?B 0 Vs H1 ?A ?B ? 0
Test is based on data collected from two
independent samples of sizes nA and nB. The
statistics of interest are
means and and SDs sA
and sB
4 0
( )
tn-2
Test statistic
SE( )
with n nA nB lt 30
NB For both tests, if ngt30 the test statistic
with t-distribution can be assumed normally
distributed with mean 0 and standard deviation 1,
N(0,1).
5Assumptions
The assumption for the paired t-test is that the
differences have an approximately Normal
distribution. The assumptions for the
independent groups t-test is that the samples
come from populations having Normal distributions
with the same variance.
6Six main tests
7Hypothesis testing part 2
8Comparison of means
- What if there are more than two groups?
- Compare each pair of groups? NO
- The appropriate statistical test for when there
are more than two groups is Analysis of Variance
(ANOVA)
9Research question
- Is there a relationship between smoking and body
weight? - Subjects smoking status is classified as
- smoker
- given up smoking
- never smoked
- We want to compare body weight between each of
the three smoking status categories
10Null hypothesis
- The null hypothesis is that there is no
difference in population means between the groups - The alternative hypothesis that there is a
difference in population means between the groups
11Assumptions of ANOVA
- The sample data should come from populations
which follow a normal distribution - The variances of these populations should be the
same (homoscedasticity)
12Comparison of means after ANOVA
- ANOVA will only indicate whether the means differ
or not, it will not inform you of where the
differences can be found (e.g. non-smokers
weight differs from smokers but non-smokers
weight does not differ from ex-smokers) - Multiple comparison procedures can be used to
test where the differences lie - e.g. Scheffes test
13Descriptive statistics
Three means being compared
14Levenes test
Check to see that the data in the three groups
are equally spread out or not.
15ANOVA table
Formal hypothesis test to see whether the three
means are all equal or significantly different to
each other.
16Multiple comparison test
17- SPSS practicals page 91
- Task 2c requires ANOVA
- Instructions on pages 92-93
18Recap
- What is hypothesis testing?
- Theory
- In practice
- Comparison of two independent means
- Comparison of two paired means
- Comparison of more than 2 independent means
Independent t-test
Paired t-test
ANOVA
19IMPORTANT
- Open Door Sessions (1.005)
- WEDNESDAYS FROM 2pm TO 4pm
- Can make other appointments
- Student Clinics (1.026)
- DAILY FROM 1pm TO 2pm
20Tomorrows Practical
- SPSS exercise in page 91 of handbook,
- SPSS data file as always in
- F\data\Public Health\
- Tasks 3 and 4 to be completed tomorrow.
21HOW TO Print handouts of lectures, Add
comments in SPSS output, Export SPSS output to
Word, Read important values from tables.
22(No Transcript)
23(No Transcript)
24Insert
New Text
25NB Delete ALL Notes BEFORE Exporting Output to
Word
26File
Export
?
choose where to save your word document and give
it a name (eg. output.doc).
?
27NB To this new word file you can add more text,
delete and cut existing text, as well as shrink
and expand the graphs.
28Assignment
- Data file assignment2006.sav
- and a word file icu assignment2006.doc with
another copy of the instructions are in - F\data\Public Health
29Assignment
- Consider how to describe the data
- Formulate the research questions you would like
the data to answer - If a statistical test is required, decide what is
the most appropriate test and why - Then go to the data and find the information
required and carry out the statistical tests - Extract important information and include in
write up - Do NOT include large amounts of unedited SPSS
output
30Deadline
- The deadline for the assignment is
- 12 NOON, MONDAY 20th NOVEMBER
- You can hand it in earlier if you wish
- You are welcome to discuss this work with other
people - Please ensure that your report is your own work
and is different to the reports of other students - Copied or plagiarised reports will be referred to
the Head of Department
31(No Transcript)
32Comparing Proportions
- Used for summarising qualitative data
- Calculated for different categories
- The proportions being compared usually come from
a crosstabulation of two categorical variables - Sample proportions may be used to draw inferences
about population proportions - These inferences may be expressed as confidence
intervals or used in hypothesis testing
33Comparison of proportions from two independent
groups
- Is there a difference in the population
proportion of men that are current smokers and
the population of women that are current smokers? - Is there an association between gender and
smoking status?
34Null and alternative hypothesis
- The null hypothesis is that there is no
difference in the population proportions of males
who smoke and females who smoke - (or no association exists between gender and
smoking status in the population) - The alternative hypothesis is that there is a
difference in the population proportions - (an association exists between gender and smoking
status in the population)
35Chi-squared test for association
- Continuity Correction
- (tables with 2 rows and 2 columns)
- Fishers Exact Test
- (tables with 2 rows and 2 columns)
- Pearson Chi-squared Test
- (tables that have more than 2 rows or columns)
- Chi square test for trend
- (when at least one of the variables is ordinal)
36Contingency table
Total of row
Total of column
Grand Total
37Probability of two independent events
- Recall
- P(A and B) P(A) x P(B)
- If two events are independent, the result of one
event is not dependent on the result of the other
event
38Probability of an event
- If independent then
- P(male and smoker)
- P(smoker) x P(male)
- 82 x 120
- 272 272
- 0.133
- Therefore if smoking and gender are independent
of one another the probability that a person in
the population will be a male and a smoker is
0.133, - so in a sample of 272 people we would expect to
find - 0.133 x 272 36.2 male smokers
39Expected frequencies
- If two events are independent to one another,
the expected frequency of events will be equal
to - Expected count
- (row total x column total)/ grand total
- 82 x 120 / 272
- 36.2
40Observed frequencies (expected frequencies)
41Chi-squared test
- The test of the null hypothesis is based on the
difference between the observed and expected
frequencies - Under the null hypothesis this test statistic
follows the Chi-squared distribution - The value of the test statistic is then compared
with the appropriate Chi-squared distribution
(first proposed by Pearson)
42Chi-Squared
- Another important distribution related to the
normal is the Chi-squared distribution. - It is used when investigating categorical data.
- Large positive values occur with very low
probability.
43Chi-squared test
- The greater the differences between the observed
and expected statistics, the larger the
Chi-squared statistic is, the more evidence that
the two variables are associated. - Comparison of the observed Chi-squared statistic,
with tabulated critical values, will determine
whether the evidence of association is
significant at a given significance level.
44Assumptions of Chi-squared test
- When sample sizes are small, the expected
frequencies may be small. For the Chi-squared
test to be valid, no more than 20 of the cells
should have an expected frequency of less than 5
and no cells should have an expected frequency of
less than one. - If this does not hold the alternative test is
- Fishers Exact test
- (note for usually only for 2X2 tables)
45SPSS output
Note that 33.3 of males and 27.6 of females
smoke
46SPSS output
P-value
47Hypothesis
48Confidence intervals
- The difference in proportions is approximately 6
(remember33.3 - 27.6) - 95 CI for this difference can be obtained and is
6 (-5 to 17) - Note that the null hypothesised value 0 is
included in the 95 CI - This indicates that the null hypothesis can NOT
be rejected at the 5 significance level
49Larger contingency tables
- Have seen that the Chi-squared test can be used
to test for a difference between two proportions - The Chi-squared test can also be applied to
larger contingency tables - Example
- Association between smoking (smoker, ex-smoker,
never smoked) and gender
50Counts (expected frequencies)
51SPSS output
P-value
52Chi-squared test for association
- Continuity Correction
- for 2 x 2 tables
- Fishers Exact Test
- if greater than 20 of expected values are less
than 5 (calculated for 2 x 2 tables only) - Pearson
- for tables that have more than 2 rows or columns.
- Mantel-Haenszel Test for trend (Chi square test
for trend) - When one of the variables is ordinal
53Conclusion
- Recall
- H0 No association exists between gender and
smoking status (i.e. the variables are
independent) - H1 There is an association between gender and
smoking status - There is no evidence to suggest that there is an
association between gender and smoking status
54Comparison of proportions from two related groups
- Matched case-control study was conducted to
investigate risk factors for diarrhoea in
children. - Does endometrial ablation (a conservative
alternative to hysterectomy) have an effect on
the presence of pain symptoms in women?
Discomfort was assessed before and 6 months after
surgery.
55McNemar test
- When the proportions are related (paired) the Chi
square test is no longer valid since the
observations in the contingency table are not
independent of one another. - The appropriate statistical test to apply is the
McNemar test
56McNemar test
57SPSS output
P-value
58Confidence intervals
- The difference in proportions and the 95 CI is
- 17 (5 to 28)
- Note that the null hypothesised value of 0 is not
included in the 95 CI - Therefore the null hypothesis can be rejected at
the 5 significance level
59McNemar test
- 24 women had symptom at pre-op and at post op
- 10 women had no symptom pre-op and still had no
symptom post op - 26 women who had symptom pre-op no longer had the
symptom post op compared to 18 women who did not
have the symptom pre-op but did post-op
60McNemar test
- 78 women had assessment pre and post operatively
- P 0.291
- No evidence to reject the null hypothesis
- Surgical treatment has had no statistically
significant impact on whether patient has symptom
P-value
61Hypothesis testing or estimation?
- Quantification of the results by simple estimates
is an essential part of the analysis of data - A single number (P value) cannot convey all the
necessary information appropriate estimates and
confidence intervals are required as well
62Example
- In a study of patients admitted to an
otolaryngology ward, 140 with nose bleeds were
compared to 113 controls with other conditions.
Patients were interviewed about their alcohol
consumption (McGarry, 1994).
63Data
64Discuss the statement made by the authors
- The proportion of non-drinkers in the patients
with nose bleeds was similar to that in the
controls (34 vs 35), but the proportion of
regular drinkers was significantly higher (45 vs
30), Plt0.025, ? test of proportions)
65Discussion points
- The authors appear to have tested each line of
the 3x2 cross tabulation - 3 hypothesis tests using the same data
- Increased chance of type I error
- They should have done one single Chi-squared test
on all the data - Since categories of drinking are ordered an
alternative would been to have done a Chi-squared
test for trend - SPSS does both of these automatically when you
ask for a Chi-squared test