Title: Chapter 1 Looking at Data
1Desipramine is an antidepressant affecting the
brain chemicals that may become unbalanced and
cause depression. It was tested for recovery from
cocaine addiction. Treatment with desipramine was
compared to a standard treatment (lithium, with
strong anti-manic effects) and a placebo. Is
desipramine effective in preventing relapses?
2Hypothesis no association
We use the chi-square (c2) test to assess the
null hypothesis of no relationship between the
two categorical variables of a two-way
table. Two-way tables sort the data according to
two categorical variables. We want to test the
hypothesis that there is no relationship between
these two categorical variables (H0).
3Expected Cell Counts
- To test this hypothesis, we compare actual counts
from the sample data with expected counts, given
the null hypothesis of no relationship. - The expected count in any cell of a two-way table
when H0 is true is
4Desipramine is an antidepressant affecting the
brain chemicals that may become unbalanced and
cause depression. It was thus tested for recovery
from cocaine addiction. Treatment with
desipramine was compared to a standard treatment
(lithium, with strong anti-manic effects) and a
placebo.
Cocaine addiction
15/25.6
Observed
7/26.27
4/23.17
Overall non-relapse
Expected relapse counts No
Yes
35
35
35
2526/74 8.78250.35 16.22250.65
9.14260.35 16.86250.65
8.08230.35 14.92250.65
Desipramine Lithium Placebo
NOTE 26/74 0.35
5The Chi-Square Statistic
To see if the data give convincing evidence
against the null hypothesis, we compare the
observed counts from our sample with the expected
counts assuming H0 is true. The test statistic
that makes the comparison is the chi-square
statistic.
The chi-square statistic is a measure of how far
the observed counts are from the expected counts.
The formula for the statistic is Where
observed count represents an observed cell
count, expected count represents the expected
count for the same cell, and the sum is over all
r ? c cells in the table.
6The Chi-Square Distributions
If the expected counts are large and the observed
counts are very different, a large value of c2
will result, providing evidence against the null
hypothesis. The P-value for a c2 test comes from
comparing the value of the c2 statistic with
critical values for a chi-square distribution.
7When is it safe to use a ?2 test?
- We can safely use the chi-square test when
- The samples are simple random samples (SRS).
- All individual expected counts are 1 or more (1)
- No more than 20 of expected counts are less than
5 (lt 5) - ? For a 2x2 table, this implies that all
four expected counts should be 5 or more.
8Cocaine addiction
Observed
Try using Software. Statistical software output
for the cocaine study - look at the last lines of
output
Tests
N DF -LogLike RSquare (U)
The p-value is lt0.005 or half a percent. This is
very significant. We reject the null hypothesis
of no association and conclude that there is a
significant relationship between treatment
(desipramine, lithium, placebo) and outcome
(relapse or not).
74 2 5.375719 0.1121
Test ChiSquare ProbgtChiSq
Likelihood Ratio 10.751 0.0046
Pearson 10.729 0.0047
HW Read section 9.1 do 9.1, 9.2, 9.7, 9.15,
9.16, 9.19, 9.20, 9.28, 9.36(a)
9Summary Analyzing two-way tables
- When analyzing relationships between two
categorical variables, follow this procedure - 1. Calculate descriptive statistics that convey
the important information in the tableusually
column or row percents. Always take percents with
respect to the explanatory variable's totals! - 2. Find the expected counts and use them to
compute the X2 statistic. - 3. Compare your X2 statistic to the chi-square
critical values from Table F to find the
approximate P-value for your test - or use
software to do all the computations, including
the expected values! - 4. Draw a conclusion about the association
between the row and column variables - don't
forget the context!
10Finding the p-value with Table F
The ?2 distributions are a family of
distributions that can take only positive values,
are skewed to the right, and are described by a
specific degrees of freedom.
Table F gives upper critical values for many ?2
distributions.
11Table F
df (r-1)(c-1)
In a 3x2 table, df2x12
If c2 10.73, the p-value is between 0.0025 and
0.005 From software we have P.0047