Title: Please turn off cell phones, pagers, etc.
1Please turn off cell phones, pagers, etc. The
lecture will begin shortly.
2Lecture 20
This lecture will introduce topics from Chapter
12.
- 2 2 frequency tables (Section 12.1)
2. Probability and odds (Section 12.2)
3. Measures of association in 22 tables
(Section 12.2)
31. 2 2 frequency tables
In Chapters 10-11, we learned how to describe
relationships among continuous variables
Now we begin to examine relationships between
categorical variables.
More specifically, well consider relationships
between variables that are binary.
4What is a binary variable?
A binary variable is a measurement that has
only two possible outcomes.
These are also known as dichotomous variables.
Examples
- treatment in a two-armed experiment
- (e.g. aspirin or placebo)
- whether a subject has a trait or condition
- (e.g. cancer or no cancer)
- survival after a specified period of time
- (alive or dead)
5Frequency table for a binary variable
Suppose we take a sample of n subjects and
record a binary variable for each subject.
For example Ask 16 students, Are you
registered to vote?
Y Y N Y Y N Y Y N N Y Y Y N N Y
A frequency table (or contingency table) records
the number of subjects in each category
6Proportions and percentages
Once you have the frequency table, you can
compute the proportions and percentages in each
category by dividing the frequencies by the
sample size.
Registered
Proportion 10/16 0.625
Percentage 0.625 100 62.5
Unregistered
Proportion 6/16 0.375
Percentage 0.375 100 37.5
7Two binary variables
Suppose that you now have two binary variables
for each subject.
The 22 frequency table (also known as 22
contingency table) records the number of subjects
in each of the four possible categories.
8Rows and columns
When creating a 22 table, its customary to make
the
- rows correspond to the explanatory variable
- columns correspond to the response variable
Wrong
Right
9Margins
We often add an extra row and column to hold
the row and column totals. These are called the
margins.
11,037
11,034
293
21,778
22,071
or 293 21,778
(Grand total or sample size n)
10Row proportions
To uncover the relationship between the
explanatory (row) variable and response (column)
variable, compute the row proportions and
percentages
- choose one of the first two columns
- divide by the third column
Aspirin
104 / 11,037 .0094
Placebo
189 / 11,034 .0171
11Percentages and rates
When the row proportions are small, it is
customary to express them as percentages, rates
per 1,000, per 10,000, per 100,000, etc.
- proportion 1,000 rate per 1,000
- proportion 10,000 rate per 10,000
0.94
9.4
1.71
17.1
122. Probability and odds
Probability is a number between 0 and 1 that
indicates how likely it is that an event will
occur
unlikely
likely
probability 0 means that the event will never
occur
probability 1 means that the event will always
occur
probability 0.5 means that the event is just as
likely to occur as not
Values close to zero indicate that the event is
unlikely values close to one indicate that it is
likely.
13Odds
Another measure of how likely an event is to
occur is odds.
Odds ranges from 0 to 8.
unlikely
likely
odds 0 means that the event will never occur
odds 8 means that the event will always occur
odds 1 (often written as 11, which is the
same as 1/1) means that the event is just as
likely to occur as not
odds 2 (often written as 21, which is the
same as 2/1) means that the event is twice as
likely to occur as not
14Odds as ratios
Gamblers sometimes express odds as a ratio ab
where b is something other than 1.
For example, they may say the odds are 32.
Note that odds of 32 are the same as
3/2 1.5
So if you ever see odds expressed as ab, you
should divide a by b to re-express the odds as a
number between 0 and 8.
15Odds and probability
Odds and probability are not the same!
16Converting probability to odds
Given a probability, you can find the odds by the
formula
Examples
Prob .5
corresponds to odds .5 /.5 1
Prob .7
corresponds to odds .7 / .3 2.33
Prob .9
corresponds to odds .9 / .1 9
Prob .99
corresponds to odds .99 / .01 99
17Converting odds to probability
Given a probability, you can find the odds by the
formula
Examples
odds .5
corresponds to prob .5/1.5 0.333
odds 3
corresponds to prob 3/4 0.75
odds 10
corresponds to prob 10/11 0.909
odds 25
corresponds to prob 25/26 0.962
18Rare events
For rare events (probabilities close to zero),
odds and probabilities are nearly the same.
Examples
Prob .001
corresponds to odds .001001
Prob .01
corresponds to odds .0101
Prob .02
corresponds to odds .0204
Prob .03
corresponds to odds .0309
When discussing rare events, the distinction
between odds and probability is often unimportant.
19Estimating probabilities from frequency tables
The sample proportion
is an estimate of the probability that a subject
chosen at random from the population has the
trait.
Example
Are you registered to vote?
The proportion registered is 10/16 .625
The proportion not registered is 6/16 .375
20Estimating odds from frequency tables
The sample odds
is an estimate of the odds that a subject chosen
at random from the population has the trait.
Example
The estimated odds of being registered is 10/6
1.67
The estimated odds of not being registered is
6/10 0.6
213. Measures of association in 22 tables
Recall that with two continuous variables, a
useful measure of association is the correlation
coefficient.
For two binary variables, the most common
measures of association are
The relative risk is a ratio of proportions.
The odds ratio is a ratio of odds.
22Estimating the relative risk
- Compute the proportions for each row
- Divide one proportion by the other
Example
The estimated relative risk is .0094 / .0171
0.55
23Computing the odds ratio
- Compute the odds for each row
- Divide one odds by the other
Example
The estimated odds ratio is .0095 / .0174 0.55
24Easier way to estimate the odds ratio
If the frequencies in the 22 table are
then the estimated odds ratio is (ad) / (bd).
Example