Title: TwoWay Table
1Two-Way Table
2To study the relationships between two
categorical variables, we use two-way tables.
Data from the U.S. Census Bureau for the year
2000 on the level of education reached by
Americans of different ages. page 134
3The number of observations falling into each
combination of categories is entered into each
cell of the table.
for the age group 3554, 23160,000 people
completed college.
417,230
totaling the values in each row of the table
gives the marginal distribution of the row
variable (whose values are on the left column)
Education
totaling the values in each column of the table
gives the marginal distribution of the column
variable (whose values are on the left column)
Age group
5The distribution of education level is a marginal
distribution
6Marginal distributions
7Page 137 Exercise 6.3
8Relationship in percents
- Relationships between categorical variables are
described by calculating appropriate percents
from the counts given in the table - prevents misleading comparisons due to unequal
sizes for different groups - What percent of people aged 25 to 34 have
completed 4 years of college? For other age
groups? - Page 138, Exercise 6.5
9Conditional Distributions
If we look at people who are in the age group
25-34, and ask about their education level, we get
This answers under the condition in age group
25-34, how education level is distributed? This
is the conditional distribution of education
level given the age group being 25-34.
10How common is college education (4 or more years)?
44828/17523025.58 But if we divide into
different age group we get 29.3 for the age
group 25-34 28.4 for 35-54 18.9 for 55 and
over.
11Conditional Distributions
Comparing conditional distributions using bar
graphs
12Conditional Distributions
- If the conditional distributions of the second
variable are nearly the same for each category of
the first variable, then we say that there is not
an association between the two variables - If there are significant differences in the
conditional distributions for each category, then
we say that there is an association between the
two variables.
13Conditional Distributions
- Now lets find the conditional distribution of
age group given the education level. - If education level is not HS grad, how is the
age group distributed? - If education level is HS grad, how is the age
group distributed? - If the education level is some college, how is
the age group distributed? - If the education level is college grad, how is
the age group distributed? - Present your result in bar graph. Are the two
categorical variables associated?
14Using Excel Pivot Table
- Open Ch6 ex06-19.xls
- Select a cell in the range of the data Select
Data ?PivotTable and PivotChart Report
?PivotChart Report ?Data Range A2B73?Next - Choose Layout ?Drag Relapse to ROW, Placebo
to COLUMN ?Drag Placebo to DATA ?Finish
(Relapse and Placebo are the first row in the
data table representing respectively the values
of Treatment results and drugs used)
15Using Excel Pivot Table
- For conditional distribution of treatment results
(row variable) , given the type of drug used
select any cell in the generated table ?right
click ? Field Settings ? Options ? Show data as
of row ? Ok - For conditional distribution of drugs (column
variable) used, given the treatment result
select any cell in the generated table ?right
click ? Field Settings ? Options ? Show data as
of column ? Ok
16Using Excel Pivot Table
- Converting to bar graphs
- Select the marginal distribution table ?Choose
Edit on Menubar ? Copy - Select an empty cell away from the table ?Choose
Edit on Menubar ? Paste Special ? Under Paste,
select Values - Choose the new table but not including the Grand
Totals or Count of placebo or placebo ?
insert Chart ? Column ?Next ?Finish
17Simpsons Paradox
- When studying the relationship between two
variables, there may exist a lurking variable
that creates a reversal in the direction of the
relationship when the lurking variable is ignored
as opposed to the direction of the relationship
when the lurking variable is considered. - The lurking variable creates subgroups, and
failure to take these subgroups into
consideration can lead to misleading conclusions
regarding the association between the two
variables.
18Discrimination?(Simpsons Paradox)
- Consider the acceptance rates for the following
group of men and women who applied to college.
A higher percentage of men were accepted
Discrimination?
19Discrimination?(Simpsons Paradox)
- Lurking variable Applications were split
between the Business School (240) and the Art
School (320).
BUSINESS SCHOOL
A higher percentage of women were accepted in
Business
20Discrimination?(Simpsons Paradox)
- Lurking variable Applications were split
between the Business School (240) and the Art
School (320).
ART SCHOOL
A higher percentage of women were also accepted
in Art
21Discrimination?(Simpsons Paradox)
- So within each school a higher percentage of
women were accepted than men.There is not any
discrimination against women!!! - This is an example of Simpsons Paradox. When
the lurking variable (School applied to Business
or Art) is ignored the data seem to suggest
discrimination against women. However, when the
School is considered the association is reversed
and suggests discrimination against men. - Read Example 6.5 page 141