Title: Displaying and Describing
1Chapter 3
- Displaying and Describing
- Categorical Data
2The Three Rules of Data Analysis
- The three rules of data analysis wont be
difficult to remember - Make a picturethings may be revealed that are
not obvious in the raw data. These will be things
to think about. - Make a pictureimportant features of and patterns
in the data will show up. You may also see things
that you did not expect. - Make a picturethe best way to tell others about
your data is with a well-chosen picture.
3Case Study - Titanic
- At 1140 on the night of April 14, 1912,
Frederick Fleets cry of Iceberg, right ahead
signal the beginning of a nightmare that has
become legend. - By 215 am the Titanic thought by many to be
unsinkable, had sunk, leaving more than 1,500
passengers and crew members on board to meet
their icy fate.
4Data passengers and crew aboard the Titanic
Survival Age Sex Class
Dead Adult Male Third
Dead Adult Male Crew
Dead Adult Male Second
Dead Adult Male Crew
Dead Adult Male Crew
Dead Adult Female Second
Alive Adult Female First
Dead Child Male First
5Titanic (cont.)
- Each case (row) of the data table represents a
person on board the ship. - The variables are
-
6Titanic Variables (cont.)
- Survival
- Age
- Sex
- Ticket Class
7Where from here?
- We will study categorical variables.
- Interesting when we look at how categorical
variables work together. - Example of questions
- - What percent of people were in first class?
What in second class? - - Was the percent of survivors higher in first
class than in second class?
8What are the Ws?
- Who
- What
- When
- Where
- hoW
- Why
9Frequency Tables Making Piles
- We can pile the data by counting the number of
data values in each category of interest. - We can organize these counts into a frequency
table, which records the totals and the category
names
10Frequency Tables Making Piles (cont.)
- A relative frequency table is similar, but gives
the percentages (instead of counts) for each
category.
11Answer to first question on slide 3-7
- What percent of people were in first class? What
in second class? - There were people in first class.
- There were people in second class.
- There were people on board.
- So, there were
- people in first class and
-
12Whats Wrong With This Picture?
- The length of the ship is the count in each
class. - When we look at each ship, we see the area taken
up by the ship, instead of the length of the
ship. - The ship display makes it look like most of the
people on the Titanic were crew members, with a
few passengers along for the ride.
13The Area Principle
- The ship display violates the area principle
- The area occupied by the graph should correspond
to
14Bar Charts
- A bar chart displays the distribution of a
categorical variable, showing the counts for each
category next to each other for easy comparison. - A bar chart stays true to the area principle.
- Thus a better display for the ship data is
15Bar Charts (cont.)
- A relative frequency bar chart displays the
relative - A relative frequency bar chart also stays true to
the area principle. - Replacing counts with percentages in the ship
data
16Pie Charts
- When you are interested in parts of the whole, a
pie chart might be your display of choice. - Pie charts show the whole group of cases as a
circle. - They slice the circle into pieces whose size is
proportional to the fraction of the whole in
each category.
17Contingency Tables
- A contingency table allows us to look at two
categorical variables together. - It shows how individuals are distributed along
each variable, contingent on the value of the
other variable. - Example we can examine the class of ticket and
whether a person survived the Titanic
18Contingency Tables (cont.)
- The margins of the table, both on the right and
on the bottom, give totals and the frequency
distributions for each of the variables. - Each frequency distribution is called a marginal
distribution of its respective variable. - Ex. The marginal distribution of Survival is
19Contingency Tables (cont.)
- Each cell of the table gives the count for a
combination of values of the two values.
20Contingency Table (cont.)
- How many crew members died when the Titanic sunk?
-
- What percentage of crew members died when the
Titanic sunk? -
- How many first class passengers survived?
-
21Conditional Distributions
- A conditional distribution shows the distribution
of one variable for just the individuals who
satisfy some condition on another variable. - The conditional distribution of ticket Class,
conditional on having survived
22Conditional Distributions (cont.)
- The following is the conditional distribution of
ticket Class, conditional on having perished
23Conditional Distributions (cont.)
- The conditional distributions tell us that there
is a difference in class for those who survived
and those who perished. - This can be shown with pie charts of the two
distributions
24Conditional Distributions (cont.)
- Better yet use side bar charts
25Conditional Distributions (cont.)
- The variables would be considered independent
when the distribution of one variable in a
contingency table is the same for all categories
of the other variable. - We see that the distribution of Class for the
survivors is different from that of the
non-survivors. - This leads us to believe that Class and Survival
are associated, that they are not independent.
26Segmented Bar Charts
- A segmented bar chart displays the same
information as a pie chart, but in the form of
bars instead of circles. - Here is the segmented bar chart for ticket Class
by Survival status
27Recall Conditional distributions
28Answer to second question on slide 3-7
- Was the percent of survivors higher in first
class than in second class? - The Who here is restricted
- There were survivors in first class.
- There were survivors in second class.
- There were a total of survivors in
the ship. - So, the percent of survivors in first class is
-
-
- Thus, the percent of survivors in first class
were
29Recall Contingency table
30Change question
- What percent of first and second class passengers
survived? - There were survivors in first class.
- There were first class passengers.
- So,
in first class survived. - There were survivors in second class.
- There were second class passengers.
- So, of
passengers in second class survived.
31What Can Go Wrong?
- Dont violate
- While some people might like the pie chart on the
left better, it is harder to compare fractions of
the whole, which a well-done pie chart does.
32What Can Go Wrong? (cont.)
- This plot of the percentage of high-school
students who engage in specified dangerous
behaviors has a problem. Can you see it?
33What Can Go Wrong? (cont.)
- Dont confuse similar sounding percentages. Pay
particular attention to the wording of the
context. - Dont forget to look at the variables separately
too. Examine the marginal distributions, since it
is important to know how many cases are in each
category. - Be sure to use enough individuals! Do not make a
report like - We found that 66.67 of the rats improved their
performance with training. The other rat died.
34What Can Go Wrong? (cont.)
- Dont overstate your case - dont claim something
you cant.
35What have we learned?
- We can summarize categorical data by counting the
number of cases in each category (expressing
these as counts or percents). - We can display the distribution in a bar chart or
pie chart. - And, we can examine two-way tables called
contingency tables, examining marginal and/or
conditional distributions of the variables. - If conditional distributions of one variable are
the same for every category of the other, the
variables are independent.
36Exercise 3.25 - Seniors
- Prior to graduation, a high school class was
surveyed about its plans. The following table
displays the results for white and minority
students (included African American, Asian,
Hispanic and Native American)
37Exercise 3.25 (cont.)
- What percent of the graduates are white?
- What percent of the graduates are planning to
attend a 2-year college? - What percent of the graduates are white and
planning to attend a 2-year college? - What percent of the white graduates are planning
to attend a 2-year college? - What percent of the graduates planning to attend
a 2-year college are white?
38Answers Exercise 3.25
- First, get a table with marginal totals
39Answer Exercise 3.25 (cont.)
- What percent of the graduates are white?
-
40Answer Exercise 3.25 (cont.)
- What percent of the graduates are planning to
attend a 2-year college? -
41Answer Exercise 3.25 (cont.)
- What percent of the graduates are white and
planning to attend a 2-year college? -
42Answer Exercise 3.25 (cont.)
- What percent of the white graduates are planning
to attend a 2-year college? -
43Answer Exercise 3.25 (cont.)
- What percent of the graduates planning to attend
a 2-year college are white? -
44Exercise 3.27 Seniors (cont.)
- Find the conditional distributions (percentages)
of plans for the white students. - Find the conditional distributions (percentages)
of plans for the minority students. - Create a graph comparing the plans of white and
minority students - Do you see any important differences in the
post-graduation plans of white and minority
students? Write a brief summary of what these
data show, including comparisons of conditional
distributions
45Answer Exercise 3.27 Seniors
- Find the conditional distributions (percentages)
of plans for the white students.
46Answer Exercise 3.27 Seniors (cont.)
- Find the conditional distributions (percentages)
of plans for the minority students.
47Answer Exercise 3.27 Seniors (cont.)
- Create a graph comparing the plans of white and
minority students.
48Answer Exercise 3.27 Seniors (cont.)
49- Alternatively side bar charts
50Answer Exercise 3.27 Seniors (cont.)
- Do you see any important differences in the
post-graduation plans of white and minority
students? Write a brief summary of what these
data show, including comparisons of conditional
distributions - -
- - Caution should be used with the percentages for
Minority graduates, because the total is so small
each graduate is almost 2. - -
-