Displaying and Describing - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Displaying and Describing

Description:

Chapter 3 Displaying and Describing Categorical Data . – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 51
Provided by: Addi108
Category:

less

Transcript and Presenter's Notes

Title: Displaying and Describing


1
Chapter 3
  • Displaying and Describing
  • Categorical Data

2
The Three Rules of Data Analysis
  • The three rules of data analysis wont be
    difficult to remember
  • Make a picturethings may be revealed that are
    not obvious in the raw data. These will be things
    to think about.
  • Make a pictureimportant features of and patterns
    in the data will show up. You may also see things
    that you did not expect.
  • Make a picturethe best way to tell others about
    your data is with a well-chosen picture.

3
Case Study - Titanic
  • At 1140 on the night of April 14, 1912,
    Frederick Fleets cry of Iceberg, right ahead
    signal the beginning of a nightmare that has
    become legend.
  • By 215 am the Titanic thought by many to be
    unsinkable, had sunk, leaving more than 1,500
    passengers and crew members on board to meet
    their icy fate.

4
Data passengers and crew aboard the Titanic
Survival Age Sex Class
Dead Adult Male Third
Dead Adult Male Crew
Dead Adult Male Second
Dead Adult Male Crew
Dead Adult Male Crew
Dead Adult Female Second
Alive Adult Female First
Dead Child Male First
5
Titanic (cont.)
  • Each case (row) of the data table represents a
    person on board the ship.
  • The variables are

6
Titanic Variables (cont.)
  • Survival
  • Age
  • Sex
  • Ticket Class

7
Where from here?
  • We will study categorical variables.
  • Interesting when we look at how categorical
    variables work together.
  • Example of questions
  • - What percent of people were in first class?
    What in second class?
  • - Was the percent of survivors higher in first
    class than in second class?

8
What are the Ws?
  • Who
  • What
  • When
  • Where
  • hoW
  • Why

9
Frequency Tables Making Piles
  • We can pile the data by counting the number of
    data values in each category of interest.
  • We can organize these counts into a frequency
    table, which records the totals and the category
    names

10
Frequency Tables Making Piles (cont.)
  • A relative frequency table is similar, but gives
    the percentages (instead of counts) for each
    category.

11
Answer to first question on slide 3-7
  • What percent of people were in first class? What
    in second class?
  • There were people in first class.
  • There were people in second class.
  • There were people on board.
  • So, there were
  • people in first class and

12
Whats Wrong With This Picture?
  • The length of the ship is the count in each
    class.
  • When we look at each ship, we see the area taken
    up by the ship, instead of the length of the
    ship.
  • The ship display makes it look like most of the
    people on the Titanic were crew members, with a
    few passengers along for the ride.

13
The Area Principle
  • The ship display violates the area principle
  • The area occupied by the graph should correspond
    to

14
Bar Charts
  • A bar chart displays the distribution of a
    categorical variable, showing the counts for each
    category next to each other for easy comparison.
  • A bar chart stays true to the area principle.
  • Thus a better display for the ship data is

15
Bar Charts (cont.)
  • A relative frequency bar chart displays the
    relative
  • A relative frequency bar chart also stays true to
    the area principle.
  • Replacing counts with percentages in the ship
    data

16
Pie Charts
  • When you are interested in parts of the whole, a
    pie chart might be your display of choice.
  • Pie charts show the whole group of cases as a
    circle.
  • They slice the circle into pieces whose size is
    proportional to the fraction of the whole in
    each category.

17
Contingency Tables
  • A contingency table allows us to look at two
    categorical variables together.
  • It shows how individuals are distributed along
    each variable, contingent on the value of the
    other variable.
  • Example we can examine the class of ticket and
    whether a person survived the Titanic

18
Contingency Tables (cont.)
  • The margins of the table, both on the right and
    on the bottom, give totals and the frequency
    distributions for each of the variables.
  • Each frequency distribution is called a marginal
    distribution of its respective variable.
  • Ex. The marginal distribution of Survival is

19
Contingency Tables (cont.)
  • Each cell of the table gives the count for a
    combination of values of the two values.

20
Contingency Table (cont.)
  • How many crew members died when the Titanic sunk?
  • What percentage of crew members died when the
    Titanic sunk?
  • How many first class passengers survived?

21
Conditional Distributions
  • A conditional distribution shows the distribution
    of one variable for just the individuals who
    satisfy some condition on another variable.
  • The conditional distribution of ticket Class,
    conditional on having survived

22
Conditional Distributions (cont.)
  • The following is the conditional distribution of
    ticket Class, conditional on having perished

23
Conditional Distributions (cont.)
  • The conditional distributions tell us that there
    is a difference in class for those who survived
    and those who perished.
  • This can be shown with pie charts of the two
    distributions

24
Conditional Distributions (cont.)
  • Better yet use side bar charts

25
Conditional Distributions (cont.)
  • The variables would be considered independent
    when the distribution of one variable in a
    contingency table is the same for all categories
    of the other variable.
  • We see that the distribution of Class for the
    survivors is different from that of the
    non-survivors.
  • This leads us to believe that Class and Survival
    are associated, that they are not independent.

26
Segmented Bar Charts
  • A segmented bar chart displays the same
    information as a pie chart, but in the form of
    bars instead of circles.
  • Here is the segmented bar chart for ticket Class
    by Survival status

27
Recall Conditional distributions
28
Answer to second question on slide 3-7
  • Was the percent of survivors higher in first
    class than in second class?
  • The Who here is restricted
  • There were survivors in first class.
  • There were survivors in second class.
  • There were a total of survivors in
    the ship.
  • So, the percent of survivors in first class is
  • Thus, the percent of survivors in first class
    were

29
Recall Contingency table
30
Change question
  • What percent of first and second class passengers
    survived?
  • There were survivors in first class.
  • There were first class passengers.
  • So,
    in first class survived.
  • There were survivors in second class.
  • There were second class passengers.
  • So, of
    passengers in second class survived.

31
What Can Go Wrong?
  • Dont violate
  • While some people might like the pie chart on the
    left better, it is harder to compare fractions of
    the whole, which a well-done pie chart does.

32
What Can Go Wrong? (cont.)
  • This plot of the percentage of high-school
    students who engage in specified dangerous
    behaviors has a problem. Can you see it?

33
What Can Go Wrong? (cont.)
  • Dont confuse similar sounding percentages. Pay
    particular attention to the wording of the
    context.
  • Dont forget to look at the variables separately
    too. Examine the marginal distributions, since it
    is important to know how many cases are in each
    category.
  • Be sure to use enough individuals! Do not make a
    report like
  • We found that 66.67 of the rats improved their
    performance with training. The other rat died.

34
What Can Go Wrong? (cont.)
  • Dont overstate your case - dont claim something
    you cant.

35
What have we learned?
  • We can summarize categorical data by counting the
    number of cases in each category (expressing
    these as counts or percents).
  • We can display the distribution in a bar chart or
    pie chart.
  • And, we can examine two-way tables called
    contingency tables, examining marginal and/or
    conditional distributions of the variables.
  • If conditional distributions of one variable are
    the same for every category of the other, the
    variables are independent.

36
Exercise 3.25 - Seniors
  • Prior to graduation, a high school class was
    surveyed about its plans. The following table
    displays the results for white and minority
    students (included African American, Asian,
    Hispanic and Native American)

37
Exercise 3.25 (cont.)
  • What percent of the graduates are white?
  • What percent of the graduates are planning to
    attend a 2-year college?
  • What percent of the graduates are white and
    planning to attend a 2-year college?
  • What percent of the white graduates are planning
    to attend a 2-year college?
  • What percent of the graduates planning to attend
    a 2-year college are white?

38
Answers Exercise 3.25
  • First, get a table with marginal totals

39
Answer Exercise 3.25 (cont.)
  • What percent of the graduates are white?

40
Answer Exercise 3.25 (cont.)
  • What percent of the graduates are planning to
    attend a 2-year college?

41
Answer Exercise 3.25 (cont.)
  • What percent of the graduates are white and
    planning to attend a 2-year college?

42
Answer Exercise 3.25 (cont.)
  • What percent of the white graduates are planning
    to attend a 2-year college?

43
Answer Exercise 3.25 (cont.)
  • What percent of the graduates planning to attend
    a 2-year college are white?

44
Exercise 3.27 Seniors (cont.)
  • Find the conditional distributions (percentages)
    of plans for the white students.
  • Find the conditional distributions (percentages)
    of plans for the minority students.
  • Create a graph comparing the plans of white and
    minority students
  • Do you see any important differences in the
    post-graduation plans of white and minority
    students? Write a brief summary of what these
    data show, including comparisons of conditional
    distributions

45
Answer Exercise 3.27 Seniors
  • Find the conditional distributions (percentages)
    of plans for the white students.

46
Answer Exercise 3.27 Seniors (cont.)
  • Find the conditional distributions (percentages)
    of plans for the minority students.

47
Answer Exercise 3.27 Seniors (cont.)
  • Create a graph comparing the plans of white and
    minority students.

48
Answer Exercise 3.27 Seniors (cont.)
  • Segmented bar chart

49
  • Alternatively side bar charts

50
Answer Exercise 3.27 Seniors (cont.)
  • Do you see any important differences in the
    post-graduation plans of white and minority
    students? Write a brief summary of what these
    data show, including comparisons of conditional
    distributions
  • -
  • - Caution should be used with the percentages for
    Minority graduates, because the total is so small
    each graduate is almost 2.
  • -
Write a Comment
User Comments (0)
About PowerShow.com