Chapter 3: Displaying and Describing Categorical Data - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Chapter 3: Displaying and Describing Categorical Data

Description:

... decide between the upcoming Rolling Stones and the Black Eyed Peas ... D) What percent of those who preferred tickets to the Rolling Stones were students? ... – PowerPoint PPT presentation

Number of Views:590
Avg rating:3.0/5.0
Slides: 33
Provided by: Addison6
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3: Displaying and Describing Categorical Data


1
Chapter 3 Displaying and DescribingCategorical
Data
  • The three rules of data analysis wont be
    difficult to remember
  • Make a picturethings may be revealed that are
    not obvious in the raw data. These will be things
    to think about.
  • Make a pictureimportant features of and patterns
    in the data will show up. You may also see things
    that you did not expect.
  • Make a picturethe best way to tell others about
    your data is with a well-chosen picture.

2
Frequency Tables Making Piles
  • We can pile the data by counting the number of
    data values in each category of interest.
  • We can organize these counts into a frequency
    table, which records the totals and the category
    names.
  • People on Titanic by Ticket Class

3
Frequency Tables Making Piles (cont.)
  • A relative frequency table is similar, but gives
    the percentages (instead of counts) for each
    category.

Relative Frequency of People on Titanic by Ticket
Class
4
Displaying DataWhats Wrong With This Picture?
  • You might think that
  • a good way to show
  • the Titanic data is
  • with this display
  • There are 2 things wrong

5
The Area Principle
  • The ship display makes it look like most of the
    people on the Titanic were crew members, with a
    few passengers along for the ride.
  • When we look at each ship, we see the area taken
    up by the ship, instead of the length of the
    ship.
  • The ship display violates the area principle
  • The area occupied by a part of the graph should
    correspond to the magnitude of the value it
    represents.

6
More on Displaying Data (not in Text)
  • Your table or graphical display should always
    have a title or caption, so that casual readers
    can understand what is being presented at a
    glance
  • An informative display often leads people to read
    your paper!
  • If you are writing a paper, it is imperative that
    you attribute the source of your data.
  • Keep the display easy-to-read
  • Use simple fonts, colors/patterns
  • Make sure that your reader distinguish between
    different categories
  • Common problem- graphs that look good on your
    color monitor may be hard to read when printed
    with a B/W printer

7
Bar Charts
  • A bar chart displays the distribution of a
    categorical variable, showing the counts for each
    category next to each other for easy comparison.
  • A bar chart stays true

    to the area principle.
  • Thus, it is a better
  • display for this data
  • Dont forget a title (or at
  • least a caption!)

People on Titanic by Ticket Class
8
Bar Charts (cont.)
  • A relative frequency bar chart displays the
    relative proportion of counts for each category.
  • A relative frequency bar chart also stays true to
    the area principle.
  • Replacing counts

    with percentages

    in the ship data
  • Dont forget a title/caption

Percentage of Titanic Passengers in each Ticket
Class
9
Pie Charts
  • When you are interested in parts of the whole, a
    pie chart might be your display of choice.
  • Pie charts show the whole
    group of cases as
    a circle.
  • They slice the circle into
    pieces whose size
    is
    proportional to the
    fraction
    of the whole
    in each
    category.
  • Dont forget a title /caption

Number of Titanic Passengers by Class
10
Contingency Tables
  • A contingency table allows us to look at two
    categorical variables together.
  • It shows how individuals are distributed along
    each variable, contingent on the value of the
    other variable.
  • Example of a contingency table of ticket class
    and survival.

Survival and Class of Titanic Passengers
11
Contingency Tables (cont)
  • Each cell of the contingency table gives the
    count for a combination of values of the two
    values.
  • For example, the second cell in the crew column
    tells us that 673 crew members died when the
    Titanic sunk.

12
Contingency Tables
  • The margins of the table, both on the right and
    on the bottom, give totals and the frequency
    distributions for each of the variables.
  • Each frequency distribution is called a marginal
    distribution of its respective variable.
  • The marginal distribution of Survival is
  • (Can also phrase as What aboard the Titanic
    Survived?)

13
Conditional Distributions
  • A conditional distribution shows the distribution
    of one variable for just the individuals who
    satisfy some condition on another variable.
  • The following is the conditional distribution of
    ticket Class, conditional on having survived

For example 28.6 of those who survived were
from First Class How were these ages calculated?

What might be another way to phrase this?
14
Conditional Distributions (cont.)
  • The following is the conditional distribution of
    ticket Class. It is conditional on having
    perished

How were these ages calculated?
What might be another way to phrase this?
15
Conditional Distributions (cont.)
  • The conditional distributions tell us that there
    was a difference in class for those who survived
    and those who perished.
  • Rather than a
  • table of numbers,
  • this is better
    shown with

    pie charts of

    the two
    distributions

Titanic Survivors and Non-survivors, by Class
What would these pie charts look like if class
had no influence on survival?

16
Conditional Distributions (cont.)
  • We see that the distribution of Class for the
    survivors is different from that of the
    non-survivors.
  • This leads us to believe that Class and Survival
    are associated, and are not independent.
  • The variables would be considered independent
    when the distribution of one variable in a
    contingency table is the same for all categories
    of the other variable.

17
Segmented Bar Charts
Titanic Survivors and Non-survivors, by Class
  • A segmented bar chart displays the same
    information as a pie chart, but in the form of
    bars instead of circles.
  • Here is the segmented bar chart for ticket Class
    by Survival status

18
What Can Go Wrong?
  • Dont violate the area principle.
  • While some people might like the pie chart on the
    left better, it is harder to compare fractions of
    the whole, which a well-done pie chart does.

19
What Can Go Wrong? (cont.)
  • Keep it honestmake sure your display shows what
    it says it shows.
  • This plot of the percentage of high-school
    students who engage in specified dangerous
    behaviors has a problem. Can you see it?

20
What Can Go Wrong? (cont.)
  • Dont confuse similar-sounding percentagespay
    particular attention to the wording of the
    context.
  • Dont forget to look at the variables separately
    tooexamine the marginal distributions, since it
    is important to know how many cases are in each
    category.
  • Be sure to use enough individuals!
  • Do not make a report like We found that 66.67 of
    the rats improved their performance with
    training. The other rat died.

21
What Can Go Wrong? (cont.)
  • Dont overstate your casedont claim something
    you cant.
  • Dont use unfair or silly averagesthis could
    lead to Simpsons Paradox, so be careful when you
    average one variable across different levels of a
    second variable.

22
Simpsons Paradox Example
  • Chris has a 3.20 SFSU GPA
  • Sean has a 3.34 SFSU GPA
  • Who seems to do better in SFSU classes, Chris or
    Sean?
  • Who do you think is likely to get a better grade
    in DS412 (Operations Management)
  • What else might be useful to know?

23
Simpsons Paradox, Continued
  • Is comparing Chriss and Seans GPAs a fair
    assessment of their abilities to get good grades
    in particular classes?
  • What data is shared in common?
  • Who do you think is likely to get a better grade
    in DS412 (Operations Management) now?

24
What have we learned?
  • We can summarize categorical data by counting the
    number of cases in each category (expressing
    these as counts or percents).
  • We can display the distribution in a bar chart or
    pie chart.
  • And, we can examine two-way tables called
    contingency tables, examining marginal and/or
    conditional distributions of the variables.

25
Additional Examples
  • Pr9 A May 2001 Gallup Poll found that many
    Americans believe in supernatural phenomena.
    The poll was based on telephone responses from
    1012 randomly selected adults.
  • Is it reasonable to conclude 66 of those polled
    believe in either Ghosts or Astrology?
  • Can you tell what of people did not believe in
    any of these phenomena? Explain.
  • What is an appropriate graph?

26
Chart for Example 9
27
Additional Examples
  • Pr 22 A survey of autos in the student lot and
    staff lot at SFSU classified cars by country of
    origin
  • What of cars surveyed where foreign?
  • What of the American cars were student owned?
  • What of students own American cars?
  • What is the marginal distributor of origin?
  • What is the conditional distribution of origin by
    owner type?
  • Do you think that car origin is independent of
    owner type? Use a graph to explain your argument

28
Example 22 continued
  • First create
  • Contingency Table Origin of Students' and
    Staffs' Cars
  • Answers
  • a) (45102)/359 41
  • b) We have 212 American cars of which 107 were
    student driven, 50.5
  • c) We have 195 students, of whom 107 drove
    American cars, 55
  • (note that b and c look similar but are not the
    same question,
  • nor do they have the same answer!)
  • Marginal Dist of care origin
  • American 212/359 59, Europe 13, Asian
    28

29
Example 22 continued
  • f) Now find Conditional Distributions for each
    driver type
  • Does Car Origin Seem independent of type of
    driver?
  • What is an even better way to show this?

30
Example 22- Extension
  • Pick an appropriate graph to show the difference
    between the two distributions

31
Example Old Test Problem
  • The Bookstore will run a contest where the prize
    is a pair of concert tickets and must decide
    between the upcoming Rolling Stones and the Black
    Eyed Peas concerts. They want to figure out
    which will generate more excitement, so they paid
    a student to run a survey of students and staff
    (profs, administrators, and other non-students)
    on preferences. The student produced the
    following table
  • A) What percent of people surveyed preferred
    tickets to the Rolling Stones?
  • B) What percent of people surveyed were Staff?
  • C) What percent of the students preferred tickets
    to the Rolling Stones?
  • D) What percent of those who preferred tickets to
    the Rolling Stones were students?
  • E) What percent of respondents were students and
    preferred tickets to the Rolling Stones?
  • F) Does it appear that preference for concert
    tickets is independent of a respondents being a
    staff or a student? Show why using calculations
    or graphics


32
Example Old Test Problem
  • First, it is probably useful to fill out the
    contingency table with the marginal
    distributions
  • A) What percent of people surveyed preferred
    tickets to the Rolling Stones?
  • B) What percent of people surveyed were Staff?
  • C) What percent of the students preferred tickets
    to the Rolling Stones?
  • D) What percent of those who preferred tickets to
    the Rolling Stones were students?
  • E) What percent of respondents were students and
    preferred tickets to the Rolling Stones?
  • F) Does it appear that preference for concert
    tickets is independent of a respondents being a
    staff or a student? Show why using calculations
    or graphics

Write a Comment
User Comments (0)
About PowerShow.com