Title: Displaying and Describing Categorical Data
1Chapter 3
- Displaying and Describing Categorical Data
2Before we begin
- Finish Chapter 2
- ActivStats
- DataDesk
- Copy DD to computer
3MMs
- Open your bag of MMs.
- DO NOT EAT THEM YET!!
- Construct a question that needs a categorical
variable. - Think about your Who when you do this
- Construct a question that uses a quantitative
variable. - Gather that data from your bag.
- Share with class.
4More MMs
- Class-
- Choose a color
- Count the number of each color in your bag.
- Count the total number of MMs in your bag
- Eat your MMs.
5Representing Data
- How do you organize things?
- One typical method- Make piles!
More like this.
Not like this, though!
6Make a picture!
- Without some way to organize our data and
summarize it, it is hard to see what is going on.
- Making a picture is one very important way to
help us - Helps us to think about patterns, relationships
- Shows important features
- Helps us to tell others about our data.
7Organizing MMs
- In the class activity, you collected information
on the colors of the MMs in your bag. - If you wanted to convey that information to
someone else, how would you? - Ideas?
8Options Frequency Table
- Frequency Table A data table showing category
names and pile counts for each category. - Note We are counting the number of cases falling
into each category of a categorical variable. - Category labels are the What
- Individuals counted are the Who
- For your MM packet?
- Categorical variable Color (what was measured?)
- Individuals measured each MM in the packet.
9Frequency Table MMs
Do we need to list black, white, chartreuse?
10Relative Frequency
- Counts are useful, but what if Jenny has 25
candies, and Jimmy has only 18? Is a straight
comparison of of reds the best method? - Relative Frequencies, or looking at the pile
percentages is sometimes a better option. - Percentage, Proportion, relative frequency all
refer to the proportion of the whole that a
particular category makes up. -
- If asked for Proportion- give decimal. Percent-
give !
11Relative Frequency MMs
Both of these tables tell us the distribution of
a categorical variable, ie all the possible
categories and how frequently each occurs.
12Other ways to represent Cat. Vars
- Graphs!
- Bar Charts
- Give either a count for each category, or the .
- Important to read axis labels!
- Easy comparison
- Pie Charts
- Show whole group of cases as circle
- Each category shown as a slice of the pie, size
proportional to the fraction of the whole in each
category. - Note These graphs are only for Categorical
Variables! Wont work for Quantitative Vars.
13Bar and Pie Charts DD
- Usually youll do bar and pie charts by hand,
unless you have the original data. - Remember, frequency tables are summaries of
categorical data. - Must observe the Area Principle
- Equal amount of area for each increase in count.
- Lets look at some distributions of Cat. Vars. in
DataDesk.
14Contingency Tables
- What if we want to look at 2 Cat. Vars together?
- Use a Contingency Table
- Shows counts or of individuals falling into
categories on two (or more) variables. - Can reveal patterns in one variable that may
depend on the category of the other.
15Contingency Table Example
Cells in the table give the count of cases for
every combination of the 2 Cat. Vars
From 1990 Census Data Women age 18 by age and
current marital status (in thousands)
16Contingency Tables
How many women are included, total, in the
population?
99,585 thousand
How many women were Widowed AND aged 25-64?
2,425 thousand
How many women were Widowed?
11,080 thousand
68,709 thousand
How many women were aged 25-64?
17Marginal Distributions
Marginal Distribution of Marital Status
Marginal Distribution of Age
What if we are only interested in one of the
categorical variables?
Want the Marginal Distribution the frequency
distribution of only one var in a contingency
table
18Percents and Contingency Tables
- Often, well want to see percentages rather than
counts. - What number to divide each count by depends on
the question you need answered.
19Contingency Tables
If we want to know the percentage of the whole
that each combination of Categories represents,
simply divide by the size of the sample.
20Conditional distributions
- Often, well want to restrict our interest to
only one group (widows, perhaps), and want to
look at the breakdown of the second variable
(age) for that group. - This is known as a conditional distribution
- The distribution of one variable for only those
cases satisfying some condition on the other
variable.
21Conditional Distributions
- What is the distribution of Age for Widowed?
- Divide each count in the Widowed Row by the total
number of Widowed (the group of interest). - Ex What percent of Widowed women are 18-24?
- 19 / 11080 0.17 (Count meeting criteria /
in sample)
Look for clues in the wording to determine the
group of interest (widowed women) and the
criteria (aged 18-24). Typical What proportion
OF _(group)___ ARE _(criteria)___.
22Conditional Distributions
- What is the distribution of marital status for
- 18-24 yr olds?
- Divide each count in the 18-24 yr old column by
the total number of 18-24 yr olds (the
group of interest). - Ex What percent of 18-24 yr olds are married?
23Independence
- Does your marital status depend on your age?
- Common sense may tell you one thing, but dont
trust it- verify with numbers!! - So how do we do that?
- Look at the conditional distributions.
- If the distribution of one variable is the same
for all categories of another, then the variables
are independent (not associated) - So lets look at the conditional distributions of
marital status for each age group.
24Independence
Are the conditional distributions of age the same
for each marital status?
No- so we conclude that Age and Marital Status
ARE related, so they are NOT independent.
We could have also looked at the conditional
distribution of age for each marital status.
Patterns??
25Example 24
- What percent of grads joined the military?
- What percent of 1970 grads joined the military?
- Of the students who joined the military, what
graduated in 1970? - Is what people did after HS independent of the
year?