Categorical Data - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Categorical Data

Description:

Recall Rule of Thumb: Ex: weight, height, income, age Recall Rule of Thumb: Ex: gender, race, occupation, political affiliation, country of origin ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 46
Provided by: EddM1
Category:

less

Transcript and Presenter's Notes

Title: Categorical Data


1
Categorical Data
2
Lesson Objective
  • Understand basic rules of probability.
  • Calculate marginal and conditional
    probabilities.
  • Determine if two categorical variables are
    independent.

3
Recall Rule of Thumb
C
Quantitative variables averages or
differences have meaning.
  • Ex weight, height, income, age

4
Recall Rule of Thumb
C
Categorical variables classify people or
things.
  • Ex gender, race, occupation, political
    affiliation, country of origin

5
Note Sometimes quantitative variables are
expressed as categorical.
Income (Family Economic Income)
Class Definition 1. Less than
30,000 2. 30,000 but less than 100,000
3. 100,000 or more.
6
Relationships between variables
7
Relationship between two quantitative
variables?
Is relationship linear (scatterplot)?
  • J Use Correlation Least Squares
    Regression.
  • L Data transformations.

8
Recall Boxplots
  • Best graphical tool for examining the
    relationship between a quantitative variable and
    a categorical variable,(i.e., comparing
    distributions).

Example Weight vs. Country of Origin
Boxplot can be used to answer
Do the distributions of weights vary for
different countries of origin?
9
Relationship between two categorical
variables?
Use two-way frequency tables
  • Look at marginal probabilities and
    conditional probabilities.

10
STATISTICS
  • is the science of
  • transforming data
  • into informationto make decisionsin the face of
    uncertainty.

11
Probability
How do we measure "uncertainty"?
  • A numerical measure of the likelihood that an
    outcome or an event occurs.
  • P(A) probability of event A

12
  • Three Methods for Assessing Probability
  • Classical
  • Relative Frequency
  • Subjective

13
Probability requirements fordiscrete variables
_
_
  • 1. 0 lt P(A) lt 1

2. Sum of the probabilities of all possible
outcomes must equal 1. (Binomial, Poisson)
14
Conditional probability The chance one event
happens,given that another event will occur.
P(A B)
15
  • Problem Credit Card Manager
  • New credit test to determine credit worthiness.
  • Credit test checked against500 previous
    customers.

16
Credit Test A
Credit History
Failed (F)
Passed (P)
Good (G)
400
350
50
Default (D)
20
80
100
370
130
500
17
What is the probability of a customer defaulting?
P(Defaults)
What is the probability of a customer defaulting
given that he fails test A?
P(Defaults given failed test A)
18
General Rules
  • P(A and B) P(A) P(BA)
  • P(B) P(AB)
  • P(A or B) P(A) P(B) - P(A and B)

19
P(Fails AND Defaults)
P(F) P(DF)
20
P(Fails OR Defaults)
P(F) P(D) - P(D AND F)
Note The overlap group would be counted twice
if no subtraction.
21
Does knowledge of test A resulthelp you make a
better decision?

P
(
D
)
Do you want to know the test A results before
you give the loan?
Credit test A results and defaulting are
____________ on each other.
22
A Newer Credit Test.Is it even better?
A different sample of 500 credit records
23
Credit Test B
Credit History
Failed (F)
Passed (P)
Good (G)
400
340
60
Default (D)
85
15
100
425
75
500
24
What is the probability of a customer defaulting?
P(Defaults)
What is the probability of a customer defaulting
given that he fails test B?
P(Defaults given failed test B)
25
Does knowledge of test B resulthelp you make a
better decision?

P
(
D
)
Test B tells me .
Credit test B results and defaulting are
of each other.
26
Independence
27
Two events are independent if the occurrence, or
non-occurrence, of one does not affect the
chances of the other occurring, or not occurring.
  • Otherwise, we say the events are dependent.

28
If A and B independent, then
  • P(A and B) P(A) P(B)
  • P(A or B) P(A) P(B) - P(A) P(B)
  • P(AB) P(A)
  • P(BA) P(B)

Note The condition does NOT changethe
probability.
29
Survey of randomly selectedpeople voters in Jan.
2001
Q1 Did you vote in the 2000 election? Q2 Do
you favor an amendment to require a
balanced budget? Q3 To which political party
do you belong ?
30
Do you favor amendmentfor a balancedbudget?
Political Party
Yes No Total
Republican Democrat OtherTotal
90 44 48
172148 80
82 104 32
182 218 400
31
Sample size
Marginal totals for Party.
Marginal totalsfor opinion.
32
What proportionfavor the amend.?
What proportionclaim to be Rep?
What proportionfavor the amend.and are Other?
33
What proportionfavor the amend,given those that
claim to be Rep?
Considering onlythose opposed, what
proportionare not Republican?
Of those that claim to be Democrat,what
proportionfavor the amend.
34
Conditional Distribution
Restrict subjects to only those that meet a
condition. Within this restricted group, what is
the distribution of some other var.?
Distribution of opinion given those that claim
to be Republican
P( Yes Rep. ) .523P( No Rep. )
.477
35
Is there a relationship betweenthe party and
the opinion on the amendment?
  • What would you expect to happen if no
    relationship existed?

36
Three Conditional Distributions
Marginal Distribution P( Yes ) .455,
P( No ) .545
Is there a relationship?Why? or Why not?
37
If there is NO relationship(i.e.,
independence)between the party andthe opinion,
then
the three conditional probabilitiesshould be
the close to each other and close to the
marginal probability.
38
Three Conditional Probabilities
P( Yes Rep.) .523
Are these close toeach other?
P( Yes Demo) .297
P( Yes Other) .600
Marginal Probability P( Yes ) .455
AND close to the marginal?
Not close therefore, party and the opinion
are ____________.
39
Create with Pivot Tablesin Excel.
40
Barchart- Clustered
Yes
Rep.
Demo.
Other
Frequency
41
Barchart- Stacked
Rep.
Yes
Demo.
Other
Frequency
42
Barchart- Percents
Yes
Rep.
Demo.
Other
Percent
43
SummaryFor two categorical variables
  • Must use conditional probabilities to
    determine if a relationship exists.
  • Cannot use correlation.
  • Visual display Stacked percentage
    bar charts

44
Associations between TWO Variables
numerical graphical
Variables
Quant. vs. Quant
LS regression line, r, r-sq, std error
Scatterplot,residual plots
Quant. vs. Cat.
X-bar and sfor each category
Side-by-side box plots
Two-way table, conditional marginal
distributions
Bar chart stacked, percent.
Cat. vs. Cat.
45
The End
Write a Comment
User Comments (0)
About PowerShow.com