Categorical Data Analysis Week 1 April 18 - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Categorical Data Analysis Week 1 April 18

Description:

Better to use when proportion is not close to 0 or 1. COMPARING PROPORTIONS. Relative risk: ... COMPARING PROPORTIONS. Odds Ratio. odds1 = 1/(1- 1) odds2 = 2/(1 ... – PowerPoint PPT presentation

Number of Views:470
Avg rating:3.0/5.0
Slides: 36
Provided by: dingc
Category:

less

Transcript and Presenter's Notes

Title: Categorical Data Analysis Week 1 April 18


1
Categorical Data AnalysisWeek 1 April 18 April
20
  • Dingcai Cao
  • d-cao_at_uchicago.edu

2
Categorical Data Analysis
  • Textbook Introduction to Categorical Data
    Analysis by Alan Agresti
  • Recommended reading Categorical Data Analysis
    Using the SAS System, by Maura E. Stokes, Charles
    S. Davis Gary G. Koch
  • Office hours By appointment or after class

3
Scales of Measurement
  • Nominal ? identity
  • Ordinal ? identity ? magnitude
  • Interval ? identity ? magnitude ? equal
    distance
  • Ratio ? identity ? magnitude ? equal
    distance ? absolute/true zero

4
Categorical data analysis strategies
  • Hypothesis testing Is there an association?
  • Chi-square test, Fishers exact test, etc
  • Chapters 1, 2, 3
  • Modeling What is the nature of the association?
  • Logistic regression, log linear modeling
  • Chapters 4, 5, 6

5
Two-way contingency Tables
  • Contingency table A table with cells containing
    frequency counts of combinations of different
    levels of two or more categorical variables.
  • Two-way table A contingency table that cross
    classifies two variables.

6
The 2x2 Table
  • Research question Is one sex more likely than
    the other to believe in an afterlife?
  • Statistical question Is belief in an afterlife
    independent of gender?

7
Probabilities for contingency tables
Joint probability ?ijP(X i,Y j) The
probability that (X,Y) falls in the cell in row i
and column j. Sample joint probability pijnij/n
n 435147375134 1091
8
Probabilities for contingency tables
Marginal probability ?I or ?j row or columun
totals of the joint probabilities. Sample joint
probability pI or pj
n 435147375134 1091
9
Probabilities for contingency tables
Marginal probability ?I or ?j row or columun
totals of the joint probabilities. Sample joint
probability pI or pj
n 435147375134 1091
10
Probabilities for contingency tables
Conditional probability Probability of Y at each
level of X, or probability of X at each level of
Y.
P(GenderFemalesYYes) 435/(435375) 0.54
column probability
P(YYes GenderFemales) 435/(435147) 0.75
row probability
11
Playing with SAS
DATA BELIEF INPUT GENDER BELIEF
COUNT DATALINES FEMALE YES 435 FEMALE NO 147 MA
LE YES 375 MALE NO 134 PROC FREQ DATA
BELIEF WEIGHT COUNT TABLES GENDERBELIEF RUN
12
COMPARING PROPORTIONS
Difference of proportions?1- ?2 Sample
difference of proportion p1-p2 Standard error
Confidence interval Better to use when
proportion is not close to 0 or 1
13
COMPARING PROPORTIONS
Relative risk?1/?2 Sample difference of
proportion p1/p2 Standard error too
complicated to talk about Confidence
interval too complicated. Rely on SAS to do the
computation. Better to use when proportion
is near 0 or 1
14
COMPARING PROPORTIONS
Odds Ratio odds1 ?1/(1-?1) odds2
?2/(1-?2) odds ratio ?odds1/odds2
?1/(1-?1)/?2/(1-?2) sample odds ratio
Confidence interval
15
Playing with SAS
DATA BELIEF INPUT GENDER BELIEF
COUNT DATALINES FEMALE YES 435 FEMALE NO 147 MA
LE YES 375 MALE NO 134 PROC FREQ DATA
BELIEF WEIGHT COUNT TABLES GENDERBELIEF/RISKD
IFF MEASURES RUN
16
Playing with SAS
17
Playing with SAS
18
Playing with SAS
Relative risk
19
TESTS OF INDEPENDENCE
Independence Two variables are said to be
statistically independent if the conditional
distributions of Y are identical at each level of
X. Equivalently, statistical independence is that
all joint probabilities equal the product of
their marginal probabilities, ?ij?i?j for i
1, 2, ,I and j 1,2, J
  • Test of independence
  • H0 ?ij?i?j for i 1, 2, ,I and j 1,2, J
  • H1 ?ij??i?j for i 1, 2, ,I and j 1,2, J
  • Pearson Chi-square test
  • Likelihood-ratio test

20
TESTS OF INDEPENDENCEPearson Chi-Square test
Y
Level 2
X
Level 1
Level 1 Level 2
n11 n21
n12 n22
n
21
TESTS OF INDEPENDENCELikelihood ratio test
Y
Level 2
X
Level 1
Level 1 Level 2
n11 n21
n12 n22
n
22
Playing with SAS
DATA BELIEF INPUT GENDER BELIEF
COUNT DATALINES FEMALE YES 435 FEMALE NO 147 MA
LE YES 375 MALE NO 134 PROC FREQ DATA
BELIEF WEIGHT COUNT TABLES GENDERBELIEF/CHISQ
NOCOL NOROW NOPCT RUN
23
Playing with SAS
24
TESTS OF INDEPENDENCEFishers exact test
What if the conditions are not satisfied?
Fishers Exact test
25
TESTS OF INDEPENDENCEFishers exact test
Fishers exact test rely on hypergeometric
distribution. For a 2x2 Table with odds ratio of
1 (independence null hypothesis)
DATA TEA INPUT POUR GUESS
COUNT DATALINES MILK MILK 3 MILK TEA 1 TEA
MILK 1 TEA TEA 3 PROC FREQ DATA TEA WEIGHT
COUNT TABLES POURGUESS/CHISQ NOCOL NOROW
NOPCT RUN
26
Playing with SAS
Fisher Tea Taster Data DATA TEA INPUT POUR
GUESS COUNT DATALINES MILK MILK 3 MILK TEA
1 TEA MILK 1 TEA TEA 3 PROC FREQ DATA
TEA WEIGHT COUNT TABLES POURGUESS/CHISQ
NOCOL NOROW NOPCT RUN
27
Playing with SAS
28
TESTING INDEPENDENCE FOR ORDINAL DATA
The X2 and G2 tests treat both classification as
nominal. What if the rows or the columns are
ordinal?
LINEAR TREND OR CORRELATION TEST
The idea is to calculate the Pearson correlation,
r, based on the scores assigned to row and
column categories.A statistic for testing the
null hypothesis of independence against the
two-sided alternative hypothesis of nonzero true
correlation is given by M2(n-1)r2M2 has
approximately a chi-squared distribution with df
1.
29
TESTING INDEPENDENCE FOR ORDINAL DATA
LINEAR TREND OR CORRELATION TESTChoice of
scores midranksThe average rank of all subjects
in a category.
For variable X Midrank of Level 1
(1n1)/2 Midrank of Level 2 (1n1 n2
)/2 Midrank of Level I (1nI-1 nI )/2
For variable Y Midrank of Level 1
(1n1)/2 Midrank of Level 2 (1n1 n2
)/2 Midrank of Level J (1nJ-1 nJ )/2
30
Playing with SAS
DATA INFANTS INPUT MALFORM ALCOHOL COUNT
_at__at_ DATALINES 1 0 17066 1 0.5 14464 1 1.5 788 1
4.0 126 1 7.0 37 2 0 48 2 0.5 38 2 1.5 5 2 4.0 1
2 7.0 1 PROC FREQ DATA INFANTS TITLE
"LINEAR TREND TEST EQ. 2.5.1" WEIGHT
COUNT TABLES MALFORMALCOHOL/CHISQ CMH1
ALL PROC FREQ DATA INFANTS TITLE "LINEAR
TREND TEST MIDRANK SCORE" WEIGHT
COUNT TABLES MALFORMALCOHOL/CMH1 SCORE
RIDIT
31
Playing with SAS
32
Playing with SAS
33
Playing with SAS
34
Playing with SAS
35
Playing with SAS
Write a Comment
User Comments (0)
About PowerShow.com