Introduction to Categorical Data Analysis July 22, 2004 - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Categorical Data Analysis July 22, 2004

Description:

Title: Disordered Eating, Menstrual Irregularity, and Bone Mineral Density in Young Female Runners Author: John Last modified by: kristinc Created Date – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 49
Provided by: John4423
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Categorical Data Analysis July 22, 2004


1
Introduction to Categorical
DataAnalysisJuly 22, 2004
2
Categorical data
  • The t-test, ANOVA, and linear regression all
    assumed outcome variables that were continuous
    (normally distributed).
  • Even their non-parametric equivalents assumed at
    least many levels of the outcome (discrete
    quantitative or ordinal).
  • We havent discussed the case where the outcome
    variable is categorical.

3
Types of Variables a taxonomy
Categorical
Quantitative
continuous
discrete
ordinal
nominal
binary
2 categories more categories
order matters numerical
uninterrupted
4
Overview of statistical tests
  • Independent variablepredictor
  • Dependent variableoutcome
  • e.g., BMD pounds age amenorrheic (1/0)

5
(No Transcript)
6
(No Transcript)
7
Difference in proportions
  • Example You poll 50 people from random
    districts in Florida as they exit the polls on
    election day 2004. You also poll 50 people from
    random districts in Massachusetts. 49 of
    pollees in Florida say that they voted for Kerry,
    and 53 of pollees in Massachusetts say they
    voted for Kerry. Is there enough evidence to
    reject the null hypothesis that the states voted
    for Kerry in equal proportions?

8
Null distribution of a difference in proportions
9
Null distribution of a difference in proportions
10
Answer to Example
  • We saw a difference of 4 between Florida and
    Massachusetts
  • Null distribution predicts chance variation
    between the two states of 10.
  • P(our data/null distribution)P(Zgt.04/.10.4)gt.05
  • Not enough evidence to reject the null.

11
Chi-square testfor comparing proportions (of a
categorical variable) between groups
I. Chi-Square Test of Independence When both
your predictor and outcome variables are
categorical, they may be cross-classified in a
contingency table and compared using a chi-square
test of independence.   A contingency table
with R rows and C columns is an R x C contingency
table.
12
Example
  • Asch, S.E. (1955). Opinions and social pressure.
    Scientific American, 193, 31-35.

13
The Experiment
  • A Subject volunteers to participate in a visual
    perception study.
  • Everyone else in the room is actually a
    conspirator in the study (unbeknownst to the
    Subject).
  • The experimenter reveals a pair of cards

14
The Task Cards
Standard line
Comparison lines A, B, and C
15
The Experiment
  • Everyone goes around the room and says which
    comparison line (A, B, or C) is correct the true
    Subject always answers last after hearing all
    the others answers.
  • The first few times, the 7 conspirators give
    the correct answer.
  • Then, they start purposely giving the (obviously)
    wrong answer.
  • 75 of Subjects tested went along with the
    groups consensus at least once.

16
Further Results
  • In a further experiment, group size (number of
    conspirators) was altered from 2-10.
  • Does the group size alter the proportion of
    subjects who conform?

17
The Chi-Square test

 
 
 
Apparently, conformity less likely when less or
more group members
 
18
  • 20 50 75 60 30 235 conformed
  • out of 500 experiments.
  • Overall likelihood of conforming 235/500 .47

19
Expected frequencies if no association between
group size and conformity

 
 
 
 
20

 
  • Do observed and expected differ more than
    expected due to chance?

 
 
 
21
Chi-Square test
Rule of thumb if the chi-square statistic is
much greater than its degrees of freedom,
indicates statistical significance. Here 85gtgt4.
22
The Chi-Square distributionis sum of squared
normal deviates
The expected value and variance of a
chi-square E(x)df   Var(x)2(df)
23
Chi-Square test
Rule of thumb if the chi-square statistic is
much greater than its degrees of freedom,
indicates statistical significance. Here 85gtgt4.
24
Caveat
  • When the sample size is very small in any cell
    (lt5), Fischers exact test is used as an
    alternative to the chi-square test.

25
Example of Fishers Exact Test
26
Fishers Tea-tasting experiment
Claim Fishers colleague (call her Cathy)
claimed that, when drinking tea, she could
distinguish whether milk or tea was added to the
cup first. To test her claim, Fisher designed
an experiment in which she tasted 8 cups of tea
(4 cups had milk poured first, 4 had tea poured
first). Null hypothesis Cathys guessing
abilities are no better than chance. Alternatives
hypotheses Right-tail She guesses right more
than expected by chance. Left-tail She guesses
wrong more than expected by chance
27
Fishers Tea-tasting experiment
Experimental Results
28
Fishers Exact Test
Step 1 Identify tables that are as extreme or
more extreme than what actually happened Here
she identified 3 out of 4 of the
milk-poured-first teas correctly. Is that good
luck or real talent? The only way she could have
done better is if she identified 4 of 4 correct.
29
Fishers Exact Test
Step 2 Calculate the probability of the tables
(assuming fixed marginals)
30
Step 3 to get the left tail and right-tail
p-values, consider the probability mass
function Probability mass function of X, where
X the number of correct identifications of the
cups with milk-poured-first
31
SAS code and outputfor generating Fishers Exact
statistics for 2x2 table
32
data tea input MilkFirst GuessedMilk
Freq datalines 1 1 3 1 0 1 0 1 1 0 0
3 run data tea Fix quirky reversal of SAS 2x2
tables set tea MilkFirst1-MilkFirst Guessed
Milk1-GuessedMilkrun proc freq
datatea tables MilkFirstGuessedMilk
/exact weight freqrun
33
SAS output
Statistics for Table of
MilkFirst by GuessedMilk
Statistic DF Value
Prob Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’
Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’
Chi-Square 1 2.0000
0.1573 Likelihood Ratio
Chi-Square 1 2.0930 0.1480
Continuity Adj. Chi-Square 1
0.5000 0.4795
Mantel-Haenszel Chi-Square 1 1.7500
0.1859 Phi Coefficient
0.5000
Contingency Coefficient 0.4472
Cramer's V
0.5000 WARNING 100
of the cells have expected counts less
than 5. Chi-Square may not be
a valid test.
Fisher's Exact Test
Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’Æ’
Cell (1,1) Frequency (F)
3 Left-sided
Pr lt F 0.9857
Right-sided Pr gt F 0.2429
Table Probability (P)
0.2286 Two-sided
Pr lt P 0.4857
Sample Size 8
34
Introduction to the 2x2 Table
35
Introduction to the 2x2 Table
36
Cohort Studies
Disease
Disease-free
Target population
Disease
Disease-free
TIME
37
The Risk Ratio, or Relative Risk (RR)
38
Hypothetical Data

39
Case-Control Studies
  • Sample on disease status and ask retrospectively
    about exposures (for rare diseases)
  • Marginal probabilities of exposure for cases and
    controls are valid.
  • Doesnt require knowledge of the absolute risks
    of disease
  • For rare diseases, can approximate relative risk

40
Case-Control Studies
Exposed in past
  • Disease
  • (Cases)

Not exposed
Target population
Exposed
No Disease (Controls)
Not Exposed
41
The Odds Ratio (OR)
42
The Odds Ratio
43
Properties of the OR (simulation)
44
Properties of the lnOR
Standard deviation
45
Hypothetical Data
30
30
46
Example Cell phones and brain tumors
(cross-sectional data)
47
Same data, but use Chi-square testor Fischers
exact
48
Same data, but use Odds Ratio
Write a Comment
User Comments (0)
About PowerShow.com