Stats 330: Lecture 27 - PowerPoint PPT Presentation

About This Presentation
Title:

Stats 330: Lecture 27

Description:

In today's lecture we apply Poisson regression to the analysis of one and two ... Contingency tables arise when we classify a number of individuals into ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 57
Provided by: T10877
Category:
Tags: aboutus | lecture | stats

less

Transcript and Presenter's Notes

Title: Stats 330: Lecture 27


1
Stats 330 Lecture 27
Contingency tables
2
Plan of the day
  • In todays lecture we apply Poisson regression to
    the analysis of one and two-dimensional
    contingency tables.
  • Topics
  • Contingency tables
  • Sampling models
  • Equivalence of Poisson and multinomial
  • Correspondence between interactions and
    independence

3
Contingency tables
  • Contingency tables arise when we classify a
    number of individuals into categories using one
    or more criteria.
  • Result is a cross-tabulation or contingency table
  • 1 criterion gives a 1-dimensional table
  • 2 criteria give a 2 dimensional table
  • and so on.

4
Example one dimension
Income distribution of New Zealanders 15 2006
census
5
Example 2 2 dimensions
New Zealanders 15 by annual income and sex
2006 census
6
Censuses and samples
  • Sometimes tables come from a census
  • Sometimes the table comes from a random sample
    from a population
  • In this case we often want to draw inferences
    about the population on the basis of the sample
    e.g.is income independent of sex?

7
Example death by falling
  • The following is a random sample of 16,976
    persons who met their deaths in fatal falls,
    selected from a larger population of deaths from
    this cause. The falls classified by month are

8
The question
  • Question if we regard this as a random sample
    from several years, are the months in which death
    occurs equally likely?
  • Or is one month more likely than the others?
  • To answer this, we use the idea of maximum
    likelihood. First, we must detour into sampling
    theory in order to work out the likelihood

9
Sampling models
  • There are two common models used for contingency
    tables
  • The multinomial sampling model assumes that a
    fixed number of individuals from a random sample
    are classified with fixed probabilities of being
    assigned to the different cells.
  • The Poisson sampling model assumes that the table
    entries have independent Poisson distributions

10
Multinomial sampling
11
Multinomial model
  • Suppose a table has M cells
  • n individuals classified independently
  • (A.k.a. sampling with replacement)
  • Each individual has probability pi of being in
    cell i
  • Every individual is classified into exactly 1
    cell, so p1 p2 . . . pM 1
  • Let Yi be the number in the ith cell, so
  • Y1 Y2 . . . YM n

12
Multinomial model (cont)
  • Y1, . . . ,Ym have a multinomial distribution

This is the maximal model, as in logistic
regression, making no assumptions about the
probabilities.
13
Log-likelihood
14
MLEs
  • If there are no restrictions on the ps (except
    that they add to one) the log-likelihood is
    maximised when pi yi/n
  • These are the MLEs for the maximal model
  • Substitute these into the log-likelihood to get
    the maximum value of the maximal log-likelihood.
    Call this Log Lmax

15
Deviance
  • Suppose we have some model for the probabilities
    this will specify the form of the probabilities,
    perhaps as functions of other parameters. In our
    example, the model says that each probability is
    1/12.
  • Let log Lmod be the maximum value of the
    log-likelihood, when the probabilities are given
    by the model.
  • As for logistic regression, we define the
    deviance as D 2log Lmax - 2 log Lmod

16
DevianceTesting adequacy of the model
  • If n is large (all cells more than 5) and M is
    small, and if the model is true, the deviance
    will have (approximately) a chi-square
    distribution with M-k-1 df, where k is the number
    of unknown parameters in the model.
  • Thus, we accept the model if the deviance p-value
    is more than 0.05.
  • This is almost the same as the situation in
    logistic regression with strongly grouped data.
  • These tests are an alternative form of the
    chi-square tests used in stage 2.

17
Example death by falling
  • Are deaths equally likely in all months? In
    statistical terms, is pi1/12, i1,2,,12?
  • Log Lmax is, up to a constant,
  • Calculated in R by
  • gtylt-c(1688,1407,1370,1309,1341,1388,
  • 1406,1446,1322,1363, 1410,1526)
  • gt sum(ylog(y/sum(y)))
  • 1 -42143.23

18
Example death by falling
  • Our model is pi1/12, i1,2,,12. (all months
    equally likely). This completely specifies the
    probabilities, so k0, M-k-111. Log Lmod is, up
    to a constant,

gt sum(ylog(1/12)) 1 -42183.78 now calculate
deviance gt Dlt-2sum(ylog(y/sum(y)))-2sum(ylog(1
/12)) gt D 1 81.09515 gt 1-pchisq(D,11) 1
9.05831e-13 Model is not plausible!
19
gt Monthslt-c("Jan","Feb","Mar","Apr","May","Jun", "
Jul","Aug","Sep","Oct","Nov","Dec") gt
barplot(y/(sum(y),names.argMonths)
20
Poisson model
  • Assume that each cell is Poisson with a different
    mean this is just a Poisson regression model,
    with a single explanatory variable month
  • This is the maximal model
  • The null model is the model with all means equal
  • Null model deviance will give us a test that all
    months have the same mean

21
R-code for Poisson regression
  • gt monthslt-c("Jan","Feb","Mar","Apr","May",
  • "Jun","Jul","Aug","Sep","Oct","Nov","Dec")
  • gt months.faclt-factor(months,levelsmonths)
  • gt falls.glmlt-glm(ymonths.fac,familypoisson)
  • gt summary(falls.glm)

22
Coefficients Estimate Std. Error z
value Pr(gtz) (Intercept) 7.43130
0.02434 305.317 lt 2e-16 months.facFeb
-0.18208 0.03610 -5.044 4.56e-07
months.facMar -0.20873 0.03636 -5.740
9.46e-09 months.facApr -0.25428 0.03683
-6.904 5.04e-12 months.facMay -0.23013
0.03658 -6.291 3.15e-10 months.facJun
-0.19568 0.03623 -5.401 6.64e-08
months.facJul -0.18280 0.03611 -5.063
4.13e-07 months.facAug -0.15474 0.03583
-4.318 1.57e-05 months.facSep -0.24440
0.03673 -6.655 2.84e-11 months.facOct
-0.21386 0.03642 -5.873 4.29e-09
months.facNov -0.17995 0.03608 -4.988
6.10e-07 months.facDec -0.10089 0.03532
-2.856 0.00429 Null deviance 8.1095e01
on 11 degrees of freedom Residual deviance
-7.5051e-14 on 0 degrees of freedom gt D 1
81.09515
Same as before! Why????
23
General principle
  • To every Poisson model for the cell counts, there
    corresponds a multinomial model, obtained by
    conditioning on the table total.
  • Suppose the Poisson means are m1, mM .... The
    cell probabilities in the multinomial sampling
    model are related to the means in the Poisson
    sampling model by the relationship
  • pi mi/(m1 mM)
  • We can estimate the parameters in the multinomial
    model by fitting the Poisson model and using the
    relationship above.
  • We can test hypotheses about the multinomial
    model by testing the equivalent hypothesis in the
    Poisson regression model.

24
Example death by falling
  • Poisson means are

25
Relationship between Poisson means and
multinomial probs
26
Relationship between Poisson parameters and
multinomial probs alternative form
27
Testing all months equal
  • Clearly, all months are equal if and only if all
    the deltas are zero.
  • Thus, testing for equal months in the multinomial
    model is the same as testing for deltas all zero
    in the Poisson model.
  • This is done using the null model deviance, or,
    equivalently, the anova table.
  • Recall The null deviance was 8.1095e01 on 11
    degrees of freedom, with a p-value of
    9.05831e-13, so months not equally likely

28
2x2 tables
  • Relationship between Snoring and Nightmares
  • Data from a
  • random sample,
  • classified according
  • to these 2 factors

29
Parametrising the 2x2 table of Poisson means
Main effect of Nightmare
Main effect of Snoring
Snoring/nightmare interaction
30
Parameterising the 2x2 table of probabilities
Marginal distn of Snorer
Marginal distn of Nightmare
31
Relationship between multinomial probabilities
and Poisson parameters
32
Independence
  • Events A and B are independent if the
    conditional probability that A occurs, given B
    occurs, is the same as the unconditional
    probability of A occuring. i.e. P(AB)P(A).
  • Since P(AB)P(A and B)/P(B), this is equivalent
    to P(A and B) P(A) P(B)

33
Independence in a 2 x2 table
  • Independence of nightmares and snoring means
  • P(Snoring and frequent nightmares)
  • P(Snoring) x P(frequent nightmares)
  • (plus similar equations for the other 3 cells)
  • Or, p1 (p1 p2) (p1 p3)
  • In fact, this equation implies the other 3 for a
    2 x 2 table

34
Independence in a 2x2 table (cont)
  • Three equivalent conditions for independence in
    the 2 x 2 table
  • p1 (p1 p2) (p1 p3)
  • p1 p4 p2 p3
  • g 0 (ie zero interaction in Poisson model)
  • Thus, we can test independence in the
    multinomial model by testing for zero interaction
    in the Poisson regression.

35
Math stuff
36
More math stuff
37
Odds ratio
  • Prob of not being a snorer for frequent
    nightmares population is p1/(p1 p3)
  • Prob of being a snorer for frequent nightmares
    population is p3/(p1 p3)
  • Odds of not being a snorer for frequent
    nightmares population is
  • (p1/(p1 p3)) /(p3/(p1 p3)) p1/p3
  • Similarly, the odds of not being a snorer for
    occasional nightmares population is p2/p4

38
Odds ratio (2)
  • Odds ratio (OR) is the ratio of these 2 odds.
    Thus
  • OR (p1/p3) / (p2/p4) (p1p4)/(p2p3)
  • OR exp(g), log(OR) g
  • OR 1 (i.e. log (OR) 0 ) if and only if snoring
    and nightmares are independent
  • Acts like a correlation coefficient between
    factors, with 1 corresponding to no relationship
  • Interchanging rows (or columns) changes OR to
    1/OR

39
Odds ratio (3)
  • Get a CI for the OR by
  • Getting a CI for g
  • Exponentiating
  • CI for g is estimate /- 1.96 standard error
  • Get estimate, standard error from Poisson
    regression summary

40
Example snoring
gt ylt-c(11,82,12,74) gt nightmareslt-factor(rep(c("F"
,"O"),2)) gt snorelt-factor(rep(c("N","Y"),c(2,2)))
gt snoring.glmlt-glm(ynightmaressnore,familypoiss
on) gt anova(snoring.glm, test"Chisq") Analysis
of Deviance Table Model poisson, link
log Response y Df Deviance Resid. Df Resid. Dev
P(gtChi) NULL
3 111.304 nightmares 1
110.850 2 0.454 lt2e-16 snore
1 0.274 1 0.180
0.6008 nightmaressnore 1 0.180 0
0.000 0.6713
No evidence of association between nightmares and
snoring.
41
Odds ratio
  • Coefficients
  • Estimate Std. Error z value
    Pr(gtz)
  • (Intercept) 2.39790 0.30151 7.953
    1.82e-15
  • nightmaresO 2.00882 0.32110 6.256
    3.95e-10
  • snoreY 0.08701 0.41742 0.208
    0.835
  • nightmaresOsnoreY -0.18967 0.44716 -0.424
    0.671
  • Estimate of g is -0.18967, standard error is
    0.44716, so CI for g is -0.18967 /- 1.96
    0.44716, or
  • (-1.066 , 0.6868,)
  • CI for OR exp(g) is (exp(-1.066), exp(0.6868) )
  • (0.3443, 1.987)

42
The general I x J table
  • Example In the 1996 general Social Survey, the
    National center for Opinion Research collected
    data on the following variables from 2726
    respondents
  • Education Less than high school, High school,
    Bachelors or graduate
  • Religious belief Fundamentalist, moderate or
    liberal.
  • Cross-classification is

43
Education and Religious belief
44
Testing independence in the general 2-dimensional
table
  • We have a two-dimensional table, with the rows
    corresponding to Education, having I3 levels,
    and the columns corresponding to Religious
    Belief, having J3 levels.
  • Under the Poisson sampling model, let mij be the
    mean count for the i,j cell
  • Split log mij into overall level, main effects,
    interactions as usual

45
Testing independence in the general 2-dimensional
table (ii)
  • Under the corresponding multinomial sampling
    model, let pij be the probability that a randomly
    chosen individual has level i of Education and
    level j of Religious belief, and so is classified
    into the i,j cell of the table.
  • Then, as before

46
Testing independence (iii)
  • Let pi pi1 piJ be the marginal
    probabilities the probability that Educationi.
  • Let pj be the same thing for Religious belief
  • The factors are independent if
  • pij pi pj
  • This is equivalent to (ab)ij 0 for all i,j.

47
Testing independence (iv)
  • Now we can state the principle
  • We can test independence in the multinomial
    sampling model by fitting a Poisson regression
    with interacting factors, and testing for zero
    interaction.

48
Doing it in R
counts c(178, 570, 138, 138, 648, 252, 108,442,
252) example.df data.frame(y counts,
expand.grid(education c("LessThanHighSchool",
"HighSchool", "Bachelor"), religion c("Fund",
"Mod", "Lib"))) levels(example.dfeducation)
c("LessThanHighSchool", "HighSchool",
"Bachelor") levels(example.dfreligion)
c("Fund", "Mod", "Lib") example.glm
glm(yeducationreligion, familypoisson,
dataexample.df) anova(example.glm, test"Chisq")
49
The data
gt example.df y education religion 1
178 LessThanHighSchool Fund 2 570
HighSchool Fund 3 138 Bachelor
Fund 4 138 LessThanHighSchool Mod 5 648
HighSchool Mod 6 252 Bachelor
Mod 7 108 LessThanHighSchool Lib 8 442
HighSchool Lib 9 252 Bachelor
Lib
50
The analysis
Degrees of freedom
gt example.glm glm(yeducationreligion,
familypoisson, dataexample.df) gt
anova(example.glm, test"Chisq") Analysis of
Deviance Table Model poisson, link
log Response y Terms added sequentially (first
to last) Df Deviance Resid.
Df Resid. Dev P(gtChi) NULL
8 1009.20
education 2 908.18 6
101.02 6.179e-198 religion 2 31.20
4 69.81 1.675e-07 educationreligio
n 4 69.81 0 -2.047e-13 2.488e-14
Chisq
P-value
Small p-value is evidence against independence
51
Odds ratios
  • In a general I x J table, the relationship
    between the factors is described by a set of
    (I-1) x (J-1) odds ratios, defined by

52
Interpretation
  • Consider sub-table of all individuals having
    levels 1 and i of education, and levels 1 and j
    of religious belief

Column j
Row i
53
Interpretation (cont)
  • Odds of being Education 1 at level 1 of Religious
    belief are p11/pi1
  • Odds of being Education 1 at level j of Religious
    belief are p1j/pij
  • Odds ratio is (p11/pi1)/(p1j/pij)
    (pijp11)/(pi1p1j)

54
Odds Ratio facts
  • Main facts
  • A and B are independent iff all the odds ratios
    equal to 1
  • Log (ORij) (ab)ij
  • Estimate ORij by exponentiating the
    corresponding interaction in the Poisson summary
  • Get CI by exponentiating CI for interaction

55
Numerical example odds ratios
Coefficients
Estimate Std. Error educationHighSchoolreligionMo
d 0.38278 0.12713 educationBachelorreligionMo
d 0.85671 0.15517 educationHighSchoolreligi
onLib 0.24533 0.13746 educationBachelorreligi
onLib 1.10183 0.16153
56
Confidence interval
  • Estimate of log(OR Bach/Lib) is 1.10183, standard
    error is 0.16153, so CI is 1.10183 /- 1.96
    0.16153, or (0.7852312 1.4184288)
  • CI for OR is exp(0.7852312 ,1.4184288)
    2.192914, 4.130625
  • Odds for having a bachelors degree versus no
    high school are about 3 times higher for Liberals
    than fundamentalists
Write a Comment
User Comments (0)
About PowerShow.com