Categorical dependent variables - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Categorical dependent variables

Description:

Species goes extinct or not. Opinion is Strongly Opposed, Opposed, Neutral, Favorable, ... identity, log, sqrt, logit, probit, inverse, 1/mu^2 (inverse squared) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 17
Provided by: brucek64
Category:

less

Transcript and Presenter's Notes

Title: Categorical dependent variables


1
Categorical dependent variables
  • May 25 2006

2
Categorical dependent variables
  • Firm joins Energy Star or not
  • Parcel of land developed as urban, agriculture,
    or open space
  • Species goes extinct or not
  • Opinion is Strongly Opposed, Opposed, Neutral,
    Favorable, or Strongly Favorable
  • Residuals will clearly not be normal!
  • Relevant probability distributions are binomial
    and multinomial
  • Instead of thinking about means, we think about
    probabilities of a given result
  • Can still do hypothesis testing and fit linear
    models

3
Hypothesis testing
  • Types of hypotheses
  • Single sample compare to a predicted
    probability
  • Are ESM students a sample from a population that
    is 50 female?
  • Are UCSB students a random sample of the
    California population with regards to race?
  • Two or more samples are they samples from the
    same population?
  • Do two states have the same proportion of invaded
    species?
  • Do land parcels in Ventura and Santa Barbara
    Counties have the same distribution among
    developed, agricultural, and open space?
  • Types of tests
  • Exact tests
  • Gives precise probability of seeing data, given
    null hypothesis
  • Computer intensive
  • Chi-squared test
  • Oldest, easiest to compute by hand
  • Most commonly used
  • Called Pearson in JMP
  • G-test
  • Called Likelihood Ratio in JMP
  • Both chi-squared and G-test are approximations to
    exact test
  • Make different assumptions
  • Require expected number of at least 5 in each
    cell for reliable results
  • Exact tests only implemented for certain cases in
    JMP

4
Single sample tests
  • In a group of 35 students, 24 are female. Is
    this a random sample from a population with a
    5050 sex ratio?
  • Null hypothesis p 0.5
  • Expected frequencies 17.5 female, 17.5 male

5
Single sample tests
  • In a group of 52 students, 30 are white, 5 are
    Latino, 7 are black, and 10 are Asian. Is this a
    random sample from a population that is 45
    white, 30 Latino, 15 black, and 10 Asian?
  • Null hypothesis probabilities of each race are
    as given
  • Expected frequencies 23.4 white, 15.6 Latino,
    7.8 black, 5.2 Asian

6
Two sample tests
  • Alabama has 30 native tree species and 40
    exotics Kansas has 25 natives and 40 exotics.
    Are these drawn from the same statistical
    population?
  • Null hypothesis p 55/135 0.41
  • Expected frequencies in Alabama 28.5 native,
    41.5 exotic
  • Expected frequencies in Kansas 26.5 native, 38.5
    exotic

7
Multiple sample tests
  • A random sample of 100 land parcels in each of
    Ventura, SB, and San Luis Obispo counties. In
    Ventura, 30 developed, 40 agricultural, and 30
    open space in SB, 45, 30, and 25 in SLO, 10,
    25, and 65. Do the counties have the same land
    use pattern?
  • Null hypothesis 85/300 0.28 developed 95/300
    0.32 agriculture 12/300 0.4 open space
  • Expected frequencies in each county 28
    developed, 32 agriculture, 40 open space

8
Tumors and ETU
  • Treated foods contain ETU (ethylenethiourea)
    may be harmful to health.
  • Big question How does exposure affect chance of
    contracting disease?
  • Some rats exposed to ETU contracted tumors.
  • How does probability of tumor depend on dose?
  • What dose associated with 10 tumor rate (To
    advise on regulation)?
  • 6 dose groups (0,5,25,125,250,500)
  • 70 rats per group.

9
How about OLS regression?
Call lm(formula Tumor Dose) Residuals
Min 1Q Median 3Q Max -0.78572
-0.15976 0.04055 0.04889 1.04889
Coefficients Estimate Std. Error
t value Pr(gtt) (Intercept) -4.889e-02
1.657e-02 -2.951 0.00335 Dose
1.669e-03 7.181e-05 23.244 lt 2e-16
--- Signif. codes 0 ' 0.001 ' 0.01
' 0.05 .' 0.1 ' 1 Residual standard error
0.2653 on 430 degrees of freedom Multiple
R-Squared 0.5568, Adjusted R-squared 0.5558
F-statistic 540.3 on 1 and 430 DF, p-value lt
2.2e-16
10
(No Transcript)
11
(No Transcript)
12
Problems with the OLS regression
  • Statistical
  • Residuals not normally distributed
  • Maybe some nonlinearity hard to tell
  • Logical
  • Predicted value represents probability of getting
    a tumor
  • How do we interpret values less than zero or
    greater than one?
  • Solution GLM with binomially distributed
    residuals
  • Logistic regression
  • Uses the appropriate error model
  • Fits a model that is constrained to be between
    zero and one

13
Logistic Regression
14
(No Transcript)
15
Dose _at_ 10 Tumor Chance
  • What dose gives a 10 chance of contracting a
    tumor?
  • Solve
  • After a bunch of math, D170.24

16
More complex logistic regression and other GLM
models
  • Can add more variables, interactions, etc.
  • Within the logistic function, model needs to be
    linear in parameters
  • With multiple logistic regression, use Effect
    Tests to test significance of individual terms
  • When comparing models, look at AIC (smaller is
    better)
  • AIC -2 LogLikelihood 2df
  • Can use other probability models for residuals
  • gaussian, Gamma, inverse.gaussian, poisson
  • Can also specify the link the function that
    transforms the dependent variable to get
    linearity
  • identity, log, sqrt, logit, probit, inverse,
    1/mu2 (inverse squared)
  • JMP doesnt do any of these
Write a Comment
User Comments (0)
About PowerShow.com