Logistic Regression and Generalized Linear Models: - PowerPoint PPT Presentation

About This Presentation

Title:

Logistic Regression and Generalized Linear Models:

Description:

Logistic Regression and Generalized Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps Blood Screening ESR measurements (erythrocyte ... – PowerPoint PPT presentation

Number of Views:243

Avg rating:3.0/5.0

Slides: 33

Provided by: Information321

Learn more at: https://webpages.math.luc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Logistic Regression and Generalized Linear Models:

1
Logistic Regression and Generalized Linear Models

Blood Screening,
Womens Role in Society,
and Colonic Polyps

2
Blood Screening

ESR measurements (erythrocyte sedimentation
rate)?
Study checks for the rate of increase of ESR when
(blood proteins) fibrinogen and globulin increase
Looking for an association between the
probability of an ESR reading gt 20mm/hr and the
levels of the two plasma proteins.
Less than 20mm/hr indicates a healthy individual

3
Multiple Regression Doesnt Work

Not Normally distributed
The response variable is binary.
In fact, the distribution is Binomial. The can be
seen by looking at how the error term relates to
the probability. If y 1 then the error term is
1- P(y1). We will assume for our Random
Variables Y that

4
Logistic Regression, a Generalized Linear Model

Modelling the expected value of the response
requires a transformation
This chapter is about the logit transformation of
a model, which is of the form
If the response variable is 1, the log odds of
the response is the logit function of a
probability.
Fixing all explanatory variables but xj, exp(Bj)
represents the odds that the response variable is
1 when xj increases by 1.

5
R commands

Plasma_glm_1 lt- glm(ESR fibrinogen, data
plasma, family binomial())?
Fits the model, in this example specifying the
logistic function is implied because of the
binomial() parameter is already passed.
Layout(matrix(12, ncol 2))?
Lets you choose the layout of your graph screen.
Cdplot(ESR fibrinogen, data plasma)?
Cdplot(ESR globulin, data plasma)?
cdplot plots conditional densities describing
how the conditional distribution of a categorical
variable y changes over a numerical variable x
(help(cdplot))?

6
Interpretation

The area of the dark region is the probability
that ESR lt 20 mm/hr, and this decreases as the
protein levels increase
There is not much shape to the globulin density
function.

7
R Commands

Confint(plasma_glm_1, parm fibrinogen)?
Gives a confidence interval
Output
Waiting for profiling to be done...
2.5 97.5
0.3389465 3.9988602
Exp(coef(plasma_glm_1)fibrinogen)?
This is necessary to do the reverse
transformation to get the model coefficients
Output
fibrinogen
6.215715

8
R Commands

Summary(plasma_glm_1)?
Output
Deviance Residuals
Min 1Q Median 3Q Max
-0.9298 -0.5399 -0.4382 -0.3356 2.4794
Coefficients
Estimate Std. Error z value Pr(gtz)
(Intercept) -6.8451 2.7703 -2.471 0.0135
fibrinogen 1.8271 0.9009 2.028 0.0425
---
Signif. codes 0 0.001 0.01 0.05
. 0.1 1
(Dispersion parameter for binomial family taken
to be 1)
Null deviance 30.885 on 31 degrees of
freedom
Residual deviance 24.840 on 30 degrees of
freedom

9
R Commands

Exp(confint(plasma_glm_1, parm fibrinogen))?
Output
2.5 97.5
1.403468 54.535954
Plasma_glm_2 lt- glm(ESR fibrinogen globulin,
data plasma, family binomial())?
Use logistic regression for both variables
fibrinogen and globulin
Summary(Plasma_glm_2)?

10
R Commands

Summary(Plasma_glm_2)?
Output
Deviance Residuals
Min 1Q Median 3Q Max
-0.9683 -0.6122 -0.3458 -0.2116 2.2636
Coefficients
Estimate Std. Error z value Pr(gtz)
(Intercept) -12.7921 5.7963 -2.207 0.0273
fibrinogen 1.9104 0.9710 1.967 0.0491
globulin 0.1558 0.1195 1.303 0.1925
---
Signif. codes 0 0.001 0.01 0.05
. 0.1 1
(Dispersion parameter for binomial family taken
to be 1)
Null deviance 30.885 on 31 degrees of
freedom
Residual deviance 22.971 on 29 degrees of
freedom

11
Interpretation

large confidence interval is because there are
not many observations in all, and even fewer ESR
gt 20mm/hr
The globulin coefficient is basically zero

12
R Commands

Anova(plasma_glm_1, plasma_glm_2, test
Chisq)?
Compares the two models one of just fibrinogen,
the other of both fibrinogen and globulin
Why Chisquared test?
ANOVA assumes normal distribution for each data
set. Why is this OK?
Output
Analysis of Deviance Table
Model 1 ESR fibrinogen
Model 2 ESR fibrinogen globulin
Resid. Df Resid. Dev Df Deviance P(gtChi)
1 30 24.8404
2 29 22.9711 1 1.8692 0.1716

Comparing the residual deviance between the two
models, we see that the difference is not
significant. We must surmise that there is no
association between globulin and ESR level.

13
R Commands

Prob lt- predict(plasma_glm_1, type response
The model of just fibrinogen useful in creating a
neat looking bubble plot. First we must take the
predicted values from the first model and use
them to determine the size of the bubbles
Plot(globulin fibrinogen, data plasma, xlim
c(2, 6), ylim c(25, 50), pch .)?
Plots the second model
Symbols(plasmafibrinogen, plasmaglobulin,
circles prob, add TRUE)?
Uses the values of the first model to create
different bubble sizes.

14
Bubbles size is Probability

The plot shows how the probability of having an
ESR gt 20mm/hr increases as fibrinogen increases

15
Generalized Linear Model

Unifies the logistic regression, Analysis of
Variance and multiple regression techniques.
GLMs have three essential parts
An error distribution- the distribution of the
response variable.
Normal for Analysis of Variance and Multiple
Regression
Binomial for Logistic Regression

16
Main parts of GLM (continued)?

A link function which links the explanatory
variables to the expected value of the response.
Logit function for logistic regression
Identity function for ANOVA and multiple
regression
A variance function which shows the dependency of
the response variable variability on the mean

17
Measure of Fit

The deviance shows how well the model fits the
data
Comparing two models deviances
Use a likelihood ratio test
Compare using Chi-square distribution

18
Womens Role in Society

Response to Women should take care of running
their homes and leave the running the country up
to men.
Factors Education, Sex
Response Agree or Disagree
data is presented as categories with counts for
each education, sex combination

19
R Commands

Womensrole_glm_1 lt- glm(cbind(agree, disagree)
sex education, data womensrole, family
binomial())?
This uses the cbind function to change data from
two responses to one response that is a matrix of
agree, disagree counts.
Summary(womensrole_glm_1)?
We can see that education is a significant factor
to the response.

20
Summary Output

Deviance Residuals
Min 1Q Median 3Q Max
-2.72544 -0.86302 -0.06525 0.84340 3.13315
Coefficients
Estimate Std. Error z value Pr(gtz)
(Intercept) 2.50937 0.18389 13.646 lt2e-16
Why is there a P-value for the intercept?
sexFemale -0.01145 0.08415 -0.136 0.892
education -0.27062 0.01541 -17.560 lt2e-16
---
Signif. codes 0 0.001 0.01 0.05
. 0.1 1
(Dispersion parameter for binomial family taken
to be 1)
Null deviance 451.722 on 40 degrees of
freedom
Residual deviance 64.007 on 38 degrees of
freedom

21
Declaring a function in R

Myplot lt- function(role.fitted)
Declares the function myplot and passes it an
object role.fitted to be used in the function
f lt- womensrolesex Female
Stores everything in the data with sex femail
in f
plot(womensroleeducation, role.fitted, type
"n", ylab "probability of agreeing", xlab
"Education", ylim c(0,1))?
Plots education against the role.fitted object
which will be some predicted values from our GLM
defined as
Role.fitted1 lt- predict(womensrole_glm_1, type
response)?

22
Myplot function (cont)?

lines(womensroleeducation!f, role.fitted!f,
lty 1)
lines(womensroleeducationf, role.fittedf,
lty 2)
These graph the lines, one for males and one for
femails. Lty indicates the kind of line (in this
case solid or dotted)?
lgtxt lt- c("Fitted (Males)", "Fitted (Females)")
legend("topright", lgtxt, lty 12, bty "n")
These add a legend for each line.

23
Myplot Function (cont)?

ylt-womensroleagree/ (womensroleagree
womensroledisagree)?
A basic calculation of the proportion of women
that agree
text(womensroleeducation, y, ifelse(f, "\\VE",
"\\MA"), family "HersheySerif", cex 1.25)?
Plots y and not y using male/female symbols

24
Interpretation

The two fitted lines indicate that sex does not
change a probability of agreeing vs education
The symbols of unfitted observations may indicate
an interaction between sex and education

25
MyPlot SexEducation

By running the same analysis, e.g.
Womensrole_glm_2 lt- glm(cbind(agree, disagree)
sexeducation, data womensrole, family
binomial())?
Summary(womensrole_glm_2)?
This summary shows a significant p-value for the
sexeducation interaction
Role.fitted2 lt- predict(womensrole_glm_1, type
response)?
Myplot(role.fitted2)?
The plot shows that less education is associated
with agreement that women belong in the home.

26
The Deviance Residual

One of the many methods for checking the adequacy
of the model fit
The deviance residual is the square root of the
part of each observation that contributes to the
deviance
Res lt- residuals( womensrole_glm_2, type
deviance)?
Pulls the residuals for many other models as well
Plot(predict(womensrole_glm_2), res, xlab
fitted values, ylab Residuals, lim
max(abs(res))c(-1,1))?
No visible pattern fit appears ok
Abline(h 0, lty 2)?
Adds a dotted line at height 0

27
Drug Treatment TestingFamlilial Andenomatous
Polyposis (FAP)?

Counts of colonic polyps after 12 months of
treatment
Dont want to be the guy that did that
Placebo controlled (Binary factor)?
Age

28
GLM Analysis

Multiple Regression wont work.
Count data strictly positive
Normality is not probable
Poisson Regression- GLM with a log link function
Ensures a Poisson Distribution
Ensures positive fitted amounts
R command
Polyps_glm_1 lt- glm(number treat age, data
polyps, family poisson())?

29
Model Summary

Call
glm(formula number treat age, family
poisson(), data polyps)?
Deviance Residuals
Min 1Q Median 3Q Max
-4.2212 -3.0536 -0.1802 1.4459 5.8301
Coefficients
Estimate Std. Error z value Pr(gtz)
(Intercept) 4.529024 0.146872 30.84 lt 2e-16
treatdrug -1.359083 0.117643 -11.55 lt 2e-16
age -0.038830 0.005955 -6.52 7.02e-11
---
Signif. codes 0 0.001 0.01 0.05
. 0.1 1
(Dispersion parameter for poisson family taken to
be 1)?
Null deviance 378.66 on 19 degrees of
freedom
Residual deviance 179.54 on 17 degrees of
freedom

30
Model Problem Overdispersion

In previous models the variance can be seen as
dependent solely on the mean
E.G. Binomial, Poisson
In practice, this doesnt always work.
Sometimes the raw data points are not
independent, there is some correlation, in this
case, possible clustering
Compare residual deviance and degrees of freedom
to determine
These should be basically equal

31
Model Solution Quasi-Likelihood

This procedure estimates the other factors that
might contribute to the variance
R Command
Polyps_glm_2 lt- glm(number treat age, data
polyps, family quasipoisson())?
Summary(polyps_glm_2)?
The coefficients are still significant, but less
so

32
Homework

Please run through the commands for the myplot
function (pg 100) and send me the command script.
Please send me what you thought of the
presentation, give me a grade and add any
constructive criticism.
zweihanderdawg_at_gmail.com

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Generalized Linear Models PowerPoint PPT Presentation

Generalized Linear Models - On the Aegean Island of Kalythos the male inhabitants suffer from a congenital ... LS: the maximized log-likelihood for the most ... Deviance = 2(LM LS) ... | PowerPoint PPT presentation | free to view

Logistic Regression and Discriminant Function Analysis PowerPoint PPT Presentation

Logistic Regression and Discriminant Function Analysis - Requires an estimation and validation sample to assess predictive accuracy ... of the following variables predict whether a woman is hired to be a Hooters girl? ... | PowerPoint PPT presentation | free to view

Freedom to the Designs Multiple logistic regression and mixed models PowerPoint PPT Presentation

Freedom to the Designs Multiple logistic regression and mixed models - Freedom to the Designs. Multiple logistic regression and mixed models. Florian Jaeger Roger Levy ... time data by Florian Jaeger http://www.stanford.edu/~tiflo ... | PowerPoint PPT presentation | free to view

Lecture 1 Introduction to Multi-level Models PowerPoint PPT Presentation

Lecture 1 Introduction to Multi-level Models - ( forget the holy grail ) A model is a tool for asking a scientific question; ... Alcohol Consumption (ml/day) 27. Within-Cluster Correlation ... | PowerPoint PPT presentation | free to view

Linear correlation and linear regression summary of tests PowerPoint PPT Presentation

Linear correlation and linear regression summary of tests - cov(X,Y) 0 X and Y are positively correlated. cov(X,Y) 0 X and Y are ... ( remember max and mins from calculus)... Derivative[ (Yi-(mx b))2]=0. Prediction ... | PowerPoint PPT presentation | free to view

$Multivariable regression models with continuous covariates with a practical emphasis on fractional polynomials and applications in clinical epidemiology PowerPoint PPT Presentation$