Regression Analysis Qualitative Dependent Variable - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Regression Analysis Qualitative Dependent Variable

Description:

The Regression Analysis deals with prediction of the Mean value of the Dependent ... having a Car for a family with Income of US$ 65000 and a 8 year old car is: ... – PowerPoint PPT presentation

Number of Views:435
Avg rating:3.0/5.0
Slides: 73
Provided by: muhammadqa
Category:

less

Transcript and Presenter's Notes

Title: Regression Analysis Qualitative Dependent Variable


1
Regression AnalysisQualitative Dependent Variable
  • Muhammad Qaiser Shahbaz
  • Department of Statistics
  • GC University, Lahore

2
The Regression Analysis
  • The Regression Analysis deals with prediction of
    the Mean value of the Dependent variable by using
    information of Independent variables.
  • Nature of the dependent variable plays very
    important role in the regression analysis.
  • Major types of the dependent variable encountered
    in the regression analysis are Quantitative and
    Qualitative types.
  • Estimation framework differ for both type of the
    dependent variables.

3
The Generalized Linear Models
  • The Class of Linear Models that contains certain
    types of models in itself.
  • The Generalized Linear Model set up is given as

4
Types of Qualitative Dependent Variable
  • Following major types of Qualitative Variables
    are met in practice
  • Binary Variable
  • Categorical without Order
  • Categorical with Order

5
Regression with Qualitative Dependent Variable
6
Generalized Linear Models
  • Models Estimation
  • Iteratively Reweighted Least Squares Estimation
  • Maximum Likelihood Estimation
  • Model Diagnostics
  • Outliers
  • Residual Analysis
  • Autocorrelations
  • Heteroscedasticity
  • Multicollinearity
  • Leverage Values
  • Influential Observations
  • Model Validation

7
Regression with Binary Dependent Variable
  • The dependent variable is Binary.
  • Distribution of the Error is Binomial
  • Several models are available depending upon the
    Link Function two most popular are
  • The Binary Logistic Regression
  • The Probit Regression
  • The Models are used to predict the probability of
    falling in the success category given the
    information of explanatory variables.

8
The Binary Logistic Regression
  • The Dependent variable is Binary, say for
    example, recovery from a disease (Yes, No)
    qualifying an entry test (Yes, No) etc.
  • The Yes category is generally referred to as
    the success category.
  • Used to model the probability of having in the
    success category given the information of
    independent variables.
  • Can also be used to predict the Logit of the
    success category
  • Commonly used in Medical sciences.

9
The Model for Binary Variable
  • The Logistic Regression model used to predict
    the probability of dependent variable to have in
    the success category given the information of
    explanatory variables is given as
  • The Logit model used to predict Logit of the
    dependent variable is given as

10
Interpreting the Coefficients
  • The Coefficients in the Logistic Regression are
    interpreted in terms of Logit and Odds Ratio.
  • The Coefficient is the Logit of the
    dependent variable when all the independent
    variables have zero value. The quantity
    is the Odds Ratio of the dependent variable
    when all independent variables are zero.
  • The coefficient is partial effect of jth
    Independent variable on Logit of the dependent
    variable. The quantity is the
    partial effect of jth Independent variable on
    Odds Ratio of the dependent variable.

11
Some Important Measures
  • The Model ChiSquare measures the difference
    between two LogLikelihood functions.
  • The PseudoR2Used to decide about the proportion
    of variation of dependent variable explained by
    the model. Two types of R2 are available

12
Testing Adequacy and Significance of the Model
  • Adequacy of the Logistic Regression can be tested
    by using the Deviance statistic that measures
    difference between saturated model and the fitted
    model. An insignificant result indicates that the
    model is adequate.
  • Significance of the model is tested by using the
    model ChiSquare. This test tests whether all of
    the regression coefficients are significantly
    different from zero. A significant result
    indicates that the coefficients are different
    from zero.

13
Measures for Model Diagnostics
  • Individual Deviance
  • Leverage Values
  • Standardized Residuals
  • Cooks Distance

14
Data Format for Logistic Regression
  • Two formats of data are available for Logistic
    Regression.
  • The Raw Format The data is entered as it is
    collected.
  • Covariate Class Format The data is entered in
    the form of groups.

15
The Raw Format Illustrated
16
The Covariate Class Illustrated
17
Example 1
  • A study was performed to investigate new
    automobile purchases. Data on monthly income (000
    US), Age of Old Car and purchase of the new car
    (1Yes) is collected and is given below

18
Running the Regression
19
Running the Regression
20
The Output
21
The Logistic Logit Model
22
Calculation of Estimated Probability
  • The estimated Logistic Regression Model is
  • The probability of having a Car for a family
    with Income of US 65000 and a 8 year old car is

23
The Diagnostics
24
The Diagnostics
25
Models for Categorical Data
  • Two types of Models are available depending upon
    nature of the Categorical Variable.
  • Unordered Categorical Variable
  • The Multinomial Logistic Model
  • The Discriminatory Analysis
  • Ordered Categorical Variable
  • The Ordinal Logistic Model

26
The Multinomial Logistic Model
  • Dependent variable is Unordered Categorical, for
    example preference of a TV brand.
  • Distribution of the dependent variable is
    Multinomial.
  • A Base category is used in the model.
  • Used to predict the Probability of a specific
    category given the information of independent
    variables

27
The Multinomial Logistic Model
  • The Multinomial Logistic model is a collection of
    several models.
  • If there are G categories in the dependent
    variable then there are G 1 logistic models,
    one for each category with one category as base.
  • Each model can be used to predict the probability
    or Logit of a given category on the basis of
    information of explanatory variables.

28
The Multinomial Logistic Model
29
Interpreting the Regression Coefficients
  • The model to predict Category Logit is
  • The coefficient is partial change in the
    Logit of gth category for a unit change in the
    jth independent variable.

30
Model Adequacy Significance
  • Like Binary Logistic model, tests of significance
    and adequacy of the model can also be conducted
    in Multinomial Logistic Model.
  • Adequacy of the model can be tested by using the
    Deviance Statistic. An insignificant result
    indicates that the model is adequate.
  • Significance of the model can be tested by using
    the ChiSquare Statistics. A significant result
    indicates that the model is significant.

31
The R2 Measure
  • The R2 Multinomial Logistic is calculated by
    using the statistic
  • Lp is LogLikelihood of model with independent
    variables.
  • L0 is LogLikelihood of intercept only model.

32
Diagnostics
33
Example 2
  • A study was conducted to see the effect of
    length of alligators on their primary food
    choice. Data is given below

34
Running the Regression
35
Running the RegressionStatistics
36
Running the Regression Save
37
The Output
38
The Output
39
Models to Predict Logit of a Category
40
Models to Predict Category Probability
41
Calculation of Probability
  • The Probability of preference of various food
    types of an alligator with 3.5 m length are

42
The Discriminatory Analysis
  • Used as an alternative to Multinomial Logistic
    Model.
  • Basic use is to develop the Linear Discriminant
    Functions that can be used to predict the group
    membership.
  • The role of discriminant analysis is opposite to
    One Way MANOVA. The Fixed Factor in one way
    MANOVA becomes the dependent variable in
    Discriminant Analysis. The dependent variable in
    MANOVA becomes independent variables in
    Discriminant Analysis.

43
The Discriminatory Analysis
  • The discriminatory analysis require continuous
    independent variables. If independent variables
    are categorical then the technique is not
    appropriate.
  • The Multivariate Normality of independent
    variables is also required to conduct Tests of
    Significance.
  • The Homogeneity of Covariance Matrices is also
    required for efficient use.

44
The Linear Discriminant Function
  • The function is used to predict the group
    membership.
  • The Standardized Canonical Coefficients are

45
Interpreting the Coefficients of Linear
Discriminant Function
  • The Coefficients are like the coefficients of the
    Regression Function.
  • The coefficients can be used to look at the role
    of a particular variable in discrimination.
  • Larger the coefficient of a variable is, greater
    will be its role in discrimination for that
    particular group.

46
Some Tests of Significance
  • Boxs M test for testing Equality of Covariance
    Matrices across various groups. Insignificant
    result of this test indicates that pooled within
    group covariance matrix can be used to form the
    discriminant function. Significant result
    indicates that separate covariance matrices
    should be used.
  • Wilks Lambda statistic for testing Equality of
    Mean Vectors across various groups.

47
Example 3
  • Data on Sepal Length, Sepal Width, Petal Length
    and Petal Width is collected for various types of
    Irises. The complete data has 150 observations. A
    part of the data is given below

48
Running the Analysis
49
Running the Analysis
50
Running the Analysis - Statistic
51
Running the AnalysisClassification
52
Running the AnalysisSave
53
The Output
54
The Output
55
The Output
56
The Output
57
The Output
58
The Output
59
The Linear Discriminant Functions
60
The Output
61
The Ordinal Logistic Model
  • Dependent variable is Ordered Categorical.
  • Distribution of the dependent variable is
    Multinomial.
  • If there are k categories then a total of k
    1 models are estimated.
  • The models are used to predict the Cumulative
    Probability of a specific category given the
    information of explanatory variables.

62
The Ordinal Logistic Model
  • Two models are widely used and are given

63
The Ordinal Logistic Model
  • The models to predict Cumulative Probability of
    a category are

64
Tests of Significance in Ordinal Logistic
  • Certain tests of significance can be carried out
    in Ordinal Logistic Regression.
  • Significance of the Model is tested by using the
    ChiSquare Statistic.
  • A test of Parallel Regressions can be tested by
    using the ChiSquare Statistic.

65
Example 4
  • Data on Credit Card Status has been collected
    from 250 credit card holders. Information upon
    Card Status (chist), Age (age), Duration of Card
    (dura) and Card Amount (camt) is collected. A
    part of data is shown here

66
Running the Analysis
67
Running the Analysis Options
68
Running the Analysis Output
69
The Output
70
The Output
71
The Regression Models
72
  • Thanks You
Write a Comment
User Comments (0)
About PowerShow.com