Multiple Regression 2 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Multiple Regression 2

Description:

Multiple Regression 2 Solving for and b The weight for predictor xj will be a function of: The correlation xj and y. The extent to which xj s relationship with y is ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 33
Provided by: psyphzPsy
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression 2


1
Multiple Regression 2
2
Solving for ? and b
  • The weight for predictor xj will be a function
    of
  • The correlation xj and y.
  • The extent to which xjs relationship with y is
    redundant with other predictors relationships
    with y (collinearity).
  • The correlations between y and all other
    predictors.
  • The correlations between xj and all other
    predictors

3
Solving for ? and b the two variable case
  • ? 1 slope for X1 controlling for the other
    independent variable X2
  • ? 2 is computed the same way. Swap X1s, X2s
  • Compare to bivariate slope
  • What happens to b1 if X1 and X2 are totally
    uncorrelated?

4
Solving for ? and b the two variable case
  • Solving for ? and b is relatively simple with two
    variables but becomes increasingly complex with
    more variables and requires differential calculus
    to derive formulas. Matrix algebra can be used
    to simplify the process.

5
Matrix Equations
  • R2 S(ryjbj)
  • where each ryj correlation between the DV and
    the jth IV
  • each bi standardized regression coefficient
  • R2 RyjBj
  • where Ryj row matrix of correlations between
    the DV and k IVs.
  • Bj column matrix of standardized regression
    coefficients for the IVs.
  • Bj Rjj-1Rjy
  • In other words, the matrix of standardized
    regression coefficients is simply the correlation
    matrix between the DV and IVs divided by the
    matrix of correlations among the IVs.

6
Tests of Regression Coefficients
The Xj predictor is not related to Y when the
other predictors are held constant.
Null Hypothesis
7
Further Interpretation of Regression Coefficients
  • Regression coefficients in multiple regression
    (unstandardized and standardized) are considered
    partial regression coefficients because each
    coefficient is calculated after controlling for
    the other predictors in the model.
  • Tests of regression coefficients represent a test
    of the unique contribution of that variable in
    predicting y over and above all other predictor
    variables in the model.

8
Assumptions
  • Predictors are linearly related to criterion.
  • Normality of errors -- residuals are normally
    distributed around zero
  • Multivariate normal distribution -- multivariate
    extension of bivariate normalityhomoscedasticity.
  • Regression diagnostics check on these assumptions

9
Regression Diagnostics
  • Detecting multivariate outliers
  • Distance, leverage, and influence
  • Evaluating Collinearity

10
Regression Diagnostics
  • Methods for identifying problems in your multiple
    regression analysis -- a good idea for any
    multiple regression analysis
  • Can help identify
  • violation of assumptions
  • outliers and overly influential casescases you
    might want to delete or transform
  • important variables youve omitted from the
    analysis

11
Three Classes of MR Diagnostic Statistics
  • 1. Distance -- detects outliers in the dependent
    variable and assumption violations -- primary
    measure is the residual (Y-Y) or standardized
    residual (i.e., put in terms of z scores) or
    studentized residual (i.e., put in terms of
    t-scores)
  • 2. Leverage -- identifies potential outliers in
    the independent variables -- primary measure is
    the leverage statistic or hat diagnostic

12
Three Classes of MR Diagnostic Statistics (cont.)
  • 3. Influence -- combines distance and leverage to
    identify unusually influential observations
    (i.e., observations or cases that have a big
    influence on the MR equation) -- the measure we
    will use is Cooks D

13
Distance
  • Analyze residuals
  • Pay attention to standardized or studentized
    residuals gt 2.5 shouldnt be more than 5 of
    cases
  • Tells you which cases are not predicted well by
    regression analysis -- you can learn from this in
    itself
  • Necessary to test MR assumptions
  • homoscedasticity
  • normality of errors

14
Distance
  • Unstandardized Residuals
  • The difference between an observed value and the
    value predicted by the model. The mean is 0.
  • Standardized Residuals
  • The residual divided by an estimate of its
    standard error. Standardized residuals have a
    mean of 0 and a standard deviation of 1.
  • Studentized Residuals
  • The residual divided by an estimate of its
    standard deviation that varies from case to case,
    depending on the leverage of each cases
    predictor values in determining model fit. They
    have a mean of 0 and a standard deviation
    slightly larger than 1.

15
Distance
  • Deleted Residuals
  • The residual for a case that is excluded from the
    calculation of the regression coefficients. It is
    the difference between the value of the dependent
    variable and the adjusted predicted value.
  • Studentized Deleted Residuals
  • It is a studentized residual with the effect of
    the observation deleted from the standard error.
    The residual can be large due to distance,
    leverage, or influence. The mean is 0 and the
    variance is slightly greater than 1.

16
Distance-example
  • Open mregression1/example2c.sav
  • Regress problems on peak, week, and index
  • Under statistics select estimates, covariance
    matrix, and model fit.
  • Save predicted values unstandardized and save all
    residuals (unstandardized, standardized,
    Studentized, deleted, and Studentized deleted)
  • Okay

17
Distance-example output
  • Interpret bs and betas. Compare betas with
    correlations.
  • Zero order correlations
  • Validity coefficients
  • Why is the standard error of estimate different
    from the standard deviation of unstandardized
    residuals?
  • Note casewise diagnostics compared to saved
    values.

18
Leverage (hat diagnostic hat diag)
  • Tells you outliers on X variables
  • Note that this can detect so-called multivariate
    outliers, that is, cases that are not outliers on
    any one X variable but are outliers on
    combinations of X variables. Example Someone
    who is 60 inches tall and weighs 190 pounds.
  • Guideline Pay attention to cases with centered
    leverage that stands out or is greater than 2k/n
    for large samples or 3k/n for small samples (.04
    in this case). (SPSS prints out the centered
    leverage for each case and the average centered
    leverage across cases)

19
Leverage (hat diagnostic hat diag)
  • Possible values on leverage range from a low of
    1/N to a high of 1.0.
  • The mean of the leverage values is k/N
  • Rerun the regression but save leverage values
    (only)
  • Examine leverage values gt .04.

20
Influence Statistics
  • Cooks D
  • A measure of how much MSResidual would change if
    a particular case were excluded from the
    calculations of the regression coefficients. A
    large Cooks D indicates that excluding a case
    from computation of the regression statistics
    changes the coefficients substantially.
  • Dfbeta(s)
  • The difference in beta value is the change in the
    regression coefficient that results from the
    exclusion of a particular case. A value is
    computed for each term in the model, including
    the constant.
  • Standardized Dfbeta(s)
  • Standardized difference in the beta value.The
    change in the regression coefficient that results
    from the exclusion of a particular case. You may
    want to examine cases with absolute values
    greater than 2/sqrt(N), where N is the number of
    cases.

21
Influence
  • There is no general rule for what constitutes a
    large value of Cooks D
  • Cooks D gt 1 is unusual
  • Look for cases that have a large Cooks D
    relative to other cases
  • Rerun analyses and save Cooks, dfBetas, and
    standardized dfBetas (only)

22
Suggestions for Handling Outliers
  • Check for recording errors.
  • Determine if they are legitimate cases for the
    population you intended to sample.
  • Transform variables. You sacrifice
    interpretation here and it doesnt help much for
    floor or ceiling effects.
  • Trimming (delete extreme cases)?
  • Winsorizing (assign extreme cases the highest
    value considered reasonable
  • (e.g., if someone reports 99 drink/week, and the
    next heaviest drinker is 50, change the 99 to
    50.)
  • Run analyses with and without outliers. If
    conclusions dont change, leave them in. If they
    do change and you take them out, provide
    justification for removal and note how they
    affected the results.

23
Collinearity Among the Predictors
  • Identifying the Source(s) of Collinearity
  • Tolerance
  • Variance Inflation Factor
  • Condition Indices and Variance Proportions
  • Handling Collinearity

24
Collinearity
  • Collinearity
  • We want the predictors to be highly correlated
    with the dependent variable.
  • We do not want the predictors to be highly
    correlated with each other.
  • Collinearity occurs when a predictor is too
    highly correlated with one or more of the other
    predictors.
  • Impact of Collinearity
  • The regression coefficients are very sensitive to
    minor changes in the data.
  • The regression coefficients have large standard
    errors, which lead to low power for the
    predictors.
  • In the extreme case, singularity, you cannot
    calculate the regression equation.

25
Tolerance
  • Tolerance tells us
  • The amount of overlap between the predictor and
    all other remaining predictors.
  • The degree of instability in the regression
    coefficients.
  • Tolerance values less than 0.10 are often
    considered to be an indication of collinearity.

26
Variance Inflation Factor
  • The VIF tells us
  • The degree to which the standard error of the
    predictor is increased due to the predictors
    correlation with the other predictors in the
    model.
  • VIF values greater than 10 (or, Tolerance values
    less than 0.10) are often considered to be an
    indication of collinearity.

27
Condition Indices and Variance Proportions
  • Taken together, they provide information about
  • whether collinearity is a concern
  • if collinearity is a concern, which predictors
    are too highly correlated
  • Weak Dependencies have condition indices around
    5-10 and two or more variance proportions greater
    than 0.50.
  • Strong Dependencies have condition indices
    around 30 or higher and two or more variance
    proportions greater than 0.50.

28
Collinearity diagnostics
  • Eigen value -- the amount of total variation that
    can be explained by one dimension among the
    variables -- when several are close to 0, this
    indicates high multicollinearity. Ideal
    situation all 1s.
  • Condition index --square root of the ratio of
    each eigen value to each successive eigen value
    gt 15 indicates possible problem and gt 30
    indicates serious problem with multicollinearity

29
Collinearity diagnostics (cont.)
  • Variance proportions -- proportion of each
    variable explained by a given dimension --
    multicollinearity can be a problem when a
    dimension explains a high proportion of variance
    in more than one variable. The proportions of
    variance for each variable shows the damage
    multicollinearity does to estimation of the
    regression coefficient for each.

30
Example
  • Regress problems on week, peak, and index.
  • Under statistics select collinearity diagnostics
  • Okay
  • Examine tolerance, VIF, and collinearity
    diagnostics

31
Collinearity in practice
  • When is collinearity a problem?
  • When you have predictors that are VERY highly
    correlated (gt.7).
  • Mention centering, product terms, and
    interactions ? interpretation of coefficients.

32
Handling Collinearity
  • Combine the information contained in your
    predictors, linear combinations (mean of z-scored
    predictors), factor analysis, SEM.
  • Delete some of the predictors that are too highly
    correlated.
  • Collect additional datain the hope that
    additional data will reduce the collinearity.
Write a Comment
User Comments (0)
About PowerShow.com