Title: B AD 6243: Applied Univariate Statistics
1B AD 6243 Applied Univariate Statistics
- Multiple Regression
- Professor Laku Chidambaram
- Price College of Business
- University of Oklahoma
2Basics of Multiple Regression
- Multiple regression examines the relationship
between one interval/ratio level variable and two
or more interval/ratio (or dichotomous) variables - As in simple regression, the dependent (or
criterion) variable is y and the other variables
are the independent (or predictor) variables xi - The intent of the regression model is to find a
linear combination of xs that best correlate
with y - The model is expressed as Y ?0 ?1Xi ?2X2
?nXn ?I
3A Graphical Representation
Objective To graphically represent the equation
Y ?0 ?1Exp_X1 ?2RlExp_X2 ?I
4Selecting Predictors
- Rely on theory to inform selection
- Examine correlation matrix to determine strength
of relationships with Y - Use variables based on your knowledge
- Let the computer decide based on data set
5Selecting Method of Inclusion
- Enter
- Enter Block
- Stepwise
- Forward selection
- Backward elimination
- Stepwise
6What to Look For?
- b-values vs. standardized beta weights (ß)
- R represents correlation between observed values
and predicted values of Y - R-squared represents the amount of variance
shared between Y and all the predictors combined - Adjusted R-squared
7First Order Assumptions
- Continuous variables (also see next slide)
- Linear relationships between Y and Xs
- Sufficient variance in values of predictors
- Predictors uncorrelated with external variables
8Including Categorical Variables
- Dichotomous variables e.g., Gender
- Coded as 0 or 1
- Dummy variables
- e.g., Political affiliation
- Create d - 1 dummy variables, where d is the
number of categories - So, with four categories, you need three dummy
variables
Variable/ Category D1 D2 D3
Democrat 1 0 0
Republican 0 1 0
Libertarian 0 0 1
Other 0 0 0
9Second Order Assumptions
- Independence of independent variables
- Equality of variance
- Normal distribution of error terms
- Independence of observations
10Violations of Assumptions
PROBLEM DEFINITION DETECTION
Multicollinearity Predictor variables are highly correlated High inter-correlations Examine VIFs and tolerances
Heteroskedasticity Error terms do not have a constant variance Scatter plot of residuals Split file to examine variances
Outliers Error terms not normally distributed Cooks distance Mahalanobis distance Residual plots
Autocorrelation Residuals are correlated Durbin-Watson ? 2 (If lt 2, then correlation If gt 2, then correlation)
11Multicollinearity
- High correlations among predictors
- Can result in
- Lower value of R
- Difficulty of judging relative importance of
predictors - Increases instability of model
- Possible solutions
- Examine correlation matrices, VIFs and tolerances
to judge if predictor(s) need to be dropped - Rely on computer assisted means
- Other options
12Heteroskedasticity
- Systematic increase or decrease in variance
- Can result in
- Confidence intervals being too wide or narrow
- Unstable estimates
- Possible solutions
- Transform data
- Other options
13Outliers
- Undue influence of extreme values
- Can result in
- Incorrect estimates and inaccurate confidence
intervals - Possible solutions
- Identify and eliminate value(s), but
- Transform data
- Other options
14Autocorrelation
- Observations are not independent (typically,
observations over time) - Can result in
- Lower standard error of estimate
- Lower standardized beta values
- Possible solutions
- Search for key missing variables
- Cochrane-Orcutt Procedure
- Other options
15Results of Analysis
16Results of Analysis (contd.)
17A Graphical Representation