Title: USTPADPDD601
1UST/PAD/PDD601 Applied Quantitative Reasoning
Lecture 8. Multiple Regression Analysis
Sugie Lee, Ph.D. Assistant Professor Urban
Planning, Design and Development Program Levin
College of Urban Affairs Cleveland State
University
2Multiple Regression
- Extends the concept of simple regression
- Includes one criterion (dependent) variable and
several predictor (independent) variables - Needs Interval-ratio dependent variable and two
or more independent variables (Independent
variables can be either dichotomous or interval
level) - Evaluates direct effect of a single independent
variable controlling for the other independent
variables - Reduces our errors of prediction using many
predictor variables instead of just one predictor
variable
3Multiple Regression Equation
Population
Estimate
- Examples
- Salaryf( education, gender, experience, etc)
- College grade point averagef (high school grade
point average, aptitude test scores, household
income, entrance test scores, etc) - Housing sales ()f (square feet, lot size, of
bedrooms, of bathrooms, year built, etc school
quality, transportation accessibility, etc tax
policy, public services, etc) - Commuting timef( , ,
, , ) - Poverty ratef( , ,
, , ,.)
4Multiple Regression Equation (cont.)
- is the expected value of Y
- are the independent variables
- is the y-intercept (when all
independent variables are zero) - are the regression coefficients (the
expected change in Y from an one-unit increase in
, holding all the other Xs constant)
5Multiple Regression Equation (Cont.)
6Multiple Regression Analysis (Example)
SPSS data opm91.sav
7Multiple Regression Analysis (Example)
- The first regression coefficient suggests that,
on average, male employees earn 7,983 more than
female employees, holding education constant - The second regression coefficient suggests that,
on average, employees earns 3,076 more than
people one grade below them, holding gender
constant - Y-intercept is the salary of female employee
with 0 of education
8Multiple Regression Analysis (Example)
- The coefficients of the standardized predictor
variables are referred to as beta coefficients or
beta weights - The second standardized coefficients .458 has a
more important contribution to the dependent
variable than the gender variable - The beta regression coefficients can inform us
only of the relative importance of the various
predictor variables, not the absolute
contributions - The beta regression coefficients will be
changed if we add other independent variables
9Multiple Regression Analysis (Example)
- .373 indicates that 37.3 of the
variance of the salary is predictable by two
independent variablesgender and education. - If the independent variable is uncorrelated with
each other, the multiple is the summation
of individual of each independent variable. - If the independent variable is correlated with
each other, the sum of the individual will
be greater than , since most of the
independent variables are duplicating the
predictive power contained in another independent
variable
10Multiple Regression Analysis (Example)
- The adjusted adjusts for a bias in
R-square when the model has a small sample size
and many predictors (independent variables)
Increase in the sample size will make a small
adjustment
11Multiple Regression Analysis (Example)
SSR
SSE
SST
12Importance of the Predictor Variables
- The multiple correlation coefficients R tells
us the correlation between the weighted sum of
the predictor (independent) variables and the
criterion (dependent) variable - The squared multiple correlation coefficient
tells us what proportion of the variance of
the criterion variable is accounted for by all
the predictor variables combined for example, a
multiple of .7 indicates that 70 of the
variance of the dependent variable is accounted
by a given set of independent variables
13Multiple Regression Dummy and Interval Variables
- Reference group female with no education
- Y-intercept(-21,690) is the expected income of
female with no education - The coefficient (13,550) on male is the expected
difference in income between male and female
holding education constant. That is, the expected
income of male is 13,550 higher than that of
female of same education status - The coefficient (3,350) on educ is the increase
in the expected income with an one-year increase
in education holding gender constant. That is,
each additional year of education would raise the
income by 3,350 holding gender constant.
14Multiple Regression Dummy and Interval Variables
(cont.)
If male1(male), If male0(female),
Income
male
female
education
15Multiple Regression Dummy Variables
- Reference group white population
- Y-intercept(41.91) is the expected working hours
of the White - The coefficient (.33) on Asian is the expected
difference in working hours between the Asian and
the White. That is, the Asian are .33 hours more
working than the white - The coefficient (-.94) on oth_minority is the
expected difference in working hours between the
other minority and the white. That is, other
minorities are .94 hours less working than the
white.
16Multiple Regression Interaction Variable
- Reference group female with no education
- Y-intercept(-12,050) is the expected income of
female with no education - The coefficient (-4,410) on male is the expected
difference in income between male and female
holding education zero (no education). That is,
the expected income of male is 4,410 lower than
that of female holding education zero - The coefficient (2,640) on educ is the increase
in the expected income with an one-year increase
in education holding male zero (female). In other
words, an additional year of education of female
would raise the income by 2,640 - The coefficient (1,310) on maleeduc
(interaction variable) is the additional income
advantage being male for an additional year of
education.
17Multiple Regression Interaction Variable (cont.)
If male1(male), If male0(female),
Income
male
female
education
18Multiple Regression Polynomial Model
- The relationship between dependent and
independent variables are curvilinear rather than
linear. Adding the squared term is the most
common way to model a curvilinear relationship. - Y-intercept(-29,660) is the expected income of
someone with zero age - We cannot interpret the coefficients as the
effect of increase in age while holding age
squared constant.
19Multiple Regression Polynomial Model (cont.)
20Multicolinearity
- Multicolinearity means co-dependence among
independent variables - When multicolinearity is severe, it leads to
unreasonable coefficient estimates and large
standard errors - The common detecting method of multicolinearity
is the variance inflation factor (VIF) or
tolerance (1/VIF) a rule of thumb of
multicolinearity VIFgt10. - When you detect multicolinearity among your
independent variables, you will have to drop some
variables
21Autoregression
- Predict values on the dependent variable based
on values of the same dependent variable obtained
earlier in time - Example Predict the price of Stock A on a given
day based on its price the previous day - Useful in identifying dependencies among data
collected sequentially which we may wish to
extract before submitting the data to further
analysis - Useful for projecting time series data such as
crime rates, fertility rates, etc.
22Multiple Regression Modeling Process in Planning
and Policy Analysis
- Research Background
- Hypothesis
- Data Preparation and Variable Selection
- Descriptive Analysis
- Multiple Regression Analysis
- Interpretation
- Limitations
- Policy Implications
23Multiple Regression Modeling Process Example
- Research Background
- The purpose of this research is to investigate
the economic impact of publicly owned street
trees on residential property values involving
case studies of Clevelands neighborhoods. By
quantifying the positive economic impact of trees
on property values, it should encourage planners
and decision makers to provide adequate program
funding to maintain a healthy and vigorous tree
resource. - Hypothesis
- The citys street trees has a positive economic
impact on single family property values at the
neighborhood level controlling for neighborhood
characteristics and housing characteristics
24Multiple Regression Modeling Process Example
(cont.)
- Data Preparation and Variable Selection
Figure 1. Example of Case Study Blocks, Belden
Avenue, City of Cleveland With Street Trees
(A) and Without Street Trees (B)
Figure 2. Sold Houses (1994-2005)
Multiple Regression Model y (property sales
value) ß0 ßx housing characteristics ßy
neighborhood impacts ßz street tree
Housing characteristics s.f in living space,
of bedrooms, of bathrooms, year built, and year
sold Neighborhood characteristics Street
trees dummy variable of with tree and without
tree
25Multiple Regression Modeling Process Example
(cont.)
- Analysis (Descriptive Analysis and Multiple
Regression)
Data tree_regression.sav on the class
website Dependent variables convamt (sale
price) Independent variables livatot, bedrooms,
baths, ryrbuilt, transyr, tree
26Multiple Regression Modeling Process Example
(cont.)
- Output, Interpretation, limitations, and Policy
Implications