Title: Regression Models with Nonlinear Transformations
1Regression Models with Nonlinear Transformations
Topics Motivational Example Modeling Curvature
with Square Term Log Transformations Interpreting
Models with Log Transforms Implementing
Transforms in StatTools
2Problem Scenario
- A restaurant chain wants to investigate the
relationship between shop revenue (000s) and the
median household income (000s) in the
neighborhood - Is there a better fit to the data than a linear
model?
3Exploring the Best Model Fit to Data with Excel
4Exploring the Best Model Fit to Data with Excel
5Exploring the Best Model Fit to Data with Excel
6Modeling Curvature with Quadratic (square) Term
- The Quadratic fit is the best with the highest
rsquared for the model (81.3) - The main downside to a quadratic regression
equation is that there is no easy interpretation
of the coefficients of Units and Sqr_Units for
median income
7Modeling Curvature with Quadratic (square) Term
- We can say the terms in the equation combine to
explain the nonlinear relationship between
revenue and median income
8Modeling Curvature with Quadratic (square) Term
- Note the coefficient of Income Sqr is negative to
model the downward bend of the parabola curve.
This produces the decreasing marginal revenue,
where every extra unit of median household income
is associated with a smaller revenue
9Modeling Curvature Logarithmic X Variable
- The Log transform model is the next best fit to
the data with rsquared 76.5 although the simple
linear model is not much worse with rsquared
73.1 - The log model is easier to interpret than the
quadratic model.
10Interpretation of Log X Slope Coefficient
- In general if the log transformed equation is Y
a b logX - For a 1 increase in X, Y is expected to increase
by b/100 units
11Interpretation of Log X Regression Slope
coefficient
- Revenue 558.17 Log (Income) 630.52
- When Median income increases by 1 the restaurant
can expect an increase in revenue of 5.5817
(000s) or 5,582
12Interpretation of Log X Regression Slope
coefficient
- Note that for larger values of median income, a
1 increase represents a larger absolute
increase. But each such 1 increase entails the
same 5,582 increase in revenue. This is another
way of describing the decreasing marginal revenue
property observed in the plot of the data.
13Other Log Transformations in Regression Models
- Whenever the response (Y) variable in regression
is highly skewed to the right a log transform
helps to normalize the distribution. - This is often done for Salary Data
- E.g. Regression of CEO Salary against Company
Profit
14Interpretation of X Variable Coefficient in
Regression Models with Log Y
- In general if the log transformed equation is Log
Y a b X where b is expressed as a percent,
then - For a 1 unit increase in X, Y is expected to
increase by approximately b
15Comparing R2 and Se for Regression Models with
Log Y
- Since the Y variable in a log transformed model
of the form is Log Y a b X is converted to a
log scale, R2 and Se are also measuring
variations on a log scale and cannot be compared
with R2 and Se for a regular Y
16Creating Log Variables in StatTools
- Name the data set in the usual way
- Place the cursor anywhere in the spreadsheet and
click on the Data Utilities icon (3rd from left) - Select Transform and by clicking, place a check
in the box next to the variable (s) to be
transformed
17Creating Log Variables in StatTools
- Accept the default log function in the
transformation box as well as the other defaults
then click O.K. - Click yes when StatTools warns if you wish to
continue to insert a new column - StatTools will insert the new column with the log
transformed variable next to your data