Regression

About This Presentation

Title:

Regression

Description:

The values of Y in these models are often called predicted values, sometimes ... weighted average of all our slopes created would be the LS slope for the model ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 38

Provided by: mik4

Learn more at: https://digitalstrategy.unt.edu

Category:

Tags: ls | models | regression

more less

Transcript and Presenter's Notes

Title: Regression

1
Regression

Review and Extension

2
The Formula for a Straight Line

Only one possible straight line can be drawn once
the slope and Y intercept are specified
The formula for a straight line is
Y bx a
Y the calculated value for the variable on the
vertical axis
a the intercept
b the slope of the line
X a value for the variable on the horizontal
axis
Once this line is specified, we can calculate the
corresponding value of Y for any value of X
entered

3
The Line of Best Fit

Real data do not conform perfectly to a straight
line
The best fit straight line is that which
minimizes the amount of variation in data points
from the line
Note that this is a key idea, you get to choose
how you want to minimize some estimate of
variability about a regression line
The typical approach is the least squares method
The equation for this line can be used to predict
or estimate an individuals score on Y on the
basis of his or her score on X

4
Least Squares Modeling

When the relation between variables are expressed
in this manner, we call the relevant equation(s)
mathematical models
The intercept and weight values are called the
parameters of the model.
Well assume that our models are causal models,
such that the variable on the left-hand side of
the equation is being caused by the variable(s)
on the right side.

5
Terminology

The values of Y in these models are often called
predicted values, sometimes abbreviated as Y-hat
or
They are the values of Y that are implied or
predicted by the specific parameters of the
model.

6
Parameter Estimation

In estimating the parameters of our model, we are
trying to find a set of parameters that minimizes
the error variance. In other words, we want the
sum of the squared residuals to be as small as it
possibly can be.
The process of finding this minimum value is
called least-squares estimation.

7
Least-squares estimation

The relevant equations

8
Estimates of a and b

Estimating the Slope (the regression coefficient)
Estimating the Y intercept
These calculations ensure that the regression
line passes through the point on the scatterplot
defined by the means of X and Y

9
Relationship to r
10
Standardized regression coefficient

Standardized slope is often given in output, and
will have added usefulness within multiple
regression
When normally distributed scores are changed into
Z scores the mean is 0 and standard deviation is
1. Referring to our previous formula
So r would be equal to the slope, and interpreted
as 1 sd unit of change in X leads to a b sd unit
change in Y

11
What can the model explain?

Total variability in the dependent variable
(observed mean) comes from two sources
Variability predicted by the model i.e. what
variability in the dependent variable is due to
the independent variable
How far off our predicted values are from the
mean of Y
Error or residual variability i.e. variability
not explained by the independent variable
The difference between the predicted values and
the observed values

S2y
S2
S2(yi - i)
Total variance predicted variance error
variance
12
R-squared - the coefficient of determination

The square of the correlation, r², is the
fraction of the variation in the values of y
that is explained by the regression of y on x
Conceptually
R² variance of predicted values y
variance of observed values y

13
R2
A Venn Diagram Showing r2 as the Proportion of
Variability Shared by Two Variables (X and Y)

The shaded portion shared by the two circles
represents the proportion of shared variance the
larger the area of overlap, the greater the
strength of the association between the two
variables

14
Predicted variance and r2
15
Interpreting regression summary

Intercept
Value of Y if X is 0
Often not meaningful, particularly if its
practically impossible to have an X of 0 (e.g.
weight)
Slope
Amount of change in Y seen with 1 unit change in
X
Standardized regression coefficient
Amount of change in Y seen in standard deviation
units with 1 standard deviation unit change in X
In simple regression it is equivalent to the r
for the two variables
Standard error of estimate
Essentially the standard deviation of the
residuals
The difference involves dividing by df residuals
for the model (see) vs. n-1 (sd)
As R2 goes up, it goes down
Statistical significance of the model
R2
Proportion of variance explained by the model

16
The Caution of Causality

Correlation does not prove causality, but
One cant establish causality without correlation
One thing to remember is that just because things
look good for your model, other models may be as
viable or even better

17
Assumptions in regression

For starters
Linear relationship between the independent and
dependent variable
Residuals are normally distributed
Residuals are independent

18
Heteroscedasticity

We also assume residuals have the same variance
about the regression line
Homoscedasticity
Example of heteroscedasticity

19
Interval measures and measurement without error

Ordinal variables are not to be used as the
differences among levels is not constant
But we like our Likerts!
Most suggest that at least 5 to lessen the impact
of ordinal differences (7 or more better)
Measurement without error
Must have reliable measures involved
More random error will lead to larger error
variance
Less reliable, smaller R2

20
Violating assumptions

Usual situation
Slight problems may not result in much change in
type I error
However, type II will be a major concern with
even modest violations
With multiple violations, type I may also suffer
Additional assumptions will be made for multiple
independent variables

21
Outliers

As outliers can greatly influence r, they will
naturally influence any analysis using it
Detecting and dealing with outliers is a part of
the process of regression analysis
One issue is distinguishing univariate vs.
multivariate outliers
While a data point might be an outlier on a
variable, it may not be as far as the model goes
Conversely, what might be an outlier for the
model, might not have its individual variable
values noted as outliers

22
Robust Regression

A single unusual point can greatly distort the
picture regarding the relationship among
variables
Heteroscedasticity, even in normal situations,
inflates the standard error of estimate and
decreases our estimate of R2
Nonnormality can hamper our ability to come up
with useful interval measures for slopes

23
Robust Regression

While least squares regression performs well in
general if we are conducting hypothesis testing
regarding independence, it is poor at detecting
associations in less than ideal circumstances
What we would like are methods that perform well
in a variety of circumstances, and compete well
with least-squares regression under ideal
conditions
To be discussed
Theil-Sen Estimator
Regression via robust correlation
L regression
Least trimmed squares
Least trimmed absolute value
Least median of squares
M-estimators
Deepest regression line

24
Theil-Sen Estimator

For any pair of data points regarding a
relationship between two variables, we can plot
those 2 points, produce a line connecting them,
and note its slope
E.g. if we had 4 data points we could calculate 6
slopes
X 1,2,3,4
Y 5,7,11,15
If each of those slopes is weighted by the
squared difference in X values for the
appropriate points, the weighted average of all
our slopes created would be the LS slope for the
model
E.g. Create a line for the points, (1,5) and
(2,7)
Slope 2
Weight by (1-2)2
What if instead of a weighted average, the median
of those slopes is chosen as our model slope
estimate?
That in essence is the Theil-Sen estimator

25
Theil-Sen Estimator

Advantages
Competes with LS regression in ideal conditions
More resistant
Reduced standard error in problematic situations,
e.g. heteroscedasticity
We can, using the percentile bootstrap method,
calculate CIs as well

It has been shown that the median approach here
performs better than trimming less
26
Regression via robust correlation

We could simple replace our regular r with a more
robust estimate
This is possible but more work needs to be done
to figure out which approaches might be more
viable, and it appears bias might be a problem in
some cases with this approach (e.g.
heteroscedastic situations using a winsorized r)

27
Least Absolute Value

Instead of minimizing the sum of the squared
residuals, we could choose a method that attempts
to minimize the sum of the absolute residuals
L1 regression
Problem while protecting against outliers on Y,
it does not for values of on the predictor

28
Least Trimmed Squares

The least trimmed squares approach involves
trimming the smallest and largest residuals
So if h is the amount of values left after
trimming and
Then the goal would be to minimize the sum of the
squared residuals of the remaining data
Note again that optimal trimming amount is about
.2

29
S-plus menu example

The first two show the standard menu availability
of least trimmed squares regression
The last uses the robust library

30
Least Trimmed Absolute Value

Same approach, but rather than minimize the
trimmed squared residuals, we minimize the sum of
the absolute residuals remaining after trimming
This may be preferable to LTS in heteroscedastic
situations

31
Least Median of Squares

Find the slope and intercept that minimizes the
median of the squared residuals
Doesnt seem to perform as well generally as
other robust approaches

32
M-estimators

In general, regression using M-estimators
minimize the sum of some function of the
residuals
Where ? is a function used to guard against
outliers and heteroscedasticity
E.g. ?(r1) r2 would give us our regular LS
result
Although there are many M-estimator approaches
one might be able to choose from given the
newness of the approach in general and our
relative lack of research regarding it, Wilcox
suggest the adjusted M-estimator seems to work
well in practical situations
First checks for bad leverage points and may
ignore in estimate of slope and intercept

33
Leverage points

Leverage is one aspect of outlierness that
well mention here but come back to later
It is primarily concerned with outliers among
predictors
E.g. Mahalanobis distance
Good leverage points may be extreme with regard
to predictors but is not an outlier with regard
to the model
In LS, it can decrease the standard error
Bad leverage points are extreme and would not lie
close to a line that would fit most of the data
well, and have a profound effect on your estimate
of the slope

34
Leverage points
35
Deepest regression line

One of the more recent developments, and may be
of practical use as it is researched further
It is really more about linear fit (i.e. matching
parameters to data) as opposed to focus on the
observations/residuals themselves
Depth is the number of observations that would
need to be removed to make the data nonfit
Appears to have a breakdown point of about 1/3
regardless of the number of predictors

36
Summary

In single predictor situations, alternatives are
available that perform well in ideal situations,
and much better than the LS approach in others
Theil-Sen in particular
While we have kept to the single predictor, this
will typically never be our research situation in
using regression analysis
These methods can also be generalized to the
multiple predictor setting, but their breakdown
point (i.e. resistance advantage) decreases as
more predictors enter into the equation

37
Summary

Again we call on the Tukey suggestion
just which robust/resistant methods you use is
not important what is important is that you use
some. It is perfectly proper to use both
classical and robust/resistant methods routinely,
and only worry when they differ enough to matter.
But when they differ, you should think hard.
A general approach
Check for linearity
Perhaps using a smoother
If ok there, then use an estimator with a
breakdown point of about .2-.3, and compare with
LS output
If notable differences between LS and robust
exist, figure out why and determine which is more
appropriate
If assumptions are tenable and little difference
between LS and robust exists, feel comfortable
going with the LS output