Regression - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Regression

Description:

The values of Y in these models are often called predicted values, sometimes ... weighted average of all our slopes created would be the LS slope for the model ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 38
Provided by: mik4
Category:
Tags: ls | models | regression

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
  • Review and Extension

2
The Formula for a Straight Line
  • Only one possible straight line can be drawn once
    the slope and Y intercept are specified
  • The formula for a straight line is
  • Y bx a
  • Y the calculated value for the variable on the
    vertical axis
  • a the intercept
  • b the slope of the line
  • X a value for the variable on the horizontal
    axis
  • Once this line is specified, we can calculate the
    corresponding value of Y for any value of X
    entered

3
The Line of Best Fit
  • Real data do not conform perfectly to a straight
    line
  • The best fit straight line is that which
    minimizes the amount of variation in data points
    from the line
  • Note that this is a key idea, you get to choose
    how you want to minimize some estimate of
    variability about a regression line
  • The typical approach is the least squares method
  • The equation for this line can be used to predict
    or estimate an individuals score on Y on the
    basis of his or her score on X

4
Least Squares Modeling
  • When the relation between variables are expressed
    in this manner, we call the relevant equation(s)
    mathematical models
  • The intercept and weight values are called the
    parameters of the model.
  • Well assume that our models are causal models,
    such that the variable on the left-hand side of
    the equation is being caused by the variable(s)
    on the right side.

5
Terminology
  • The values of Y in these models are often called
    predicted values, sometimes abbreviated as Y-hat
    or
  • They are the values of Y that are implied or
    predicted by the specific parameters of the
    model.

6
Parameter Estimation
  • In estimating the parameters of our model, we are
    trying to find a set of parameters that minimizes
    the error variance. In other words, we want the
    sum of the squared residuals to be as small as it
    possibly can be.
  • The process of finding this minimum value is
    called least-squares estimation.

7
Least-squares estimation
  • The relevant equations

8
Estimates of a and b
  • Estimating the Slope (the regression coefficient)
  • Estimating the Y intercept
  • These calculations ensure that the regression
    line passes through the point on the scatterplot
    defined by the means of X and Y

9
Relationship to r
10
Standardized regression coefficient
  • Standardized slope is often given in output, and
    will have added usefulness within multiple
    regression
  • When normally distributed scores are changed into
    Z scores the mean is 0 and standard deviation is
    1. Referring to our previous formula
  • So r would be equal to the slope, and interpreted
    as 1 sd unit of change in X leads to a b sd unit
    change in Y

11
What can the model explain?
  • Total variability in the dependent variable
    (observed mean) comes from two sources
  • Variability predicted by the model i.e. what
    variability in the dependent variable is due to
    the independent variable
  • How far off our predicted values are from the
    mean of Y
  • Error or residual variability i.e. variability
    not explained by the independent variable
  • The difference between the predicted values and
    the observed values

S2y
S2
S2(yi - i)
Total variance predicted variance error
variance
12
R-squared - the coefficient of determination
  • The square of the correlation, r², is the
    fraction of the variation in the values of y
    that is explained by the regression of y on x
  • Conceptually
  • R² variance of predicted values y
  • variance of observed values y

13
R2
A Venn Diagram Showing r2 as the Proportion of
Variability Shared by Two Variables (X and Y)
  • The shaded portion shared by the two circles
    represents the proportion of shared variance the
    larger the area of overlap, the greater the
    strength of the association between the two
    variables

14
Predicted variance and r2
15
Interpreting regression summary
  • Intercept
  • Value of Y if X is 0
  • Often not meaningful, particularly if its
    practically impossible to have an X of 0 (e.g.
    weight)
  • Slope
  • Amount of change in Y seen with 1 unit change in
    X
  • Standardized regression coefficient
  • Amount of change in Y seen in standard deviation
    units with 1 standard deviation unit change in X
  • In simple regression it is equivalent to the r
    for the two variables
  • Standard error of estimate
  • Essentially the standard deviation of the
    residuals
  • The difference involves dividing by df residuals
    for the model (see) vs. n-1 (sd)
  • As R2 goes up, it goes down
  • Statistical significance of the model
  • R2
  • Proportion of variance explained by the model

16
The Caution of Causality
  • Correlation does not prove causality, but
  • One cant establish causality without correlation
  • One thing to remember is that just because things
    look good for your model, other models may be as
    viable or even better

17
Assumptions in regression
  • For starters
  • Linear relationship between the independent and
    dependent variable
  • Residuals are normally distributed
  • Residuals are independent

18
Heteroscedasticity
  • We also assume residuals have the same variance
    about the regression line
  • Homoscedasticity
  • Example of heteroscedasticity

19
Interval measures and measurement without error
  • Ordinal variables are not to be used as the
    differences among levels is not constant
  • But we like our Likerts!
  • Most suggest that at least 5 to lessen the impact
    of ordinal differences (7 or more better)
  • Measurement without error
  • Must have reliable measures involved
  • More random error will lead to larger error
    variance
  • Less reliable, smaller R2

20
Violating assumptions
  • Usual situation
  • Slight problems may not result in much change in
    type I error
  • However, type II will be a major concern with
    even modest violations
  • With multiple violations, type I may also suffer
  • Additional assumptions will be made for multiple
    independent variables

21
Outliers
  • As outliers can greatly influence r, they will
    naturally influence any analysis using it
  • Detecting and dealing with outliers is a part of
    the process of regression analysis
  • One issue is distinguishing univariate vs.
    multivariate outliers
  • While a data point might be an outlier on a
    variable, it may not be as far as the model goes
  • Conversely, what might be an outlier for the
    model, might not have its individual variable
    values noted as outliers

22
Robust Regression
  • A single unusual point can greatly distort the
    picture regarding the relationship among
    variables
  • Heteroscedasticity, even in normal situations,
    inflates the standard error of estimate and
    decreases our estimate of R2
  • Nonnormality can hamper our ability to come up
    with useful interval measures for slopes

23
Robust Regression
  • While least squares regression performs well in
    general if we are conducting hypothesis testing
    regarding independence, it is poor at detecting
    associations in less than ideal circumstances
  • What we would like are methods that perform well
    in a variety of circumstances, and compete well
    with least-squares regression under ideal
    conditions
  • To be discussed
  • Theil-Sen Estimator
  • Regression via robust correlation
  • L regression
  • Least trimmed squares
  • Least trimmed absolute value
  • Least median of squares
  • M-estimators
  • Deepest regression line

24
Theil-Sen Estimator
  • For any pair of data points regarding a
    relationship between two variables, we can plot
    those 2 points, produce a line connecting them,
    and note its slope
  • E.g. if we had 4 data points we could calculate 6
    slopes
  • X 1,2,3,4
  • Y 5,7,11,15
  • If each of those slopes is weighted by the
    squared difference in X values for the
    appropriate points, the weighted average of all
    our slopes created would be the LS slope for the
    model
  • E.g. Create a line for the points, (1,5) and
    (2,7)
  • Slope 2
  • Weight by (1-2)2
  • What if instead of a weighted average, the median
    of those slopes is chosen as our model slope
    estimate?
  • That in essence is the Theil-Sen estimator

25
Theil-Sen Estimator
  • Advantages
  • Competes with LS regression in ideal conditions
  • More resistant
  • Reduced standard error in problematic situations,
    e.g. heteroscedasticity
  • We can, using the percentile bootstrap method,
    calculate CIs as well

It has been shown that the median approach here
performs better than trimming less
26
Regression via robust correlation
  • We could simple replace our regular r with a more
    robust estimate
  • This is possible but more work needs to be done
    to figure out which approaches might be more
    viable, and it appears bias might be a problem in
    some cases with this approach (e.g.
    heteroscedastic situations using a winsorized r)

27
Least Absolute Value
  • Instead of minimizing the sum of the squared
    residuals, we could choose a method that attempts
    to minimize the sum of the absolute residuals
  • L1 regression
  • Problem while protecting against outliers on Y,
    it does not for values of on the predictor

28
Least Trimmed Squares
  • The least trimmed squares approach involves
    trimming the smallest and largest residuals
  • So if h is the amount of values left after
    trimming and
  • Then the goal would be to minimize the sum of the
    squared residuals of the remaining data
  • Note again that optimal trimming amount is about
    .2

29
S-plus menu example
  • The first two show the standard menu availability
    of least trimmed squares regression
  • The last uses the robust library

30
Least Trimmed Absolute Value
  • Same approach, but rather than minimize the
    trimmed squared residuals, we minimize the sum of
    the absolute residuals remaining after trimming
  • This may be preferable to LTS in heteroscedastic
    situations

31
Least Median of Squares
  • Find the slope and intercept that minimizes the
    median of the squared residuals
  • Doesnt seem to perform as well generally as
    other robust approaches

32
M-estimators
  • In general, regression using M-estimators
    minimize the sum of some function of the
    residuals
  • Where ? is a function used to guard against
    outliers and heteroscedasticity
  • E.g. ?(r1) r2 would give us our regular LS
    result
  • Although there are many M-estimator approaches
    one might be able to choose from given the
    newness of the approach in general and our
    relative lack of research regarding it, Wilcox
    suggest the adjusted M-estimator seems to work
    well in practical situations
  • First checks for bad leverage points and may
    ignore in estimate of slope and intercept

33
Leverage points
  • Leverage is one aspect of outlierness that
    well mention here but come back to later
  • It is primarily concerned with outliers among
    predictors
  • E.g. Mahalanobis distance
  • Good leverage points may be extreme with regard
    to predictors but is not an outlier with regard
    to the model
  • In LS, it can decrease the standard error
  • Bad leverage points are extreme and would not lie
    close to a line that would fit most of the data
    well, and have a profound effect on your estimate
    of the slope

34
Leverage points
35
Deepest regression line
  • One of the more recent developments, and may be
    of practical use as it is researched further
  • It is really more about linear fit (i.e. matching
    parameters to data) as opposed to focus on the
    observations/residuals themselves
  • Depth is the number of observations that would
    need to be removed to make the data nonfit
  • Appears to have a breakdown point of about 1/3
    regardless of the number of predictors

36
Summary
  • In single predictor situations, alternatives are
    available that perform well in ideal situations,
    and much better than the LS approach in others
  • Theil-Sen in particular
  • While we have kept to the single predictor, this
    will typically never be our research situation in
    using regression analysis
  • These methods can also be generalized to the
    multiple predictor setting, but their breakdown
    point (i.e. resistance advantage) decreases as
    more predictors enter into the equation

37
Summary
  • Again we call on the Tukey suggestion
  • just which robust/resistant methods you use is
    not important what is important is that you use
    some. It is perfectly proper to use both
    classical and robust/resistant methods routinely,
    and only worry when they differ enough to matter.
    But when they differ, you should think hard.
  • A general approach
  • Check for linearity
  • Perhaps using a smoother
  • If ok there, then use an estimator with a
    breakdown point of about .2-.3, and compare with
    LS output
  • If notable differences between LS and robust
    exist, figure out why and determine which is more
    appropriate
  • If assumptions are tenable and little difference
    between LS and robust exists, feel comfortable
    going with the LS output
Write a Comment
User Comments (0)
About PowerShow.com