Title: Statistical Regression and Correlation
1Statistical Regression and Correlation
- Download Presentation Source
2Introduction
We are given a list of observations
And we are asked to draw some type of conclusion
fromthese observations. How are we going to do
that?
Lets see what the data looks like by plotting it
with ascatter plot first.
3Scatter Diagram
4The Problem Our Conclusions
My Line
Your Line
5Scatter Diagram and Best Line
Centroid (mx,my)
(xi,yi)
ei
(x1,y1)
6Linear Assumption
Now lets assume the relationship is linear. In
this case wealready know one point on the line
the mean
The trick Can we find a general a and b for the
line? Theanswer is YES (in some sense). In
fact, we are looking forestimators of a and b
(how do we estimate a and b from thedata
given?). The problem then is to find the
relation
7Method of Least Squares
But if there is error in the relationship then we
have
Where ei is the residual error at each
observation. Now weactually have n such
observations, so that
8Method of Least Squares (2)
The power in the residuals is
But P is also
We want to minimize P (the residual error power)
with
9Least Squares Error Minimization
Minimization of P occurs when
And differentiating, we find the conditions are
(1)
(2)
10Least Squares Error Minimization (2)
Rewrite (1) as
And since â is constant, we find
(3)
Or
11Least Squares Error Minimization (3)
This result is exactly what we conjectured
before. Now weneed another point to make the
line. From (2) we have
(4)
Equations (3) and (4) are called the normal
equations. Theymust be solved simultaneously to
find . Solving, wefind
and
(5)
12Least Squares Error Minimization (4)
And noting that
since
We find
(6)
13Least Squares Error Minimization (5)
Equations (5) and (6) are most general and
computationallycumbersome. By recognizing that
(mx,my) is the point in the middle, we can
perform an axis transformation by centering this
centroidal point at the origin. The new
coordinates are
Also we note that now
14Least Squares Error Minimization (6)
So from (5) and (6) we have
(7)
Which is equivalent to writing
(8)
15Example
We will use the method of least squares in MATLAB
in orderto solve the equation of a line with
random errors. Given avector of observations
x0110 y -2.1129 4.8303 2.3898
7.0575 8.4386,... 8.1562 7.6587
13.8816 13.9787 19.2289 21.0155 plot(x,y,'
o') hold plot(2x) this was the ideal curve
y2x lspolyfit(x,y,2) this fits with least
squares bls(1) als(2) plot(b.xa)
16Example (continued)
True Line y2x (red)
Least Squares Estimate (blue)
17Correlation
Correlation measures the degree of relationship
between theindependent and dependent variables
Unexplained deviation
y
(xi,yi)
yi
(yi-my)
my
Total deviation
Explained deviation
mx
x
xi
18Coefficient of Determination
Write the total deviation as
(9)
Square and sum both sides
(10)
Note, the cross term (from the result of normal
equations)
19Coefficient of Determination (2)
The ratio of explained variation to total
variation, whichexplains how well the regression
line fits the observeddata is now written as
(11)
Note, this quantity lies between 0 and 1. If r2
1, then allpoints lie exactly on the regression
line. If r2 0, then theregression line does
not explain the data at all (there isno
relationship that can be drawn in this case).
20Correlation Coefficient
Recall the slope of the least squares best fit
line is
(6)
If this quantity is 0, then the line is
horizontal. By swappingy for x in the
denominator, the slope measures regression ofx
on y. If x is not a function of y, then the
quantity
(12)
21Correlation Coefficient (2)
But the slope measured as y vertical x horizontal
is
(13)
If no correlation exists between the two
variables beingstudied, the products of (11) and
(12) is zero
(14)
22Correlation Coefficient (3)
For perfect correlation, the regression on x and
the regressionon y line up the two lines are
equal so that
(15)
Or we may equally write (for perfect correlation)
(16)
23Correlation Coefficient (4)
The correlation coefficient is the square root of
(16)
(17)
Which can also be written in terms of centered
points, X, Y as
(18)
24Correlation Coefficient (5)
The correlation coefficient can also be derived
from thecoefficient of determination as
(19)
Note, the correlation coefficient must lie in the
range
But in practice realistically 0ltrlt1.
25Correlation Coefficient
Recall the slope of the least squares best fit
line is
(6)
If this quantity is 0, then the line is
horizontal. By swappingy for x in the
denominator, the slope measures regression ofx
on y. If x is not a function of y, then the
quantity