Simple Linear Regression

About This Presentation

Title:

Simple Linear Regression

Description:

Simple Linear Regression PPT based on Dr Chuanhua Yu and Wikipedia T test Table Another Test Earlier in this section you saw how to perform a t-test to compare a ... – PowerPoint PPT presentation

Number of Views:286

Avg rating:3.0/5.0

Slides: 54

Provided by: Chuan6

Learn more at: https://www3.cs.stonybrook.edu

Category:

more less

Transcript and Presenter's Notes

Title: Simple Linear Regression

1
Principles of Biostatistics Simple Linear
Regression
PPT based on Dr Chuanhua Yu and Wikipedia
2
Terminology
Moments, Skewness, Kurtosis Analysis of
variance ANOVA Response (dependent)
variable Explanatory (independent)
variable Linear regression model Method
of least squares Normal equation sum of squares,
Error SSE sum of
squares, Regression
SSR sum of squares, Total SST Coefficient of
Determination
R2 F-value P-value, t-test, F-test,
p-test Homoscedasticity heteroscedasticity

3
Contents

18.0 Normal distribution and terms
18.1 An Example
18.2 The Simple Linear Regression Model
18.3 Estimation The Method of Least Squares
18.4 Error Variance and the Standard Errors of
Regression Estimators
18.5 Confidence Intervals for the Regression
Parameters
18.6 Hypothesis Tests about the Regression
Relationship
18.7 How Good is the Regression?
18.8 Analysis of Variance Table and an F Test
of the Regression Model
18.9 Residual Analysis
18.10 Prediction Interval and Confidence Interval

4
Normal Distribution
The continuous probability density function of
the normal distribution is the Gaussian
function where s gt 0 is the standard
deviation, the real parameter µ is the expected
value, and is the density function of the
"standard" normal distribution i.e., the normal
distribution with µ 0 and s 1.
5
Normal Distribution
6
Moment About The Mean
The kth moment about the mean (or kth central
moment) of a real-valued random variable X is the
quantity µk E(X - EX) k, where E is the
expectation operator. For a continuous
uni-variate probability distribution with
probability density function f(x), the moment
about the mean µ is The first moment about
zero, if it exists, is the expectation of X, i.e.
the mean of the probability distribution of X,
designated µ. In higher orders, the central
moments are more interesting than the moments
about zero. µ1 is 0. µ2 is the variance, the
positive square root of which is the standard
deviation, s. µ3/s3 is Skewness, often ?. µ3/s4
-3 is Kurtosis.
7
Skewness

Consider the distribution in the figure. The bars
on the right side of the distribution taper
differently than the bars on the left side. These
tapering sides are called tails (or snakes), and
they provide a visual means for determining which
of the two kinds of skewness a distribution has
negative skew The left tail is longer the mass
of the distribution is concentrated on the right
of the figure. The distribution is said to be
left-skewed.
positive skew The right tail is longer the mass
of the distribution is concentrated on the left
of the figure. The distribution is said to be
right-skewed.

8
Skewness
Skewness, the third standardized moment, is
written as ?1 and defined as where µ3 is the
third moment about the mean and s is the standard
deviation. For a sample of n values the sample
skewness is
9
Kurtosis
Kurtosis is the degree of peakedness of a
distribution. A normal distribution is a
mesokurtic distribution. A pure leptokurtic
distribution has a higher peak than the normal
distribution and has heavier tails. A pure
platykurtic distribution has a lower peak than a
normal distribution and lighter tails.
10
Kurtosis
The fourth standardized moment is defined
as where µ4 is the fourth moment about the mean
and s is the standard deviation. For a sample of
n values the sample kurtosis is
11
18.1 An example
Table18.1 IL-6 levels in brain and serum (pg/ml) of 10 patients with subarachnoid hemorrhage Table18.1 IL-6 levels in brain and serum (pg/ml) of 10 patients with subarachnoid hemorrhage Table18.1 IL-6 levels in brain and serum (pg/ml) of 10 patients with subarachnoid hemorrhage
Patient i Serum IL-6 (pg/ml) x Brain IL-6 (pg/ml) y
1 22.4 134.0
2 51.6 167.0
3 58.1 132.3
4 25.1 80.2
5 65.9 100.0
6 79.7 139.1
7 75.3 187.2
8 32.4 97.2
9 96.4 192.3
10 85.7 199.4
12
Scatterplot
This scatterplot locates pairs of observations of
serum IL-6 on the x-axis and brain IL-6 on the
y-axis. We notice that Larger (smaller)
values of brain IL-6 tend to be associated with
larger (smaller) values of serum IL-6 .
The scatter of points tends to be distributed
around a positively sloped straight line.
The pairs of values of serum IL-6 and brain IL-6
are not located exactly on a straight line. The
scatter plot reveals a more or less strong
tendency rather than a precise linear
relationship. The line represents the nature of
the relationship on average.
13
Examples of Other Scatterplots
14
Model Building
15
18.2 The Simple Linear Regression Model
The population simple linear regression
model y a b x ?
or myxab x Nonrandom or
Random
Systematic Component
Component
Where y is the dependent (response) variable,
the variable we wish to explain or predict x is
the independent (explanatory) variable, also
called the predictor variable and ? is the error
term, the only random component in the model, and
thus, the only source of randomness in y.
myx is the mean of y when x is specified, all
called the conditional mean of Y. a is the
intercept of the systematic component of the
regression relationship. ? is the slope of the
systematic component.
16
Picturing the Simple Linear Regression Model
The simple linear regression model posits an
exact linear relationship between the expected or
average value of Y, the dependent variable Y, and
X, the independent or predictor variable
myx ab x Actual observed values of Y
(y) differ from the expected value (myx ) by an
unexplained or random error(e) y
myx ? ab x ?
Regression Plot
Y
myxa ? x

y

? Slope
Error ?
1

a Intercept
X
0
x
17
Assumptions of the Simple Linear Regression Model

The relationship between X and Y is a
straight-Line (linear) relationship.
The values of the independent variable X are
assumed fixed (not random) the only randomness
in the values of Y comes from the error term ?.
The errors ? are uncorrelated (i.e. Independent)
in successive observations. The errors ? are
Normally distributed with mean 0 and variance
?2(Equal variance). That is ? N(0,?2)

LINE assumptions of the Simple Linear Regression
Model
Y
myxa ? x
y
Identical normal distributions of errors, all
centered on the regression line.
N(myx, syx2)
X
x
18
18.3 Estimation The Method of Least Squares
Estimation of a simple linear regression
relationship involves finding estimated or
predicted values of the intercept and slope of
the linear regression line. The estimated
regression equation
y a bx e where a estimates
the intercept of the population regression line,
a b estimates the slope of the
population regression line, ? and e stands
for the observed errors ------- the residuals
from fitting the estimated regression line a bx
to a set of n points.
The estima
ted regres
sion line

y

a
b
x

where
(y
-
hat) is th
e value of
Y lying o
n the fitt
ed regress
ion line f
or a given
value of X
.
19
Fitting a Regression Line
Y
Y
Data
Three errors from the least squares regression
line
X
X
Y
e
Errors from the least squares regression line are
minimized
Three errors from a fitted line
X
X
20
Errors in Regression
Y
.

yi
X
xi
21
Least Squares Regression
The sum of
squared e
rrors in r
egression
is
n
n
å
å

-
SSE

e

(y
2
)
y
2
SSE sum of squared errors
i
i
i
i

1
i

1
The
is that which
the SSE
least squa
res regres
sion line
minimizes
with respe
ct to the
estimates
.
a
and b
a
SSE
Parabola function
Least squares a

b
Least squares b
22
Normal Equation
S is minimized when its gradient with respect to
each parameter is equal to zero. The elements of
the gradient vector are the partial derivatives
of S with respect to the parameters Since
, the
derivatives are Substitution of the
expressions for the residuals and the derivatives
into the gradient equations gives Upon
rearrangement, the normal equations are
obtained. The normal equations are written in
matrix notation as The solution of the
normal equations yields the vector of the
optimal parameter values.
23
Normal Equation
24
Sums of Squares, Cross Products, and Least
Squares Estimators
25
Example 18-1
26
New Normal Distributions

Since each coefficient estimator is a linear
combination of Y (normal random variables), each
bi (i 0,1, ..., k) is normally distributed.
Notation
in 2D special case,
when j0, in 2D special case

27
Total Variance and Error Variance
28
18.4 Error Variance and the Standard Errors of
Regression Estimators
Y
Square and sum all regression errors to find SSE.
X
29
Standard Errors of Estimates in Regression
30
T distribution
Student's distribution arises when the population
standard deviation is unknown and has to be
estimated from the data.
31
18.5 Confidence Intervals for the Regression
Parameters
32
18.6 Hypothesis Tests about the Regression
Relationship
Constant Y
Unsystematic Variation
Nonlinear Relationship
H0b 0
H0b 0
H0b 0
A hypothes
is test fo
r the exis
tence of a
linear re
lationship
between X
and Y
b

H

0
0
b
¹

H

0
1
Test stati
stic for t
he existen
ce of a li
near relat
ionship be
tween X an
d Y

sb
b
b
where
is the le
ast
-
squares es
timate of
the regres
sion slope
and
is the s
tandard er
ror of

When the
null hypot
hesis is t
rue,
the stati
stic has a

distribu
tion with
-
degrees o
f
freedom.
2
t
n
33
T-test
A test of the null hypothesis that the means of
two normally distributed populations are equal.
Given two data sets, each characterized by its
mean, standard deviation and number of data
points, we can use some kind of t test to
determine whether the means are distinct,
provided that the underlying distributions can
be assumed to be normal. All such tests are
usually called Student's t tests
34
T-test
35
T test Table
36
18.7 How Good is the Regression?
The coefficient of determination, R2, is a
descriptive measure of the strength of the
regression relationship, a measure how well the
regression line fits the data.
R2 coefficient of determination
Y
.

Unexplained Deviation
Total Deviation

Explained Deviation
Percentage of total variation explained by the
regression.
R2
X
37
The Coefficient of Determination
Y
Y
Y
X
X
X
SST
SST
SST
S S E
SSR
SSR
SSE
R20
SSE
R20.90
R20.50
38
Another Test

Earlier in this section you saw how to perform a
t-test to compare a sample mean to an accepted
value, or to compare two sample means. In this
section, you will see how to use the F-test to
compare two variances or standard deviations.
When using the F-test, you again require a
hypothesis, but this time, it is to compare
standard deviations. That is, you will test the
null hypothesis H0 s12 s22 against an
appropriate alternate hypothesis.

39
F-test
T test is used for every single parameter. If
there are many dimensions, all parameters are
independent. Too verify the combination of all
the paramenters, we can use F-test. The
formula for an F- test in multiple-comparison
ANOVA problems is F (between-group
variability) / (within-group variability)
40
F test table
41
18.8 Analysis of Variance Table and an F Test of
the Regression Model
42
F-test T-test and R

1. In 2D case, F-test and T-test are same. It
can be proved that f t2
So in 2D case, either F or T test is enough.
This is not true for more variables.
2. F-test and R have the same purpose to measure
the whole regressions. They
are co-related as
3. F-test are better than R became it has better
metric which has distributions
for hypothesis test.
Approach
First F-test. If passed, continue.
T-test for every parameter, if some parameter can
not pass, then we can
delete it can re-evaluate the regression.
Note we can delete only one parameters(which has
least effect on regression)
at one time, until we get all the
parameters with strong effect.

43
18.9 Residual Analysis
44
Example 18-1 Using Computer-Excel
Residual Analysis. The plot shows the a curve
relationship between the residuals and the
X-values (serum IL-6).
45
Prediction Interval

samples from a normally distributed population.
The mean and standard deviation of the
population are unknown except insofar as they can
be estimated based on the sample. It is desired
to predict the next observation.
Let n be the sample size let µ and s be
respectively the unobservable mean and standard
deviation of the population. Let X1, ..., Xn, be
the sample let Xn1 be the future observation to
be predicted. Let
and

46
Prediction Interval

Then it is fairly routine to show that
It has a Student's t-distribution with n - 1
degrees of freedom. Consequently we have
where Tais the 100((1 p)/2)th percentile of
Student's t-distribution with n - 1 degrees of
freedom. Therefore the numbers
are the endpoints of a 100p prediction interval
for Xn 1.

47
18.10 Prediction Interval and Confidence Interval

Point Prediction
A single-valued estimate of Y for a given value
of X obtained by inserting the value of X in the
estimated regression equation.
Prediction Interval
For a value of Y given a value of X
Variation in regression line estimate
Variation of points around regression line
For confidence interval of an average value of Y
given a value of X
Variation in regression line estimate

48
confidence interval of an average value of Y
given a value of X
49
Confidence Interval for the Average Value of Y
50
Prediction Interval For a value of Y given a
value of X
51
Prediction Interval for a Value of Y
52
Confidence Interval for the Average Value of Y
and Prediction Interval for the Individual Value
of Y
53
Summary
1. Regression analysis is applied for prediction
while control effect of independent variable
X. 2. The principle of least squares in solution
of regression parameters is to minimize the
residual sum of squares. 3. The coefficient of
determination, R2, is a descriptive measure of
the strength of the regression relationship. 4.
There are two confidence bands one for mean
predictions and the other for individual
prediction values 5. Residual analysis is used
to check goodness of fit for models

Write a Comment

User Comments (0)