Title: Empirical Modeling : Linear Regression
1Empirical Modeling Linear Regression
2Fitting a continuous function to noisy data
- Can we find a relationship between these two
variables? - What about a linear relationship?
- Intuitively, we say that two variables are
linearly dependent if one either increases (or
decreases) the other changes as well.
3Linear dependence
- But how do we know if there is a strong linear
relationship (dependence) between the variable. - The correlation coefficient, r, between the two
variables will tell you the if the variables are
linearly dependent.
Mathematica
Excel
4Linear dependence
- But how do we know if there is a strong linear
relationship (dependence) between the variable. - The correlation coefficient, r, between the two
variables will tell you the if the variables are
linearly dependent. - If r 1 then the values line on a straight
line ( with positive slope is r 1 and negative
slope if r -1. - If r 0 there is no dependence between the
variables.
5CORRELATION DOES NOT IMPLY CAUSALITY
- When we say that x and y are highly correlated,
this means that x and y vary together - There is no implication that changes in x causes
changes in y.
6CORRELATION DOES NOT IMPLY CAUSALITY
- The following table provides information on life
expectancies for a sample of 22 countries. It
also lists the number of people per television
set in the country
The correlation coefficient is -.758
7Linear Regression
- Objectives
- Assess the significance of the predictor
variables in explaining the variability or
behavior of the response variable - Predict the values of the response variable given
the values of the predictor variables
8The Simple Linear Regression Baseline Model
- In the baseline model, there is no association
between the response variable and the predictor
variable - Knowing only the mean of the response variable is
just as good in predicting values of the response
variable as knowing the values of the predictor
variable as well. - To find the mean of the response variable
- where there are n data points.
9The Simple Linear Regression
- The relationship between the response variable
and the predictor variable can be characterized
by - Where
- Y response variable
- X predictor variable
- b0 intercept parameter
- b1 slope parameter
- e error term representing deviations of Y
from
10The Simple Linear Regression
- In order to determine whether a simple linear
regression model is better than the baseline
model, you must compare the explained variability
to the unexplained variability. - The explained variability is related to the
difference between the regression line and the
mean of the response variable - The unexplained variability is related to the
difference between the observed data values and
the regression line.
Unexplained
Total
Explained
11Assumptions for Linear Regression
- The mean of the Ys is accurately modeled by a
linear function of the Xs - The errors are independent.
12The Simple Linear Regression Model Hypothesis Test
- Null Hypothesis
- The simple linear regression model does not fit
the data better than the baseline model b10 - Alternative Hypothesis
- The simple linear regression model does fit the
data better than the base model b10 - The test will help decide if the linear term,
b1x, is significant (important) to the model
13The Simple Linear Regression Model Hypothesis Test
- H0 b10 vs Ha b10
- A p-value in the hypothesis test with tell you if
there is enough evidence to reject the null
hypothesis. - You can reject the null hypothesis with 95
confidence if plt0.025. - You can reject the null hypothesis with 99
confidence if plt0.005. - If there is not enough evidence to reject the
null hypothesis, then the linear term is not
significant to the model.
14Fitting a Line to the data
- Assume that the data points are (xi,yi)
- We would like to fit a function to the data.
- Once we determine that the data appears to be
linear we decide to find the data with
b0b1x - Therefore when we plug in xi into the equation of
the line we will get out estimates for the data
15Fitting a Line to the data Using Least Squares
Criterion
- We would like to pick the line in a way such that
the error, or residual, between the actual yi and
the estimated is minimized - Least squares determines b0 and b1 minimizing
16Fitting a Line to the data Using Least Squares
Criterion
- An important question is how well a model fits
your observed data. The R2 statistic measures
the amount of variation in the data this is
explained by a model. - Excel
- The data on the right was fitting with the right
line (the equation of the line is above the
graph) - The R2 value for this model is approximately .985
17Does Increased Study Time Increase GPA?
- You are curious about how much you need to study
per week to keep your GPA up so you collect
information about fellow students average amount
of study time per week as well as GPA - Goal To create a model which will predict the
GPA of a student given the amount of time they
are willing to study per week.
Excel
18Multiple Linear Regression with 2 predictor
variables
- Consider the two-variable model
- where
- Y response (dependent) variable
- X1 and X2 predictor (independent) variables.
- e error term
- b0, b1, and b2 unknown parameters.
19Multiple Linear Regression with 2 predictor
variables
- If there is no relationship among Y and X1 and
X2, the model looks like a horizontal plane
(Yb2, X10, X20) -
20Multiple Linear Regression with 2 predictor
variables
- Consider the two-variable model
- where
- Y response (dependent) variable
- X1 and X2 predictor (independent) variables.
- e error term
- b0, b1, and b2 unknown parameters.
21The Multiple Linear Regression Model Hypothesis
Test
- Null Hypothesis
- The multiple linear regression model does not fit
the data better than the baseline model b1 b2
0 - Alternative Hypothesis
- The multiple linear regression model does fit the
data better than the base model - b1 and b2 are not both 0.
- The test will help decide if the linear terms,
b1x1 and b2x2, are significant (important) to the
model
22What Effects GPA?
- You are curious about what variables might effect
your GPA so you collect information about fellow
students average amount of study time per week,
time spent in the teachers office per week,
number of alcoholic drinks per week, and GPA - Goal To create a model which will predict the
GPA of a student.
Excel
23Multiple Linear Regression In General
24The Multiple Linear Regression Model Hypothesis
Test
- Null Hypothesis
- The multiple linear regression model does not fit
the data better than the baseline model - b1 b2 bk0
- Alternative Hypothesis
- The multiple linear regression model does fit the
data better than the base model - Not all bis are 0.