Title: Module 5 Regression, Chi Square
1Module 5Regression, Chi Square
2Simple Linear Regression
Y-intercept or the value of y when x 0.
Slope or the change in y for every unit change in
x
Y variable plotted on vertical axis.
X variable plotted on horizontal axis.
3(No Transcript)
4 Linear Regression
The first step is to do a correlation
analysis Between x and y to determine the
strength of relationship.
5Review Linear Regression
Correlation tells the strength of relationship
between x and y. Correlation or R can be any
value from -1 to 1. Relationship may not be
linear.
6Correlation or R
7Review - Linear Regression
If the correlation is significant then create a
regression analysis.
A regression creates a model of the relationship
between x and y. It fits a line to the scatter
plot by minimizing the distance between y and the
line or
8Review - Linear Regression
The slope is calculated as
Tells you the change in the dependent variable
for every unit change in the independent
variable. Alternate calculation
Intercept Formula
9INTERCEPT
SLOPE
10Review Linear Regression
The coefficient of determination or R-square
measures the variation explained by the best-fit
line as a percent of the total variation Altern
ate calculation
Average y
11R Square
SSE
12Behind the numbers
13Excel Output
14R Square
SSE
15Chapter 11 - Linear Regression
- Can you use a linear regression model?
- Assumptions must be met
- Mean of errors about the line is 0.
- Variance of errors is constant for all x.
- Distribution of errors is normal.
- Errors are independent.
- If these dont hold, then cannot use regression.
- You should test the data for these assumptions.
16Chapter 11 - Linear Regression
- Is the model valid?
- For simple linear regressions
- Test to see if there is a slope, that is the
slope does not equal 0 - Test to see if there is a y-intercept, that is
the y-intercept does not equal 0 - Use hypothesis testing where the null hypothesis
is slope 0 and y-intercept 0 - When you reject the null, the model has a slope
or a y-intercept - The conclusion is counterintuitive so check your
logic.
17Chapter 11 - Linear Regression
Testing the model.
Correlation or strength of linear relationship
between x and y.
Coefficient of determination or the amount of
variation explained by the regression. Is the
model a reliable predictor of y?
Test the p-value against a null hypothesis of
Intercept 0. In this example, assume alpha
.01. P-value is very large, therefore DO NOT
reject the null. The intercept is 0. Note the
line is still valid.
Y-intercept
Test the p-value against a null hypothesis of
Slope 0. If the slope 0, there cannot be a
line. In this example, assume alpha .01.
p-value is very small, therefore, reject the
null. There is a slope other than 0. The
regression is valid.
Slope
18Excel Demo of Prediction Interval
19Using the model to estimate or predict Confidence
Interval for the mean value of y at x
Prediction interval for an individual new value
of y at x
Degrees of freedom n-2
20Excel Prediction Interval
21Chapter 11 - Linear Regression
Can use the model to predict point estimate of
what y will be for a given x. Qualifications Assu
mptions of error must hold. Dont assume that x
causes y. Dont predict outside of the range of
data. Other qualifications hold for multiple
regression.
22Chapter 13 - Chi Square
Observed frequency
Expected frequency Row total X Column total Grand
Total
Null hypothesis There is no relationship between
the categories. Alternate hypothesis There is a
relationship. Use alpha and degrees of freedom to
look up critical statistic. Calculate test
statistic and compare to critical
statistic. Reject or do not reject null.
Degrees of freedom (Number of rows -1)(Number of
columns -1) or Number of columns - 1 (When there
is only one row)
23Chapter 13 - Chi Square
Does the type of car driven have any relationship
with whether the driver will run a stop sign?
Observed values.
Expected values if there was no relationship
between type of car and behavior. (Row total x
Column total)/Grand total
24Chapter 13 - Chi Square
- Chi square measures the deviation of observed
from expected. The bigger the deviation, the more
likely there is a relationship. - Test Chi square 12.431 alpha .05
- Table p. 898
- Critical chi square 9.488
25Chapter 12
Courtesy Winnie Li
26Objectives
- Review Simple Linear Regression Model
- Multiple Linear Regression Model
- Model Selection Criteria
- Model Significance
- Parameters Significance
- Adjusted R-Square
- Final Model Discussion and Possible Improvements
- Residual Analysis and Assumption Check
- Outlier Assessment
- Data Transformation
27What factors affect the price of a car?
- Location Seattle
- Year 2004
- Total of 428 New Vehicle Models
- Retail Price Vs.
- Horse Power
- City MPG
- Weight
28 Review Simple Linear Regression Price Vs.
Horsepower
- Parameters Coefficient SE t-test P-value
Significant? - --------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
- y-intercept -16639 1750.386 -9.506
1.710-19 Yes - Slope (HP) 229.752 7.751 29.641
5.410-104 Yes
29 Review Simple Linear Regression Price Vs. City
MPG
Parameters Coefficient SE t-test P-value
Significant? ---------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
----------------- y-intercept 67129.04
3371.483 19.911 3.3210-62 Yes Slope (City
mpg) -1711.69 162.359 -10.543
3.7210-23 Yes
30 Review Simple Linear Regression Price Vs. Weight
Parameters Coefficient SE t-test P-value
Significant? ---------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
----------------- y-intercept -9572.44
4263.331 -2.245 0.025 Yes Slope (Wt)
11.857 1.171 10.126 1.1610-21 Yes
31Simple Vs. Multiple
- R-square ? adjusted R-square
- Adjusts for the number of explanatory terms in a
model - Increases only if the new predictor improves the
model - The Adjusted R2 can be negative, and will always
be less than or equal to R2 (Adjusted R2 R2)
32Possible Models
Model Selection Criteria
- Model Significance
- Parameters Significance
- Adjusted R-square
F-test t-test
33Outputs 1 Model Significance
- p-values of F-tests for each Model.
- The Model is Significant if p-value lt 0.05
- The Model is NOT Significant if p-value gt 0.05
34Outputs 2 Parameter Significance
- p-values of t-tests for each parameters
- The Parameter is Significant if p-value lt 0.05
- The Parameter is NOT Significant if p-value gt 0.05
35Outputs 3 Adjusted R-Square
- R-square and Adjusted R-square for each Model.
- Lets Recall R-square for Simple Linear
Regression Models - With Variable Horsepower 0.5818
- With Variable City MPG 0.2133
- With Variable Weight 0.2001
36Which one is the BEST Model?
37Discussion 1 Residual Analysis
Evenly Around 0
NO Trend
NO Pattern
Linear Line
Back
Ideal Residual Plot Evenly Spread Around 0.
38Discussion 2 Outlier Assessment
Identification Rules Standardized Residual
3 Extreme Outlier Standardized Residual
2 Normal Outlier
After Outlier Removal, adjusted r-square improved
nearly 11.5, From 0.695 (Original Model) to
0.785 (Model after Outlier Removal)
39Discussion 3 Data Transformation
Optional
Check for normality
Natural Log Transformation
- After doing Natural Log Transformation on
Variable Price, adjusted r-square improved
another 7.6, From 0.785 (Original after Outlier
Removal) to 0.845 (Model after data
transformation)
40NEW Residual Plot
Satisfied
Satisfied
Satisfied
Satisfied
Compare
41Recap
- Multiple Regression Model
- Choose a BETTER model
- Model is Significant
- ALL Parameters are Significant
- Larger Adjusted R-Square
- Validate the Model and More Improvement
- Assumption Check Residual Analysis
- Outlier Assessment Delete Outliers
- Ensure Normality Possible Data Transformation