Module 5 Regression, Chi Square - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Module 5 Regression, Chi Square

Description:

Slope or the change in y for every unit change in x ... Hummer H2. Jeep Wrangler. BA 240 -- Winnie Li. 28. Simple Vs. Multiple. R-square adjusted R-square ... – PowerPoint PPT presentation

Number of Views:313
Avg rating:3.0/5.0
Slides: 42
Provided by: llum9
Category:

less

Transcript and Presenter's Notes

Title: Module 5 Regression, Chi Square


1
Module 5Regression, Chi Square
  • Chapter 11, 12, 13

2
Simple Linear Regression
Y-intercept or the value of y when x 0.
Slope or the change in y for every unit change in
x
Y variable plotted on vertical axis.
X variable plotted on horizontal axis.
3
(No Transcript)
4
Linear Regression
The first step is to do a correlation
analysis Between x and y to determine the
strength of relationship.
5
Review Linear Regression
Correlation tells the strength of relationship
between x and y. Correlation or R can be any
value from -1 to 1. Relationship may not be
linear.
6
Correlation or R
7
Review - Linear Regression
If the correlation is significant then create a
regression analysis.
A regression creates a model of the relationship
between x and y. It fits a line to the scatter
plot by minimizing the distance between y and the
line or
8
Review - Linear Regression
The slope is calculated as
Tells you the change in the dependent variable
for every unit change in the independent
variable. Alternate calculation
Intercept Formula
9
INTERCEPT
SLOPE
10
Review Linear Regression
The coefficient of determination or R-square
measures the variation explained by the best-fit
line as a percent of the total variation Altern
ate calculation
Average y
11
R Square
SSE
12
Behind the numbers
13
Excel Output
14
R Square
SSE
15
Chapter 11 - Linear Regression
  • Can you use a linear regression model?
  • Assumptions must be met
  • Mean of errors about the line is 0.
  • Variance of errors is constant for all x.
  • Distribution of errors is normal.
  • Errors are independent.
  • If these dont hold, then cannot use regression.
  • You should test the data for these assumptions.

16
Chapter 11 - Linear Regression
  • Is the model valid?
  • For simple linear regressions
  • Test to see if there is a slope, that is the
    slope does not equal 0
  • Test to see if there is a y-intercept, that is
    the y-intercept does not equal 0
  • Use hypothesis testing where the null hypothesis
    is slope 0 and y-intercept 0
  • When you reject the null, the model has a slope
    or a y-intercept
  • The conclusion is counterintuitive so check your
    logic.

17
Chapter 11 - Linear Regression
Testing the model.
Correlation or strength of linear relationship
between x and y.
Coefficient of determination or the amount of
variation explained by the regression. Is the
model a reliable predictor of y?
Test the p-value against a null hypothesis of
Intercept 0. In this example, assume alpha
.01. P-value is very large, therefore DO NOT
reject the null. The intercept is 0. Note the
line is still valid.
Y-intercept
Test the p-value against a null hypothesis of
Slope 0. If the slope 0, there cannot be a
line. In this example, assume alpha .01.
p-value is very small, therefore, reject the
null. There is a slope other than 0. The
regression is valid.
Slope
18
Excel Demo of Prediction Interval
19
Using the model to estimate or predict Confidence
Interval for the mean value of y at x
Prediction interval for an individual new value
of y at x
Degrees of freedom n-2
20
Excel Prediction Interval
21
Chapter 11 - Linear Regression
Can use the model to predict point estimate of
what y will be for a given x. Qualifications Assu
mptions of error must hold. Dont assume that x
causes y. Dont predict outside of the range of
data. Other qualifications hold for multiple
regression.
22
Chapter 13 - Chi Square
Observed frequency
Expected frequency Row total X Column total Grand
Total
Null hypothesis There is no relationship between
the categories. Alternate hypothesis There is a
relationship. Use alpha and degrees of freedom to
look up critical statistic. Calculate test
statistic and compare to critical
statistic. Reject or do not reject null.
Degrees of freedom (Number of rows -1)(Number of
columns -1) or Number of columns - 1 (When there
is only one row)
23
Chapter 13 - Chi Square
Does the type of car driven have any relationship
with whether the driver will run a stop sign?
Observed values.
Expected values if there was no relationship
between type of car and behavior. (Row total x
Column total)/Grand total
24
Chapter 13 - Chi Square
  • Chi square measures the deviation of observed
    from expected. The bigger the deviation, the more
    likely there is a relationship.
  • Test Chi square 12.431 alpha .05
  • Table p. 898
  • Critical chi square 9.488

25
Chapter 12
  • Multiple Regression

Courtesy Winnie Li
26
Objectives
  • Review Simple Linear Regression Model
  • Multiple Linear Regression Model
  • Model Selection Criteria
  • Model Significance
  • Parameters Significance
  • Adjusted R-Square
  • Final Model Discussion and Possible Improvements
  • Residual Analysis and Assumption Check
  • Outlier Assessment
  • Data Transformation

27
What factors affect the price of a car?
  • Location Seattle
  • Year 2004
  • Total of 428 New Vehicle Models
  • Retail Price Vs.
  • Horse Power
  • City MPG
  • Weight

28
Review Simple Linear Regression Price Vs.
Horsepower
  • Parameters Coefficient SE t-test P-value
    Significant?
  • --------------------------------------------------
    --------------------------------------------------
    --------------------------------------------------
    --------------------------------------------------
    --------------------------------------------------
  • y-intercept -16639 1750.386 -9.506
    1.710-19 Yes
  • Slope (HP) 229.752 7.751 29.641
    5.410-104 Yes

29
Review Simple Linear Regression Price Vs. City
MPG
Parameters Coefficient SE t-test P-value
Significant? ---------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
----------------- y-intercept 67129.04
3371.483 19.911 3.3210-62 Yes Slope (City
mpg) -1711.69 162.359 -10.543
3.7210-23 Yes
30
Review Simple Linear Regression Price Vs. Weight
Parameters Coefficient SE t-test P-value
Significant? ---------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
----------------- y-intercept -9572.44
4263.331 -2.245 0.025 Yes Slope (Wt)
11.857 1.171 10.126 1.1610-21 Yes
31
Simple Vs. Multiple
  • Model
  • R-square ? adjusted R-square
  • Adjusts for the number of explanatory terms in a
    model
  • Increases only if the new predictor improves the
    model
  • The Adjusted R2 can be negative, and will always
    be less than or equal to R2 (Adjusted R2 R2)

32
Possible Models
Model Selection Criteria
  • Model Significance
  • Parameters Significance
  • Adjusted R-square

F-test t-test
33
Outputs 1 Model Significance
  • p-values of F-tests for each Model.
  • The Model is Significant if p-value lt 0.05
  • The Model is NOT Significant if p-value gt 0.05

34
Outputs 2 Parameter Significance
  • p-values of t-tests for each parameters
  • The Parameter is Significant if p-value lt 0.05
  • The Parameter is NOT Significant if p-value gt 0.05

35
Outputs 3 Adjusted R-Square
  • R-square and Adjusted R-square for each Model.
  • Lets Recall R-square for Simple Linear
    Regression Models
  • With Variable Horsepower 0.5818
  • With Variable City MPG 0.2133
  • With Variable Weight 0.2001

36
Which one is the BEST Model?
37
Discussion 1 Residual Analysis
  • Mean of Zero

Evenly Around 0
  • Variance Constancy

NO Trend
  • Independence

NO Pattern
  • Normality

Linear Line
Back
Ideal Residual Plot Evenly Spread Around 0.
38
Discussion 2 Outlier Assessment
Identification Rules Standardized Residual
3 Extreme Outlier Standardized Residual
2 Normal Outlier
After Outlier Removal, adjusted r-square improved
nearly 11.5, From 0.695 (Original Model) to
0.785 (Model after Outlier Removal)
39
Discussion 3 Data Transformation
Optional
Check for normality
Natural Log Transformation
  • After doing Natural Log Transformation on
    Variable Price, adjusted r-square improved
    another 7.6, From 0.785 (Original after Outlier
    Removal) to 0.845 (Model after data
    transformation)

40
NEW Residual Plot
  • Mean of Zero

Satisfied
  • Variance Constancy

Satisfied
  • Independence

Satisfied
  • Normality

Satisfied
Compare
41
Recap
  • Multiple Regression Model
  • Choose a BETTER model
  • Model is Significant
  • ALL Parameters are Significant
  • Larger Adjusted R-Square
  • Validate the Model and More Improvement
  • Assumption Check Residual Analysis
  • Outlier Assessment Delete Outliers
  • Ensure Normality Possible Data Transformation
Write a Comment
User Comments (0)
About PowerShow.com