Correlation and Regression Analysis - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Correlation and Regression Analysis

Description:

Correlation and Regression Analysis An Application. Dr. Jerrell T. Stracener, SAE Fellow. Leadership in Engineering. EMIS 7370/5370 STAT 5340 : PROBABILITY AND ... – PowerPoint PPT presentation

Number of Views:844
Avg rating:3.0/5.0
Slides: 32
Provided by: smu143
Category:

less

Transcript and Presenter's Notes

Title: Correlation and Regression Analysis


1
Systems Engineering Program
Department of Engineering Management, Information
and Systems
EMIS 7370/5370 STAT 5340 PROBABILITY AND
STATISTICS FOR SCIENTISTS AND ENGINEERS
Correlation and Regression Analysis An
Application
Dr. Jerrell T. Stracener, SAE Fellow
Leadership in Engineering
2
Montgomery, Peck, and Vining (2001) present data
concerning the performance of the 28 National
Football league teams in 1976. It is suspected
that the number of games won(y) is related to the
number of yards gained rushing by an opponent(x).
The data are shown in the following table
3
Team Games Won (y) Yards Rushing by Opponent (x) Team Games Won (y) Yards Rushing by Opponent (x)
Washington 10 2205 Detroit 6 1901
Minnesota 11 2096 Green Bay 5 2288
New England 11 1847 Houston 5 2072
Oakland 13 1903 Kansas City 5 2861
Pittsburgh 10 1457 Miami 6 2411
Baltimore 11 1848 New Orleans 4 2289
Los Angeles 10 1564 New york Giants 3 2203
Dallas 11 1821 New York Jets 3 2592
Atlanta 4 2577 Philadelphia 4 2053
Buffalo 2 2476 St. Louis 10 1979
Chicago 7 1984 San Diego 6 2048
Cincinnati 10 1917 San Francisco 8 1786
Cleveland 9 1761 Seattle 2 2876
Denver 9 1709 Tampa Bay 0 2560
4
Correlation Analysis
  • Statistical analysis used to obtain a
    quantitative
  • measure of the strength of the relationship
    between
  • a dependent variable and one or more independent
  • variables

5
Scatter Plot
6
Sample correlation coefficient
Notes -1 ? r ? 1 Rr2 ? 100
coefficient of determination
7
Rr2 ? 100 0.5447
8
Correlation
To test for no linear association between x
y, calculate Where r is the sample
correlation coefficient and n is the sample size.

9
Correlation
Conclude no linear association if then
treat y1, y2, , yn as a random sample
10
Correlation
Take a0.05 and check from the T-table, we get
Since t-5.5766 lt -2.0555, we conclude that
there is linear association between x and y and
proceed with regression analysis
11
Linear Regression Model
Simple linear regression model
where Y is the response (or dependent)
variable ?0 and ?1 are the unknown parameters ?
N(0,?) and data (x1, y1), (x2, y2), ..., (xn,
yn)
12
Least squares estimates of ?0 and ?1
13
estimates of ?1
14
estimates of ?0
15
Least squares regression equation
16
Regression Fitted Line Plot
17
Point estimate of ?2
18
Interval Estimates for y intercept (?0)
19
Interval Estimates for y intercept (?0)
Take ?0.05, then 95 confidence interval for ?0
is
20
Interval Estimates for y intercept (?0)
Apply to the equation and we get the lower
and upper bound for
21
Interval Estimates for slope (?1)
(1 - ?)?100 confidence interval for ?1 is
where and where
22
Interval Estimates for slope (?1)
23
Confidence interval for conditional mean of Y,
given x2205
Given x equal to 2205, we can calculate the
confidence interval of conditional mean of Y
24
Confidence interval for conditional mean of Y,
given x2205
and
25
(No Transcript)
26
Prediction interval for a single future value of
Y, given x
and
27
Prediction interval for a single future value of
Y, given x2000
Given x 2000,
28
Prediction interval for a single future value of
Y, given x2000
and
29
(No Transcript)
30
Excel Calculation
X Y XY X2 Y2 Y (Y-Y)2 (x-xbar)2
2205 10 22050 4862025 100 6.297905 13.70551 8997.878
2096 11 23056 4393216 121 7.063641 15.49492 200.0204
1847 11 20317 3411409 121 8.812891 4.783447 69244.16
1903 13 24739 3621409 169 8.419485 20.98112 42908.16
1457 10 14570 2122849 100 11.55268 2.410815 426595.6
1848 11 20328 3415104 121 8.805866 4.814226 68718.88
1564 10 15640 2446096 100 10.80099 0.641591 298272
1821 11 20031 3316041 121 8.995543 4.017847 83603.59
2577 4 10308 6640929 16 3.684567 0.099498 217955.6
2476 2 4952 6130576 4 4.394103 5.731727 133851.4
1984 7 13888 3936256 49 7.850452 0.723268 15912.02
1917 10 19170 3674889 100 8.321134 2.818592 37304.16
1761 9 15849 3101121 81 9.417049 0.17393 121900.7
1709 9 15381 2920681 81 9.782355 0.612079 160915.6
1901 6 11406 3613801 36 8.433535 5.922094 43740.73
2288 5 11440 5234944 25 5.714821 0.51097 31633.16
2072 5 10360 4293184 25 7.232243 4.982909 1454.878
2861 5 14305 8185321 25 1.689439 10.95981 563786.4
2411 6 14466 5812921 36 4.850734 1.320812 90515.02
2289 4 9156 5239521 16 5.707796 2.916568 31989.88
2203 3 6609 4853209 9 6.311955 10.96905 8622.449
2592 3 7776 6718464 9 3.579191 0.335462 232186.3
2053 4 8212 4214809 16 7.36572 11.32807 3265.306
1979 10 19790 3916441 100 7.885577 4.470783 17198.45
2048 6 12288 4194304 36 7.400846 1.962368 3861.735
1786 8 14288 3189796 64 9.241422 1.541128 105068.6
2876 2 5752 8271376 4 1.584062 0.173004 586537.2
2560 0 0 6553600 0 3.803994 14.47037 202371.4
SUM 59084 195 386127 128284292 1685 195 148.872 3608611
x-bar 2110.1429
-709824 34.54949
101041120 9155 961785.6 -0.738027304 lt-r Sb0 14.0723
2.696233
b1 -0.007025 5.725845085 lt-S2 b0l 16.2448
b0 21.788251 2.392873813 lt--S b0u 27.33171

Sb1 0.00126 0.00126
Sb1l -0.00961 -0.00961
Y(2205)-gt 6.2979048 Sb1u -0.00444 -0.00444
mu-l 1.291074258
mu-u 11.30473529
Y(2000)-gt 7.7380503 y-l 0.718628866
y-u 14.7574718
31
Excel Regression Analysis Output
SUMMARY OUTPUT

Regression Statistics Regression Statistics
Multiple R 0.738027
R Square 0.544684
Adjusted R Square 0.527172
Standard Error 2.392874
Observations 28

ANOVA
  df SS MS F Significance F
Regression 1 178.0923 178.0923 31.10324 7.381E-06
Residual 26 148.872 5.725845
Total 27 326.9643      

  Coefficients Standard Error t Stat P-value Lower 95 Upper 95 Lower 95.0 Upper 95.0
Intercept 21.78825 2.696233 8.080996 1.46E-08 16.246064 27.3304377 16.2460641 27.33044
X Variable 1 -0.00703 0.00126 -5.57703 7.38E-06 -0.009614 -0.0044359 -0.0096143 -0.00444



RESIDUAL OUTPUT

Observation Predicted Y Residuals
1 6.297905 3.702095
2 7.063641 3.936359
3 8.812891 2.187109
4 8.419485 4.580515
5 11.55268 -1.55268
6 8.805866 2.194134
7 10.80099 -0.80099
8 8.995543 2.004457
9 3.684567 0.315433
10 4.394103 -2.3941
11 7.850452 -0.85045
12 8.321134 1.678866
13 9.417049 -0.41705
14 9.782355 -0.78235
15 8.433535 -2.43354
16 5.714821 -0.71482
17 7.232243 -2.23224
18 1.689439 3.310561
19 4.850734 1.149266
20 5.707796 -1.7078
21 6.311955 -3.31195
22 3.579191 -0.57919
23 7.36572 -3.36572
24 7.885577 2.114423
25 7.400846 -1.40085
26 9.241422 -1.24142
27 1.584062 0.415938
28 3.803994 -3.80399
Write a Comment
User Comments (0)
About PowerShow.com