Title: Probability Distribution of Random Error
1Probability Distribution of Random Error
2Regression Modeling Steps
- 1. Hypothesize Deterministic Component
- 2. Estimate Unknown Model Parameters
- 3. Specify Probability Distribution of Random
Error Term - Estimate Standard Deviation of Error
- 4. Evaluate Model
- 5. Use Model for Prediction Estimation
3Linear Regression Assumptions
- Assumptions of errors ?1, ..., ?n
- - Gauss-Markov condition
- Independent errors
- Mean of probability distribution of errors is 0
- Errors have constant variance s2, for which an
estimator is S2 - Probability distribution of error is normal
- Potential violation of G-M condition.
4Error Probability Distribution
5Random Error Variation
6Random Error Variation
- 1. Variation of Actual Y from Predicted Y
7Random Error Variation
- 1. Variation of Actual Y from Predicted Y
- 2. Measured by Standard Error of Regression
Model - Sample Standard Deviation of ?, s
8Random Error Variation
- 1. Variation of Actual Y from Predicted Y
- 2. Measured by Standard Error of Regression Model
- Sample Standard Deviation of ?, s
- 3. Affects Several Factors
- Parameter Significance
- Prediction Accuracy
9Evaluating the Model
10Regression Modeling Steps
- 1. Hypothesize Deterministic Component
- 2. Estimate Unknown Model Parameters
- 3. Specify Probability Distribution of Random
- Error Term
- Estimate Standard Deviation of Error
- 4. Evaluate Model
- 5. Use Model for Prediction Estimation
11Test of Slope Coefficient
- 1. Shows If There Is a Linear Relationship
Between X Y - 2. Involves Population Slope ?1
- 3. Hypotheses
- H0 ?1 0 (No Linear Relationship)
- Ha ?1 ? 0 (Linear Relationship)
- 4. Theoretical basis of the test statistic is the
sampling distribution of slope
12Sampling Distribution of Sample Slopes
13Sampling Distribution of Sample Slopes
14Sampling Distribution of Sample Slopes
- All Possible Sample Slopes
- Sample 1 2.5
- Sample 2 1.6
- Sample 3 1.8
- Sample 4 2.1 Very
large number of sample slopes
15Sampling Distribution of Sample Slopes
- All Possible Sample Slopes
- Sample 1 2.5
- Sample 2 1.6
- Sample 3 1.8
- Sample 4 2.1 large
number of sample slopes
Sampling Distribution
S
?1
?1
16Slope Coefficient Test Statistic
17Test of Slope Coefficient Rejection Rule
- Reject H0 in favor of Ha if t falls in colored
area - Reject H0 for Ha if P-value P(Tgtt) lt a
Reject H
Reject H
0
0
a/2
a/2
Tt(n-2)
0
t1-a/2, (n-2)
-t1-a/2, (n-2)
18Test of Slope Coefficient Example
- Reconsider the Obstetrics example with the
following data - Estriol (mg/24h) B.w. (g/1000)
- 1 1 2 1 3 2 4 2 5 4
- Is the Linear Relationship betweenEstriol
Birthweight significant at .05 level?
19Solution Table For ßs
20Solution Table for SSE
21Test of Slope Parameter Solution
- H0 ?1 0
- Ha ?1 ? 0
- ? ? .05
- df ? 5 - 2 3
- Critical Value(s)
Test Statistic
22Test StatisticSolution
From Table
23Test of Slope Parameter
- H0 ?1 0
- Ha ?1 ? 0
- ? ? .05
- df ? 5 - 2 3
- Critical Value(s)
Test Statistic Decision Conclusion
Reject at ? .05
There is evidence of a linear relationship
24Test of Slope ParameterComputer Output
- Parameter Estimates
- Parameter
Standard - Variable DF Estimate
Error t Value Pr gt t - Intercept 1 -0.10000
0.63509 -0.16 0.8849 - Estriol 1 0.70000
0.19149 3.66 0.0354 -
t ?k / S?
?k
S?
k
k
P-Value
25Measures of Variation in Regression
- 1. Total Sum of Squares (SSyy)
- Measures Variation of Observed Yi Around the
Mean?Y - 2. Explained Variation (SSR)
- Variation Due to Relationship Between X Y
- 3. Unexplained Variation (SSE)
- Variation Due to Other Factors
26Variation Measures
Unexplained sum of squares (Yi -?Yi)2
Yi
Total sum of squares (Yi -?Y)2
Explained sum of squares (Yi -?Y)2
27Coefficient of Determination
- 1. Proportion of Variation Explained by
Relationship Between X Y
0 ? r2 ? 1
28Coefficient of Determination Examples
r2 1
r2 1
r2 .8
r2 0
29Coefficient of Determination Example
- Reconsider the Obstetrics example. Interpret a
coefficient of Determination of 0.8167. - Answer About 82 of the
- total variation of birthweight
- Is explained by the mothers
- Estriol level.
30r 2 Computer Output
r2
- Root MSE 0.60553
R-Square 0.8167 - Dependent Mean 2.00000 Adj
R-Sq 0.7556 - Coeff Var 30.27650
-
r2 adjusted for number of explanatory variables
sample size
S
31Using the Model for Prediction Estimation
32Regression Modeling Steps
- 1. Hypothesize Deterministic Component
- 2. Estimate Unknown Model Parameters
- 3. Specify Probability Distribution of Random
Error Term-Estimate Standard Deviation of Error - 4. Evaluate Model
- 5. Use Model for Prediction Estimation
33Prediction With Regression Models
- What Is Predicted?
- Population Mean Response E(Y) for Given X
- Point on Population Regression Line
- Individual Response (Yi) for Given X
34What Is Predicted?
35Confidence Interval Estimate of Mean Y
36Factors Affecting Interval Width
- 1. Level of Confidence (1 - ?)
- Width Increases as Confidence Increases
- 2. Data Dispersion (s)
- Width Increases as Variation Increases
- 3. Sample Size
- Width Decreases as Sample Size Increases
- 4. Distance of Xp from Mean?X
- Width Increases as Distance Increases
37Why Distance from Mean?
Greater dispersion than X1
?X
38Confidence Interval Estimate Example
- Reconsider the Obstetrics example with the
following data - Estriol (mg/24h) B.w. (g/1000)
- 1 1 2 1 3 2 4 2 5 4
- Estimate the mean BW and a subjects BW response
when the Estriol level is 4 at .05 level.
39Solution Table
40Confidence Interval Estimate Solution - Mean BW
X to be predicted
41Prediction Interval of Individual Response
Note!
42Why the Extra S?
43SAS codes for computing mean and prediction
intervals
- Data BW /Reading data in SAS/
- input estriol birthw
- cards
- 1 1
- 2 1
- 3 2
- 4 2
- 5 4
-
- run
- PROC REG dataBW /Fitting a linear regression
model/ - model birthwestriol/CLI CLM alpha.05
- run
44Interval Estimate from SAS- Output
- The REG Procedure
- Dependent Variable y
- Output Statistics
- Dep Var Predicted Std Error
- Obs y Value Mean Predict
95 CL Mean 95 CL Predict Residual - 1 1.0000 0.6000 0.4690
-0.8927 2.0927 -1.8376 3.0376 0.4000 - 2 1.0000 1.3000 0.3317
0.2445 2.3555 -0.8972 3.4972 -0.3000 - 3 2.0000 2.0000 0.2708
1.1382 2.8618 -0.1110 4.1110 0 - 4 2.0000 2.7000 0.3317
1.6445 3.7555 0.5028 4.8972 -0.7000 - 5 4.0000 3.4000 0.4690
1.9073 4.8927 0.9624 5.8376 0.6000
Predicted Y when X 3
Confidence Interval
Prediction Interval
SY
45Hyperbolic Interval Bands
46Correlation Models
47Types of Probabilistic Models
48Correlation vs. regression
- Both variables are treated the same in
correlation in regression there is a predictor
and a response - In regression the x variable is assumed
non-random or measured without error - Correlation is used in looking for relationships,
regression for prediction
49Correlation Models
- 1. Answer How Strong Is the Linear Relationship
Between 2 Variables? - 2. Coefficient of Correlation Used
- Population Correlation Coefficient Denoted ?
(Rho) - Values Range from -1 to 1
- Measures Degree of Association
- 3. Used Mainly for Understanding
50Sample Coefficient of Correlation
- 1. Pearson Product Moment Coefficient of
Correlation between x and y
51Coefficient of Correlation Values
-1.0
1.0
0
-.5
.5
52Coefficient of Correlation Values
No Correlation
-1.0
1.0
0
-.5
.5
53Coefficient of Correlation Values
No Correlation
-1.0
1.0
0
-.5
.5
Increasing degree of negative correlation
54Coefficient of Correlation Values
Perfect Negative Correlation
No Correlation
-1.0
1.0
0
-.5
.5
55Coefficient of Correlation Values
Perfect Negative Correlation
No Correlation
-1.0
1.0
0
-.5
.5
Increasing degree of positive correlation
56Coefficient of Correlation Values
Perfect Positive Correlation
Perfect Negative Correlation
No Correlation
-1.0
1.0
0
-.5
.5
57Coefficient of Correlation Examples
r 1
r -1
r .89
r 0
58Test of Coefficient of Correlation
- 1. Shows If There Is a Linear Relationship
Between 2 Numerical Variables - 2. Same Conclusion as Testing Population Slope ?1
- 3. Hypotheses
- H0 ? 0 (No Correlation)
- Ha ? ? 0 (Correlation)
591 Sample t-Test on Correlation Coefficient
- Hypotheses
- H0 ? 0 (No Correlation)
- Ha ? ? 0 (Correlation)
- test statistic under H0
- t r (n-2)1/2 / (1-r2)1/2 t (n-2)
- Reject H0 if t gt ta/2, n-2
601 Sample Z-Test on Correlation Coefficient
- Hypotheses (Fisher)
- H0 ? ?0
- Ha ? ? ?0
- test statistic under H0
-
- Reject H0 if z gt z 1-a/2
61Conclusion
- Describe the Linear Regression Model
- State the Regression Modeling Steps
- Explain Ordinary Least Squares
- Compute Regression Coefficients
- Understand and check model assumptions
- Predict Response Variable
- Comments of SAS Output
62Conclusion
- Correlation Models
- Test of coefficient of Correlation