Title: Regression Analysis: Fitting Equations to Data
1Regression AnalysisFitting Equations to Data
2Introduction Aim
Find an equation (mathematical model) describing
relationship between
- A response variable
- or dependent variable
- One or more explanatory variables
- predictors or independent variables
3Reasons for modeling
- Understand form of relationship
- What controllable variables affect process
output? - Is process yield affected by changing temp?
- Insight into physical mechanism
- Predict
- What is surface finish if lathe speed is 120 rpm?
- Instrument calibration curves.
- Optimise
- What mixture of ingredients results in best
taste-test?
4Types of model
- Mechanistic model
- Ohms law
- Statistical model
- Take account of randomness
- Form of model?
Recorded data often dont fit model exactly
Rounding error Measurement error
- Does theory suggest form?
- Try linear model first?
- Nonlinear model?
5Types of data
- Observational data
- No control over values of variables
- Observe what is there
- Experimental data
- Adjust values of some explanatory variables
(factors) - Try to keep everything else constant
Experiment needed to infer cause-and-effect
relationship
6Linear nonlinear models
- Linear models
- linear in parameters
7Why linear models?
- Easier to estimate parameters
- Easier to interpret meaning of parameters
- First approx if true form of relationship is
unknown - Approximation to reality
- Many relationships nonlinear ...
- ... but approx linear over restricted range of x
8Nonlinear models?
- Can you linearise relationship?
9New type of particleboard
Relationship between density stiffness? 30
sheets manufactured and measured
Response y Vertical axis
Which variable on which axis?
Explanatory, x Horizontal axis
10New type of particleboard
Relationship between density stiffness?
- No known physical law
- Try
- ... or perhaps
Dont use for extrapolation
11Revision Simple Linear Model
12Meaning of parameters
- Loss of vitamin A from baked bread
- y vitamin A (mg/100g)
- x time (days)
Expected vitamin A at time 0 (immediately after
baking)
Expected change in vit A per day
St devn of different loaves at same time from
baking
13Expected response
Model
Expected response
14Model errors
Error for ith observation
All errors
15Fitted values residuals
- In practice, ?0 and ?1 are unknown
- Using estimates, b0 and b1,
16Least squares
Good estimates b0 and b1 make residuals small
Find b0 and b1 to minimise SSResidual
Least squares estimates
17Least squares estimates
To minimise, solve
Normal equations
18Least squares estimates
Normal equations
Solution
Dont try to remember formulae
- Minitab evaluates LS estimates
- More general and simpler matrix formulae
19Estimating ?
Best estimate
20Example
Response Monthly steam consumption in chemical
plant Explanatory Average operating temperature
Linear model?
Least squares estimates
21Interpretation
Least squares estimates
Interpretation?
Intercept
Expect 13.623 lb steam used / month at 0F
But avoid extrapolation!!!
Slope
Expect decrease of 0.0798 lb steam used / month
for each extra 1F
But only between 30F and 80F !!!
Error sd
At any temperature, sd of steam used / month is
about 0.8901 lb
22Properties of LS estimates
- Unbiased
- Formula for standard errors
- Normal distributions
Dont try to remember formulae Minitab does
calculations
23Inference
Confidence intervals
Hypothesis tests
To test whether ?0 k or ?1 k
?1 0 means y does not depend on x
Compare with t(n-2) distribution
24Concrete hardness (in Minitab)
Forty batches were mixed with varying amounts of
cement the hardness of each batch was measured
after 7 days.
Scatterplot
25Concrete hardness (in Minitab)
Linear relationship seems reasonable approximation
Fit linear model
Regression Analysis Hardness versus Cement The
regression equation is Hardness - 24.1 0.186
Cement Predictor Coef SE Coef T
P Constant -24.067 2.298 -10.47
0.000 Cement 0.186471 0.007974 23.38
0.000 S 3.10604 R-Sq 93.5 R-Sq(adj)
93.3 ...
26Concrete hardness (in Minitab)
Equation for predictions
Regression Analysis Hardness versus Cement The
regression equation is Hardness - 24.1 0.186
Cement Predictor Coef SE Coef T
P Constant -24.067 2.298 -10.47
0.000 Cement 0.186471 0.007974 23.38
0.000 S 3.10604 R-Sq 93.5 R-Sq(adj)
93.3 ...
27Prediction estimation
... Predicted Values for New Observations New Ob
s Fit SE Fit 95 CI 95 PI
1 41.198 0.735 (39.711, 42.685) (34.737,
47.660) ...