Title: Statistics
1Simple Linear Regression
2Regression Model
Motivation describe relationship between two
variables X and Y X-independent or predictor
(experimenters control) Y-dependent (outcome of
interest) Regression approach (1) Ascertain
probable form (2) Prediction and/or estimation
(population parameters based on sample
statistics) Regression model is an approximation
to the real situation Separates variation
components (1) Systematic (overall linear
relationship) and (2) Random (variation around
the line)
3Assumptions
1. Values of independent variable X are fixed 2.
X is measured without or negligible error 3.
For each value of X there is a subpopulation of
Y values NOTE For inference to be valid
YNormal 4. Variances of subpopulations of Y are
all equal 5. Means of subpopulations of Y all lie
on the same straight line Assumption of
LINEARITY myx b0 b1x Geometrically
population regression coefficients b0
y-intercept, b1 slope (of line where means
lie) 6. Y values are statistically independent
4LINE
Linear Independent Normal Equal variances
Y
EY?0 ?1 X
yi b0 b1xi ei
Identical normal distributions of errors, all
centered on the regression line
ei y - b0 - b1xi ei y - myx
X
Graph source Aczel, A. (1998), Complete Business
Statistics, McGraw-Hill/Irwin, Mass., 4th ed.
(CD-ROM)
5Step 1 Verify Linear Relationship
SCATTERPLOTS
X
Graphs source Aczel, A. (1998), Complete
Business Statistics, McGraw-Hill/Irwin, Mass.,
4th ed. (CD-ROM)
6Step 2 Estimation Via Least-Squares
Objective the line that BEST fit the data
Graph source Aczel, A. (1998), Complete Business
Statistics, McGraw-Hill/Irwin, Mass., 4th ed.
(CD-ROM)
7Step 2 Estimation Via Least-Squares
Objective function
Least-squares minimize SSE with respect to b0
and b1
8Steps 2- Estimation Via Least-Squares
Least-squares regression estimators
slope
intercept
1/2
9Step 3 Evaluating Regression Equation
Constant Y
Unsystematic Variation
Nonlinear Relationship
Hypothesis testing regarding existence of linear
relationship
H0 b10 vs HA b1?0
Test statistic
Graphs source Aczel, A. (1998), Complete
Business Statistics, McGraw-Hill/Irwin, Mass.,
4th ed. (CD-ROM)
10Coefficient of Determination
r2 descriptive measure of the strength of the
regression relationship percentage of total
variation explained by the regression
Total Unexplained Explained deviation
deviation deviation
SST SSE SSR
11Coefficient of Determination
Graphs source Aczel, A. (1998), Complete
Business Statistics, McGraw-Hill/Irwin, Mass.,
4th ed. (CD-ROM)
12ANOVA for Regression
VR MSReg/MSRes
VR Fk-1,N-k
N total number of observations k number of
regression parameters
13Residual Analysis and Model Inadequacies
Residuals
Residuals
0
0
Homoscedasticity Residuals appear completely
random. No indication of model inadequacy.
Heteroscedasticity Variance of residuals changes
when x changes.
Residuals
Residuals
0
0
Time
Curved pattern in residuals resulting from
underlying nonlinear relationship.
Residuals exhibit a linear trend with time.
Source Aczel, A. (1998), Complete Business
Statistics, McGraw-Hill/Irwin, Mass., 4th ed.
(CD-ROM)
14Correlation Analysis
15Correlation Model
Motivation describe relationship between two
variables X and Y Both X and Y are random No
distinction between them as dependent or
independent Bivariate normal distribution X
and Y vary together in a JOINT DISTRIBUTION Assum
ptions For e/X there is a normally distributed
subpopulation of Ys For e/Y there is a normally
distributed subpopulation of Xs The joint
distribution of X and Y -gt BIVARIATE
NORMAL Subpopulations of Y values all have same
variance Subpopulations of x values all have
same variance
16Bivariate Normal Distribution
Probability Distribution Function
correlation coefficient
Graph source Electronic Textbook StatSoft,
http//www.statsoftinc.com/textbook/glosb.html,
accessed 11/17/2003
17Correlation Coefficient
Correlation degree of linear association
between the two r.v.
Population parameter ?
Graphs source Aczel, A. (1998), Complete
Business Statistics, McGraw-Hill/Irwin, Mass.,
4th ed. (CD-ROM)
18Correlation Coefficient
The joint variation of two random variables
COVARIANCE Cov(X,Y)E(X-mx)(Y-mY)
? square root of coefficient of determination
19Hypothesis Testing
20Example
Daniels exercise 9.7.4 21 children with Downs
syndrome. Is there a linear association between
mean length of utterance (MLU) and number of
one-word utterances (OWU)
Ho r0 vs. Ha r?0
t -8.67 with 19 d.f t.975,19 -2.093 -gt reject
p-valuelt0.01
95 Confidence Interval
95 CI for r
21Using the Regression Equation
Goals prediction vs. estimation