Title: Advanced Quantitative Methods - PS 401 Notes
1Advanced Quantitative Methods - PS 401Notes
Version as of 9/21/2000
- Robert D. Duval
- WVU Dept of Political Science
- Class Office
- 306E Woodburn 301A Woodburn
- TTh 1130-1245 T 200-300
- Th 100-300
- Phone 293-3811 x5299
- 293-4372 x13050
- e-mail bduval_at_wvu.edu
2Introduction
- This course is about Regression analysis.
- The principle method in the social science
- Three basic parts to the course
- An introduction to the general Model
- The formal assumptions and what they mean.
- Selected special topics
3Syllabus
- Required texts
- Additional readings
- Computer exercises
- Course requirements
- Midterm - in class, open book (30)
- Final - in class, open book (30)
- Research paper (30)
- Participation (10)
- http//www.polsci.wvu.edu/duval/ps401/401syl.html
4Introduction The General Linear Model
- The General Linear Model is a phrase used to
indicate a class of statistical models which
include simple linear regression analysis. - Regression is the predominant statistical tool
used in the social sciences due to its simplicity
and versatility. - Also called Linear Regression Analysis.
5Simple Linear Regression The Basic Mathematical
Model
- Regression is based on the concept of the simple
proportional relationship - also known as the
straight line. - We can express this idea mathematically!
- Theoretical aside All theoretical statements of
relationship imply a mathematical theoretical
structure. - Just because it isnt explicitly stated doesnt
mean that the math isnt implicit in the language
itself!
6Alternate Mathematical Notation for the Line
- Alternate Mathematical Notation for the straight
line - dont ask why! - 10th Grade Geometry
- Statistics Literature
- Econometrics Literature
7Alternate Mathematical Notation for the Line
cont.
- These are all equivalent. We simply have to live
with this inconsistency. - We wont use the geometric tradition, and so you
just need to remember that B0 and a are both the
same thing.
8Linear Regression the Linguistic Interpretation
- In general terms, the linear model states that
the dependent variable is directly proportional
to the value of the independent variable. - Thus if we state that some variable Y increases
in direct proportion to some increase in X, we
are stating a specific mathematical model of
behavior - the linear model. - Hence, if we say that the crime rate goes up as
unemployment goes up, we are stating a simple
linear model.
9Linear RegressionA Graphic Interpretation
10The linear model is represented by a simple
picture
11The Mathematical Interpretation The Meaning of
the Regression Parameters
- a the intercept
- the point where the line crosses the Y-axis.
- (the value of the dependent variable when all of
the independent variables 0) - b the slope
- the increase in the dependent variable per unit
change in the independent variable (also known as
the 'rise over the run')
12The Error Term
- Such models do not predict behavior perfectly.
- So we must add a component to adjust or
compensate for the errors in prediction. - Having fully described the linear model, the rest
of the semester (as well as several more) will be
spent of the error.
13The Nature of Least Squares Estimation
- There is 1 essential goal and there are 4
important concerns with any OLS Model
14The 'Goal' of Ordinary Least Squares
- Ordinary Least Squares (OLS) is a method of
finding the linear model which minimizes the sum
of the squared errors. - Such a model provides the best explanation/predict
ion of the data.
15Why Least Squared error?
- Why not simply minimum error?
- The errors about the line sum to 0.0!
- Minimum absolute deviation (error) models now
exist, but they are mathematically cumbersome. - Try algebra with Absolute Value signs!
16Other models are possible...
- Best parabola...?
- (i.e. nonlinear or curvilinear relationships)
- Best maximum likelihood model ... ?
- Best expert system...?
- Complex Systems?
- Chaos/Non-linear systems models
- Catastrophe models
- others
17The Simple Linear Virtue
- I think we over emphasize the linear model.
- It does, however, embody this rather important
notion that Y is proportional to X. - As noted, we can state such relationships in
simple English. - As unemployment increases, so does the crime
rate. - As domestic conflict increased, national leaders
will seek to distract their populations by
initiating foreign disputes.
18The Notion of Linear Change
- The linear aspect means that the same amount of
increase in unemployment will have the same
effect on crime at both low and high
unemployment. - A nonlinear change would mean that as
unemployment increases, its impact upon the crime
rate might increase at higher unemployment levels.
19Why squared error?
- Because
- (1) the sum of the errors expressed as deviations
would be zero as it is with standard deviations,
and - (2) some feel that big errors should be more
influential than small errors. - Therefore, we wish to find the values of a and b
that produce the smallest sum of squared errors.
20Minimizing the Sum of Squared Errors
- Who put the Least in OLS
- In mathematical jargon we seek to minimize the
Unexplained Sum of Squares (USS), where
21The Parameter estimates
- In order to do this, we must find parameter
estimates which accomplish this minimization. - In calculus, if you wish to know when a function
is at its minimum, you take the first
derivative. - In this case we must take partial derivatives
since we have two parameters (a b) to worry
about. - We will look closer at this and its not a pretty
sight!
22Why squared error?
- Because
- (1) the sum of the errors expressed as
deviations would be zero as it is with standard
deviations, and - (2) some feel that big errors should be more
influential than small errors. - Therefore, we wish to find the values of a and b
that produce the smallest sum of squared errors.
23Decomposition of the error in LS
24Goodness of Fit
- Since we are interested in how well the model
performs at reducing error, we need to develop a
means of assessing that error reduction. Since
the mean of the dependent variable represents a
good benchmark for comparing predictions, we
calculate the improvement in the prediction of Yi
relative to the mean of Y (the best guess of Y
with no other information).
25Sum of Squares Terminology
- In mathematical jargon we seek to minimize the
Unexplained Sum of Squares (USS), where
26Sums of Squares
- This gives us the following 'sum-of-squares'
measures - Total Variation Explained Variation
Unexplained Variation
27Sums of Squares Confusion
- Note Occasionally you will run across ESS and
RSS which generate confusion since they can be
used interchangeably. ESS can be error
sums-of-squares or estimated or explained SSQ.
Likewise RSS can be residual SSQ or regression
SSQ. Hence the use of USS for Unexplained SSQ in
this treatment.
28The Parameter estimates
- In order to do this, we must find parameter
estimates which accomplish this minimization. - In calculus, if you wish to know when a function
is at its minimum, you take the first derivative.
- In this case we must take partial derivatives
since we have two parameters to worry about.
29Deriving the Parameter Estimates
- Since
- We can take the partial derivative with respect
to a and b
30Deriving the Parameter Estimates (cont.)
- Which simplifies to
- We also set these derivatives to 0 to indicate
that we are at a minimum.
31Deriving the Parameter Estimates (cont.)
- We now add a hat to the parameters to indicate
that the results are estimators and . - We also Set these derivatives equal to zero.
32Deriving the Parameter Estimates (cont.)
- Dividing through by -2 and rearranging terms, we
get
33Deriving the Parameter Estimates (cont.)
- We can solve these equations simultaneously to
get our estimators.
34Deriving the Parameter Estimates (cont.)
- The estimator for a which shows that the
regression line always goes through the point
which is the intersection of the two means. - This formula is quite manageable for bivariate
regression. If there are two or more independent
variables, the formula for b2, etc. becomes
unmanageable!
35Tests of Inference
- t-tests for coefficients
- F-test for entire model
36T-Tests
- Since we wish to make probability statements
about our model, we must do tests of inference. - Fortunately,
37This gives us the F test
38Measures of Goodness of fit
- The Correlation coefficient
- r-squared
39The correlation coefficient
- A measure of how close the residuals are to the
regression line - It ranges between -1.0 and 1.0
- It is closely related to the slope.
40R2 (r-square)
- The r2 (or R-square) is also called the
coefficient of determination.
41Tests of Inference
- t-tests for coefficients
- F-test for entire model
- Since we are interested in how well the model
performs at reducing error, we need to develop a
means of assessing that error reduction. Since
the mean of the dependent variable represents a
good benchmark for comparing predictions, we
calculate the improvement in the prediction of Yi
relative to the mean of Y (the best guess of Y
with no other information).
42Goodness of fit
- The correlation coefficient
- A measure of how close the residuals are to the
regression lineIt ranges between -1.0 and 1.0 - r2 (r-square)
- The r-square (or R-square) is also called the
coefficient of determination
43The assumptions of the model
- We will spend the next 4 weeks on this!
44The Multiple Regression Model The Scalar Version
- The basic multiple regression model is a simple
extension of the bivariate equation. by adding
extra independent variables, we are creating a
multiple-dimensioned space, where the model fit
is a some appropriate space. For instance, if
there are two independent variables, we are
fitting the points to a plane in space.
45The Scalar Equation
46The Matrix Model
- The multiple regression model may be easily
represented in matrix terms. - Where the Y, X, B and e are all matrices of data,
coefficients, or residuals
47The Matrix Model (cont.)
- The matrices in are
represented by - Note that we postmultiply X by B since this order
makes them conformable.
48Assumptions of the modelScalar Version
- The OLS model has seven fundamental assumptions.
These assumptions form the foundation for all
regression analysis. Failure of a model to
conform to these assumptions frequently presents
severe problems for estimation and inference.
49The Assumptions of the ModelScalar Version
(cont.)
- 1. The ei's are normally distributed.
- 2. E(ei) 0
- 3. E(ei2) ?2
- 4. E(eiej) 0 (i?j)
- 5. X's are nonstochastic with values fixed in
repeated samples and ?(Xik-Xbark)2/n is a finite
nonzero number. - 6. The number of observations is greater than the
number of coefficients estimated. - 7. No exact linear relationship exists between
any of the explanatory variables.
50The Assumptions of the ModelThe English Version
- The errors have a normal distribution.
- The residuals are heteroskedastic.
- There is no serial correlation.
- There is no multicollinearity.
- The Xs are fixed. (non-stochastic)
- There are more data points than unknowns.
- The model is linear.
- OKso its not really English.
51The Assumptions of the Model The Matrix Version
- These same assumptions expressed in matrix format
are - 1. e ? N(0,?)
- 2. ? ?2I
- 3. The elements of X are fixed in repeated
samples and (1/ n)X'X is nonsingular and its
elements are finite
52Extra Material on OLS The Adjusted R2
- Since R2 always increases with the addition of a
new variable, the adjusted R2 compensates for
added explanatory variables.
53Extra Material on OLS The F-test
- In addition, the F test for the entire model must
be adjusted to compensate for the changed degrees
of freedom. - Note that F increases as n or R2 increases and
decreases as k increasesAdding a variable will
always increase R2, but not necessarily adjusted
R2 or F. In addition values of adjusted R2 below
0.0 are possible.
54Derivation of B's in matrix notation
- Skip this material in PS 401
- Given the matrix algebra model
- 1.33
- we can replicate the least squares normal
equations in matrix format.We need to minimize
ee, which is the sum of squared errors.1.34 - Setting the derivative equal to 0 we get1.35
1.36 1.37 1.38 - Note that XX is called the sums-of-squares and
cross-products matrix.
55Properties of Estimators (?)
- Since we are concerned with error, we will be
concerned with those properties of estimators
which have to do with the errors produced by the
estimates - the ?s
56Types of estimator error
- Estimators are seldom exactly correct due to any
number of reasons, most notably sampling error
and biased selection. There are several important
concepts that we need to understand in examining
how well estimators do their job.
57Sampling error
- Sampling error is simply the difference between
the true value of a parameter and its estimate in
any given sample. - This sampling error means that an estimator will
vary from sample to sample and therefore
estimators have variance.
58Bias
- The bias of an estimate is the difference between
its expected value and its true value. - If the estimator is always low (or high) then the
estimator is biased. - An estimator is unbiased if
- And
59Mean Squared Error
- The mean square error (MSE) is different from the
estimators variance in that the variance
measures dispersion about the estimated parameter
while mean squared error measures the dispersion
about the true parameter. - If the estimator is unbiased then the variance
and MSE are the same.
60Mean Squared Error (cont.)
- The MSE is important for time series and
forecasting since it allows for both bias and
efficiency - For instance
- These concepts lead us to look at the properties
of estimators. Estimators may behave differently
in large and small samples, so we look at both
the small and large (asymptotic) sample
properties.
61Small Sample Properties
- These are the ideal properties. We desire these
to hold. - Bias
- Efficiency
- Best Linear Unbiased Estimator
62Bias
- A parameter is unbiased if
- In other words, the average value of the
estimator in repreated sampling equals the true
parameter. - Note that whether an estimator is biased or not
implies nothing about its dispersion.
63Efficiency
- An estimator is efficient if it is unbiased and
where its variance is less than any other
unbiased estimator of the parameter. - Is unbiased
- Var( ) ? Var ( )
where
is any other unbiased estimator of - There might be instances in which we might choose
a biased estimator, if it has a smaller variance.
64BLUE (Best Linear Unbiased Estimate)
- An estimator is described as a BLUE estimator if
it is - is a linear function
- is unbiased
- Var( ) ? Var ( )
where
is any other linear unbiased estimator of -
65What is a linear estimator?
- Note that the sample mean is an example of a
linear estimator.
66Asymptotic (Large Sample) Properties
- Asymptotically unbiased
- Consistency
- Asymptotic efficiency
67Asymptotic bias
- An estimator is unbiased if
68Consistency
- The point at which a distribution collapses is
called the probability limit (plim)If the bias
and variance both decrease as gets larger, the
estimator is consistent.
69Asymptotic efficiency
- An estimator is asymptotically efficient if
- asymptotic distribution with finite mean and
variance - is consistent
- no other estimator has smaller asymptotic
variance
70Rifle and Target Analogy
- Small sample properties
- Bias The shots cluster around some spot other
than the bulls-eye) - Efficient When one rifles cluster is smaller
than anothers. - BLUE - Smallest scatter for rifles of a
particular type of simple construction
71Rifle and Target Analogy (cont.)
- Asymptotic properties
- Think of increased sample size as getting closer
to the target. When all of the assumptions of
the OLS model hold its estimators are - unbiased
- Minimum variance, and
- BLUE
72Assumption Violations How we will approach the
question.
- Definition
- Implications
- Causes
- Tests
- Remedies
73Non-zero Mean for the residuals (Definition)
- Definition
- The residuals have a mean other than 0.0.
- Note that this refers to the true residuals.
Hence the estimated residuals have a mean of 0.0,
while the true residuals are non-zero.
74Non-zero Mean for the residuals (Implications)
- The true regression line is
- Therefore the intercept is biased.
- The slope, b, is unbiased. There ia also no way
of separating out a and ?.
75Non-zero Mean for the residuals (Causes, Tests,
Remedies)
- Causes Non-zero means result from some form of
specification error. Something has been omitted
from the model which accounts for that mean in
the estimation. - We will discuss Tests and Remedies when we look
closely at Specification errors.
76Non-normally distributed errors Definition
- The residuals are not NID(0,?)
Normality Tests Section Assumption Value Probabili
ty Decision(5) Skewness 5.1766 0.000000 Rejected
Kurtosis 4.6390 0.000004 Rejected Omnibus 48.3172
0.000000 Rejected
77Non-normally distributed errors Implications
- The existence of residuals which are not normally
distributed has several implications. - First is that it implies that the model is to
some degree misspecified. - A collection of truly stochastic disturbances
should have a normal distribution. The central
limit theorem states that as the number of random
variables increases, the sum of their
distributions tends to be a normal distribution.
78Non-normally distributed errors Implications
(cont.)
- If the residuals are not normally distributed,
then the estimators of a and b are also not
normally distributed. - Estimates are, however, still BLUE.
- Estimates are unbiased and have minimum variance.
- They are no longer efficient, even though they
are asymptotically unbiased and consistent. - It is only our hypothesis tests which are suspect.
79Non-normally distributed errors Causes
- Generally causes by a misspecification error.
- Usually an omitted variable.
- Can also result from
- Outliers in data.
- Wrong functional form.
80Non-normally distributed errors Tests for
non-normality
- Chi-Square goodness of fit
- Since the cumulative normal frequency
distribution has a chi-square distribution, we
can test for the normality of the error terms
using a standard chi-square statistic. - We take our residuals, group them, and count how
many occur in each group, along with how many we
would expect in each group.
81Non-normally distributed errors Tests for
non-normality (cont.)
- We then calculate the simple ?2 statistic.
- This statistic has (N-1) degrees of freedom,
where N is the number of classes.
82Non-normally distributed errors Tests for
non-normality (cont.)
- Jarque-Bera test
- This test examines both the skewness and kurtosis
of a distribution to test for normality. - Where S is the skewness and K is the kurtosis of
the residuals. - JB has a ?2 distribution with 2 df.
83Non-normally distributed errors Remedies
- Try to modify your theory. Omitted variable?
Outlier needing specification? - Modify your functional form by taking some
variance transforming step such as square root,
exponentiation, logs, etc. - Be mindful that you are changing the nature of
the model. - Bootstrap it!
84Multicollinearity Definition
- Multicollinearity is the condition where the
independent variables are related to each other.
Causation is not implied by multicollinearity. - As any two (or more) variables become more and
more closely correlated, the condition worsens,
and approaches singularity. - Since the X's are supposed to be fixed, this a
sample problem. - Since multicollinearity is almost always present,
it is a problem of degree, not merely existence.
85Multicollinearity Implications
- Consider the following cases
- A) No multicollinearity
- The regression would appear to be identical to
separate bivariate regressionsThis produces
variances which are biased upward (too large)
making t-tests too small.For multiple regression
this satisfies the assumption. - B) Perfect Multicollinearity
- Some variable Xi is a perfect linear combination
of one or more other variables Xj, therefore X'X
is singular, and X'X 0. - A model cannot be estimated under such
circumstances. The computer dies. - C. A high degree of Multicollinearity
- When the independent variables are highly
correlated the variances and covariances of the
Bi's are inflated (t ratio's are lower) and R2
tends to be high as well. - The B's are unbiased (but perhaps useless due to
their imprecise measurement as a result of their
variances being too large). In fact there are
still BLUE. - OLS estimates tend to be sensitive to small
changes in the data. - Relevant variables may be discarded
86Multicollinearity Implications
- Consider the following cases
- A) No multicollinearity
- The regression would appear to be identical to
separate bivariate regressions - This produces variances which are biased upward
(too large) making t-tests too small. - For multiple regression this satisfies the
assumption.
87Multicollinearity Implications (cont.)
- B) Perfect Multicollinearity
- Some variable Xi is a perfect linear combination
of one or more other variables Xj, therefore X'X
is singular, and X'X 0. - This is matrix algebra notation. It means that
one variable is a perfect linear function of
another. (e.g. X2 X13.2) - A model cannot be estimated under such
circumstances. The computer dies.
88Multicollinearity Implications (cont.)
- C. A high degree of Multicollinearity
- When the independent variables are highly
correlated the variances and covariances of the
Bi's are inflated (t ratio's are lower) and R2
tends to be high as well. - The B's are unbiased (but perhaps useless due to
their imprecise measurement as a result of their
variances being too large). In fact they are
still BLUE. - OLS estimates tend to be sensitive to small
changes in the data. - Relevant variables may be discarded.
89Multicollinearity Causes
- Sampling mechanism.Poorly constructed design
measurement scheme or limited range. - Statistical model specification adding
polynomial terms or trend indicators. - Too many variables in the model - the model is
overdetermined. - Theoretical specification is wrong. Inappropriate
construction of theory or even measurement
90Multicollinearity Tests/Indicators
- X'X approaches 0
- Since the determinant is a function of variable
scale, this measure doesn't help a whole lot. We
could, however, use the determinant of the
correlation matrix and therefore bound the range
from 0. to 1.0
91Multicollinearity Tests/Indicators (cont.)
- Tolerance
- If the tolerance equals 1, the variables are
unrelated. If TOLj 0, then they are perfectly
correlated. - Variance Inflation Factors (VIFs)
- Tolerance
92Interpreting VIFs
- No multicollinearity produces VIFs 1.0
- If the VIF is greater than 10.0, then
multicollinearity is probably severe. 90 of the
variance of Xj is explained by the other Xs. - In small samples, a VIF of about 5.0 may indicate
problems
93Multicollinearity Tests/Indicators (cont.)
- R2 deletes - tries all possible models of X's and
by includes/ excludes based on small changes in
R2 with the inclusion/omission of the variables
(taken 1 at a time) - F is significant, But no t value is.
- Adjusted R2 declines with a new variable
- Multicollinearity is of concern when either
94Multicollinearity Tests/Indicators (cont.)
- I would avoid the rule of thumb
- Beta's are gt 1.0 or lt -1.0
- Sign changes occur with the introduction of a new
variable - The R2 is high, but few t-ratios are.
- Eigenvalues and Condition Index - If this topic
is beyond Gujarati, its beyond me.
95Multicollinearity Remedies
- Increase sample size
- Omit Variables
- Scale Construction/Transformation
- Factor Analysis
- Constrain the estimation. Such as the case where
you can set the value of one coefficient relative
to another.
96Multicollinearity Remedies (cont.)
- Change design (LISREL maybe or Pooled
cross-sectional Time series) - Ridge Regression
- This technique introduces a small amount of bias
into the coefficients to reduce their variance. - Ignore it - report adjusted r2 and claim it
warrants retention in the model.
97Heteroskedasticity Definition
- Heteroskedasticity is a problem where the error
terms do not have a constant variance. - That is, they may have a larger variance when
values of some Xi (or the Yis themselves) are
large (or small).
98Heteroskedasticity Definition
- This often gives the plots of the residuals by
the dependent variable or appropriate independent
variables a characteristic fan or funnel shape.
99Heteroskedasticity Implications
- The regression B's are unbiased.
- But they are no longer the best estimator. They
are not BLUE (not minimum variance - hence not
efficient). - They are, however, consistent.
100Heteroskedasticity Implications (cont.)
- The estimator variances are not asymptotically
efficient, and they are biased. - So confidence intervals are invalid.
- What do we know about the bias of the variance?
- If Yi is positively correlated with ei, bias is
negative - (hence t values will be too large.) - With positive bias many t's too small.
101Heteroskedasticity Implications (cont.)
- Types of Heteroskedasticity
- There are a number of types of heteroskedasticity.
- Additive
- Multiplicative
- ARCH (Autoregressive conditional heteroskedastic)
- a time series problem.
102Heteroskedasticity Causes
- It may be caused by
- Model misspecification - omitted variable or
improper functional form. - Learning behaviors across time
- Changes in data collection or definitions.
- Outliers or breakdown in model.
- Frequently observed in cross sectional data sets
where demographics are involved (population, GNP,
etc).
103Heteroskedasticity Tests
- Informal Methods
- Graph the data and look for patterns!
104Heteroskedasticity Tests (cont.)
- Park test
- As an exploratory test, log the residuals and
regress them on the logged values of the
suspected independent variable. - If the B is significant, then heteroskedasticity
may be a problem.
105Heteroskedasticity Tests (cont.)
- Glejser Test
- This test is quite similar to the park test,
except that it uses the absolute values of the
residuals, and a variety of transformed Xs. - A significant B2 indicated Heteroskedasticity.
- Easy test, but has problems.
106Heteroskedasticity Tests (cont.)
- Goldfeld-Quandt test
- Order the n cases by the X that you think is
correlated with ei2. - Drop a section of c cases out of the
middle(one-fifth is a reasonable number). - Run separate regressions on both upper and lower
samples.
107Heteroskedasticity Tests (cont.)
- Goldfeld-Quandt test (cont.)
- Do F-test for difference in error variancesF has
(n - c - 2k)/2 degrees of freedom for each
108Heteroskedasticity Tests (cont.)
- Breusch-Pagan-Godfrey Test (Lagrangian Multiplier
test) - Estimate model with OLS
- Obtain
- Construct variables
109Heteroskedasticity Tests (cont.)
- Breusch-Pagan-Godfrey Test (cont.)
- Regress pi on the X (and other?!) variables
- Calculate
- Note that
110Heteroskedasticity Tests (cont.)
- Whites Generalized Heteroskedasticity test
- Estimate model with OLS and obtain residuals
- Run the following auxiliary regression
- Higher powers may also be used, along with more
Xs
111Heteroskedasticity Tests (cont.)
- Whites Generalized Heteroskedasticity test
(cont.) - Note that
- The degrees of freedom is the number of
coefficients estimated above.
112Heteroskedasticity Remedies
- GLS
- We will cover this after autocorrelation
- Weighted Least Squares
- si2 is a consistent estimator of si2
- use same formula (BLUE) to get a ß
113- Iteratively weighted least squares (IWLS)
- Uses BLUE
- The Variance equals
- Obtain estimates of and using OLS
- Use these to get "1st round" estimates of si2
- Using formula above replace wi with 1/ si2 and
obtain new estimates for a and ß. - Use these to re-estimate
- Repeat Step 2 until a and ß converge.
114Heteroskedasticity Remedies (cont.)
- Whitess corrected standard errors
- Discussion beyond this course
- Some software will calculate these.
- (SHAZAM,TSP)
115Autocorrelation Definition
- Autocorrelation is simply the presence of
standard correlation between adjacent residuals. - If a residual is negative (positive) then its
neighbors tend to also be negative (positive). - Most often autocorrelation is between adjacent
observations, however, lagged or seasonal
patterns can also occur. - Autocorrelation is also usually a function of
order by time, but it can occur for other orders
as well.
116Autocorrelation Definition (cont.)
- The assumption violated is
- Meaning that the Pearsons r between the
residuals from OLS and the same residuals lagged
on period is non-zero.
117Autocorrelation Definition (cont.)
- Most autocorrelation is what we call 1st order
autocorrelation, meaning that the residuals are
related to their contiguous values - For instance
118Autocorrelation Definition (cont.)
- Types of Autocorrelation
- Autoregressive processes
- Moving Averages
119Autocorrelation Definition (cont.)
- Autoregressive processes AR(p)
- The residuals are related to their preceding
values. - This is classic 1st order autocorrelation
120Autocorrelation Definition (cont.)
- Autoregressive processes (cont.)
- In 2nd order autocorrelation the residuals are
related to their t-2 values as well - Larger order processes may occur as well
121Autocorrelation Definition (cont.)
- Moving Average Processes MA(q)
- The error term is a function of some random error
plus a portion of the previous random error.
122Autocorrelation Definition (cont.)
- Moving Average Processes (cont.
- Higher order processes for MA(q) also exist.
- The error term is a function of some random error
plus a portion of the previous random error.
123Autocorrelation Definition (cont.)
- Mixed processes ARMA(p,q)
- The error term is a complex function of both
autoregressive and moving average processes.
124Autocorrelation Definition (cont.)
- There are substantive interpretations that can be
placed on these processes. - AR processes represent shocks to systems that
have long-term memory. - MA processes are quick shocks that to systems
that handle the process, but have only short term
memory.
125Autocorrelation Implications
- Coefficient estimates are unbiased, but the
estimates are not BLUE - The variances are often greatly underestimated
(biased small) - Hence hypothesis tests are exceptionally suspect.
126Autocorrelation Causes
- Specification error
- Omitted variable i.e inflation
- Wrong functional form
- Lagged effects
- Data Transformations
- Interpolation of missing data
- differencing
127Autocorrelation Tests
- Observation of residuals
- Graph/plot them!
- Runs of signs
- Geary test
128Autocorrelation Tests (cont.)
- Durbin-Watson d
- Criteria for hypothesis of AC
- Reject if d lt dL
- Do not reject if d gt dU
- Test is inconclusive if dL ? d ? dU.
129Autocorrelation Tests (cont.)
- Durbin-Watson d (cont.)
- Note that the d is symmetric about 2.0, so that
negative autocorrelation will be indicated by a d
gt 2.0. - Use the same distances above 2.0 as upper and
lower bounds.
130Autocorrelation Tests (cont.)
- Durbins h
- Cannot use DW d if there is a lagged endogenous
variable in the model - sc2 is the estimated variance of the Yt-1 term
- h has a standard normal distribution
131Autocorrelation Tests (cont.)
- Tests for higher order autocorreltaion
- Ljung-Box Q (?2 statistic)
- Portmanteau test
- Breusch-Godfrey
132Autocorrelation Remedies
- Generalized Least Squares
- Later!
- First difference method
- Take 1st differences of your Xs and Y
- Regress ?Y on ?X
- Assumes that ? 1!
- Generalized differences
- Requires that ? be known.
133Autocorrelation Remedies
- Cochran-Orcutt method
- (1) Estimate model using OLS and obtain the
residuals, ut. - (2) Using the residuals run the following
regression.
134Autocorrelation Remedies (cont.)
- Cochran-Orcutt method (cont.)
- (3) using the p obtained, perform the regression
on the generalized differences - (4) Substitute the values of B1 and B2 into the
original regression to obtain new estimates of
the residuals. - (5) Return to step 2 and repeat until p no
longer changes.
135Model Specification Definition
- The analyst should understand one fundamental
truth about statistical models. They are all
misspecified. - We exist in a world of incomplete information at
best. Hence model misspecification is an
ever-present danger. We do, however, need to come
to terms with the problems associated with
misspecification so we can develop a feeling for
the quality of information, description, and
prediction produced by our models.
136Model Specification Definition (cont.)
- There are basically 4 types of misspecification
we need to examine - functional form
- inclusion of an irrelevant variable
- exclusion of a relevant variable
- measurement error and misspecified error term
137Model Specification Implications
- If an omitted variable is correlated with the
included variables, the estimates are biased as
well as inconsistent. - In addition, the error variance is incorrect, and
usually overestimated. - If the omitted variable is uncorrelated tot the
included variables, the errors are still biased,
even though the Bs are not.
138Model Specification Implications
- Incorrect functional form can result in
autocorrelation or heteroskedasticity. - See these sections for the implications of each
problem.
139Model Specification Causes
- This one is easy - theoretical design.
- something is omitted, irrelevantly included,
mismeasured or non-linear. - This problem is explicitly theoretical.
140Model Specification Tests
- Actual Specification Tests
- No test can reveal poor theoretical construction
per se. - The best indicator that your model is
misspecified is the discovery that the model has
some undesirable statistical property e.g a
misspecified functional form will often be
indicated by a significant test for
autocorrelation. - Sometimes time-series models will have negative
autocorrelation as a result of poor design.
141Model Specification Tests
- Specification Criteria for lagged designs
- Most useful for comparing time series models with
same set of variables, but differing number of
parameters
142Model Specification Tests (cont)
- Schwartz Criterion
- where ?2 equals RSS/n, m is the number of Lags
(variables), and n is the number of observations - Note that this is designed for time series.
143Model Specification Tests (cont)
- AIC (Akaike Information Criterion)
- Both of these criteria (AIC and Schwartz) are to
be minimized for improved model specification.
Note that they both have a lower bound which is a
function of sample size and number of parameters.
144Model Specification Remedies
- Model Building
- A. "Theory Trimming" (Pedhauzer 616)
- B. Hendry and the LSE school of top-down
modeling. - C. Nested Models
- D. Stepwise Regression.
- Stepwise regression is a process of including the
variables in the model one step at a time.
This is a highly controversial technique.
145Model Specification Remedies (cont.) Stepwise
Regression
- Twelve things someone else says are wrong with
stepwise - Philosophical Problems
- 1. Completely atheoretical
- 2. Subject to spurious correlation
- 3. Information tossed out - insignificant
variables may be useful - 4. Computer replacing the scientist
- 5. Utterly mechanistic
146Model Specification Remedies (cont.) Stepwise
Regression
- Statistical
- 6. Population model from sample data
- 7. Large N - statistical significance can be an
artifact - 8. Inflates the alpha level
- 9. The scientist becomes the beholden to the
significant tests - 10. Overestimates the effect of the variables
added early, and underestimates the variables
added later - 11. Prevents data exploration
- 12. Not even least squares for stagewise
147Model Specification Remedies (cont.) Stepwise
Regression
- Twelve Responses
- Selection of the data selected for the procedure
implies some minimal level of theorization - All analysis is subject to spurious correlation.
If you think it might be spurious, - omit it. - True - but this can happen anytime
- All the better
- If it "works", is this bad? We use statistical
decision rules in a mechanistic manner
148Model Specification Remedies (cont.) Stepwise
Regression
- this is true of regular regression as well
- This is true of regular regression as well
- No
- No more than OLS
- Not true
- Also not true - this is a data exploration
technique - Huh? Antiquated view of stepwise...probably not
accurate in last 20 years
149Measurement Error
- Not much to say yet.Iif the measurement error is
random, estimates are unbiased, but results are
weakerIf biased measurement, results are biased.
150(No Transcript)