Title: Lecture 4: Estimating the Covariance Matrix
1Lecture 4 Estimating the Covariance Matrix
2We we will learn in this lecture
- Fundamental Methods of deriving the covariance
matrix from datasets - Dealing with non-stationary data sets
- Extensions to the basic estimation model Factor
Models
3Estimating the Covariance Matrix
- Up until now we have treated the covariance
matrix as something we just happen to know - When we build a stochastic model we frequently
have to estimate the covariance matrix from
sampled data - The basic methods of estimation are straight
forward but before we introduce them we need to
make sure that the covariance matrix has meaning
when applied to a data set
4What we are trying to measure
- We are using the covariance matrix and the
expected return vector to describe the behaviour
of random variables. - But does it makes sense to apply them to any
random variable? - As we will shortly see the answer is no, for some
random variables they have little or no meaning - Therefore we cannot use the covariance matrix to
describe some datasets
5Observation 1We Need A Random Variable With A
Central Tendency
- The basis of our covariance matrix is the
movement about the expected value - Both variance and covariance are based on the
idea of movement about a central point - If the random variable does not have a central
tendency then our method of measuring movement
about a centre is meaningless - It is also possible to adjust series for any
predictable trends
6Observation 2 We Need A Random Variable That Has
Pattern To Its Behaviour
- There is little point is trying to describe the
behaviour of something that does not have any
behaviour! - Are the random variable drawn from the same
underlying world of causality? - Is there a constant set of causal factors
influencing the observations we are seeing in the
dataset, or are the rules changing? - If the rules are changing are they changing
gradually?
7The Concept Of Covariance Stationarity
- A set of random variables are said to be
Covariance Stationary if their mean (or central
tendency), variance and covariance are static - A set of random variables are said to be Strictly
Stationary if their joint distribution is
stationary - Covariance Stationary and Strictly Stationary
are the same when we are dealing with the
multivariate normal distribution
8Our Imaginary Hypothesis
- We know that the probability of observing a set
of random variables (A,B,,Z) is described by a
multivariate normal distribution - We have a finite set of observations for these
random variables (A1,B1,.Z1), (A2,B2,.Z2), - We want to know what multivariate normal
distribution, out of all the possible
distributions, would most likely produce the data
set we observed - This is approach is called the maximum likelihood
estimator
9Maximum Likelihood
What Normal Distribution Will most likely produce
our data set?
Our Data Sample
Maximum Likelihood Distribution
10The maximum likelihood
- For the multivariate normal distribution the
maximum likelihood for its parameters can be
calculated by averaging the observations - Est. E(A) 1/N S Ai
- Est. Var(A) (1/N-1) S(Ai E(Ai))2
- Est. Cov(A,B) (1/N-1) S(Ai E(Ai)).(Bi
E(Bi)) - where Ai and Bi are the ith observations
11Filling in the Covariance Matrix
Est. Var(A) Est. Cov(B,A) Est. Cov(C,A)
Est. Cov(A,B) Est. Var(B) Est. Cov(C,B)
Est. Cov(A,C) Est. Cov(B,C) Est. Var(C)
12Excels Support for the Maximum Likelihood
Estimate
- Excel has support calculating the maximum
likelihood covariance matrix under
Tools -gt Data Analysis -gt Covariance. - The Covariance dialog is as follow
Input Data Sample For Covariance Matrix
Use Data Labels
Output Range for Covariance Matrix
13What To Do If The Data Does Not Have A Central
Tendency
- When the dataset we are dealing with does not
have a central it is normally possible to
transform it to a dataset that does - This is one of the reasons why we use returns
rather than price - The value of the FTSE 100 Index does not have a
central tendency, but we could say that the
proportional rate of growth in the index does - The absolute level might not have a central
tendency but the rate of growth in that level
might have
14What If The Behaviour Of The Variables Change
Across Time?
- This is a more fundamental problem
- The nature of the thing we are measuring is
changing across time - However it might be reasonable to assume that it
is changing slowly - Mean, Variance and Covariance might change
slowly. - Intuitively we can deal with this type of
non-stationarity by giving more recent estimates
more weight - We do not weight all observations equally
- More recent observations are more valuable
15Variable Weight Equations
- Let oij be the ith observation of the jth series,
wi be the ith weight attached to that observation
then we make our estimates thus
16Choosing The Weights
- The selection of the weights is subjective but
should reflect the basic idea that more recent
observations are given greater importance - The window over which we take our estimates is
also subjective - A formula used by some investment banks is
- This produces a series 0.5,0.33,0.25. decaying
geometrically
17Decaying Weights
The further in the past the observation the
smaller its weight in our calculation
Observation Weight
Time Offset
18Problems with the Max Likelihood Method
- The maximum likelihood method requires us to
estimate a large number of covariances ((N2 N)
/ 2) - Even if we have a large data set spurious
correlations can enter into our matrix by chance - Methods such as efficient frontier calculations
which seek to exploit idiosyncratic behaviour
will compound these spurious results! - Direct estimation of the covariances does not
explain why the cause of the link or covariance
19Factor Models
- One of the main technique used to overcome the
problems experienced using the maximum likelihood
method is factor modelling - Factor Models impose structure on the
relationships between the various elements of the
covariance matrix - This structure allows us to greatly reduce the
number of parameters we need to estimate - It also provides us with an explanation and
breakdown of the covariances not just their
magnitude
20Factor Model Formula
- The factor model seeks to describe an observed
outcome in terms of some underlying parameters - We will be dealing with linear factor models of
the form - oi bi1f1 bi2f2 biNfN ei
- where oi is the ith observed outcome
- biN is the sensitivity to the ith observed
outcome to the Nth factor - fN is the Nth factor
- ei is the unexplained random component of the
ith observation
21Factor Model Diagram
Input Factors
Observation explained by Factors
Factor Model
f1
f2
bA1f1 bA2f2 bA3f3
MA
f3
Actual Observation
OA
ei
Unexplained Noise
22Assumptions made by factor models
- The correlations between the error term (ei) and
the factors (f1, f2 etc) are 0 (this is
guaranteed if we use regression to estimate the
factor model) - The error terms between the two observations i
and j explained by the factor model uncorrelated
(Cov(ei,ej) 0) - This uncorrelated error assumption is vital if we
are to use factor models to estimate the
covariance matrix accurately since it states that
all the covariance is described by the factors - Uncorrelated errors are only guaranteed if we do
not leave important factors out of our model!
23Estimating Factor Models
- Once the factors have been selected we can use
standard linear regression techniques can be used
to estimate factor sensitivities - The problem can be reduced to finding the best
fit line or plane between the observations and
factors the slope of the line or plane are the
sensitivities (b1, b2)
24Explaining Covariance with Factor Models
- The relationships between different observations
can be explain in terms of their relationships to
the underlying factors - If observation A and observation B are correlated
then their correlation can be explained interms
of their underlying factors - By quantifying relationships interms of a factor
model we greatly reduce the number of parameters
we need to estimate
25Diagram of Factor Model Method Vs Maximum
Likelihood
Factor Model
Maximum Likelihood
Observations
Underlying Factor
Observations
Observations
All Observations are indirectly related by an
underlying Factor(s)
All Observations are directly related to each
other
26Deriving the Observation Covariances from the
Factor Model
- Once we have defined the factor model for a
series of observations we can generate the
implied variances and covariance for those
observations - For a 1 factor model
- Var(O1) b112 . Var(f1) Var(e1)
- Cov(O1,O2) b11.b21.Var(f1)
27Proof for the Variance of the Observation from
the Factors
- The 1 factor model
- O1 b11. f1 e1
- The Expection of the Observation
- E(O1) E(b11. f1 e1)
- b11. E(f1) E(e1) b11. E(f1)
- The Variance of the Observation
- Var(O1) E(b11. f1 e1 - b11. E(f1) )2
- b112.Var(f1) Var(e1)
- In General we do not use the factor model to
estimate the variance of the observation and
estimate it directly from the dataset. The Factor
Model does not simplify the task of estimating
variances in isolation.
28Proof for the Covariance of the Observations from
the Factors
- The 1 factor model
- O1 b11. f1 e1
- O2 b21. f1 e2
- Cov(O1,O2) Cov(b11. f1 e1, b21. f1 e2)
- E((b11.f1 - b11. E(f1) e1).(b21. f1
b21. E(f1) e2)) - E(b11. b21 .(f1 - E(f1)). (f1 E(f1)))
E(b11.(f1 - E(f1)).e2) - E(b21.(f1 - E(f1)).e1) E(e1. e2)
- b11.b21.Var(f1) 0 0 0
- b11.b21.Var(f1)
29Intuitive Explanation Of The Covariance Equation
- The link between observation O1 and O2 is via
factor f1. - The stronger the link between O1 and O2 the
stronger their links with the underlying factor
by which they are related (ie the larger b11 and
b21 the stronger their covariance). - The larger the variance of the underlying factor
the more it moves the related observations O1 and
O2 and in turn the larger their covariance. - If O1 and O2 are strongly related to a factor
with a very low variance they will have a low
covariance! (The factor never moves so the
observations never move in unison as a result.)
30Diagram Of The Derived Covariance
Factor
The larger the variance in the factor the more it
will move the observations
b1
b2
Observation 1
Observation 2
Covariance measures the relationship in movement
31Advantages and Disadvantages of Factor Models
- The main advantage of a factor model is that it
greatly reduces the number of parameters we need
to estimate
Model Params 2 Factor Model Max Likelihood
5 10 10
20 40 190
100 200 4950
- The disadvantage is we need to have some insight
into the mechanics of the observations to produce
the factor model
32Sharpes Single Factor Model
- Sharpes single factor model (also called the
market model) seeks to explain variations in the
returns on the ith asset in terms of variations
in the market - The market is represented by the appropriate
stock market index (eg FTSE 100) - It is closely related to the CAPM model widely
used in finance
33Formulation of the Sharpe Model
- The Sharpe Factor Model states that returns are
only deterministically related to the market
index and all other movement is noise unique to
that stock - Rit ai bi.RMt eit
- Ritis the return on the ith asset at time t
- RMt is the return on the market index at time t
- eit is the noise on the ith asset at time t
- ai is the difference between the expected return
on the market index and the expected return on
the ith asset - bi is the sensitivity of the return of the
return on ith assets to changes in the return on
the market index
34Calculating the Sharpe Factor Model Parameters
- The Sharpe Factor Model is calculated by using
linear regression between the returns on the ith
asset and the return on the market portfolio - Once we have parameterised the Sharpe model we
can use it to obtain an estimate for the
covariance matrix - We can estimate bi by taking
- Cov(Rit, RMt) / Var(RMt)
- This is the same as the OLS estimate of the
linear relationship between the return on the
asset and return on the market
35Generating a 2 by 2 Covariance Matrix
- Estimate b1 and b2 from the data set
- Estimate the variance of the market portfolio
returns Var(RM) from the data set - Estimate the variance of returns on asset 1 and 2
Var(R1) and Var(R2) from the data set
Var(R1) b2 . b1. Var(RM)
b1. b2 . Var(RM) Var(R2)
363 by 3 Covariance From Factor Model
Var(R1) b2. b1 . Var(RM) b3. b1 . Var(RM)
b1. b2 . Var(RM) Var(R2) b3. b2 . Var(RM)
b1. b3 . Var(RM) b2. b3 . Var(RM) Var(R3)