Title: Portfolio Selection, Multivariate Regression, and Complex Systems
1Portfolio Selection, Multivariate Regression,
and Complex Systems
-
- Imre Kondor
- Collegium Budapest and Eötvös University,
Budapest - IUPAP STATPHYS23 Conference
- Genova, Italy, July 9-13, 2007
2Coworkers
- Szilárd Pafka (Paycom.net, California)
- Gábor Nagy (CIB Bank, Budapest)
- Nándor Gulyás (Collegium Budapest)
- István Varga-Haszonits (Morgan-Stanley Fixed
Income, Budapest) - Andrea Ciliberti (Science et Finance, Paris)
- Marc Mézard (Orsay University)
- Stefan Thurner (Vienna University
3Summary
- The subject of the talk lies at the crossroads of
finance, statistical physics, and statistics - The main message
- - portfolio selection is highly unstable the
estimation error diverges for a critical value of
the ratio of the portfolio size N and the length
of the time series T, - - this divergence is an algorithmic phase
transition that is characterized by universal
scaling laws, - - multivariate regression is equivalent to
quadratic optimization, so concepts, methods, and
results can be taken over to the regression
problem, - - when applied to complex phenomena, the
classical problems with regression (hidden
variables, correlations, non-Gaussian noise) are
supplemented by the high number of the
explicatory variables and the scarcity of data, - - so modelling is often attempted in the vicinity
of, or even below, the critical point.
4Rational portfolio selection seeks a tradeoff
between risk and reward
- In this talk I will focus on equity portfolios
- Financial reward can be measured in terms of the
return (relative gain) - or logarithmic return
- The characterization of risk is more controversial
5The most obvious choice for a risk measure
Variance
- Its use for a risk measure assumes that the
probability distribution of returns is
sufficiently concentrated around the average,
that there are no large fluctuations - This is true in several instances, but we often
encounter fat tails, huge deviations with a
non-negligible probability which necessitates the
use of alternative risk measures
6The most obvious choice for a risk measure
Variance
- Its use for a risk measure assumes that the
probability distribution of returns is
sufficiently concentrated around the average,
that there are no large fluctuations - This is true in several instances, but we often
encounter fat tails, huge deviations with a
non-negligible probability which necessitates the
use of alternative risk measures.
7Portfolios
- A portfolio is a linear combination (a weighted
average) of assets -
-
- with a set of weights wi that add up to unity
(the budget constraint) -
-
- The weights are not necessarily positive short
selling - The fact that the weights can be arbitrary means
that the region over which we are trying to
determine the optimal portfolio is not bounded
8Portfolios
- A portfolio is a linear combination (a weighted
average) of assets -
-
- with a set of weights wi that add up to unity
(the budget constraint) -
-
- The weights are not necessarily positive short
selling - The fact that the weights can be arbitrary means
that the region over which we are trying to
determine the optimal portfolio is not bounded
9Portfolios
- A portfolio is a linear combination (a weighted
average) of assets -
-
- with a set of weights wi that add up to unity
(the budget constraint) -
-
- The weights are not necessarily positive short
selling - The fact that the weights can be arbitrary means
that the region over which we are trying to
determine the optimal portfolio is not bounded
10Markowitz portfolio selection theory
- The tradeoff between risk and reward is realized
by minimizing the variance - over the weights, given the expected return,
the budget constraint, and possibly other
costraints.
11How do we know the returns and the covariances?
- In principle, from observations on the market
- If the portfolio contains N assets, we need O(N²)
data - The input data come from T observations for N
assets - The estimation error is negligible as long as
NTgtgtN², i.e. NltltT - This condition is often violated in practice
12How do we know the returns and the covariances?
- In principle, from observations on the market
- If the portfolio contains N assets, we need O(N²)
data - The input data come from T observations for N
assets - The estimation error is negligible as long as
NTgtgtN², i.e. NltltT - This condition is often violated in practice
13How do we know the returns and the covariances?
- In principle, from observations on the market
- If the portfolio contains N assets, we need O(N²)
data - The input data come from T observations for N
assets - The estimation error is negligible as long as
NTgtgtN², i.e. NltltT - This condition is often violated in practice
14How do we know the returns and the covariances?
- In principle, from observations on the market
- If the portfolio contains N assets, we need O(N²)
data - The input data come from T observations for N
assets - The estimation error is negligible as long as
NTgtgtN², i.e. NltltT - This condition is often violated in practice
15How do we know the returns and the covariances?
- In principle, from observations on the market
- If the portfolio contains N assets, we need O(N²)
data - The input data come from T observations for N
assets - The estimation error is negligible as long as
NTgtgtN², i.e. NltltT - This condition is often violated in practice
16Information deficit
- Thus the Markowitz problem suffers from the
curse of dimensions, or from information
deficit - The estimates will contain error and the
resulting portfolios will be suboptimal
17Information deficit
- Thus the Markowitz problem suffers from the
curse of dimensions, or from information
deficit - The estimates will contain error and the
resulting portfolios will be suboptimal
18Fighting the curse of dimensions
- Economists have been struggling with this problem
for ages. Since the root of the problem is lack
of sufficient information, the remedy is to
inject external info into the estimate. This
means imposing some structure on s. This
introduces bias, but beneficial effect of noise
reduction may compensate for this. - Examples
- single-factor models (ßs) All these
help to - multi-factor models various degrees.
- grouping by sectors Most studies are
based - principal component analysis on
empirical data - Bayesian shrinkage estimators, etc.
- Random matrix theory
19Our approach
- Analytical Applying the methods of statistical
physics (random matrix theory, phase transition
theory, replicas, etc.) - Numerical To test the noise sensitivity of
various risk measures we use simulated data - The rationale is that in order to be able to
compare the sensitivity of various risk measures
to noise, we better get rid of other sources of
uncertainty, like non-stationarity. This can be
achieved by using artificial data where we have
total control over the underlying stochastic
process. - For simplicity, we mostly use iid normal
variables in the following.
20Our approach
- Analytical Applying the methods of statistical
physics (random matrix theory, phase transition
theory, replicas, etc.) - Numerical To test the noise sensitivity of
various risk measures we use simulated data - The rationale is that in order to be able to
compare the sensitivity of various risk measures
to noise, we better get rid of other sources of
uncertainty, like non-stationarity. This can be
achieved by using artificial data where we have
total control over the underlying stochastic
process. - For simplicity, we mostly use iid normal
variables in the following.
21Our approach
- Analytical Applying the methods of statistical
physics (random matrix theory, phase transition
theory, replicas, etc.) - Numerical To test the noise sensitivity of
various risk measures we use simulated data - The rationale is that in order to be able to
compare the sensitivity of various risk measures
to noise, we better get rid of other sources of
uncertainty, like non-stationarity. This can be
achieved by using artificial data where we have
total control over the underlying stochastic
process. - For simplicity, we mostly use iid normal
variables in the following.
22Our approach
- Analytical Applying the methods of statistical
physics (random matrix theory, phase transition
theory, replicas, etc.) - Numerical To test the noise sensitivity of
various risk measures we use simulated data - The rationale is that in order to be able to
compare the sensitivity of various risk measures
to noise, we better get rid of other sources of
uncertainty, like non-stationarity. This can be
achieved by using artificial data where we have
total control over the underlying stochastic
process. - For simplicity, we mostly use iid normal
variables in the following.
23- For such simple underlying processes the exact
risk measure can be calculated. - To construct the empirical risk measure
- we generate long time series, and cut out
segments of length T from them, as if making
observations on the market. - From these observations we construct the
empirical risk measure and optimize our portfolio
under it.
24- For such simple underlying processes the exact
risk measure can be calculated. - To construct the empirical risk measure
- we generate long time series, and cut out
segments of length T from them, as if making
observations on the market. - From these observations we construct the
empirical risk measure and optimize our portfolio
under it.
25- For such simple underlying processes the exact
risk measure can be calculated. - To construct the empirical risk measure
- we generate long time series, and cut out
segments of length T from them, as if making
observations on the market. - From these observations we construct the
empirical risk measure and optimize our portfolio
under it.
26The ratio qo of the empirical and the exact risk
measure is a measure of the estimation error due
to noise
27- The relative error of the optimal portfolio
is a random variable, fluctuating from sample to
sample. - The weights of the optimal portfolio also
fluctuate.
28The distribution of qo over the samples
29Critical behaviour for N,T large, with N/Tfixed
- The average of qo as a function of N/T can be
calculated from random matrix theory it diverges
at the critical point N/T1 -
30The standard deviation of the estimation error
diverges even more strongly than the average
31Instability of the weigthsThe weights of a
portfolio of N100 iid normal variables for a
given sample, T500
32The distribution of weights in a given sample
- The optimization hardly determines the weights
even far from the critical point! - The standard deviation of the weights relative to
their exact average value also diverges at the
critical point -
33If short selling is banned
- If the weights are constrained to be positive,
the instability will manifest itself by more and
more weights becoming zero the portfolio
spontaneously reduces its size! - Explanation the solution would like to run away,
the constraints prevent it from doing so,
therefore it will stick to the walls. - Similar effects are observed if we impose any
other linear constraints, like limits on sectors,
etc. - It is clear, that in these cases the solution is
determined more by the constraints (and the
experts who impose them) than the objective
function.
34If short selling is banned
- If the weights are constrained to be positive,
the instability will manifest itself by more and
more weights becoming zero the portfolio
spontaneously reduces its size! - Explanation the solution would like to run away,
the constraints prevent it from doing so,
therefore it will stick to the walls. - Similar effects are observed if we impose any
other linear constraints, like limits on sectors,
etc. - It is clear, that in these cases the solution is
determined more by the constraints (and the
experts who impose them) than the objective
function.
35If short selling is banned
- If the weights are constrained to be positive,
the instability will manifest itself by more and
more weights becoming zero the portfolio
spontaneously reduces its size! - Explanation the solution would like to run away,
the constraints prevent it from doing so,
therefore it will stick to the walls. - Similar effects are observed if we impose any
other linear constraints, like limits on sectors,
etc. - It is clear, that in these cases the solution is
determined more by the constraints (and the
experts who impose them) than the objective
function.
36If short selling is banned
- If the weights are constrained to be positive,
the instability will manifest itself by more and
more weights becoming zero the portfolio
spontaneously reduces its size! - Explanation the solution would like to run away,
the constraints prevent it from doing so,
therefore it will stick to the walls. - Similar effects are observed if we impose any
other linear constraints, like limits on sectors,
etc. - It is clear, that in these cases the solution is
determined more by the constraints (and the
experts who impose them) than the objective
function.
37If the variables are not iid
- Experimenting with various market models
(one-factor, market plus sectors, positive and
negative covariances, etc.) shows that the main
conclusion does not change a manifestation of
universality - Overwhelmingly positive correlations tend to
enhance the instability, negative ones decrease
it, but they do not change the power of the
divergence, only its prefactor
38If the variables are not iid
- Experimenting with various market models
(one-factor, market plus sectors, positive and
negative covariances, etc.) shows that the main
conclusion does not change a manifestation of
universality. - Overwhelmingly positive correlations tend to
enhance the instability, negative ones decrease
it, but they do not change the power of the
divergence, only its prefactor
39After filtering the noise is much reduced, and we
can even penetrate into the region below the
critical point TltN . BUT the weights remain
extremely unstable even after filtering
ButButBUT
40Similar studies under alternative risk measures
mean absolute deviation, expected shortfall and
maximal loss
- Lead to similar conclusions, except that the
effect of estimation error is even more serious - In addition, no convincing filtering methods
exist for these measures - In the case of coherent measures the existence of
a solution becomes a probabilistic issue,
depending on the sample - Calculation of this probability leads to some
intriguing problems in random geometry that can
be solved by the replica method.
41A wider context
- The critical phenomena we observe in portfolio
selection are analogous to the phase transitions
discovered recently in some hard computational
problems, they represent a new random Gaussian
universality class within this family, where a
number of modes go soft in rapid succession, as
one approaches the critical point. - Filtering corresponds to discarding these soft
modes.
42A wider context
- The critical phenomena we observe in portfolio
selection are analogous to the phase transitions
discovered recently in some hard computational
problems, they represent a new random Gaussian
universality class within this family, where a
number of modes go soft in rapid succession, as
one approaches the critical point. - Filtering corresponds to discarding these soft
modes.
43- The appearence of powerful tools borrowed from
statistical physics (random matrices, phase
transition concepts, scaling, universality,
replicas) is an important development that
enriches finance theory
44More generally
- The sampling error catastrophe, due to lack of
sufficient information, appears in a much wider
set of problems than just the problem of
investment decisions (multivariate regression,
stochastic linear progamming and all their
applications.) - Whenever a phenomenon is influenced by a large
number of factors, but we have a limited amount
of information about this dependence, we have to
expect that the estimation error will diverge and
fluctuations over the samples will be huge.
45Optimization and statistical mechanics
- Any convex optimization problem can be
transformed into a problem in statistical
mechanics, by promoting the cost (objective,
target) function into a Hamiltonian, and
introducing a fictitious temperature. At the end
we can recover the original problem in the limit
of zero temperature. - Averaging over the time series segments (samples)
is similar to what is called quenched averaging
in the statistical physics of random systems one
has to average the logarithm of the partition
function (i.e. the cumulant generating function). - Averaging can then be performed by the replica
trick
46Portfolio optimization and linear regression
47Linear regression
48Equivalence of the two
49Translation
50Minimizing the residual error for an infinitely
large sample
51Minimizing the residual error for a sample of
length T
52The relative error
53Summary
- If we do not have sufficient information we
cannot make an intelligent decision, nor can we
build a good model so far this is a triviality - The important message here is that there is a
critical point in both the optimization problem
and in the regression problem where the error
diverges, and its behaviour is subject to
universal scaling laws
54A few remarks on modeling complex systems
55- Normally, one is supposed to work in the NltltT
limit, i.e. with low dimensional problems and
plenty of data. - Modern portfolio management (e.g. in hedge funds)
forces us to consider very large portfolios, but
the amount of input information is always
limited. So we have N T, or even NgtT. - Complex systems are very high dimensional and
irreducible (incompressible), they require a
large number of explicatory variables for their
faithful representation. - The dimensionality of the minimal model providing
an acceptable representation of a system can be
regarded as a measure of the complexity of the
system. (Cf. Kolmogorov Chaitin measure of the
complexity of a string. Also Jorge Luis Borges
map.)
56- Therefore, we have to face the unconventional
situation also in the regression problem that
NT, or NgtT, and then the error in the regression
coefficients will be large. - If the number of explicatory variables is very
large and they are all of the same order of
magnitude, then there is no structure in the
system, it is just noise (like a completely
random string). So we have to assume that some of
the variables have a larger weight than others,
but we do not have a natural cutoff beyond which
it would be safe to forget about the higher order
variables. This leads us to the assumption that
the regression coefficients must have a scale
free, power law like distribution for complex
systems.
57- The regression coefficients are proportional to
the covariances of the dependent and independent
variables. A power law like distribution of the
regression coefficients implies the same for the
covariances. - In a physical system this translates into the
power law like distribution of the correlations. - The usual behaviour of correlations in simple
systems is not like this correlations fall off
typically exponentially.
58- Exceptions systems at a critical point, or
systems with a broken continuous symmetry. Both
these are very special cases, however. - Correlations in a spin glass decay like a power,
without any continuous symmetry! - The power law like behaviour of correlations is a
typical behaviour in the spin glass phase, not
only on average, but for each sample. - A related phenomenon is what is called chaos in
spin glasses. - The long range correlations and the multiplicity
of ground states explain the extreme sensitivity
of the ground states the system reacts to any
slight external disturbance, but the statistical
properties of the new ground state are the same
as before this is a kind of adaptation or
learning process.
59- Other complex systems? Adaptation, learning,
evolution, self-reflexivity cannot be expected to
appear in systems with a translationally
invariant and all-ferromagnetic coupling. Some of
the characteristic features of spin glasses
(competition and cooperation, the existence of
many metastable equilibria, sensitivity, long
range correlations) seem to be necessary minimal
properties of any complex system. - This also means that we will always face the
information deficite catastrophe when we try to
build a model for a complex system.
60- How can we understand that people (in the social
sciences, medical sciences, etc.) are getting
away with lousy statistics, even with NgtT? - They are projecting external information into
their statistical assessments. (I can draw a
well-determined straight line across even a
single point, if I know that it must be parallel
to another line.) - Humans do not optimize, but use quick and dirty
heuristics. This has an evolutionary meaning if
something looks vaguely like a leopard, one
jumps, rather than trying to seek the optimal fit
to the observed fragments of the picture to a
leopard.
61- Prior knowledge, the larger picture, deliberate
or unconscious bias, etc. are essential features
of model building. - When we have a chance to check this prior
knowledge millions of times in carefully designed
laboratory experiments, this is a well-justified
procedure. - In several applications (macroeconomics, medical
sciences, epidemology, etc.) there is no way to
perform these laboratory checks, and errors may
build up as one uncertain piece of knowledge
serves as a prior for another uncertain
statistical model. This is how we construct
myths, ideologies and social theories.
62- It is conceivable that theory building, in the
sense of constructing a low dimensional model,
for social phenomena will prove to be impossible,
and the best we will be able to do is to build a
life-size computer model of the system, a kind of
gigantic Simcity. - It remains to be seen what we will mean by
understanding under those circumstances.