Title: Heavy Tails and Financial Time Series Models
1Heavy Tails and Financial Time Series Models
- Richard A. Davis
- Columbia University
- www.stat.columbia.edu/rdavis
-
-
- Thomas Mikosch
- University of Copenhagen
2Outline
- Financial time series modeling
- General comments
- Characteristics of financial time series
- Classical extreme value theory
- Extremal types
- Extension to stationary time series
- Extremal index
- Regular variation
- Multivariate case
- Point processes
- Applications
- GARCH and stochastic volatility processes
- Limit behavior of sample correlations
- Wrap-up
3Financial Time Series Modeling
2005 Neyman Lecture Dynamic Indeterminism in
Science by Brillinger contains the following
quote from Neyman. The essence of dynamic
indeterminism in science consists in an effort to
invent a hypothetical chance mechanism, called a
stochastic model, operating on various clearly
defined hypothetical entities, such that the
resulting frequencies of various possible
outcomes correspond approximately to those
actually observed. Neyman (1960), JASA
4Financial Time Series Modeling (cont)
- Two strategies for thinking about modeling
extremes in time series - Fit a model to the entire data set (e.g., GARCH
and SV for financial time series) and study the
extreme value behavior associated with the fitted
model as truth. - Construct and fit models only to the extremes
(e.g., observations exceeding a large threshold). - Do fitted models actually capture the desired
characteristics of the data? - How do we assess fitted (expected) with
observed? - Need a mechanism for measuring extremal
dependence. - Goal of this talk Focus on strategy 1 and
contrast some of the features of GARCH and SV
models as they relate to extremes including - Regular-variation of finite dimensional
distributions - Extreme value behavior
- Sample ACF behavior
5Financial Time Series Modeling
One possible goal Develop models that capture
essential features of financial data. Strategy
Formulate families of models that at least
exhibit these key characteristics (e.g., GARCH
and SV) Linkage with goal Do fitted models
actually capture the desired characteristics of
the real data? Answer wrt to GARCH and SV models
Yes and no. Answer may depend on the features.
- Goal of this talk compare and contrast some of
the features of GARCH and SV models, especially
as they relate to extremes, i.e., - Regular-variation of finite dimensional
distributions - Extreme value behavior
- Sample ACF behavior
6Financial Time Series Modeling (cont)
Bonus quote from Brillingers paper It seems to
me that the proper way of approaching economic
problems mathematically is by equations of the
above type, infinite or infinitesimal
differences, with coefficients that are not
constants, but random variables or what is
called random or stochastic equations. . . . The
theory of random differential and other
equations, and the theory or random curves are
just starting.
Neyman (1938), JASA
7Characteristics of financial time series
- Define Xt ln (Pt) - ln (Pt-1) (log returns)
- heavy tailed P(X1 gt x) RV(-a),
0 lt a lt 4. - uncorrelated near 0 for
all lags h gt 0 - Xt and Xt2 have slowly decaying
autocorrelations
converge to 0 slowly as h increases. - process exhibits volatility clustering.
8Example Pound-Dollar Exchange Rates (Oct 1,
1981 Jun 28, 1985 Koopman website)
9Example Pound-Dollar Exchange Rates Hills
estimate of alpha (Hill Horror plots-Resnick)
10Example Amazon-returns (May 16, 1997 June 16,
2004)
11Example Amazon-returnsHills estimate of alpha
(Hill Horror plots-Resnick)
12Simulated Realizations for the Amazon Data
15 realizations from GARCH model fitted to Amazon
exchange rate data. Which one is the real data?
13ACF Plots for Amazon
ACF of the squares from the 15 realizations from
the GARCH model on previous slide.
14Two models for log(returns)-cont
Xt st Zt (observation eqn in
state-space formulation) (i) GARCH(1,1) (General
AutoRegressive Conditional Heteroscedastic
observation-driven specification) (ii)
Stochastic Volatility (parameter-driven
specification)
Main question What intrinsic features in the
data (if any) can be used to discriminate between
these two models?
15Classical EVT Extremal Types Theorem
- Setup
- Xt IID(F)
- Mn maxX1,, Xn
- Convergence of types Now taking un anx bn, an
gt 0, - P (an-1(Mn bn ) ? x) Fn(anx bn)
- ? G(x)if
and only if n(1-F(anx bn)) ?
-log G(x)
- Theorem. If G is a nondegenerate distribution,
then G has to be one of the three types, - G(x) exp(-e-x) (Gumbel)
- G(x) exp(-x-a), x ? 0 (Fréchet)
- G(x) exp(-(-x)a), x ? 0 (Weibull)
16Classical EVT Domains of Attraction
Domains of attraction There are necessary and
sufficient conditions for F ? D(G) for the three
extreme value distributions. The heavy-tailed
Fréchet, which is perhaps the most commonly used
extreme value distribution, has the easiest
n.a.s. to state (and check!). In this case,
F ? D(exp(-x-a)) if and only if F is
RV(-a) for some a gt 0. Regular variation F is
RV(-a) if and only if for every x gt 0.
17Extension to Stationary Time Series
Let (Xt) is a strictly stationary sequence with
common df F ? D(G), i.e.,
Fn(anx bn) ? G(x). Theorem If (Xt) satisfies a
mixing condition (like strong mixing) and
P( an-1(Mn bn ) ? x) ?
H(x), H nondegenerate, then there exists a q ?
(0,1 such that
H(x)Gq(x). The parameter ? is called the
extremal index and is a measure of extremal
clustering.
18Extension to Stationary Time SeriesExtremal Index
- Fn(anx bn) ? G(x) P( an-1(Mn bn
) ? x) ? Gq(x). - Properties
- ? lt 1 implies clustering of exceedances
- 1/? is the mean cluster size of exceedances.
- In a certain sense, one can view ? as a measure
of statistical efficiency relative to the iid
case. That is, one needs 1/? more observations
to match the behavior of the iid case.
Specifically, - P(Mn/q ? x) Fn(x)
- Suppose c is a threshold such that Fn(c) .95
and ? .5. Then - P(Mn c) .951/2 .975
19Extension to Stationary Time SeriesExample
Example (max-moving average) Let (Zt) be iid with
a Pareto distribution, i.e., P(Z1 gt x) x-a for
x ?1, and set Xt max(Zt,
fZt-1), f ? 0,1. Then nP(X1
gt xn1/a ) ? (1fa)x-a and Fn(anx) ?
exp(-(1fa)x-a ). On the other hand P(
n-1/a Mn ? x) P( n-1/a max(Z0 ,, Zn) ? x) ?
exp(-x-a ). Thus ? 1/(1fa).
20Extension to Stationary Time SeriesExample
iid (pareto a 3)
max-moving average (f 1) q
1
q 1/2
Note that cluster size is exactly 2 in this case.
21Extension to Stationary Time SeriesMixing
Conditions
- Strong Mixing
- Remarks
- Since mixing is defined via s-fields, measurable
functions of (Xt) inherit the same mixing
property. For example, if the stationary sequence
(Xt) is strongly mixing, so are (Xt) and (Xt2)
with a rate function of similar order. - If (ak) decays to zero at an exponential rate,
(Xt) is strongly - mixing with geometric rate, i.e., the memory
between past and - future dies out exponentially fast.
- Strong mixing is much stronger than Leadbetters
dependence condition D(un).
22Extension to Stationary Time SeriesD
- Anti-clustering condition D(un) Think of un
as anx bn . - as k ? ?.
- Theorem If (Xt) satisfies D and D, F?D(G), then
q 1 (i.e., no clustering).Remarks - If (Xt) is iid, then the lim sup of the sum is
- limsupn n2/k P2(X1 gt un)
O(1/k). - If (Xt) is a stationary Gaussian process with
ACF r(h)o(1/log h), then D and D hold and there
is no clustering for Gaussian processes.
23Extension to Stationary Time SeriesExample
IID N(0,1/(1-.92))
AR(1) Xt .9 Xt-1 Zt, (Zt)IID N(0,1)
- Even though q 1, there appears to be some
clustering for small n. - Hsing, Hüsler, Reiss (1996) overcome this
problem for Gaussian processes by considering a
triangular array or rvs.
24Point Process Examplebaby steps
In particular, for one-dependent sequences,
P(X2 gt x X1 gt x) ? 1-q fa /(1 fa
). Point process convergence (max-moving
average) With ann1/a nP(Z1 gt anx)
? x-a and nP(X1 gt anx) ?(1fa)x-a Define the
sequence of point processes by From the
convergence one can show
25Point Process Examplebaby steps
Applying the continuous mapping theorem (need to
be careful), we have
0
Red Gk-1/a, k1,,5 Blue .75 Gk-1/a, k1,,5
26Regular Variation univariate case
Def The random variable X is regularly varying
with index a if
P(Xgt t x)/P(Xgtt) ? x-a and P(Xgt t)/P(Xgtt)
?p, or, equivalently, if P(Xgt t
x)/P(Xgtt) ? px-a and P(Xlt -t x)/P(Xgtt) ?
qx-a , where 0 ? p ? 1 and pq1.
Equivalence X is RV(-a) if and only if
P(X ? t ? ) /P(Xgtt)?v m(? ) (?v vague
convergence of measures on R\0). In this case,
m(dx) (pa x-a-1 I(xgt0) qa (-x)-a-1 I(xlt0))
dx Note m(tA) t-a m(A) for every t and A
bounded away from 0.
27Regular Variation univariate case
Another formulation (polar coordinates) Define
the ? 1 valued rv q, P(q 1) p, P(q -1)
1- p q. Then X is RV(-a) if and only
if or (?v vague convergence of measures on
S0 -1,1).
28Regular Variation multivariate case
- Multivariate regular variation of X(X1, . . . ,
Xm) There exists a random vector q ? Sm-1 such
that - P(Xgt t x, X/X ? ? )/P(Xgtt) ?v
x-a P(q ? ? ) - (?v vague convergence on Sm-1, unit sphere in Rm)
. - P( q ??) is called the spectral measure
- a is the index of X.
Equivalence m is a measure on Rm which
satisfies for x gt 0 and A bounded away from 0,
m(xB) x-a m(xA).
29Regular Variation multivariate case
Examples 1. If X1 and X2 are iid RV(-a), then
X (X1, X2 ) is multivariate regularly varying
with index a and spectral distribution (assuming
symmetry) P( q pk/2) ¼
k1,2,3,4 (mass on axes). Interpretation
Unlikely that X1 and X2 are very large at the
same time.
Figure plot of (Xt1,Xt2) for realization of
10,000.
302. If X1 X2 gt 0, then X (X1, X2 ) is
multivariate regularly varying with index a and
spectral distribution P( q
p/4) 1. 3. AR(1) Xt .9 Xt-1 Zt , ZtIID
t(3) P(q ?arctan(.9)) .9898 P(q
? p/2) ) .0102
31Figure plot of (Xt, Xt1) for realization of
10,000. Xt .9 Xt-1 Zt
32Estimation of a and q
The marginal distribution F for heavy-tailed data
is often modeled using Pareto-like tails,
1-F(x) x-aL(x), for x large, where
L(x) is a slowly varying function (L(xt)/ L(x)?1,
as x ?1). Now if X F, then P(log X
gt x) P(X gt exp(x))exp(-ax), and hence log X
has an approximate exponential distribution for
large x. The spacings, log(X(n-j))
- log(X(n-j-1)), j0,1,2,. . . ,m, from a sample
of size n from an exponential distr are
approximately independent and Exp(a(j1))
distributed. This suggests estimating a-1 by
33Hills estimate of a
Def The Hill estimate of a for heavy-tailed data
with distribution given by
1-F(x) x-aL(x), is
The asymptotic variance of this estimate for a
is and
estimated by (See also GPDgeneralized Pareto
distribution.)
34Hills estimate of a
For a bivariate series, we will estimate a for
the univariate series using the Euclidean norm of
the two components.
35Hills estimate of a
36Estimation of the spectral distribution of q
Based on the relation P(Xgt t x, X/X
? ? )/P(Xgtt) ?v x-a P(q ? ? ) a naïve estimate
of the distribution of q is based on the angular
components Xt/Xt in the sample. One simply uses
the empirical distribution of these angular
pieces for which the modulus Xt exceeds some
large threshold. In the examples given below, we
use a kernel density estimate of these angular
components for those observations whose moduli
exceed some large threshold. Here we only
consider two components, i.e., q is one
dimensional.
37Estimation of the spectral distribution of q
38Estimation of q
Vertical lines on right are at arctan(.9) and
arctan(.9) -p
39Examples of Processes that are Regular Varying
GARCH(1) Xt(a0a1 X2t-1 b1s2 t-1)1/2Zt,
ZtIID. a found by solving Ea1
Z2 b1a/2 1. ARCH(1) case a1
.312 .577 1.00 1.57
a 8.00 4.00 2.00 1.00
Distr of q P(q ? ?) E(B,Z) a
I(arg((B,Z)) ? ?)/ E(B,Z)a where
P(B 1) P(B -1) .5
40Examples of Processes that are Regular Varying
Example of ARCH(1) a01, a 11, a 2, Xt(a0
a1 X2t-1)1/2Zt, ZtIID
Figures plots of (Xt, Xt1) and estimated
distribution of a for realization of 10,000.
41Example SV model Xt st Zt
Examples of Processes that are Regular Varying
Suppose Zt RV(-a) and Then Zn(Z1,,Zn) is
regulary varying with index a and so is
Xn (X1,,Xn) diag(s1,, sn) Zn with
spectral distribution concentrated on (?1,0), (0,
?1).
Figure plot of (Xt,Xt1) for realization of
10,000.
42Point process Convergence
Theorem (Davis Hsing 95, Davis Mikosch 97).
Let Xt be a stationary sequence of random
m-vectors. Suppose (i) finite dimensional
distributions are jointly regularly varying (let
(q-k, . . . , qk) be the vector in S(2k1)m-1 in
the definition). (ii) mixing condition A (an) or
strong mixing. (iii) Then
(extremal index) exists. If q gt 0,
then
43Point process convergence(cont)
- (Pi) are points of a Poisson process on (0,?)
with intensity function
n(dy)qay-a-1dy. - , i ? 1, are iid point process with
distribution Q, and Q is the weak limit of
Remarks 1. GARCH and SV processes satisfy
the conditions of the theorem. 2. Limit
distribution for sample extremes and sample ACF
follows from this theorem.
44Extremes for GARCH and SV processes
- Setup
- Xt st Zt , Zt IID (0,1)
- Xt is RV (-a)
- Choose an s.t. nP(Xt gt an) ?1
- Then
Then, with Mn maxX1, . . . , Xn, (i)
GARCH
g is extremal index ( 0 lt g lt
1). (ii) SV model
extremal index g 1
no clustering.
45Extremes for GARCH and SV processes (cont)
Absolute values of ARCH
46Extremes for GARCH and SV processes (cont)
Absolute values of SV process
47Summary of results for ACF of GARCH(p,q) and SV
models
GARCH(p,q)
a?(0,2) a?(2,4) a?(4,?) Remark Similar
results hold for the sample ACF based on Xt and
Xt2.
48Summary of results for ACF of GARCH(p,q) and SV
models (cont)
SV Model
a?(0,2) a?(2, ?)
49Sample ACF for GARCH and SV Models (1000 reps)
50Sample ACF for Squares of GARCH (1000 reps)
(a) GARCH(1,1) Model, n10000
51Sample ACF for Squares of SV (1000 reps)
52Example Amazon-returns (May 16, 1997 June 16,
2004)
53Amazon returns (GARCH model)
GARCH(1,1) model fit to Amazon returns a0
.00002493, a1 .0385, b1 .957, Xt(a0a1
X2t-1)1/2Zt, ZtIID t(3.672)
Simulation from GARCH(1,1) model
54Amazon returns (SV model)
Stochastic volatility model fit to Amazon
returns simulation based on fitted model.
55Application to Crystal River
River flow rate for Crystal River located in the
mountain of Western Colorado (see Cooley et al.
(2007)). After deasonalizing the data, we obtain
728 weekly observations from Oct 1, 1990 to Oct
1, 2005.
56Application to Crystal River
Estimates of a and the distribution of q for
bivariate pairs (Xt-1,Xt)
Vertical lines at p/4 and p/4 - p
57The Extremogram
The extremogram of a stationary time series (Xt)
can be viewed as the analogue of the correlogram
for measuring dependence in extremes (see Davis
and Mikosch (2008)). Definition For two sets A
B bounded away from 0, the extremogram is
defined as rA,B(h)
limn??P(an-1X0 ? A, an-1Xh ? B)/ P(an-1X0 ? A) In
many examples, this can be computed explicitly.
If one takes AB(1,?), then
rA,B(h) limx?? P(Xh gtx, X0 gtx)
l(X0,Xh) often called the extremal dependence
coefficient (l 0 means independence or
asymptotic independence).
58The Extremogram
The extremogram is estimated via the empirical
extremogram defined by where m?? with m/n ?0.
Note that the limit of the expectation of the
numerator is mP (am-1X0 ?
A, am-1Xh ? B) ? m(A?B), where m is the measure
defined in the statement of regular variation.
Hence the empirical estimate is asymptotically
unbiased. Under suitable mixing conditions, a
CLT for the empirical estimate is established in
M (2008).
59Application to Crystal River
Extremogram for Crystal River A B (1,?)
60Application to Crystal River
Fit an AR(6) model to the data (remove all
appreciable autocorrelation in the data). Now
we estimate the distribution of q and the
extremogram based on the residuals.
Vertical lines at -p/2, 0, and p/2
61Application to Crystal River
There is still a touch of autocorrelation in the
absolute values and squares of the residuals. We
remove these by fitting a GARCH model to these
residuals. The degrees of freedom for the noise
was 3.43
62Wrap-up
- Regular variation is a flexible tool for
modeling both dependence and tail heaviness. - Useful for establishing point process
convergence of heavy-tailed time series. - Extremal index g lt 1 for GARCH and g 1 for SV.
- ACF has faster convergence for SV.