Title: Lszl Mrkus and Pter Elek
1 Nonlinear time series for modelling river flows
and their extremes
- László Márkus and Péter Elek
- Dept. Probability Theory and Statistics
- Eötvös Loránd University
- Budapest, Hungary
2River Tisza and its aquifer
3Tisza produces devastating floods
4(No Transcript)
5Water discharge at Vásárosnamény(We have 5 more
monitoring sites)from1901-2000
6Basic properties of the series
- Series exhibit
- slight upward trend in mean
- strong seasonal component both in mean and
variance - Seasonal components can be removed by a local
polinomial fitting (LOESS) procedure - The distribution of the standardized series are
still periodic. - The distribution is skewed.
7Empirical and smoothed seasonal components
8Bi-monthly distributions
9Autocorrelation function is slowly decaying
10How long does the river remember...?
- Long memory - known ever since Hursts original
work on the Aswan dam (1952) of the Nile River - Short and long memory - autocorrelations die out
at exponential versus hyperbolic rate - Equivalent for long memory - spectral density has
pole at zero, meaning it tends to infinity at
zero at polynomial speed.
11Long memory processes
- Short and long memory - autocorrelations die out
at exponential versus hyperbolic rate - Equivalent for long memory - spectral density has
pole at zero, meaning it tends to infinity at
zero at polynomial speed.
12Indicators of long memory
- Nonparametric statistics
- Rescaled adjusted range or R/S
- Classical
- Los (test)
- Taqqus graphical (robust)
- Variance plot
- Log-periodogram (Geweke-Porter Hudak)
13- Classic R/S biased, lack of distribution theory,
no test or confidence intervals, sensitive for
short mem component if present. - Los modification - convergence to Brownian
Bridge. Test of long memory often accepts short
memory when long is present, but reliably rejects
long mem when only short is present. - Taqqus graphical R/S get out the most of the
data drop blocks from the beginnig, reestimate
R/S from the shorter data, log-log plot all
together and fit a straight line (regression)
14(No Transcript)
15Variance plot
- The variance of the mean tends to zero as n2H-2
- Estimate the variance of means from the
aggregated processes - Fit a straight line on the log-log scale
- Determine the slope from a linear regression
16The log - spectrum
- The order of the pole of the spectral density is
1-2H - On the log-log scaleplot of the periodogram this
means a straight line of steepness 1-2H - The celebrated Geweke - Porter Hudak statistics
uses log(sin2(?/2)) instead of the log of the
frequencies.
17(No Transcript)
18Linear long-memory model fractional
ARIMA-process (Montanari et al., Lago Maggiore,
1997)
- Fractional ARIMA-model
-
- Fitting is done by Whittle-estimator
- based on the empirical and theoretical
periodogram - quite robust consistent and asymptotically
normal for linear processes driven by innovatons
with finite forth moments (Giraitis and
Surgailis, 1990)
19Results of fractional ARIMA fit
- H0.846 (standard error 0.014)
- p-value 0.558 (indicates goodness of fit)
- Innovations can be reconstructed using a linear
filter (the inverse of the filter above)
20Reconstruct the innovation from the fitted model
21The density of the innovations
22Reconstructed innovations are uncorrelated...
23Simulations using i.i.d. innovations
- If we assume that innovations are i.i.d, we can
generate synthetic series - Use resampling to generate synthetic innovations
- Apply then the linear filter
- Add the sesonal components to get a synthetic
streamflow series - But these series do not approximate well the
high quantiles of the original series
24But they fail to catch the densities and
underestimate the high quantiles of the original
series
25Logarithmic linear model
- It is quite common to take logarithm in order to
get closer to the normal distribution - It is indeed the case here as well
- Even the simulated quantiles from a fitted linear
model seem to be almost acceptable
26- But the backtransformed quantiles are clearly
unrealistic
27Lets have a closer look at innovations
- Innovations can be regarded as shocks to the
linear system - Few properties
- Squared and absolute values are autocorrelated
- Skewed and peaked marginal distribution
- There are periods of high and low variance
- All these point to a GARCH-type model
- The classical GARCH is far too heavy tailed to
our purposes
28Simulation from the GARCH-process
- Simulations
- Generate i.i.d. series from the estimated
GARCH-residuals - Then simulate the GARCH(1,1) process using these
residuals - Apply the linear filter and add the seasonalities
- The simulated series are much heavier-tailed than
the original series
29General form of GARCH-models
- Need a model between
- i.i.d.-driven FARIMA-series and
- GARCH(1,1)-driven FARIMA-series
- General form of GARCH-models
30A smooth transition GARCH-model
31ACF of GARCH-residuals
32Results of simulationsat Vásárosnamény
33Back to the original GARCH philosophy
- The above described GARCH model is somewhat
artificial, and hard to find heuristic
explanations for it - why does the conditional variance depend on the
innovations of the linear filter? - in the original GARCH-context the variance is
dependent on the lagged values of the process
itself. - A possible solution condition the variance on
the lagged discharge process instead ! - The fractional integration does not seem to be
necessary - almost the same innovations as from an ARMA(3,1)
- In extreme value theory long memory in linear
models does not make a difference
34Estimated variance of innovations plotted against
the lagged discharge
- Spectacularly linear relationship
- This approves the new modelling attempt
- Distorted at sites with damming along Tisza River
35 The variance is not conditional on the lagged
innovation but it is conditional on the lagged
water discharge.
36- Theoretical problems arise in the new model
- Existence of a stationary solution
- Finiteness of all moments
- Consistence and asymptotic normality of the
quasi max-likelihood estimators - Heuristically clearer explanation can be given
- The discharge is indicative of the saturation of
the watershed - A saturated watershed results in more
straightforward reach for precipitation to the
river, hence an increase in the water supply. - A saturated watershed gives away water quicker.
- The possible changes are greater and so is the
uncertainty for the next discharge value.
37Existence of the stationary solution
- We assume that ct constant
- The model has a unique stationary solution if the
corresponding linear model is stationary (i.e.
all roots of the characteristic equation lie
within the unit circle). - This is in contrast to the usual case with
quadratic ARCH-type innovations. - Moreover, if the m-th moment of Zt is finite then
the same holds for the stationary distribution of
Xt, too.
38Sketch of the proof I.
- The process can be embedded into a
(pq)-dimensional Markov-chain YtAYt-1Et - where Yt(Xt, Xt-1, ...Xt-p1, et, et-1,...,
et-q1) and Et(et, 0, 0, ...). - Yt is aperiodic and irreducible (under some
technical conditions) - The general condition for geometric ergodicity
(Meyn and Tweedie, 1993) there exists a V?1
function with 0lt?lt1, blt? and C compact set, - E( V(Y1) Y0y ) ? (1-?) V(y) b IC(y)
- In other words V is bounded on a compact set and
is a contraction outside it. - Moreover, E?(V(Yt)) is finite, where ? denotes
the stationary distribution.
39Sketch of the proof II.
- In the given case if E(Ztm) is finite
- V(y) 1 QPymm will suffice,
- where BPAP-1 is the real valued block
Jordan-decomposition of A and Q is an
appropriately chosen diagonal matrix with
positive elements. - This also implies the finiteness of the mth
moment of Xt.
40Estimation
- Estimation of the ARMA-filter can be carried out
by least squares. (Essentially only the
uncorrelatedness of innovations is needed for its
consistency. Additionally, finite fourth moment
is needed for asymptotic normality, Francq and
Zakoian, 1998.) - The ARCH-equation is estimated by quasi maximum
likelihood, using the innovations estimated from
the ARMA-filter.
41Estimation of the ARCH-equation in the case of
known innovations I.(along the lines of
Kristensen and Rahbek, 2005)
- Maximising the Gaussian log-likelihood
- we obtain the QML-estimator of ?.
42Consistency of the estimator
- By the ergodic theorem
- It is easy to check that L(a)ltL(a0)where a0
denotes the true parameter value. - All conditions of the usual consistency result
(e.g. Pfanzagl, 1969) are satisfied hence our
estimator is consistent.
43Asymptotic normality I.
- Using the notations
- for the derivatives of the log-likelihood
- for the information matrix
- and for the expected Hessian
44Asymptotic normality II.
- Standard Taylor-expansion implies
- Finiteness of the fourth moment with a martingale
convergence argument yields
45Estimation when ?t is not known
- Now the estimated innovation appears in the
log-likelihood - If the ARMA-parameters are estimated
consistently, the difference - tends to zero. Thus the QML estimator is
consistent.
46Asymptotic normality for estimated innovations
- When the estimated ARMA-parameters are
asymptotically normal, the difference of squared
innovations tends to zero even if normalised by
n1/2 instead of n - This way the differences of the first and second
derivatives both converge to zero - As a result, all the arguments for asymptotic
normality given above remain valid.
47Estimation results
48How to simulate the residuals of the new
GARCH-type model
- Residuals are highly skewed and peaked.
- Simulation
- Use resampling to simulate from the central
quantiles of the distribution - Use Generalized Pareto distribution to simulate
from upper and lower quantiles - Use periodic monthly densities
49The simulation process
resampling and GPD
Zt
smoothed GARCH
?t
FARIMA filter
Xt
Seasonal filter
50Evaluating the model fit
- Independence of residual series
- ACF, extremal clustering
- Fit of probability density and high quantiles
- Variance lagged discharge relationship
- Extremal index
- Consistence of parameter estimates
51ACF of original and squared innovation series
residual series
52Results of new simulationsat Vásárosnamény
53Densities and quantiles at all 6 locations
54Reestimated (from the fitted model)
discharge-variance relationship
55Seasonalities of extremes
- The seasonal appearance of the highest values
(upper 1) of the simulated processes follows
closely the same for the observed one.
56Extremal index to measure the clustering of high
values
- Estimated for the observed and simulated
processes containing all seasonal components - Estimation by block method
- Value of block length changes from 0.1 to 1.
- Value of threshold ranges from 95 to 99.9.
57Estimated extremal indices displayed
58Extremal indices in the threshold GARCH model
59Extremal indices in the discharge dependent GARCH
model
60Level exceedance times
- The distribution of the level exceedance times
may serve as a further goodness of fit measure. - It has an enormous importance as it represents
the time until the dams should stand high water
pressure.
61Flood volume distribution
- The match of the empirical and simulated flood
volume distributions also approve the good fit.
62Multivariate modelling
- Final aim to model the runoff processes
simultaneously - Nonlinear interdependence and non-Gaussianity
should be addressed here, too - First, the probabilistic model for the joint
distribution of discharges of rivers entering
Hungary should be elaborated - Differential equation-oriented models of
conventional hydrology may then be used to
describe downstream evolution of runoffs - Now we concentrate on joint modelling of two
rivers Tisza (at Tivadar) and Szamos (at
Csenger)
63Issues of joint modelling
- Measures of linear interdependences (the
cross-correlations) are likely to be
insufficient. - High runoffs appear to be more synchronized on
the two rivers than small ones - The reason may be the common generating weather
patterns for high flows - This requires a non-conventional analysis of the
dependence structure of the observed series
64Basic statistics of Tivadar (Tisza) and Csenger
(Szamos)
- The model described previously was applied to
both rivers - Correlations between the series of raw values,
innovations and residuals are highest when either
series at Tivadar are lagged by one day - Correlations
- Raw discharges 0.79
- Deseasonalized data 0.77
- Innovations 0.40
- Residuals 0.48
- Conditional variances 0.84
65Displaying the nature of interdependence
- The joint plot may not be informative because of
the highly non-Gaussian distributions - Transform the marginals into uniform
distributions (produce the so-called copula), - then the scatterplot is more informative on the
joint behaviour - The strange behaviour of the copula of the
innovations is characterized by the concentration
of points - 1. at the main diagonal, and especially at the
upper right corner (tail dependence) - 2. at the upper left (and the lower right)
corner(s) - Linearly dependent variables do not display this
type of copula - The GARCH-residuals lack the second type of
irregularity
2
1
66A possible explanation of this type of
interdependence
- The variance process is essentially common for
the two rivers - This is justified by the high correlation (0.84)
- Generate two linearly interdependent residual
series (correlation0.48) - Multiply by the common standard deviation
distributed as Gamma - Observe the obtained copula
- This justifies the hypothesis that the common
variance causes the nonlinear interdependence of
the given type
67Tail dependence
68Conclusion
- Long range dependency does not have much impact
on extremes of river discharges - Nonlinearities are influental and have to be
accounted for - The proposed model has a stationary solution
- Its estimation is consistent and asymptotically
normal - The suggested model catches densities and
quantiles well - The model fitting procedure does not optimise on
quantiles and maxima clustering, therefore their
fit really shows model adequacy - Other fit-independent measures include level
exceedence and flood volume distributions - Something different has to be done with dammed
water
69Possible refinements of the model
- Simulated process is still heavier tailed than
the original one. - Conditional distribution of the residuals depends
on the lagged water discharge - very high residual values occur at low discharge
- Possible solutions
- innovation as a mixture of a GARCH-part and of a
discharge-independent part - Markov-switching model with discharge-dependent
transition probabilities
70Thank you for your attention!