Jery R' Stedinger - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Jery R' Stedinger

Description:

Variance of those errors about prediction X depends on statistics-of-interest at ... cross-correlations rij of floods to estimate variance ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 76
Provided by: jery9
Category:

less

Transcript and Presenter's Notes

Title: Jery R' Stedinger


1
Regionalization of Statistics Describing the
Distribution of Hydrologic Extremes
SAMSI Workshop 23 January 2008
  • Jery R. Stedinger
  • Cornell University
  • Research with G. Tasker, E. Martins, D. Reis, A.
    Gruber, V. Griffis, D.I. Jeong and Y.O. Kim

2
Extreme Value Theory Hydrology
  • Annual maximum flood may be daily maximum,
  • or instantaneous maximum.
  • Annual maximum 24-hour rainfall may be daily
    maximum or maximum 1440-minute values.
  • Annual maximums are not maximum of I.I.D. series
  • Years have definite wet and dry seasons
  • Daily values are correlated
  • Because of El Niño and atmospheric patterns,
  • some years extreme-event prone, others are not.
  • Peaks-over-threshold (PDS) another alternative.

3
Outline
  • Summarizing Data Moments and L-moments
  • Parameter estimation for GEV
  • Use of a prior on ??
  • PDS versus AMS with GMLEs
  • Bayesian GLS Regression for regionalization
  • Concluding observations

4
Outline
  • Summarizing Data Moments and L-moments
  • Parameter estimation for GEV
  • Use of a prior on ??
  • PDS versus AMS with GMLEs
  • Bayesian GLS Regression for regionalization
  • Concluding observations

5
Definitions Product-Moments
  • Mean, measure of location
  • µx E X
  • Variance, measure of spread
  • ?x2 E (X µx )2
  • Coef. of Skewness, asymmetry
  • ?x E (X µx )3 /?x3

6
Conventional Moment Ratios
  • Conventional descriptions of shape are
  • Coefficient of Variation, CV s / m
  • Coefficients of skewness, g E(X-µ)3/s3
  • Coefficients of kurtosis, k E(X-µ)4/s4

7
Samples drawn from a Gumbel distribution.
8
L-Moments
  • An alternative to product moments
  • now widely used in hydrology.

9
L-Moments an alternative
  • L-moments can summarize data as do conventional
    moments using linear combinations of the ordered
    observations.
  • Because L-moments avoid squaring and cubing the
    data, their ratios do not suffer from the severe
    bias problems encountered with product moments.
  • Estimate using order statistics

10
L-Moments an alternative
  • Let X(in) be ith largest obs. in sample of size
    n.
  • Measure of Scale
  • expected difference largest and smallest
    observations in sample of 2
  • l2 (1/2) E X(22) - X(12)
  • Measure of Asymmetry
  • l3 (1/3) E X(33) - 2 X(23) X(13)
  • where l3 gt 0 for positively skewed distributions

11
L-Moments an alternative
  • Measure of Kurtosis
  • l4 (1/4) E X(44) 3 X(34) 3 X(24)
    X(14)
  • For highly kurtotic distributions, l4 large.
  • For the uniform distribution l4 0.

12
Dimensionless L-moment ratios
  • L-moment Coefficient of variation (L-CV)
  • ?? l2/l1 l2/µ
  • L-moment coef. of skew (L-Skewness)
  • t3 l3/l2
  • L-moment coef. of kurtosis (L-Kurtosis)
  • t4 l4/l2
  • (Note Hosking calls L-CV ? instead of ??.)

13
Samples drawn from a Gumbel distribution.
14
Samples drawn from a Gumbel distribution.
15
Generalized Extreme Value (GEV) distribution
  • Gumbel's Type I, II III Extreme Value distr.
  • F(x) exp 1 (k/a)(x-x)1/k for k ? 0
  • shape a scale, x location.
  • Mostly -0.3 lt k 0
  • Others use for shape ????????.

16
GEV Prob. Density Function
17
GEV Prob. Density Function large x
18
Simple GEV L-Moment Estimators
  • Using L-moments Hosking, Wallis Wood (1985)
  • c 2/(?3 3) ln(2)/ln(3) ?3 l3 / l2
  • then
  • k 7.8590 c 2.9554 c2 ? ?3 ? 0.5
  • a k l2 / G(1k ) (1 2-k )
  • x l1 a G(1k ) 1 / k
  • Quantiles
  • xp x (a/k) 1 -ln(p) k
  • Method of L-moments simple and attractive.

19
Index Flood Methodology
  • Research has demonstrated potential advantages
    of index flood procedures for combining
    regional and at-site data to improve the
    estimators at individual sites.

20
Hosking and Wallis (1997)Development
ofL-moments for regional flood frequency
analysis.Research done in the 1980-1995
period. J.R.M. Hosking and J.R. Wallis,
Regional Frequency Analysis An Approach Based
on L-moments, Cambridge University Press, 1997.
21
Compute for region average L-CV and L-CS which
yields regional yp
22
Index Flood Methodology
  • Use data from hydrologically "similar" basins to
    estimate a dimensionless flood distribution which
    is scaled by at-site sample mean.
  • "Substitutes Space for Time" by using regional
    information to compensate for relatively short
    records at each site.
  • Most of these studies have used the GEV
    distribution and L-moments or equivalent.

23
Outline
  • Summarizing Data Moments and L-moments
  • Parameter estimation for GEV
  • Use of a prior on ??
  • PDS versus AMS with GMLEs
  • Bayesian GLS Regression for regionalization
  • Concluding observations

24
Trouble with MLEs for GEV
CASE N 15, X GEV(x? 0, a? 1, k? 0.20)
MLE Solution
  • X0.999
  • 14.9 (true)
  • 6,000,000 (est.)

25
Parameter Estimators for 3-parameter GEV
distribution
  • Maximum Likelihood (ML)
  • Method of Moments (MOM)
  • Method of L-moments (LM)
  • 4. Generalized Maximum Likelihood (GML)
  • Introduces a prior distribution for k that
    ensures estimator
  • within ( -0.5, 0.5), and encourages values
    within (-0.3, 0.1)
  • Martins, E.S., and J.R. Stedinger, Generalized
    Maximum Likelihood GEV quantile estimators for
    hydrologic data, Water Resour. Res.. 36(3),
    737-744, 2000.
  • Or can use a penalty to enfore constraint that ?
    gt -1
  • Coles, S.G., and M.J.Dixon, Likelihood-Based
    Inference for Extreme Value Models, Extremes 21,
    5-23, 1999.

26
Prior distribution on GEV k
27
Performance Alternative Estmators of x0.99 for
GEV distribution, n 25

28
Performance Alternative Estmators of x0.99 for
GEV distribution, n 100

?
29
GEV Estimators
  • In 1985 when Hosking, Wallis and Wood introduced
    L-moment (PWM) estimators for GEV, they were much
    better than MLEs and Quantile estimators
  • In 1998 Madsen and Rosbjerg demonstrated MOM were
    not so bad, perhaps better than L-Moments.
  • Finally in 2000 Martins Stedinger demonstrated
    that adding realistic control of GEV shape
    parameter k yielded estimators that dominated
    competition. This is a distribution with
    modest-accuracy regional description of shape
    parameter.

30
Outline
  • Summarizing Data Moments and L-moments
  • Parameter estimation for GEV
  • Use of a prior on ??
  • PDS versus AMS with GMLEs
  • Bayesian GLS Regression for regionalization
  • Concluding observations

31
Partial Duration or Annual Maximum Series.
  • by seeing more little floods,
  • do we know more about big floods ?

32
Partial Duration Series (PDS)Peaks over
threshold (POT)
33
Poisson/Pareto model for PDS
  • arrival rate for floods gt x0
  • which follow a Poisson process
  • G(x) Pr X x for peaks over threshold x gt
    x0
  • is a Generalized Pareto distribution
  • 1 1 - k (x - x0)/a 1/k
  • Then annual maximums have
  • Generalized Extreme Value distribution
  • F(x) exp ( 1 - k (x - x)/a )1/k?
  • x x0 a(1 l-k)/ k
  • a a l-k
  • same ?

34
Which is more precise AMS or PDS?
Consider where estimate only 2 parameter. Fix ?
0, corresponding to Poisson arrivals with
exponential exceendances Share Lynn (1964)
model for flood risk.
35
Poisson Arrivals withExponential Exceedances
(?? 0)
36
Which is more precise AMSGP or PDSGEV ?
RMSE-ratio
Now estimate 3 parameters using PDS data
employing XXX MOM, L-Moments (LM) and
GML with Generalized Pareto distribution and
compare RMSE of PDS-XXX to RMSE of AMS-GMLE GEV
estimator.
37
RMSE 3 PDS estimators vs AMS-GML ? 5
events/year
RMSE-Ratio PDS/AMS-GMLE
-0.3 -0.2 -0.1 0
0.1 0.2 0.3
shape parameter?k
38
RMSE 3 PDS estimators vs AMS-GML k 0.30
RMSE-Ratio PDS/AMS-GMLE
??events per year
39
Conclusions PDS versus AMS
For ? lt 0, with PDS data, again GML quantile
estimators generally better than MOM, LM and
ML. Precision of GML quantile estimators
insensitive to ?? A year of PDS data generally
worth a year of AMS data for estimating 100-year
flood when employing the GMLE estimators of GP
and GEV parameters more little floods do not
tell us about the distribution of large floods.
40
Outline
  • Summarizing Data Moments and L-moments
  • Parameter estimation for GEV
  • Use of a prior on ??
  • PDS versus AMS with GMLEs
  • Bayesian GLS Regression for regionalization
  • Concluding observations

41
GLS Regression for Regional Analyses
  • GOAL
  • Obtain efficient estimators of the mean, standard
    deviation, T-yr flood, or GEV parameters
  • as a function of physiographic basin
    characteristics
  • and provide the precision of that estimator.
  • MODEL
  • logStatistic-of-interest
  • a b1 log(Area) b2 log(Slope) . . .
    Error

42
GLS Analysis Complications
  • With available records, only obtain sample
    estimates of Statistic-of-Interest, denoted yi
  • Total error ?i?is a combination of
  • time-sampling-error ?i in sample estimators yi
    which are often cross-correlated, and
  • underlying model error ?i (true lack of fit).
  • Variance of those errors about prediction X?
    depends on statistics-of-interest at each site.




43
GLS for Regionalization
  • Use Available
  • record lengths ni,
  • concurrent record lengths mij,
  • regional estimates of stan. deviations si, or ?2i
    , ?3i and
  • cross-correlations rij of floods to estimate
    variance
  • cross-correlations of ? describing errors in
    i.
  • With true model error variance ????determine
    covariance
  • matrix L(??) of residual errors
  • L(??) ?? I ?? ??
  • where ?( ) is covariance matrix of the estimator

44
GLS Analysis Solution
  • GLS regression model (Stedinger Tasker, 1985,
    1989)
  • X b e
  • with parameter estimator b for b
  • XT L(??)-1 X b XT L(??)-1
  • Can estimate model-error ?? using moments
  • ( X b)T L(??)-1 ( X b) n - k
  • L(??) ?? I ?? ?
  • n dimension of y k dimension of b

45
Likelihood function - model error ??2 (Tibagi
River, Brazil, n17)
Maximum of likelihood may be at zero, but
larger values are very probable. Zero clearly
not in middle of likely range of values. Method
of moments has Same problem zero estimate.
46
Advantages of Bayesian Analysis
  • Provides posterior distribution of
  • parameters ?
  • model error variance ??2, and
  • predictive distribution for dependent variable

Bayesian Approach is a natural solution to the
problem
47
Bayesian GLS Model
  • Prior distribution x(?, ??2)
  • Parameter b are multivariate normal (Q)
  • Model error variance ??2
  • Exponential dist. (?) E??2 ? 24
  • Likelihood function
  • Assume data is multivariate N X?, ?

48
Quasi-Analytic Bayesian GLS
  • Joint posterior distribution
  • Marginal posterior of sd2

where integrate analytically normal likelihood
prior to determine f in closed-form.
49
Example of a posterior of ??2 (Model 1,?Tibagi,
Brazil, n 17)
MM-GLS for sd2 0.000 MLE-GLS for sd2
0.000 Bayesian GLS for sd2 0.046
Model error variance ??2
50
Quasi-Analytic Result
From joint posterior distribution
can compute marginal posterior of b
and moments by 1- dimensional num. integrations
51
Bayesian GLS for Regionalization of Flood
Characteristics in Korea
  • Dae Il Jeong
  • Post-doctoral Researcher, Cornell University
  • Jery R. Stedinger
  • Professor, Cornell University
  • Young-Oh Kim
  • Associate Professor, Seoul National University
  • Jang Hyun Sung
  • Graduate Student, Seoul National University

52
Korean River basins
Han River Basin
  • Land Area 120,000 km2
  • Major river basins
  • Han, Nakdong, Geum
  • Total Annual Precipitation (TAP) 1283mm
  • Two thirds of TAP occurs
  • during 3-month flood season (JulSep)
  • Available sites 31
  • Average length 22 years

Nakdong River Basin
Geum River Basin
53
Korean Application
  • Regional estimators of L-CV ?2 and L-CS ?3 for
  • flood frequency analysis using GEV distribution
  • 6 Explanatory Variables
  • 2 indicators (Han-Nakdong-Geum basins)
  • logs of drainage area
  • logs of channel slope
  • mean precipitation
  • SD of annual maximum precipitation

54
Cross-correlation concurrent maxima
55
Monte Carlo results forcross-correlation L-CS
estimators GEV when ? -0.3 and t2 0.3
?xy - cross- Corre- lation L-CS estimators
?xy - cross-correlation annual maxima
56
Regression Results L-CV
Standard error in parentheses ( - ) p-value in
brackets - .
57
Performance Measures
  • Average Variance of Prediction (AVP)
  • How well model estimates true value of quantity
    of
  • interest on average across sites
  • Pseudo R2 improvement of GLS(k) versus GLS(0)
  • Effective Record Length (ERL)
  • Relative uncertainty of regional estimate
    compared to an at-site estimator

58
Regression Results L-CS ?3
Standard error in parentheses ( - ) p-value in
brackets - .
59
Model Diagnostic Measures
  • Pseudo ANOVA table
  • Variation explained by regional model
  • Residual variation due to model errors
  • Residual variation due sampling errors
  • Represents partition of TOTAL variation

60
Pseudo ANOVA Table for L-CV and L-CS
We need GLS regression analysis
, where w is the vector ( )
ERL (years) 21 51
61
Conclusion Value in Korea
  • Regional estimator for L-Coefficient of Variation
    should be combined with its at-site estimator
  • ERL(t2) 21 years average record length (22
    yrs)
  • Regional estimator for L-skewness was more
    precise than at-site estimators
  • ERL(t3) 51 years gt average record length (22
    yrs)
  • Clearly advantageous to use
  • BOTH regional and at-site information
  • in analysis of annual maxima.

62
Diagnostic Statistics
  • Statistics for evaluating data concerns,
    precision of predicted values, sources of
    variation, and model adequacy
  • Leverage and Influence
  • Measures of Prediction Precision
  • Pseudo R2 and ANOVA
  • Modeling Diagnostics EVR MBV
  • Bayesian Plausibility Level

63
Bayesian Hierarchical ModelSolve whole problem
at once?
  • Assume values for each site i for i 1, , K
  • Xit GEV( ???????), t 1, , ni
  • where for parameters we have
  • ?i N(µ???? ??
  • ?i N(µ??????????where perhaps? ?i ???? ?i / ?I
    or coef. of variation
  • ?i N(µ??????
  • with priors on µ???? ? µ????? µ?????
  • whose values for each site I may depend on
    at-site physiographic characteristics of that
    site.
  • Ignores cross-correlations need multivariate
    model for K variates?
  • Beware of special cases and lack of fit.

64
Outline
  • Summarizing Data Moments and L-moments
  • Parameter estimation for GEV
  • Use of a prior on ??
  • PDS versus AMS with GMLEs
  • Bayesian GLS Regression for regionalization
  • Concluding observations

65
Concluding Remarks
  • GEV distribution used by many water agencies and
    countries to describe the distribution of
    extremes.
  • L-moments provide simple estimators, but not
    efficient.
  • Generalized Maximum Likelihood Estimators GMLEs
  • (modest prior on ?) solve problems with MLEs and
    were the most precise.
  • PDS (GPD-Poisson) no better than AMS (GEV) when
    estimating three parameters with GMLE.

66
Final Comments
  • Regional regression procedures should account for
    precision of at-site estimators and their
    cross-correlations, as can be done with
  • Generalized Least Squares regression
  • Otherwise estimates of model accuracy and of
    precision of parameter estimates will be in
    error.
  • When model error variance is small relative to
    errors in estimated hydrologic statistics,
  • Bayesian model error variance
  • estimator is particularly attractive.

67
Hosking and Wallis (1997)
We can do better than simple index flood
procedures that everywhere use regional
average L-CV ?2 and L-CS ?3 values.
68
Conclusion Applicability of GLS
  • Developed Bayesian Generalized Least Squares
    modeling framework to analyze regional
    information addressing distribution parameters
    recognizing
  • Sampling error in at-site estimators as function
    of record length, cross-correlation of concurrent
    events, and concurrent record lengths, and
  • regional model error (true precision of regional
    model)
  • Developed regression models for L-CV and L-CS for
    Korean annual maximum flood using B-GLS analysis

69
Background ReadingStedinger, J.R., Flood
Frequency Analysis and Statistical Estimation of
Flood Risk, Chapter 12, Inland Flood Hazards
Human, Riparian and Aquatic Communities, E.E.
Wohl (ed.), Cambridge University Press, Stanford,
United Kingdom, 2000. ReferencesHosking, J. R.
M., L-Moments Analysis and Estimation of
Distributions Using Linear Combinations of Order
Statistics, J. of Royal Statistical Society, B,
52(2), 105-124, 1990.Hosking, J.R.M., and
J.R. Wallis, Regional Frequency Analysis An
Approach Based on L-moments, Cambridge University
Press, 1997.Martins, E.S., and J.R. Stedinger,
Generalized Maximum Likelihood GEV quantile
estimators for hydrologic data, Water Resources
Research. 36(3), 737-744, 2000.Martins, E.S.,
and J.R. Stedinger, Generalized Maximum
Likelihood Pareto-Poisson Flood Risk Analysis for
Partial Duration Series, Water Resources
Research.37(10), 2559-2567, 2001.Stedinger, J.
R. , and L. Lu, Appraisal of Regional and Index
Flood Quantile Estimators, Stochastic Hydrology
and Hydraulics, 9(1), 49-75, 1995.
Flood Frequency References
70
GLS References
  • Griffis, V. W., and J. R. Stedinger, The Use of
    GLS Regression in Regional Hydrologic Analyses,
    J. of Hydrology, 344(1-2), 82-95, 2007
    doi10.1016/j.jhydrol.2007.06.023.
  • Gruber, Andrea M., Dirceu S. Reis Jr., and Jery
    R. Stedinger, Models of Regional Skew Based on
    Bayesian GLS Regression, Paper 40927-3285, World
    Environ. Water Resour. Conf. - Restoring our
    Natural Habitat, K.C. Kabbes editor, Tampa, FL,
    May 15-18, 2007.
  • Jeong, Dae Il, Jery R. Stedinger, Young-Oh Kim,
    and Jang Hyun Sung, Bayesian GLS for
    Regionalization of Flood Characteristics in
    Korea, Paper 40927-2736, World Environ. Water
    Resour. Conf. - Restoring our Natural Habitat,
    Tampa, FL, May 15-18, 2007.
  • Martins, E.S., and J.R. Stedinger,
    Cross-correlation among estimators of shape,
    Water Resources Research, 38(11), doi
    10.1029/2002WR001589, 26 November 2002.
  • Reis, D. S., Jr., J. R. Stedinger, and E. S.
    Martins, Bayesian generalized least squares
    regression with application to log Pearson type 3
    regional skew estimation, Water Resour. Res., 41,
    W10419, doi10.1029/2004WR003445, 2005.
  • Stedinger, J.R., and G.D. Tasker, Regional
    Hydrologic Analysis, 1. Ordinary, Weighted and
    Generalized Least Squares Compared, Water Resour.
    Res., 21(9), 1421-1432, 1985.
  • Tasker, G.D., and J.R. Stedinger, Estimating
    Generalized Skew With Weighted Least Squares
    Regression, J. of Water Resources Planning and
    Management, 112(2), 225-237, 1986.
  • Tasker, G.D., and J.R. Stedinger, An Operational
    GLS Model for Hydrologic Regression, J. of
    Hydrology, 111(1-4), 361-375, 1989.

71
Pseudo R2 for GLS
Consider the GLS model
  • Not interested in total error e that includes
    sampling error ?? which cannot explain.
  • Traditional adjusted R2
  • How much of critical model error ? can we
    explain, where Var? ??(k) for model with k
    parameters?

72
Pseudo ANOVA Table
  • Source Degrees of Freedom Estimator
  • Model k
  • Model Error ? n - k - 1
  • Sampling Error ? n
  • Total 2n - 1

73
Modeling Diagnostics
Do we need WLS or GLS to correctly analyze this
data?
  • To evaluate whether OLS might be sufficient
  • consider the Error Variance Ratio EVR.
  • If EVR gt 20, then sampling error ???in
    estimators of y
  • are potentially an important fraction of the
    observed total error ? ???????.

74
Modeling Diagnostics
  • EVR gt 20 suggests a need for WLS or GLS.
  • But when is cross-correlation so large
  • that a GLS analysis is needed?
  • Misrepresentation of Beta Variance (MBV)
  • Describes error made by WLS in its evaluation
  • of precision of estimator b0 of the constant
    term.

75
OLS, WLS and GLS for L-CS
Standard error in parentheses ( - ).
Write a Comment
User Comments (0)
About PowerShow.com