Title: PROBABILISTIC FORECASTS AND THEIR VERIFICATION
1PROBABILISTIC FORECASTS AND THEIR VERIFICATION
- Zoltan Toth
- Environmental Modeling Center
- NOAA/NWS/NCEP
- Ackn. Yuejian Zhu and Olivier Talagrand (1)
- (1) Ecole Normale Superior and LMD, Paris,
France - http//wwwt.emc.ncep.noaa.gov/gmb/ens/index.html
2OUTLINE / SUMMARY
- SCIENCE OF FORECASTING
- GOAL OF SCIENCE Forecasting
- VERIFICATION Model development, user feedback
- GENERATION OF PROBABILISTIC FORECASTS
- SINGLE FORECASTS Statistical rendition of pdf
- ENSEMBLE FORECASTS NWP-based, case-dependent pdf
- ATTRIBUTES OF FORECAST SYSTEMS
- RELIABILITY Forecasts look like nature
statistically - RESOLUTION Forecasts indicate actual future
developments - VERIFICATION OF PROBABILSTIC ENSEMBLE FORECASTS
- UNIFIED PROBABILISTIC MEASURES Dimensionless
- ENSEMBLE MEASURES Evaluate finite sample
- STATISTICAL POSTPROCESSING OF FORECASTS
- STATISTICAL RELIABILITY Make it perfect
- STATISTICAL RESOLUTION Keep it unchanged
3SCIENCE OF FORECASTING
- Ultimate goal of science
- Forecasting
- Meteorology is in forefront
- Weather forecasting constantly in publics eye
- Approach
- Observe what is relevant and available
- Analyze data
- Build general knowledge about nature based on
analysis - Generalization abstraction Laws,
relationships - Build model of reality based on general knowledge
- Conceptual
- Quantitative/numerical, including various
physical etc processes - Analog
- Predict whats not observable in
- Space eg, data assimilation
- Time - eg, future weather
- Variables / processes
- Verify (ie, compare with observations)
4PREDICTIONS IN TIME
- Method
- Use model of nature for projection in time
- Start model with estimate of state of nature at
initial time - Sources of errors
- Discrepancy between model and nature
- Added at every time step
- Discrepancy between estimated and actual state of
nature - Initial error
- Chaotic systems
- Common type of dynamical systems
- Characterized with at least one perturbation
pattern that amplifies - All errors project onto amplifying directions
- Any initial and/or model error
- Predictability limited
- Ed Lorenz legacy
- Verification quantifies situation
5MOTIVATION FOR ENSEMBLE FORECASTING
- FORECASTS ARE NOT PERFECT - IMPLICATIONS FOR
- USERS
- Need to know how often / by how much forecasts
fail - Economically optimal behavior depends on
- Forecast error characteristics
- User specific application
- Cost of weather related adaptive action
- Expected loss if no action taken
- EXAMPLE Protect or not your crop against
possible frost - Cost 10k, Potential Loss 100k gt Will protect
if P(frost) gt Cost/Loss0.1 - NEED FOR PROBABILISTIC FORECAST INFORMATION
- DEVELOPERS
- Need to improve performance - Reduce error in
estimate of first moment - Traditional NWP activities (I.e., model, data
assimilation development) - Need to account for uncertainty - Estimate higher
moments - New aspect How to do this?
- Forecast is incomplete without information on
forecast uncertainty - NEED TO USE PROBABILISTIC FORECAST FORMAT
6GENERATION OF PROBABILISTIC FORECASTS
- How to determine forecast probability?
- Fully statistical methods losing relevance
- Numerical modeling
- Liouville Equations provide pdfs
- Not practical (computationally intractable)
- Finite sample of pdf
- Single or multiple (ensemble) integrations
- Increasingly finer resolution estimate in
probabilities - How to make (probabilistic) forecasts reliable?
- Construct pdf
- Assess reliability
- Construct frequency distribution of observations
following forecast classes - Replace form of forecast with associated
frequency distribution of observations - Production and verification of forecasts
connected in operations
7FORECASTING IN A CHAOTIC ENVIRONMENT
PROBABILISTIC FORECASTING BASED ON A SINGLE
FORECAST One integration with an NWP model,
combined with past verification statistics
DETERMINISTIC APPROACH - PROBABILISTIC FORMAT
- Does not contain all forecast information
- Not best estimate for future evolution of system
- UNCERTAINTY CAPTURED IN TIME AVERAGE SENSE -
- NO ESTIMATE OF CASE DEPENDENT VARIATIONS IN FCST
UNCERTAINTY
8SCIENTIFIC BACKGROUND WEATHER FORECASTS ARE
UNCERTAIN
Buizza 2002
9- FORECASTING IN A CHAOTIC ENVIRONMENT - 2
- DETERMINISTIC APPROACH - PROBABILISTIC FORMAT
-
- PROBABILISTIC FORECASTING -
- Based on Liuville Equations
- Continuity equation for probabilities, given
dynamical eqs. of motion - Initialize with probability distribution
function (pdf) at analysis time - Dynamical forecast of pdf based on conservation
of probability values - Prohibitively expensive -
- Very high dimensional problem (state space x
probability space) - Separate integration for each lead time
- Closure problems when simplified solution sought
10FORECASTING IN A CHAOTIC ENVIRONMENT -
3DETERMINISTIC APPROACH - PROBABILISTIC FORMAT
- MONTE CARLO APPROACH ENSEMBLE FORECASTING
- IDEA Sample sources of forecast error
- Generate initial ensemble perturbations
- Represent model related uncertainty
- PRACTICE Run multiple NWP model integrations
- Advantage of perfect parallelization
- Use lower spatial resolution if short on
resources - USAGE Construct forecast pdf based on finite
sample - Ready to be used in real world applications
- Verification of forecasts
- Statistical post-processing (remove bias in 1st,
2nd, higher moments) - CAPTURES FLOW DEPENDENT VARIATIONS
- IN FORECAST UNCERTAINTY
116 hours ET / breeding cycle
Re-scaling
6hrs
Up to 16-d
Next T00Z
T00Z 80m
Re-scaling
T06Z 80m
Up to 16-d
Re-scaling
T12Z 80m
Up to 16-d
Re-scaling
T18Z 80m
Up to 16-d
12USER REQUIREMENTSPROBABILISTIC FORECAST
INFORMATION IS CRITICAL
13HOW TO DEAL WITH FORECAST UNCERTAINTY?
- No matter what / how sophisticated forecast
methods we use - Forecast skill limited
- Skill varies from case to case
- Forecast uncertainty must be assessed by
meteorologists
THE PROBABILISTIC APPROACH
14SOCIO-ECONOMIC BENEFITS OFSEAMLESS
WEATHER/CLIMATE FORECAST SUITE
Commerce Energy
Ecosystem Health
Hydropower Agriculture
Boundary Condition Sensitivity
Reservoir control Recreation
Transportation Fire weather
Initial Condition Sensitivity
Flood mitigation Navigation
Protection of Life/Property
Weeks
Minutes
Days
Hours
Years
Seasons
Months
15ENSEMBLE FORECASTS
- Definition
- Finite sample to estimate full probability
distribution - Full solution (Liouville Eqs.) computationally
intractable - Interpretation (assignment of probabilities)
- Crude
- Step-wise increase in cumulative forecast
probability distribution - Performance dependent on size of ensemble
- Enhanced
- Inter- extrapolation (dressing)
- Performance improvement depends on quality of
inter- extrapolation - Based on assumptions
- Linear interpolation (each member equally
likely) - Based on verification statistics
- Kernel or other methods (Inclusion of some
statist. bias-correction)
16(No Transcript)
17(No Transcript)
18144 hr forecast
Poorly predictable large scale wave Eastern
Pacific Western US
Highly predictable small scale wave Eastern US
Verification
19(No Transcript)
20(No Transcript)
21(No Transcript)
22FORECAST EVALUATION
- Statistical approach
- Evaluates set of forecasts and not a single
forecast - Interest in comparing forecast systems
- Forecasts generated by same procedure
- Sample size affects how fine stratification is
possible - Level of details is limited
- Size of sample limited by available obs. record
(even for hind-casts) - Statistical significance in comparative
verification - Error in proxy for truth
- Observations or numerical analysis
- Types
- Forecast statistics
- Depends only on forecast properties
- Verification statistics
- Comparison of forecast and proxy for truth in
statistical sense - Depends on both natural and forecast systems
- Nature represented by proxy
- Observations (including observational error)
23FORECAST VERIFICATION
- Types
- Measures of quality
- Environmental science issues
- Main focus here
- Measures of utility
- Multidisciplinary
- Social economic issues, beyond environmental
sciences - Socio-economic value of forecasts is ultimate
measure - Approximate measures can be constructed
- Quality vs. utility
- Improved quality
- Generally permits enhanced utility (assumption)
- How to improve utility if quality is fixed?
- Providers communicate all available information
- E.g., offer probabilistic or other information on
forecast uncertainty - Engage in education, training
- Users identify forecast aspects important to them
- Can providers selectively improve certain aspects
of forecasts?
24EVALUATING QUALITY OF FORECAST SYSTEMS
- Goal
- Infer comparative information about forecast
systems - Value added by
- New methods
- Subsequent steps in end-to-end forecast process
(eg., manual changes) - Critical for monitoring and improving operational
forecast systems - Attributes of forecast systems
- Traditionally, forecast attributes defined
separately for each fcst format - General definition needed
- Need to compare forecasts
- From any system
- Of any type / format
- Single, ensemble, categorical, probabilistic, etc
- Supports systematic evaluation of
- End-to-end (provider-user) forecast process
- Statistical post-processing as integral part of
system
25FORECAST SYSTEM ATTRIBUTES
- Abstract concept (like length)
- Reliability and Resolution
- Both can be measured through different statistics
- Statistical property
- Interpreted for large set of forecasts
- Describe behavior of forecast system, not a
single forecast - For their definition, assume that
- Forecasts
- Can be of any format
- Single value, ensemble, categorical,
probabilistic, etc - Take a finite number of different classes Fa
- Observations
- Can also be grouped into finite number of
classes like Oa
26STATISTICAL RELIABILITY TEMPORAL AGGREGATE
STATISTICAL CONSISTENCY OF FORECASTS WITH
OBSERVATIONS
- BACKGROUND
- Consider particular forecast class Fa
- Consider frequency distribution of observations
that follow forecasts Fa - fdoa - DEFINITION
- If forecast Fa has the exact same form as fdoa,
for all forecast classes, - the forecast system is statistically consistent
with observations gt - The forecast system is perfectly reliable
- MEASURES OF RELIABILITY
- Based on different ways of comparing Fa and fdoa
27STATISTICAL RESOLUTION TEMPORAL EVOLUTION
ABILITY TO DISTINGUISH, AHEAD OF TIME, AMONG
DIFFERENT OUTCOMES
- BACKGROUND
- Assume observed events are classified into finite
number of classes, like Oa - DEFINITION
- If all observed classes (Oa, Ob,) are preceded
by - Distinctly different forecasts (Fa, Fb,)
- The forecast system resolves the problem gt
- The forecast system has perfect resolution
- MEASURES OF RESOLUTION
- Based on degree of separation of fdos that
follow various forecast classes - Measured by difference between fdos climate
distribution - Measures differ by how differences between
distributions are quantified
FORECASTS
OBSERVATIONS
EXAMPLES
28CHARACTERISTICS OF RELIABILITY RESOLUTION
- Reliability
- Related to form of forecast, not forecast content
- Fidelity of forecast
- Reproduce nature when resolution is perfect,
forecast looks like nature - Not related to time sequence of forecast/observed
systems - How to improve?
- Make model more realistic
- Also expected to improve resolution
- Statistical bias correction Can be statistically
imposed at one time level - If both natural forecast systems are stationary
in time - If there is a large enough set of
observed-forecast pairs - Link with verification
- Replace forecast with corresponding fdo
- Resolution
- Related to inherent predictive value of forecast
system - Not related to form of forecasts
- Statistical consistency at one time level
(reliability) is irrelevant - How to improve?
29CHARACTERISTICS OF FORECAST SYSTEM ATTRIBUTES
- RELIABILITY AND RESOLUTION ARE
- General forecast attributes
- Valid for any forecast format (single,
categorical, probabilistic, etc) - Independent attributes
- For example
- Climate pdf forecast is perfectly reliable, yet
has no resolution - Reversed rain / no-rain forecast can have perfect
resolution and no reliability - To separate them, they must be measured according
to general definition - If measured according to traditional, narrower
definition - Reliability resolution can be mixed
- Function of forecast quality
- There is no other relevant forecast attribute
- Perfect reliability and perfect resolution
perfect forecast system - Deterministic forecast system that is always
correct - Both needed for utility of forecast systems
30FORMAT OF FORECASTS PROBABILSITIC FORMAT
- Do we have a choice?
- When forecasts are imperfect
- Only probabilistic format can be
reliable/consistent with nature - Abstract concept
- Related to forecast system attributes
- Space of probability dimensionless pdf or
similar format - For environmental variables (not those variables
themselves) - Definition
- Define event
- Function of concrete variables, features, etc
- E.g., temperature above freezing
thunderstorm - Determine probability of event occurring in
future - Based on knowledge of initial state and evolution
of system
31OPERATIONAL PROB/ENSEMBLE FORECAST VERIFICATION
- Requirements
- Use same general dimensionless probabilistic
measures for verifying - Any event
- Against either
- Observations or
- Numerical analysis
- Measures used at NCEP
- Probabilistic forecast measures ensemble
interpreted probabilistically - Reliability
- Component of BSS, RPSS, CRPSS
- Attributes Talagrand diagrams
- Resolution
- Component of BSS, RPSS, CRPSS
- ROC, attributes diagram, potential economic value
- Special ensemble verification procedures
- Designed to assess performance of finite set of
forecasts - Most likely member statistics, PECA
32FORECAST PERFORMANCE MEASURES
COMMON CHARACTERISTIC Function of both forecast
and observed values
MEASURES OF RELIABILITY DESCRIPTION Statisticall
y compares any sample of forecasts with sample of
corresponding observations GOAL To assess
similarity of samples (e.g., whether 1st and 2nd
moments match) EXAMPLES Reliability component
of Brier Score Ranked Probability
Score Analysis Rank Histogram Spread vs. Ens.
Mean error Etc.
MEASURES OF RESOLUTION DESCRIPTION Compares the
distribution of observations that follows
different classes of forecasts with the climate
distribution (as reference) GOAL To assess how
well the observations are separated when grouped
by different classes of preceding
fcsts EXAMPLES Resolution component of Brier
Score Ranked Probability Score Information
content Relative Operational Characteristics Relat
ive Economic Value Etc.
COMBINED (RELRES) MEASURES Brier, Cont. Ranked
Prob. Scores, rmse, PAC,
33EXAMPLE PROBABILISTIC FORECASTS
RELIABILITY Forecast probabilities for given
event match observed frequencies of that event
(with given prob. fcst) RESOLUTION Many
forecasts fall into classes corresponding to high
or low observed frequency of given
event (Occurrence and non-occurrence of event is
well resolved by fcst system)
34(No Transcript)
35PROBABILISTIC FORECAST PERFORMANCE MEASURES
TO ASSESS TWO MAIN ATTRIBUTES OF PROBABILISTIC
FORECASTS RELIABILITY AND RESOLUTION Univariate
measures Statistics accumulated point by
point in space Multivariate measures Spatial
covariance is considered
BRIER SKILL SCORE (BSS)
EXAMPLE
COMBINED MEASURE OF RELIABILITY AND RESOLUTION
36BRIER SKILL SCORE (BSS)
COMBINED MEASURE OF RELIABILITY AND RESOLUTION
- METHOD
- Compares pdf against analysis
- Resolution (random error)
- Reliability (systematic error)
- EVALUATION
- BSS Higher better
- Resolution Higher better
- Reliability Lower better
- RESULTS
- Resolution dominates initially
- Reliability becomes important later
- ECMWF best throughout
- Good analysis/model?
- NCEP good days 1-2
- Good initial perturbations?
- No model perturb. hurts later?
- CANADIAN good days 8-10
May-June-July 2002 average Brier skill score for
the EC-EPS (grey lines with full circles), the
MSC-EPS (black lines with open circles) and the
NCEP-EPS (black lines with crosses). Bottom
resolution (dotted) and reliability(solid)
contributions to the Brier skill score. Values
refer to the 500 hPa geopotential height over the
northern hemisphere latitudinal band 20º-80ºN,
and have been computed considering 10
equally-climatologically-likely intervals (from
Buizza, Houtekamer, Toth et al, 2004)
37BRIER SKILL SCORE
COMBINED MEASURE OF RELIABILITY AND RESOLUTION
38RANKED PROBABILITY SCORE
COMBINED MEASURE OF RELIABILITY AND RESOLUTION
39Continuous Rank Probability Score
CRP Skill Score is
Xo
100
Obs (truth)
Heaviside Function H
50
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
40ANALYSIS RANK HISTOGRAM (TALAGRAND DIAGRAM)
MEASURE OF RELIABILITY
41ENSEMBLE MEAN ERROR VS. ENSEMBLE SPREAD
MEASURE OF RELIABILITY
Statistical consistency between the ensemble and
the verifying analysis means that the verifying
analysis should be statistically
indistinguishable from the ensemble members
gt Ensemble mean error (distance between ens.
mean and analysis) should be equal to ensemble
spread (distance between ensemble mean and
ensemble members)
In case of a statistically consistent ensemble,
ens. spread ens. mean error, and they are both
a MEASURE OF RESOLUTION. In the presence of bias,
both rms error and PAC will be a combined measure
of reliability and resolution
42INFORMATION CONTENT
MEASURE OF RESOLUTION
43RELATIVE OPERATING CHARACTERISTICS
MEASURE OF RESOLUTION
44ECONOMIC VALUE OF FORECASTS
MEASURE OF RESOLUTION
45PERTURBATION VS. ERROR CORRELATION ANALYSIS (PECA)
MULTIVATIATE COMBINED MEASURE OF RELIABILITY
RESOLUTION
- METHOD Compute correlation between ens
perturbtns and error in control fcst for - Individual members
- Optimal combination of members
- Each ensemble
- Various areas, all lead time
- EVALUATION Large correlation indicates ens
captures error in control forecast - Caveat errors defined by analysis
- RESULTS
- Canadian best on large scales
- Benefit of model diversity?
- ECMWF gains most from combinations
- Benefit of orthogonalization?
- NCEP best on small scale, short term
- Benefit of breeding (best estimate initial
error)? - PECA increases with lead time
- Lyapunov convergence
- Nonlilnear saturation
- Higher values on small scales
46WHAT WE NEED FOR POSTPROCESSING TO WORK?
- LARGE SET OF FCST OBS PAIRS
- Consistency defined over large sample need same
for post-processing - Larger the sample, more detailed corrections can
be made - BOTH FCST AND REAL SYSTEMS MUST BE STATIONARY IN
TIME - Otherwise can make things worse
- Subjective forecasts difficult to calibrate
HOW WE MEASURE STATISTICAL INCONSISTENCY?
- MEASURES OF STATIST. RELIABILITY
- Time mean error
- Analysis rank histogram (Talagrand diagram)
- Reliability component of Brier etc scores
- Reliability diagram
47SOURCES OF STATISTICAL INCONSISTENCY
- TOO FEW FORECAST MEMBERS
- Single forecast inconsistent by definition,
unless perfect - MOS fcst hedged toward climatology as fcst skill
is lost - Small ensemble sampling error due to limited
ensemble size - (Houtekamer 1994?)
- MODEL ERROR (BIAS)
- Deficiencies due to various problems in NWP
models - Effect is exacerbated with increasing lead time
- SYSTEMATIC ERRORS (BIAS) IN ANALYSIS
- Induced by observations
- Effect dies out with increasing lead time
- Model related
- Bias manifests itself even in initial conditions
- ENSEMBLE FORMATION (INPROPER SPREAD)
- Not appropriate initial spread
- Lack of representation of model related
uncertainty in ensemble - I. E., use of simplified model that is not able
to account for model related uncertainty
48HOW TO IMPROVE STATISTICAL CONSISTENCY?
- MITIGATE SOURCES OF INCONSISTENCY
- TOO FEW MEMBERS
- Run large ensemble
- MODEL ERRORS
- Make models more realistic
- INSUFFICIENT ENSEMBLE SPREAD
- Enhance models so they can represent model
related forecast uncertainty - OTHERWISE gt
- STATISTICALLY ADJUST FCST TO REDUCE INCONSISTENCY
- Unpreferred way of doing it
- What we learn can feed back into development to
mitigate problem at sources - Can have LARGE impact on (inexperienced) users
- Two separate issues
- Bias correct against NWP analysis
- Reduce lead time dependent model behavior
- Downscale NWP analysis
- Connect with observed variables that are
unresolved by NWP models
49(No Transcript)
50OUTLINE / SUMMARY
- SCIENCE OF FORECASTING
- GOAL OF SCIENCE Forecasting
- VERIFICATION Model development, user feedback
- GENERATION OF PROBABILISTIC FORECASTS
- SINGLE FORECASTS Statistical rendition of pdf
- ENSEMBLE FORECASTS NWP-based, case-dependent pdf
- ATTRIBUTES OF FORECAST SYSTEMS
- RELIABILITY Forecasts look like nature
statistically - RESOLUTION Forecasts indicate actual future
developments - VERIFICATION OF PROBABILSTIC ENSEMBLE FORECASTS
- UNIFIED PROBABILISTIC MEASURES Dimensionless
- ENSEMBLE MEASURES Evaluate finite sample
- STATISTICAL POSTPROCESSING OF FORECASTS
- STATISTICAL RELIABILITY Make it perfect
- STATISTICAL RESOLUTION Keep it unchanged
51http//wwwt.emc.ncep.noaa.gov/gmb/ens/ens_info.htm
l Toth, Z., O. Talagrand, and Y. Zhu, 2005 The
Attributes of Forecast Systems A Framework for
the Evaluation and Calibration of Weather
Forecasts. In Predictability Seminars, 9-13
September 2002, Ed. T. Palmer, ECMWF, pp.
584-595. Toth, Z., O. Talagrand, G. Candille,
and Y. Zhu, 2003 Probability and ensemble
forecasts. In Environmental Forecast
Verification A practitioner's guide in
atmospheric science. Ed. I. T. Jolliffe and D.
B. Stephenson. Wiley, p. 137-164.
52BACKGROUND
53NOTES FOR NEXT YEAR
- Define predictand
- Exhaustive set of events, eg
- Continuous temperature
- Precipitation type (Categorical)
54SUMMARY
- WHY DO WE NEED PROBABILISTIC FORECASTS?
- Isnt the atmosphere deterministic? YES, but
its also CHAOTIC - FORECASTERS PERSPECTIVE USERS PERSPECTIVE
- Ensemble techniques Probabilistic description
- WHAT ARE THE MAIN ATTRIBUTES OF FORECAST SYSTEMS?
- RELIABILITY Stat. consistency with distribution
of corresponding observations - RESOLUTION Different events are preceded by
different forecasts - WHAT ARE THE MAIN TYPES OF FORECAST METHODS?
- EMPIRICAL Good reliability, limited resolution
(problems in new situations) - THEORETICAL Potentially high resolution, prone to
inconsistency - ENSEMBLE METHODS
- Only practical way of capturing fluctuations in
forecast uncertainty due to - Case dependent dynamics acting on errors in
- Initial conditions
- Forecast methods
- HOW CAN PROBABILSTIC FORECAST PERFORMANCE BE
MEASURED?
55OUTLINE
- STATISTICAL EVALUATION OF FORECAST SYSTEMS
- ATTRIBUTES OF FORECAST SYSTEMS
- FORECAST METHODS
- EMPIRICALLY BASED
- THEORETICALLY BASED
- LIMITS OF PREDICTABILITY
- LIMITING FACTORS
- ASSESSING PREDICTABILITY
- Ensemble forecasting
- VERIFICATION MEASURES
- MEASURING FORECAST SYSTEM ATTRIBUTES
- STATISTICAL POST-PROCESSING OF FORECASTS
- IMPROVING STATISTICAL RELIABILITY
56CRPS Decomposition
- Yuejian Zhu
- Environmental Modeling Center
- NOAA/NWS/NCEP
- Acknowledgements
- Zoltan Toth EMC
57Continuous Rank Probability Score
CRP Skill Score is
Xo
100
Obs (truth)
Heaviside Function H
50
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
58CRPS Decomposition
Xo
100
OBS (truth)
P-probability
50
General example
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
59CRPS Decomposition
Example of outlier (right)
Xo
100
OBS (truth)
P-probability
50
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
60CRPS Decomposition
Example of outlier (left)
Xo
100
OBS (truth)
P-probability
50
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
61CRPS Decomposition
Where
62CRPS Decomposition
Time, space average
Observation frequency
100
CDF of
General example
P-probability
50
CDF of
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
63CRPS Decomposition
Reliability diagram
100
90
80
70
60
50
Observed relative frequency ()
40
30
20
10
0
0 10 20 30 40
50 60 70 80 90
100 Forecast probability ()
64CRPS Decomposition
Reliability diagram
100
90
Left outlier
80
70
60
50
Observed relative frequency ()
40
30
100 unreliable
20
Right outlier
10
0
0 10 20 30 40
50 60 70 80 90
100 Forecast probability ()
65CRPS Decomposition
Reliability diagram
100
100 reliable
90
80
70
60
50
Observed relative frequency ()
40
30
20
10
0
0 10 20 30 40
50 60 70 80 90
100 Forecast probability ()
66CRPS Decomposition
CRPS 0 ----------------? 1.0 RELI
0 ---------------? 0.5 RESO 0
----------------? 1.0 UNCE 0
----------------? 1.0
67Ranked Probabilistic Score
Ranked (ordered) Probability Score (RPS) is to
verify multi-category probability forecasts, to
measure both reliability and resolution which
based on climatologically equally likely bins
and
Verify Analysis
Ensemble Forecast
x
OBS On FCST PROB Pn
0
0
0
0
1
0
0
0
0
0
0
0
20
10
0
10
30
20
0
10
i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 k
number of categories
Example of 10 climatologically equally likely
bins, 10 ensembles
68RMSE and Spread
Mean and absolute errors
10 meter wind (u-component) Less biased, There is
less room to improve the skill by bias-correction
only
CRPSS
6924h improvement by NAEFS
RPSS .vs CRPSS
Winter 2006-2007 NH 2m temperature For NCEP raw
forecast (black) NCEP bias corrected forecast
(red) NAEFS forecast (pink)
ROC score
70Brier Score (and decomposition)
See ltltStatistical Methods in the Atmospheric
Sciencegtgt by D. S. Wilks, Chapter 7 Forecast
Verification
1. BS (Brier Score)
Where y is a forecast probability and o is an
observation (probability), index k denotes a
number of the n forecast event/pairs. y and o are
limited from 0 to 1 in the probability sense.
BS0 is a perfect forecast, and BS1 is missing
everything
2. BSS (Brier Skill Score)
Resolution Reliability Uncertainty
ref is the reference which is mostly climatology,
BSperf0 for perfect forecast, BSS is ranged from
0-1.
71Brier Score (and decomposition)
3. Algebraic Decomposition of the Brier Score
After some algebra, the Brier Score can be
expressed as three separated terms
Reliability Resolution
Uncertainty
where
Conditional probability of observed and sample
climatology
and
72Brier Score (and decomposition)
4. Example for BS calculation
By considering three equally likely bins Cblt22,
22ltCnlt26 and Cagt26
The average Brier Score is 0.133 for this case,
BS0.133 (range from 0 to 1)
73Brier Score (and decomposition)
5. Example for BS decomposition calculation
Rel0.0056, Res0.0889, Unc0.2222, BS0.1389 ()
74Prob. Evaluation (multi-categories)
- 4. Reliability and possible calibration ( remove
bias ) - For period precipitation evaluation
Calibrated forecast
Skill line
Raw forecast
Observed Frequency ()
Resolution line Climatological prob.
0.16