PROBABILISTIC FORECASTS AND THEIR VERIFICATION

About This Presentation

Title:

PROBABILISTIC FORECASTS AND THEIR VERIFICATION

Description:

SINGLE FORECASTS Statistical rendition of pdf. ENSEMBLE FORECASTS NWP ... record (even for hind-casts) Statistical significance in comparative verification ... – PowerPoint PPT presentation

Number of Views:144

Avg rating:3.0/5.0

Slides: 75

Provided by: wd25

Category:

more less

Transcript and Presenter's Notes

Title: PROBABILISTIC FORECASTS AND THEIR VERIFICATION

1
PROBABILISTIC FORECASTS AND THEIR VERIFICATION

Zoltan Toth
Environmental Modeling Center
NOAA/NWS/NCEP
Ackn. Yuejian Zhu and Olivier Talagrand (1)
(1) Ecole Normale Superior and LMD, Paris,
France
http//wwwt.emc.ncep.noaa.gov/gmb/ens/index.html

2
OUTLINE / SUMMARY

SCIENCE OF FORECASTING
GOAL OF SCIENCE Forecasting
VERIFICATION Model development, user feedback
GENERATION OF PROBABILISTIC FORECASTS
SINGLE FORECASTS Statistical rendition of pdf
ENSEMBLE FORECASTS NWP-based, case-dependent pdf
ATTRIBUTES OF FORECAST SYSTEMS
RELIABILITY Forecasts look like nature
statistically
RESOLUTION Forecasts indicate actual future
developments
VERIFICATION OF PROBABILSTIC ENSEMBLE FORECASTS
UNIFIED PROBABILISTIC MEASURES Dimensionless
ENSEMBLE MEASURES Evaluate finite sample
STATISTICAL POSTPROCESSING OF FORECASTS
STATISTICAL RELIABILITY Make it perfect
STATISTICAL RESOLUTION Keep it unchanged

3
SCIENCE OF FORECASTING

Ultimate goal of science
Forecasting
Meteorology is in forefront
Weather forecasting constantly in publics eye
Approach
Observe what is relevant and available
Analyze data
Build general knowledge about nature based on
analysis
Generalization abstraction Laws,
relationships
Build model of reality based on general knowledge
Conceptual
Quantitative/numerical, including various
physical etc processes
Analog
Predict whats not observable in
Space eg, data assimilation
Time - eg, future weather
Variables / processes
Verify (ie, compare with observations)

4
PREDICTIONS IN TIME

Method
Use model of nature for projection in time
Start model with estimate of state of nature at
initial time
Sources of errors
Discrepancy between model and nature
Added at every time step
Discrepancy between estimated and actual state of
nature
Initial error
Chaotic systems
Common type of dynamical systems
Characterized with at least one perturbation
pattern that amplifies
All errors project onto amplifying directions
Any initial and/or model error
Predictability limited
Ed Lorenz legacy
Verification quantifies situation

5
MOTIVATION FOR ENSEMBLE FORECASTING

FORECASTS ARE NOT PERFECT - IMPLICATIONS FOR
USERS
Need to know how often / by how much forecasts
fail
Economically optimal behavior depends on
Forecast error characteristics
User specific application
Cost of weather related adaptive action
Expected loss if no action taken
EXAMPLE Protect or not your crop against
possible frost
Cost 10k, Potential Loss 100k gt Will protect
if P(frost) gt Cost/Loss0.1
NEED FOR PROBABILISTIC FORECAST INFORMATION
DEVELOPERS
Need to improve performance - Reduce error in
estimate of first moment
Traditional NWP activities (I.e., model, data
assimilation development)
Need to account for uncertainty - Estimate higher
moments
New aspect How to do this?
Forecast is incomplete without information on
forecast uncertainty
NEED TO USE PROBABILISTIC FORECAST FORMAT

6
GENERATION OF PROBABILISTIC FORECASTS

How to determine forecast probability?
Fully statistical methods losing relevance
Numerical modeling
Liouville Equations provide pdfs
Not practical (computationally intractable)
Finite sample of pdf
Single or multiple (ensemble) integrations
Increasingly finer resolution estimate in
probabilities
How to make (probabilistic) forecasts reliable?
Construct pdf
Assess reliability
Construct frequency distribution of observations
following forecast classes
Replace form of forecast with associated
frequency distribution of observations
Production and verification of forecasts
connected in operations

7
FORECASTING IN A CHAOTIC ENVIRONMENT
PROBABILISTIC FORECASTING BASED ON A SINGLE
FORECAST One integration with an NWP model,
combined with past verification statistics
DETERMINISTIC APPROACH - PROBABILISTIC FORMAT

Does not contain all forecast information
Not best estimate for future evolution of system
UNCERTAINTY CAPTURED IN TIME AVERAGE SENSE -
NO ESTIMATE OF CASE DEPENDENT VARIATIONS IN FCST
UNCERTAINTY

8
SCIENTIFIC BACKGROUND WEATHER FORECASTS ARE
UNCERTAIN
Buizza 2002
9

FORECASTING IN A CHAOTIC ENVIRONMENT - 2
DETERMINISTIC APPROACH - PROBABILISTIC FORMAT
PROBABILISTIC FORECASTING -
Based on Liuville Equations
Continuity equation for probabilities, given
dynamical eqs. of motion
Initialize with probability distribution
function (pdf) at analysis time
Dynamical forecast of pdf based on conservation
of probability values
Prohibitively expensive -
Very high dimensional problem (state space x
probability space)
Separate integration for each lead time
Closure problems when simplified solution sought

10
FORECASTING IN A CHAOTIC ENVIRONMENT -
3DETERMINISTIC APPROACH - PROBABILISTIC FORMAT

MONTE CARLO APPROACH ENSEMBLE FORECASTING
IDEA Sample sources of forecast error
Generate initial ensemble perturbations
Represent model related uncertainty
PRACTICE Run multiple NWP model integrations
Advantage of perfect parallelization
Use lower spatial resolution if short on
resources
USAGE Construct forecast pdf based on finite
sample
Ready to be used in real world applications
Verification of forecasts
Statistical post-processing (remove bias in 1st,
2nd, higher moments)
CAPTURES FLOW DEPENDENT VARIATIONS
IN FORECAST UNCERTAINTY

11
6 hours ET / breeding cycle
Re-scaling
6hrs
Up to 16-d
Next T00Z
T00Z 80m
Re-scaling
T06Z 80m
Up to 16-d
Re-scaling
T12Z 80m
Up to 16-d
Re-scaling
T18Z 80m
Up to 16-d
12
USER REQUIREMENTSPROBABILISTIC FORECAST
INFORMATION IS CRITICAL

13
HOW TO DEAL WITH FORECAST UNCERTAINTY?

No matter what / how sophisticated forecast
methods we use
Forecast skill limited
Skill varies from case to case
Forecast uncertainty must be assessed by
meteorologists

THE PROBABILISTIC APPROACH
14
SOCIO-ECONOMIC BENEFITS OFSEAMLESS
WEATHER/CLIMATE FORECAST SUITE
Commerce Energy
Ecosystem Health
Hydropower Agriculture
Boundary Condition Sensitivity
Reservoir control Recreation
Transportation Fire weather
Initial Condition Sensitivity
Flood mitigation Navigation
Protection of Life/Property
Weeks
Minutes
Days
Hours
Years
Seasons
Months
15
ENSEMBLE FORECASTS

Definition
Finite sample to estimate full probability
distribution
Full solution (Liouville Eqs.) computationally
intractable
Interpretation (assignment of probabilities)
Crude
Step-wise increase in cumulative forecast
probability distribution
Performance dependent on size of ensemble
Enhanced
Inter- extrapolation (dressing)
Performance improvement depends on quality of
inter- extrapolation
Based on assumptions
Linear interpolation (each member equally
likely)
Based on verification statistics
Kernel or other methods (Inclusion of some
statist. bias-correction)

16
(No Transcript)
17
(No Transcript)
18
144 hr forecast
Poorly predictable large scale wave Eastern
Pacific Western US
Highly predictable small scale wave Eastern US
Verification
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
FORECAST EVALUATION

Statistical approach
Evaluates set of forecasts and not a single
forecast
Interest in comparing forecast systems
Forecasts generated by same procedure
Sample size affects how fine stratification is
possible
Level of details is limited
Size of sample limited by available obs. record
(even for hind-casts)
Statistical significance in comparative
verification
Error in proxy for truth
Observations or numerical analysis
Types
Forecast statistics
Depends only on forecast properties
Verification statistics
Comparison of forecast and proxy for truth in
statistical sense
Depends on both natural and forecast systems
Nature represented by proxy
Observations (including observational error)

23
FORECAST VERIFICATION

Types
Measures of quality
Environmental science issues
Main focus here
Measures of utility
Multidisciplinary
Social economic issues, beyond environmental
sciences
Socio-economic value of forecasts is ultimate
measure
Approximate measures can be constructed
Quality vs. utility
Improved quality
Generally permits enhanced utility (assumption)
How to improve utility if quality is fixed?
Providers communicate all available information
E.g., offer probabilistic or other information on
forecast uncertainty
Engage in education, training
Users identify forecast aspects important to them
Can providers selectively improve certain aspects
of forecasts?

24
EVALUATING QUALITY OF FORECAST SYSTEMS

Goal
Infer comparative information about forecast
systems
Value added by
New methods
Subsequent steps in end-to-end forecast process
(eg., manual changes)
Critical for monitoring and improving operational
forecast systems
Attributes of forecast systems
Traditionally, forecast attributes defined
separately for each fcst format
General definition needed
Need to compare forecasts
From any system
Of any type / format
Single, ensemble, categorical, probabilistic, etc
Supports systematic evaluation of
End-to-end (provider-user) forecast process
Statistical post-processing as integral part of
system

25
FORECAST SYSTEM ATTRIBUTES

Abstract concept (like length)
Reliability and Resolution
Both can be measured through different statistics
Statistical property
Interpreted for large set of forecasts
Describe behavior of forecast system, not a
single forecast
For their definition, assume that
Forecasts
Can be of any format
Single value, ensemble, categorical,
probabilistic, etc
Take a finite number of different classes Fa
Observations
Can also be grouped into finite number of
classes like Oa

26
STATISTICAL RELIABILITY TEMPORAL AGGREGATE
STATISTICAL CONSISTENCY OF FORECASTS WITH
OBSERVATIONS

BACKGROUND
Consider particular forecast class Fa
Consider frequency distribution of observations
that follow forecasts Fa - fdoa
DEFINITION
If forecast Fa has the exact same form as fdoa,
for all forecast classes,
the forecast system is statistically consistent
with observations gt
The forecast system is perfectly reliable
MEASURES OF RELIABILITY
Based on different ways of comparing Fa and fdoa

27
STATISTICAL RESOLUTION TEMPORAL EVOLUTION
ABILITY TO DISTINGUISH, AHEAD OF TIME, AMONG
DIFFERENT OUTCOMES

BACKGROUND
Assume observed events are classified into finite
number of classes, like Oa
DEFINITION
If all observed classes (Oa, Ob,) are preceded
by
Distinctly different forecasts (Fa, Fb,)
The forecast system resolves the problem gt
The forecast system has perfect resolution
MEASURES OF RESOLUTION
Based on degree of separation of fdos that
follow various forecast classes
Measured by difference between fdos climate
distribution
Measures differ by how differences between
distributions are quantified

FORECASTS
OBSERVATIONS
EXAMPLES
28
CHARACTERISTICS OF RELIABILITY RESOLUTION

Reliability
Related to form of forecast, not forecast content
Fidelity of forecast
Reproduce nature when resolution is perfect,
forecast looks like nature
Not related to time sequence of forecast/observed
systems
How to improve?
Make model more realistic
Also expected to improve resolution
Statistical bias correction Can be statistically
imposed at one time level
If both natural forecast systems are stationary
in time
If there is a large enough set of
observed-forecast pairs
Link with verification
Replace forecast with corresponding fdo
Resolution
Related to inherent predictive value of forecast
system
Not related to form of forecasts
Statistical consistency at one time level
(reliability) is irrelevant
How to improve?

29
CHARACTERISTICS OF FORECAST SYSTEM ATTRIBUTES

RELIABILITY AND RESOLUTION ARE
General forecast attributes
Valid for any forecast format (single,
categorical, probabilistic, etc)
Independent attributes
For example
Climate pdf forecast is perfectly reliable, yet
has no resolution
Reversed rain / no-rain forecast can have perfect
resolution and no reliability
To separate them, they must be measured according
to general definition
If measured according to traditional, narrower
definition
Reliability resolution can be mixed
Function of forecast quality
There is no other relevant forecast attribute
Perfect reliability and perfect resolution
perfect forecast system
Deterministic forecast system that is always
correct
Both needed for utility of forecast systems

30
FORMAT OF FORECASTS PROBABILSITIC FORMAT

Do we have a choice?
When forecasts are imperfect
Only probabilistic format can be
reliable/consistent with nature
Abstract concept
Related to forecast system attributes
Space of probability dimensionless pdf or
similar format
For environmental variables (not those variables
themselves)
Definition
Define event
Function of concrete variables, features, etc
E.g., temperature above freezing
thunderstorm
Determine probability of event occurring in
future
Based on knowledge of initial state and evolution
of system

31
OPERATIONAL PROB/ENSEMBLE FORECAST VERIFICATION

Requirements
Use same general dimensionless probabilistic
measures for verifying
Any event
Against either
Observations or
Numerical analysis
Measures used at NCEP
Probabilistic forecast measures ensemble
interpreted probabilistically
Reliability
Component of BSS, RPSS, CRPSS
Attributes Talagrand diagrams
Resolution
Component of BSS, RPSS, CRPSS
ROC, attributes diagram, potential economic value
Special ensemble verification procedures
Designed to assess performance of finite set of
forecasts
Most likely member statistics, PECA

32
FORECAST PERFORMANCE MEASURES
COMMON CHARACTERISTIC Function of both forecast
and observed values
MEASURES OF RELIABILITY DESCRIPTION Statisticall
y compares any sample of forecasts with sample of
corresponding observations GOAL To assess
similarity of samples (e.g., whether 1st and 2nd
moments match) EXAMPLES Reliability component
of Brier Score Ranked Probability
Score Analysis Rank Histogram Spread vs. Ens.
Mean error Etc.
MEASURES OF RESOLUTION DESCRIPTION Compares the
distribution of observations that follows
different classes of forecasts with the climate
distribution (as reference) GOAL To assess how
well the observations are separated when grouped
by different classes of preceding
fcsts EXAMPLES Resolution component of Brier
Score Ranked Probability Score Information
content Relative Operational Characteristics Relat
ive Economic Value Etc.
COMBINED (RELRES) MEASURES Brier, Cont. Ranked
Prob. Scores, rmse, PAC,
33
EXAMPLE PROBABILISTIC FORECASTS
RELIABILITY Forecast probabilities for given
event match observed frequencies of that event
(with given prob. fcst) RESOLUTION Many
forecasts fall into classes corresponding to high
or low observed frequency of given
event (Occurrence and non-occurrence of event is
well resolved by fcst system)
34
(No Transcript)
35
PROBABILISTIC FORECAST PERFORMANCE MEASURES
TO ASSESS TWO MAIN ATTRIBUTES OF PROBABILISTIC
FORECASTS RELIABILITY AND RESOLUTION Univariate
measures Statistics accumulated point by
point in space Multivariate measures Spatial
covariance is considered
BRIER SKILL SCORE (BSS)
EXAMPLE
COMBINED MEASURE OF RELIABILITY AND RESOLUTION
36
BRIER SKILL SCORE (BSS)
COMBINED MEASURE OF RELIABILITY AND RESOLUTION

METHOD
Compares pdf against analysis
Resolution (random error)
Reliability (systematic error)
EVALUATION
BSS Higher better
Resolution Higher better
Reliability Lower better
RESULTS
Resolution dominates initially
Reliability becomes important later
ECMWF best throughout
Good analysis/model?
NCEP good days 1-2
Good initial perturbations?
No model perturb. hurts later?
CANADIAN good days 8-10

May-June-July 2002 average Brier skill score for
the EC-EPS (grey lines with full circles), the
MSC-EPS (black lines with open circles) and the
NCEP-EPS (black lines with crosses). Bottom
resolution (dotted) and reliability(solid)
contributions to the Brier skill score. Values
refer to the 500 hPa geopotential height over the
northern hemisphere latitudinal band 20º-80ºN,
and have been computed considering 10
equally-climatologically-likely intervals (from
Buizza, Houtekamer, Toth et al, 2004)
37
BRIER SKILL SCORE
COMBINED MEASURE OF RELIABILITY AND RESOLUTION
38
RANKED PROBABILITY SCORE
COMBINED MEASURE OF RELIABILITY AND RESOLUTION
39
Continuous Rank Probability Score
CRP Skill Score is
Xo
100
Obs (truth)
Heaviside Function H
50
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
40
ANALYSIS RANK HISTOGRAM (TALAGRAND DIAGRAM)
MEASURE OF RELIABILITY
41
ENSEMBLE MEAN ERROR VS. ENSEMBLE SPREAD
MEASURE OF RELIABILITY
Statistical consistency between the ensemble and
the verifying analysis means that the verifying
analysis should be statistically
indistinguishable from the ensemble members
gt Ensemble mean error (distance between ens.
mean and analysis) should be equal to ensemble
spread (distance between ensemble mean and
ensemble members)
In case of a statistically consistent ensemble,
ens. spread ens. mean error, and they are both
a MEASURE OF RESOLUTION. In the presence of bias,
both rms error and PAC will be a combined measure
of reliability and resolution
42
INFORMATION CONTENT
MEASURE OF RESOLUTION
43
RELATIVE OPERATING CHARACTERISTICS
MEASURE OF RESOLUTION
44
ECONOMIC VALUE OF FORECASTS
MEASURE OF RESOLUTION
45
PERTURBATION VS. ERROR CORRELATION ANALYSIS (PECA)
MULTIVATIATE COMBINED MEASURE OF RELIABILITY
RESOLUTION

METHOD Compute correlation between ens
perturbtns and error in control fcst for
Individual members
Optimal combination of members
Each ensemble
Various areas, all lead time
EVALUATION Large correlation indicates ens
captures error in control forecast
Caveat errors defined by analysis
RESULTS
Canadian best on large scales
Benefit of model diversity?
ECMWF gains most from combinations
Benefit of orthogonalization?
NCEP best on small scale, short term
Benefit of breeding (best estimate initial
error)?
PECA increases with lead time
Lyapunov convergence
Nonlilnear saturation
Higher values on small scales

46
WHAT WE NEED FOR POSTPROCESSING TO WORK?

LARGE SET OF FCST OBS PAIRS
Consistency defined over large sample need same
for post-processing
Larger the sample, more detailed corrections can
be made
BOTH FCST AND REAL SYSTEMS MUST BE STATIONARY IN
TIME
Otherwise can make things worse
Subjective forecasts difficult to calibrate

HOW WE MEASURE STATISTICAL INCONSISTENCY?

MEASURES OF STATIST. RELIABILITY
Time mean error
Analysis rank histogram (Talagrand diagram)
Reliability component of Brier etc scores
Reliability diagram

47
SOURCES OF STATISTICAL INCONSISTENCY

TOO FEW FORECAST MEMBERS
Single forecast inconsistent by definition,
unless perfect
MOS fcst hedged toward climatology as fcst skill
is lost
Small ensemble sampling error due to limited
ensemble size
(Houtekamer 1994?)
MODEL ERROR (BIAS)
Deficiencies due to various problems in NWP
models
Effect is exacerbated with increasing lead time
SYSTEMATIC ERRORS (BIAS) IN ANALYSIS
Induced by observations
Effect dies out with increasing lead time
Model related
Bias manifests itself even in initial conditions
ENSEMBLE FORMATION (INPROPER SPREAD)
Not appropriate initial spread
Lack of representation of model related
uncertainty in ensemble
I. E., use of simplified model that is not able
to account for model related uncertainty

48
HOW TO IMPROVE STATISTICAL CONSISTENCY?

MITIGATE SOURCES OF INCONSISTENCY
TOO FEW MEMBERS
Run large ensemble
MODEL ERRORS
Make models more realistic
INSUFFICIENT ENSEMBLE SPREAD
Enhance models so they can represent model
related forecast uncertainty
OTHERWISE gt
STATISTICALLY ADJUST FCST TO REDUCE INCONSISTENCY
Unpreferred way of doing it
What we learn can feed back into development to
mitigate problem at sources
Can have LARGE impact on (inexperienced) users
Two separate issues
Bias correct against NWP analysis
Reduce lead time dependent model behavior
Downscale NWP analysis
Connect with observed variables that are
unresolved by NWP models

49
(No Transcript)
50
OUTLINE / SUMMARY

SCIENCE OF FORECASTING
GOAL OF SCIENCE Forecasting
VERIFICATION Model development, user feedback
GENERATION OF PROBABILISTIC FORECASTS
SINGLE FORECASTS Statistical rendition of pdf
ENSEMBLE FORECASTS NWP-based, case-dependent pdf
ATTRIBUTES OF FORECAST SYSTEMS
RELIABILITY Forecasts look like nature
statistically
RESOLUTION Forecasts indicate actual future
developments
VERIFICATION OF PROBABILSTIC ENSEMBLE FORECASTS
UNIFIED PROBABILISTIC MEASURES Dimensionless
ENSEMBLE MEASURES Evaluate finite sample
STATISTICAL POSTPROCESSING OF FORECASTS
STATISTICAL RELIABILITY Make it perfect
STATISTICAL RESOLUTION Keep it unchanged

51
http//wwwt.emc.ncep.noaa.gov/gmb/ens/ens_info.htm
l Toth, Z., O. Talagrand, and Y. Zhu, 2005 The
Attributes of Forecast Systems A Framework for
the Evaluation and Calibration of Weather
Forecasts. In Predictability Seminars, 9-13
September 2002, Ed. T. Palmer, ECMWF, pp.
584-595. Toth, Z., O. Talagrand, G. Candille,
and Y. Zhu, 2003 Probability and ensemble
forecasts. In Environmental Forecast
Verification A practitioner's guide in
atmospheric science. Ed. I. T. Jolliffe and D.
B. Stephenson. Wiley, p. 137-164.
52
BACKGROUND
53
NOTES FOR NEXT YEAR

Define predictand
Exhaustive set of events, eg
Continuous temperature
Precipitation type (Categorical)

54
SUMMARY

WHY DO WE NEED PROBABILISTIC FORECASTS?
Isnt the atmosphere deterministic? YES, but
its also CHAOTIC
FORECASTERS PERSPECTIVE USERS PERSPECTIVE
Ensemble techniques Probabilistic description
WHAT ARE THE MAIN ATTRIBUTES OF FORECAST SYSTEMS?
RELIABILITY Stat. consistency with distribution
of corresponding observations
RESOLUTION Different events are preceded by
different forecasts
WHAT ARE THE MAIN TYPES OF FORECAST METHODS?
EMPIRICAL Good reliability, limited resolution
(problems in new situations)
THEORETICAL Potentially high resolution, prone to
inconsistency
ENSEMBLE METHODS
Only practical way of capturing fluctuations in
forecast uncertainty due to
Case dependent dynamics acting on errors in
Initial conditions
Forecast methods
HOW CAN PROBABILSTIC FORECAST PERFORMANCE BE
MEASURED?

55
OUTLINE

STATISTICAL EVALUATION OF FORECAST SYSTEMS
ATTRIBUTES OF FORECAST SYSTEMS
FORECAST METHODS
EMPIRICALLY BASED
THEORETICALLY BASED
LIMITS OF PREDICTABILITY
LIMITING FACTORS
ASSESSING PREDICTABILITY
Ensemble forecasting
VERIFICATION MEASURES
MEASURING FORECAST SYSTEM ATTRIBUTES
STATISTICAL POST-PROCESSING OF FORECASTS
IMPROVING STATISTICAL RELIABILITY

56
CRPS Decomposition

Yuejian Zhu
Environmental Modeling Center
NOAA/NWS/NCEP
Acknowledgements
Zoltan Toth EMC

57
Continuous Rank Probability Score
CRP Skill Score is
Xo
100
Obs (truth)
Heaviside Function H
50
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
58
CRPS Decomposition

Xo
100
OBS (truth)
P-probability
50
General example
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
59
CRPS Decomposition

Example of outlier (right)
Xo
100
OBS (truth)
P-probability
50
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
60
CRPS Decomposition

Example of outlier (left)
Xo
100
OBS (truth)
P-probability
50
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
61
CRPS Decomposition

Where
62
CRPS Decomposition
Time, space average

Observation frequency
100
CDF of
General example
P-probability
50
CDF of
0
X
p07
p09
p08
p06
p03
p02
p01
p04
p05
p10
Order of 10 ensemble members (p01, p02,,p10)
63
CRPS Decomposition
Reliability diagram
100
90
80
70
60
50
Observed relative frequency ()
40
30
20
10
0
0 10 20 30 40
50 60 70 80 90
100 Forecast probability ()
64
CRPS Decomposition
Reliability diagram
100
90
Left outlier
80
70
60
50
Observed relative frequency ()
40
30
100 unreliable
20
Right outlier
10
0
0 10 20 30 40
50 60 70 80 90
100 Forecast probability ()
65
CRPS Decomposition
Reliability diagram
100
100 reliable
90
80
70
60
50
Observed relative frequency ()
40
30
20
10
0
0 10 20 30 40
50 60 70 80 90
100 Forecast probability ()
66
CRPS Decomposition
CRPS 0 ----------------? 1.0 RELI
0 ---------------? 0.5 RESO 0
----------------? 1.0 UNCE 0
----------------? 1.0
67
Ranked Probabilistic Score
Ranked (ordered) Probability Score (RPS) is to
verify multi-category probability forecasts, to
measure both reliability and resolution which
based on climatologically equally likely bins
and
Verify Analysis
Ensemble Forecast
x
OBS On FCST PROB Pn
0
0
0
0
1
0
0
0
0
0
0
0
20
10
0
10
30
20
0
10
i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 k
number of categories
Example of 10 climatologically equally likely
bins, 10 ensembles
68
RMSE and Spread
Mean and absolute errors
10 meter wind (u-component) Less biased, There is
less room to improve the skill by bias-correction
only
CRPSS
69
24h improvement by NAEFS
RPSS .vs CRPSS
Winter 2006-2007 NH 2m temperature For NCEP raw
forecast (black) NCEP bias corrected forecast
(red) NAEFS forecast (pink)
ROC score
70
Brier Score (and decomposition)
See ltltStatistical Methods in the Atmospheric
Sciencegtgt by D. S. Wilks, Chapter 7 Forecast
Verification
1. BS (Brier Score)
Where y is a forecast probability and o is an
observation (probability), index k denotes a
number of the n forecast event/pairs. y and o are
limited from 0 to 1 in the probability sense.
BS0 is a perfect forecast, and BS1 is missing
everything
2. BSS (Brier Skill Score)
Resolution Reliability Uncertainty

ref is the reference which is mostly climatology,
BSperf0 for perfect forecast, BSS is ranged from
0-1.
71
Brier Score (and decomposition)
3. Algebraic Decomposition of the Brier Score
After some algebra, the Brier Score can be
expressed as three separated terms
Reliability Resolution
Uncertainty
where
Conditional probability of observed and sample
climatology
and
72
Brier Score (and decomposition)
4. Example for BS calculation
By considering three equally likely bins Cblt22,
22ltCnlt26 and Cagt26
The average Brier Score is 0.133 for this case,
BS0.133 (range from 0 to 1)
73
Brier Score (and decomposition)
5. Example for BS decomposition calculation
Rel0.0056, Res0.0889, Unc0.2222, BS0.1389 ()
74
Prob. Evaluation (multi-categories)