Title: New developments and issues in forecast verification
1New developments and issues in forecast
verification
- Barbara Brown
- bgb_at_ucar.edu
- Co-authors and contributors Randy Bullock, John
Halley Gotway, Chris Davis, David Ahijevych, Eric
Gilleland, Lacey Holland - NCAR
- Boulder, Colorado
- October 2007
2Issues
- Uncertainty in verification statistics
- Diagnostic and user relevant verification
- Verification of high-resolution forecasts
- Spatial forecast verification
- Incorporation of observational uncertainty
- Verification of probabilistic and ensemble
forecasts - Verification of extremes
- Properties of verification measures
- Propriety, Equitability
3Issues and new developments
- Uncertainty in verification statistics
- Diagnostic and user relevant verification
- Verification of high-resolution forecasts
- Spatial forecast verification
- Incorporation of observational uncertainty
- Verification of probabilistic and ensemble
forecasts - Verification of extremes
- Properties of verification measures
- Propriety, Equitability
4Uncertainty in verification measures
Model precipitation example Equitable Threat
Score (ETS)
Confidence intervals take into account various
sources of error, including sampling and
observational Computation of confidence intervals
for verification stats is not always
straight-forward
5User-relevant verification
Good forecast or Bad forecast?
6User-relevant verification
Good forecast or Bad forecast?
If Im a water manager for this watershed, its a
pretty bad forecast
7User-relevant verification
Good forecast or Bad forecast?
O
If Im an aviation traffic strategic planner
It might be a pretty good forecast
Different users have different ideas about what
makes a good forecast
8Diagnostic and user relevant forecast evaluation
approaches
- Provide the link between weather forecasting and
forecast value - Identify and evaluate attributes of the forecasts
that are meaningful for particular users - Users could be managers, forecast developers,
forecasters, decision makers - Answer questions about forecast performance in
the context of users decisions - Example questions How do model changes impact
user-relevant variables? What is the typical
location error of a thunderstorm? Size of a
temperature error? Timing error? Lead time?
9Diagnostic and user relevant forecast evaluation
approaches (cont.)
- Provide more detailed information about forecast
quality - What went wrong? What went right?
- How can the forecast be improved?
- How do 2 forecasts differ from each other, and in
what ways is one better than the other?
10High vs. low resolution
- Which rain forecast is better?
Smooth forecasts generally Win according to
traditional verification approaches.
From E. Ebert
11Traditional Measures-based approaches
Consider forecasts and observations of some
dichotomous field on a grid
Some problems with this approach (1)
Non-diagnostic doesnt tell us what was wrong
with the forecast or what was right (2)
Utra-sensitive to small errors in simulation of
localized phenomena
CSI 0 for first 4 CSI gt 0 for the 5th
12Spatial forecasts
Weather variables defined over spatial domains
have coherent structure and features
- Spatial verification techniques aim to
- account for uncertainties in timing and location
- account for field spatial structure
- provide information on error in physical terms
- provide information that is
- diagnostic
- meaningful to forecast users
13Recent research on spatial verification methods
- Neighborhood verification methods
- give credit to "close" forecasts
- Scale decomposition methods
- measure scale-dependent error
- Object- and feature-based methods
- evaluate attributes of identifiable features
- Field verification approaches
- measure distortion and displacement (phase error)
for whole field
14Neighborhood verification
- Also called fuzzy verification
- Upscaling
- put observations and/or forecast on coarser grid
- calculate traditional metrics
15Neighborhood verification
- Treatment of forecast data within a window
- Mean value (upscaling)
- Occurrence of event in window
- Frequency of event in window ? probability
- Distribution of values within window
- Fractions skill score (Roberts 2005 Roberts and
Lean 2007)
observed
forecast
Ebert (2007 Met Applications) provides a review
and synthesis of these approaches
16Scale decomposition
- Wavelet component analysis
- Briggs and Levine, 1997
- Casati et al., 2004
- Removes noise
- Examine how different scales contribute to
traditional scores - Does forecast power spectra match the observed
power spectra?
17Scale decomposition
- Casati et al. (2004) intensity-scale approach
- Wavelets applied to binary image
- Traditional score as a function of intensity
threshold and scale
18Feature-based verification
- Composite approach (Nachamkin)
- Contiguous rain area approach (CRA Ebert and
McBride, 2000 Gallus and others) - Error components
- displacement
- volume
- pattern
19Feature- or object-based verification
- Baldwin object-based approach
- Cluster analysis (Marzban and Sandgathe)
- SAL approach for watersheds
- Method for Object-based Diagnostic Evaluation
(MODE) - Others
20MODE object definition
- Two parameters used to identify objects
- Convolution radius
- Precipitation threshold
Raw field
Raw values are restored to the objects, to
allow evaluation of precipitation amount
distributions and other characteristics
Objects
21Object merging and matching
- Definitions
- Merging Associating objects in the same field
- Matching Associating objects between fields
- Fuzzy logic approach
- Attributes used for matching, merging,
evaluation
Example single attributes Location Size
(area) Orientation angle Intensity (0.10, 0.25,
0.50, 0.75, 0.90 quantiles)
Example paired attributes Centroid/boundary
distance Size ratio Angle difference Intensity
differences
22Object-based example 1 June 2005
WRF ARW (24-h)
Stage II
Radius 15 grid squares, Threshold 0.05
23Object-based example 1 June 2006
- Area ratios
- (1) 1.3
- (2) 1.2
- (3) 1.1
- Û All forecast areas were somewhat too large
- Location errors
- (1) Too far West
- (2) Too far South
- (3) Too far North
WRF ARW-2 Objects with Stage II Objects overlaid
24Object-based example 1 June 2006
- Ratio of median intensities in objects
- (1) 1.3
- (2) 0.7
- (3) 1.9
- Ratio of 0.90th quantiles of intensities in
objects - (1) 1.8
- (2) 2.9
- (3) 1.1
- Û All WRF 0.90th intensities were too large 2 of
3 median intensity values were too large
WRF ARW-2 Objects with Stage II Objects overlaid
25Object-based example 1 June 2006
- MODE provides info about areas, displacement,
intensity, etc. - In contrast
- POD 0.40
- FAR 0.56
- CSI 0.27
WRF ARW-2 Objects with Stage II Objects overlaid
26Applications of MODE
- Climatological summaries of object
characteristics - Evaluation of individual forecasting systems
- Systematic errors
- Matching capabilities (overall skill measure)
- Model diagnostics
- User-relevant information
- Performance as a function of scale
- Comparison of forecasting systems
- As above
27Example summary statistics
22-km WRF forecasts from 2001-2002
28Example summary statistics
29Example summary statistics
- MODE Rose Plots
- Displacement of matched forecast objects
30Verification Quilts
- Forecast performance attributes as a function of
spatial scale - Can be created for almost any attribute or
statistic - Provides a summary of performance
- Guides selection of parameters
Verification quilt showing a measure of matching
capability. Warm colors indicate stronger
matches. Based on 9 cases
31MODE availability
Available as part of the Model Evaluation Tools
(MET)
- http//www.dtcenter.org/met/users/
32How can we (rationally) decide which method(s) to
use?
- MODE is just one of many new approaches
- What methods should be recommended to operational
centers, others doing verification? - What are the differences between the various
approaches? - What different forecast attributes can each
approach measure? - What can they tell us about forecast performance?
- How can they be used to
- Improve forecasts?
- Help decision makers?
- Which methods are most useful for specific types
of applications?
33Spatial verification method intercomparison
project
- Methods applied to same datasets
- WRF forecast and gridded observed precipitation
in Central U.S. - NIMROD, MAP D-PHASE/COPS, MeteoSwiss cases
- Perturbed cases
- Idealized cases
- Subjective forecast evaluations
34Intercomparison web page
- References
- Background
- Data and cases
- Software
http//www.ral.ucar.edu/projects/icp/
35Subjective evaluation
- Model performance rated on scale from 1-5 (5 was
best) - N22
36Subjective evaluation
MODEL B
OBS
MODEL A
MODEL C
37Conclusion
- Many new spatial verification methods are
becoming available a new world of verification - Intercomparison project will help lead to better
understanding of new methods - Many other issues remain
- Ensemble and probability forecasts
- Extreme and high impact weather
- Observational uncertainty
- Understanding fundamentals of new methods and
measures (e.g., equitability, propriety)