Title: Verifying high-resolution forecasts
1Verifying high-resolution forecasts
- Advanced Forecasting Techniques
- Forecast Evaluation and Decision Analysis METR
5803 - Guest Lecture Adam J. Clark
- 3 March 2011
2Outline
- Background
- Clark, A. J., W. A. Gallus, M. L. Weisman, 2010
Neighborhood-based verification of precipitation
forecasts from convection-allowing NCAR-WRF model
simulations and the operational NAM. - Clark, A. J., and Coauthors, 2011 Probabilistic
precipitation forecast skill as a function of
ensemble size and spatial scale in a
convection-allowing ensemble.
3Why is verifying high-resolution forecasts
challenging?
- Scale/predictability issues
- As finer scales are resolved, time lengths for
predictability shorten. Was first shown
mathematically by Ed Lorenz - Lorenz, E. N., 1969 The predictability of a flow
which possesses many scales of motion. Tellus,
21, 289307.
4Challenges (cont.)
- Observations many fields are not directly
observed at high-resolution. Thus, models and
data assimilation systems needed to create
analysis datasets at high-resolution. Of course,
this introduces uncertainties that should be
accounted for when doing model evaluation. - At fine scales, forecasts with small errors can
still be extremely useful. For example
5Challenges (cont.)
OBS
Most subjective evaluations would say forecast 2
was better, although all objective metrics
indicate forecast 1 was better. How can we
develop metrics consistent with human subjective
impressions?
1
2
Credit Baldwin et al. 2001
6Non-traditional approaches
- Purely subjective good, fair, poor, ugly scale
of 1 to 10, etc. - Combination of subjective and objective
- MCS example manually categorize possible
forecast outcomes into 2x2 contingency tables and
then compute objective metrics - Objective methods (Casati et al. 2008) has nice
review. - Feature-based (Ebert and McBride 2000 Davis et
al. 2006). Involves defining attributes of
objects - Scale Decomposition (e.g., Casati et al. 2004)
evaluates skill as function of amplitude and
spatial scale of the errors. - Neighborhood-based approaches (e.g., Ebert 2009)
consider values at grid-points within a
specified radius (i.e. neighborhood of an
observation.
7References so far
- Casati, B., G. Ross, and D. B. Stephenson, 2004
A new intensity-scale approach for the
verification of spatial precipitation forecasts.
Meteor. Appl., 11, 141-154. - Casati and Coauthors, 2008 Forecast
verification Current status and future
directions. Meteor. Appl., 15, 3-18. - Baldwin, M. E., S. Lakshmivarahan, and J. S.
Kain, 2001 Verification of mesoscale features in
NWP models. Preprints, Ninth Conf. on Mesoscale
Processes, Fort Lauderdale, FL, Amer. Meteor.
Soc., 255-258. - Ebert, E. E., and J. L. McBride, 2000
Verification of precipitation in weather systems
Determination of systematic errors. J. Hydrol.,
239, 179202. - Ebert, E. E., 2009 Neighborhood verification A
strategy for rewarding close forecasts. Wea.
Forecasting, 24, 1498-1510. - Davis, C. A., B. Brown, and R. Bullock, 2006
Object-based verification of precipitation
forecasts. Part II Application to convective
rain systems. Mon. Wea. Rev., 134, 17851795.
8Neighborhood ETS applied to CAMs
- What was done and why
- Examined a large set of convection-allowing
forecasts run by NCAR 2004-2008 and compared them
to operational NAM.
- Earlier work (e.g., Done et al. 2004) showed
skill scores between coarser models ( 10 km) and
convection-allowing NCAR-WRF almost identical. - This contradicted other findings
Done et al. 2004 Fig 5
9Neighborhood ETS (cont)
- Perhaps lack of differences was simply a result
of inadequate traditional metrics. - When forecasts contain fine-scale and
high-amplitude features slight displacement
errors cause double penalties
observed-but-not-forecast and forecast-but-not-obs
erved - Thus, many recent studies have developed
alternative metrics - Feature based (Ebert and McBride 2000 Davis et
al. 2006) - Scale-decomposition (Casati et al. 2004)
- Neighborhood-based (Roberts and Lean 2008 Ebert
2009)
10Neighborhood ETS (cont)
- To compare the NCAR-WRF and NAM forecast a
neighborhood-based Equitable Threat Score (ETS)
was developed. - Traditional ETS is formulated in terms of
contingency table elements
and is interpreted as fraction of correctly
predicted observed events, adjusted for hits
associate with random chance.
- Neighborhood ETS is computed in terms of
specified radii this study uses 20 to 250 km.
11r 100 km, 81 total grid-points
Open circles observed events Crosses
forecast events Grey shading - hits
12Neighborhood ETS (cont)
- To compute neighborhood ETS
- Simply redefine contingency table elements in
terms of r - Hits (traditional) correct forecast of an event
at a grid-point - Hits (neighborhood) an event is forecast at a
grid-point and observed at any grid-point within
r, or an event is observed at a grid-point and
forecast at any grid-point within r. - Misses (traditional) event is observed at a
grid-point but not forecast - Misses (neighborhood) event is observed at a
grid-point but not forecast at any grid-point
within r. - False Alarms (traditional) event is forecast at
a grid-point but not observed - False Alarms (neighborhood) event is forecast
at a grid-point but not observed at any
grid-point within r. - Correct negatives (traditional and neighborhood)
event is neither forecast nor observed at a
grid-point
13Neighborhood ETS (cont)
14Neighborhood ETS (cont.)
15Neighborhood ETS (cont.)
16Neighborhood ETS (cont.)
17Neighborhood ETS (cont.)
- Time series at constant radii
18An aside computing statistical significance for
categorical metrics
- First, what are the ways to compute averages of
categorical metrics over a set of cases? - 1) Compute the metric for each case and then
average over all cases. - 2) Sum contingency table elements over all cases,
and then compute the metric from the summed
elements.
19Hamill, T. M., 1999 Hypothesis tests for
evaluation numerical precipitation forecasts.
Wea. Forecasting, 14, 155-167.
- You have to be very careful computing statistical
significance for precipitation forecasts! Key
quote Precipitation forecast errors are often
non-normally distributed and have spatially
and/or temporally correlated error the effective
number of independent samples is much less than
the total number of grid-points. - What does this mean??
- Hamill outlines several potential approaches for
computing significance for threat scores. He
finds a rather involved resampling approach is
most appropriate for non-probabilistic forecasts.
-
20Resampling
Forecast 1 Forecast 2
Case 1 a, b, c, d a, b, c, d
Case 2 a, b, c, d a, b, c, d
Case 3 a, b, c, d a, b, c, d
Case 4 a, b, c, d a, b, c, d
Case 5 a, b, c, d a, b, c, d
Case 6 a, b, c, d a, b, c, d
Case 7 a, b, c, d a, b, c, d
ETS ETS
Forecast 1 Forecast 2
Case 1 a, b, c, d a, b, c, d
Case 2 a, b, c, d a, b, c, d
Case 3 a, b, c, d a, b, c, d
Case 4 a, b, c, d a, b, c, d
Case 5 a, b, c, d a, b, c, d
Case 6 a, b, c, d a, b, c, d
Case 7 a, b, c, d a, b, c, d
ETS ETS
Randomly shuffle the sets of contingency table
elements for each case and recompute ETS
difference. Repeat 1000 x. Construct a
distribution of ETS differences and test whether
test statistic falls within distribution. Dont
forget about bias!!
Difference between unshuffled ETS is test
statistic
21Neighborhood ETS Compositing
- OFi is observed frequency for each grid-point
within a conditional distribution of observed
events. - N is of grid-points in domain
- n is of grid-points in radius-of-influence
22Neighborhood ETS (cont) Compositing
- Composite observations relative to forecasts.
23Neighborhood ETS (cont) compositing
24Neighborhood ETS (cont)
- Composite animations from SE2010 SSEF members
with different microphysics
25Neighborhood ETS (cont)
- Conclusions
- Computing ETS on raw grids gave small
differences. When criteria for hits was relaxed
using neighborhood approach, more dramatic
differences were seen that better reflected
overall subjective impressions of forecasts.
26Probabilistic Precipitation Forecast Skill as a
function of ensemble size and spatial scale in a
convection-allowing ensemble
- Adam J. Clark NOAA/National Severe Storms
Laboratory - John S. Kain, David J. Stensrud, Ming Xue, Fanyou
Kong, Michael C. Coniglio, Kevin W. Thomas, - Yunheng Wang, Keith Brewster, Jidong Gao, Steven
J. Weiss, and Jun Du - 13 October 2010
- 25th Conference on Severe Local Storms, Denver, CO
27Introduction/motivation
- Basic idea
- Convection-allowing ensembles ( 4km
grid-spacing) provide added value relative to
operational systems that parameterize convection.
- Precipitation forecasts are improved,
- diurnal rainfall cycle is better depicted,
- ensemble dispersion is more representative of
forecast uncertainty, and - explicitly simulated storm attributes can provide
useful information on potential
convection-related hazards - However, convection-allowing models are very
computationally expensive relative to current
operational mesoscale ensembles. - EXAMPLE Back of the envelope calculation for
expense to make SREF convection-allowing - 32 km to 4 km is decrease in grid-spacing by a
factor of 8. To account for 3D increase in of
grid-points and time-step reduction, take 83
which gives increase in computational expense by
factor of 500. Then take 500 times 21 members
10,500.
28Introduction/motivation (cont)
- Given computational expense, it would be useful
to explore whether point of diminishing returns
is reached with increasing ensemble size, n. - 2009 Storm-scale ensemble forecast (SSEF) system
provides good opportunity to examine the issue
because it is relatively large - 20 members 17 members used herein
- Decent number of warm season cases (25).
29Data and Methodology
- 2009 SSEF system configuration
10 WRF-ARW members
8 WRF-NMM members
2 ARPS members
30Data and Methodology (cont)
- Domain and cases examined
SSEF domain
analysis domain
31Data and Methodology (cont)
- 6-hr accumulated rainfall examined for forecast
hour 6-30. - Stage IV rainfall estimates used for rainfall
observations. - Forecast probabilities for rainfall exceeding
0.10-, 0.25-, 0.50-, and 1.00-in were computed by
find the location of the verification threshold
within the distribution of ensemble members. - Area under the relative operating characteristic
curve (ROC area) used for objective evaluation. - 1.0 is perfect, 0.5 is no skill, and below 0.50
is negative skill - ROC areas were computed for 100 unique
combinations of randomly selected ensemble
members for n 2 to 15. For n 1, 16, and 17
all possible combinations of members used.
32Data and Methodology (cont)
- Examination of different spatial scales.
- Observed and forecast precipitation averaged over
increasingly large spatial scales. Why do this?
Not all end users require accuracy at 4-km
scales
- For the verification, constant quantiles were
used to compare across different scales. Cannot
compare constant thresholds because precipitation
distribution changes.
33Results
- Panels show ROC areas as function of n.
- Different colors of shading correspond to
different scales. Areas within each color show
the inter-quartile range for the distribution of
ROC areas (recall, for each n, ROC areas were
computed for 100 unique combinations of members). - For each color, the darker shading denotes ROC
areas that are less than those of the full 17
member ensemble with differences that are
statistically significant.
- Main result More members are required to reach
statistically indistinguishable ROC areas
relative to the full ensemble as forecast lead
time increases and spatial scale decreases (and
ROC areas arent bad).
34RESULTS (cont)
- Interpretation
- Rise in ROC area reflects gain in skill as
forecast PDF is better sampled. - More members are required to effectively sample a
wider forecast PDF, so the n at which we see
skill flatten should increase with a broadening
PDF. Two variables in the analysis are
associated with a widening of the PDF - 1) increasing forecast lead time (because
model/analysis errors grow) - 2) decreasing spatial scale (because errors grow
faster at smaller scales) - Caveats
- 1) Averages are presented. Specific cases with
lower than average predictability (higher spread)
should require more members to reach point of
diminishing returns. Low probability events
require more members to achieve reliable
forecasts (e.g. Richardson 2001).
35RESULTS (cont)
- More Caveats
- Ensemble is under-dispersive.
- A reliable ensemble with more spread would
require more members to effectively sample the
forecast PDF.
36CONCLUSIONS
- Spatial scale and forecast lead time needed for
end users should be carefully considered in
future convection-allowing ensemble designs. - Future work needed to improve reliability of
convection-allowing ensembles, and further
evaluations are needed for weather regimes with
varying degrees of predictability. - QUESTIONS??