Verifying high-resolution forecasts - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Verifying high-resolution forecasts

Description:

Verifying high-resolution forecasts Advanced Forecasting Techniques Forecast Evaluation and Decision Analysis METR 5803 Guest Lecture: Adam J. Clark – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 37
Provided by: adamc88
Category:

less

Transcript and Presenter's Notes

Title: Verifying high-resolution forecasts


1
Verifying high-resolution forecasts
  • Advanced Forecasting Techniques
  • Forecast Evaluation and Decision Analysis METR
    5803
  • Guest Lecture Adam J. Clark
  • 3 March 2011

2
Outline
  • Background
  • Clark, A. J., W. A. Gallus, M. L. Weisman, 2010
    Neighborhood-based verification of precipitation
    forecasts from convection-allowing NCAR-WRF model
    simulations and the operational NAM.
  • Clark, A. J., and Coauthors, 2011 Probabilistic
    precipitation forecast skill as a function of
    ensemble size and spatial scale in a
    convection-allowing ensemble.

3
Why is verifying high-resolution forecasts
challenging?
  • Scale/predictability issues
  • As finer scales are resolved, time lengths for
    predictability shorten. Was first shown
    mathematically by Ed Lorenz
  • Lorenz, E. N., 1969 The predictability of a flow
    which possesses many scales of motion. Tellus,
    21, 289307.

4
Challenges (cont.)
  • Observations many fields are not directly
    observed at high-resolution. Thus, models and
    data assimilation systems needed to create
    analysis datasets at high-resolution. Of course,
    this introduces uncertainties that should be
    accounted for when doing model evaluation.
  • At fine scales, forecasts with small errors can
    still be extremely useful. For example

5
Challenges (cont.)
OBS
Most subjective evaluations would say forecast 2
was better, although all objective metrics
indicate forecast 1 was better. How can we
develop metrics consistent with human subjective
impressions?
1
2
Credit Baldwin et al. 2001
6
Non-traditional approaches
  • Purely subjective good, fair, poor, ugly scale
    of 1 to 10, etc.
  • Combination of subjective and objective
  • MCS example manually categorize possible
    forecast outcomes into 2x2 contingency tables and
    then compute objective metrics
  • Objective methods (Casati et al. 2008) has nice
    review.
  • Feature-based (Ebert and McBride 2000 Davis et
    al. 2006). Involves defining attributes of
    objects
  • Scale Decomposition (e.g., Casati et al. 2004)
    evaluates skill as function of amplitude and
    spatial scale of the errors.
  • Neighborhood-based approaches (e.g., Ebert 2009)
    consider values at grid-points within a
    specified radius (i.e. neighborhood of an
    observation.

7
References so far
  • Casati, B., G. Ross, and D. B. Stephenson, 2004
    A new intensity-scale approach for the
    verification of spatial precipitation forecasts.
    Meteor. Appl., 11, 141-154.
  • Casati and Coauthors, 2008 Forecast
    verification Current status and future
    directions. Meteor. Appl., 15, 3-18.
  • Baldwin, M. E., S. Lakshmivarahan, and J. S.
    Kain, 2001 Verification of mesoscale features in
    NWP models. Preprints, Ninth Conf. on Mesoscale
    Processes, Fort Lauderdale, FL, Amer. Meteor.
    Soc., 255-258.
  • Ebert, E. E., and J. L. McBride, 2000
    Verification of precipitation in weather systems
    Determination of systematic errors. J. Hydrol.,
    239, 179202.
  • Ebert, E. E., 2009 Neighborhood verification A
    strategy for rewarding close forecasts. Wea.
    Forecasting, 24, 1498-1510.
  • Davis, C. A., B. Brown, and R. Bullock, 2006
    Object-based verification of precipitation
    forecasts. Part II Application to convective
    rain systems. Mon. Wea. Rev., 134, 17851795.

8
Neighborhood ETS applied to CAMs
  • What was done and why
  • Examined a large set of convection-allowing
    forecasts run by NCAR 2004-2008 and compared them
    to operational NAM.
  • Earlier work (e.g., Done et al. 2004) showed
    skill scores between coarser models ( 10 km) and
    convection-allowing NCAR-WRF almost identical.
  • This contradicted other findings

Done et al. 2004 Fig 5
9
Neighborhood ETS (cont)
  • Perhaps lack of differences was simply a result
    of inadequate traditional metrics.
  • When forecasts contain fine-scale and
    high-amplitude features slight displacement
    errors cause double penalties
    observed-but-not-forecast and forecast-but-not-obs
    erved
  • Thus, many recent studies have developed
    alternative metrics
  • Feature based (Ebert and McBride 2000 Davis et
    al. 2006)
  • Scale-decomposition (Casati et al. 2004)
  • Neighborhood-based (Roberts and Lean 2008 Ebert
    2009)

10
Neighborhood ETS (cont)
  • To compare the NCAR-WRF and NAM forecast a
    neighborhood-based Equitable Threat Score (ETS)
    was developed.
  • Traditional ETS is formulated in terms of
    contingency table elements

and is interpreted as fraction of correctly
predicted observed events, adjusted for hits
associate with random chance.
  • Neighborhood ETS is computed in terms of
    specified radii this study uses 20 to 250 km.

11
r 100 km, 81 total grid-points
Open circles observed events Crosses
forecast events Grey shading - hits
12
Neighborhood ETS (cont)
  • To compute neighborhood ETS
  • Simply redefine contingency table elements in
    terms of r
  • Hits (traditional) correct forecast of an event
    at a grid-point
  • Hits (neighborhood) an event is forecast at a
    grid-point and observed at any grid-point within
    r, or an event is observed at a grid-point and
    forecast at any grid-point within r.
  • Misses (traditional) event is observed at a
    grid-point but not forecast
  • Misses (neighborhood) event is observed at a
    grid-point but not forecast at any grid-point
    within r.
  • False Alarms (traditional) event is forecast at
    a grid-point but not observed
  • False Alarms (neighborhood) event is forecast
    at a grid-point but not observed at any
    grid-point within r.
  • Correct negatives (traditional and neighborhood)
    event is neither forecast nor observed at a
    grid-point

13
Neighborhood ETS (cont)
  • Domain and cases

14
Neighborhood ETS (cont.)
  • Model configurations

15
Neighborhood ETS (cont.)
  • Results 2004-2005

16
Neighborhood ETS (cont.)
  • Results 2007-2008

17
Neighborhood ETS (cont.)
  • Time series at constant radii

18
An aside computing statistical significance for
categorical metrics
  • First, what are the ways to compute averages of
    categorical metrics over a set of cases?
  • 1) Compute the metric for each case and then
    average over all cases.
  • 2) Sum contingency table elements over all cases,
    and then compute the metric from the summed
    elements.

19
Hamill, T. M., 1999 Hypothesis tests for
evaluation numerical precipitation forecasts.
Wea. Forecasting, 14, 155-167.
  • You have to be very careful computing statistical
    significance for precipitation forecasts! Key
    quote Precipitation forecast errors are often
    non-normally distributed and have spatially
    and/or temporally correlated error the effective
    number of independent samples is much less than
    the total number of grid-points.
  • What does this mean??
  • Hamill outlines several potential approaches for
    computing significance for threat scores. He
    finds a rather involved resampling approach is
    most appropriate for non-probabilistic forecasts.

20
Resampling
Forecast 1 Forecast 2
Case 1 a, b, c, d a, b, c, d
Case 2 a, b, c, d a, b, c, d
Case 3 a, b, c, d a, b, c, d
Case 4 a, b, c, d a, b, c, d
Case 5 a, b, c, d a, b, c, d
Case 6 a, b, c, d a, b, c, d
Case 7 a, b, c, d a, b, c, d
ETS ETS
Forecast 1 Forecast 2
Case 1 a, b, c, d a, b, c, d
Case 2 a, b, c, d a, b, c, d
Case 3 a, b, c, d a, b, c, d
Case 4 a, b, c, d a, b, c, d
Case 5 a, b, c, d a, b, c, d
Case 6 a, b, c, d a, b, c, d
Case 7 a, b, c, d a, b, c, d
ETS ETS
Randomly shuffle the sets of contingency table
elements for each case and recompute ETS
difference. Repeat 1000 x. Construct a
distribution of ETS differences and test whether
test statistic falls within distribution. Dont
forget about bias!!
Difference between unshuffled ETS is test
statistic
21
Neighborhood ETS Compositing
  • OFi is observed frequency for each grid-point
    within a conditional distribution of observed
    events.
  • N is of grid-points in domain
  • n is of grid-points in radius-of-influence

22
Neighborhood ETS (cont) Compositing
  • Composite observations relative to forecasts.

23
Neighborhood ETS (cont) compositing
24
Neighborhood ETS (cont)
  • Composite animations from SE2010 SSEF members
    with different microphysics

25
Neighborhood ETS (cont)
  • Conclusions
  • Computing ETS on raw grids gave small
    differences. When criteria for hits was relaxed
    using neighborhood approach, more dramatic
    differences were seen that better reflected
    overall subjective impressions of forecasts.

26
Probabilistic Precipitation Forecast Skill as a
function of ensemble size and spatial scale in a
convection-allowing ensemble
  • Adam J. Clark NOAA/National Severe Storms
    Laboratory
  • John S. Kain, David J. Stensrud, Ming Xue, Fanyou
    Kong, Michael C. Coniglio, Kevin W. Thomas,
  • Yunheng Wang, Keith Brewster, Jidong Gao, Steven
    J. Weiss, and Jun Du
  • 13 October 2010
  • 25th Conference on Severe Local Storms, Denver, CO

27
Introduction/motivation
  • Basic idea
  • Convection-allowing ensembles ( 4km
    grid-spacing) provide added value relative to
    operational systems that parameterize convection.
  • Precipitation forecasts are improved,
  • diurnal rainfall cycle is better depicted,
  • ensemble dispersion is more representative of
    forecast uncertainty, and
  • explicitly simulated storm attributes can provide
    useful information on potential
    convection-related hazards
  • However, convection-allowing models are very
    computationally expensive relative to current
    operational mesoscale ensembles.
  • EXAMPLE Back of the envelope calculation for
    expense to make SREF convection-allowing
  • 32 km to 4 km is decrease in grid-spacing by a
    factor of 8. To account for 3D increase in of
    grid-points and time-step reduction, take 83
    which gives increase in computational expense by
    factor of 500. Then take 500 times 21 members
    10,500.

28
Introduction/motivation (cont)
  • Given computational expense, it would be useful
    to explore whether point of diminishing returns
    is reached with increasing ensemble size, n.
  • 2009 Storm-scale ensemble forecast (SSEF) system
    provides good opportunity to examine the issue
    because it is relatively large
  • 20 members 17 members used herein
  • Decent number of warm season cases (25).

29
Data and Methodology
  • 2009 SSEF system configuration

10 WRF-ARW members
8 WRF-NMM members
2 ARPS members
30
Data and Methodology (cont)
  • Domain and cases examined

SSEF domain
analysis domain
31
Data and Methodology (cont)
  • 6-hr accumulated rainfall examined for forecast
    hour 6-30.
  • Stage IV rainfall estimates used for rainfall
    observations.
  • Forecast probabilities for rainfall exceeding
    0.10-, 0.25-, 0.50-, and 1.00-in were computed by
    find the location of the verification threshold
    within the distribution of ensemble members.
  • Area under the relative operating characteristic
    curve (ROC area) used for objective evaluation.
  • 1.0 is perfect, 0.5 is no skill, and below 0.50
    is negative skill
  • ROC areas were computed for 100 unique
    combinations of randomly selected ensemble
    members for n 2 to 15. For n 1, 16, and 17
    all possible combinations of members used.

32
Data and Methodology (cont)
  • Examination of different spatial scales.
  • Observed and forecast precipitation averaged over
    increasingly large spatial scales. Why do this?
    Not all end users require accuracy at 4-km
    scales

- For the verification, constant quantiles were
used to compare across different scales. Cannot
compare constant thresholds because precipitation
distribution changes.
33
Results
  • Panels show ROC areas as function of n.
  • Different colors of shading correspond to
    different scales. Areas within each color show
    the inter-quartile range for the distribution of
    ROC areas (recall, for each n, ROC areas were
    computed for 100 unique combinations of members).
  • For each color, the darker shading denotes ROC
    areas that are less than those of the full 17
    member ensemble with differences that are
    statistically significant.
  • Main result More members are required to reach
    statistically indistinguishable ROC areas
    relative to the full ensemble as forecast lead
    time increases and spatial scale decreases (and
    ROC areas arent bad).

34
RESULTS (cont)
  • Interpretation
  • Rise in ROC area reflects gain in skill as
    forecast PDF is better sampled.
  • More members are required to effectively sample a
    wider forecast PDF, so the n at which we see
    skill flatten should increase with a broadening
    PDF. Two variables in the analysis are
    associated with a widening of the PDF
  • 1) increasing forecast lead time (because
    model/analysis errors grow)
  • 2) decreasing spatial scale (because errors grow
    faster at smaller scales)
  • Caveats
  • 1) Averages are presented. Specific cases with
    lower than average predictability (higher spread)
    should require more members to reach point of
    diminishing returns. Low probability events
    require more members to achieve reliable
    forecasts (e.g. Richardson 2001).

35
RESULTS (cont)
  • More Caveats
  • Ensemble is under-dispersive.
  • A reliable ensemble with more spread would
    require more members to effectively sample the
    forecast PDF.

36
CONCLUSIONS
  • Spatial scale and forecast lead time needed for
    end users should be carefully considered in
    future convection-allowing ensemble designs.
  • Future work needed to improve reliability of
    convection-allowing ensembles, and further
    evaluations are needed for weather regimes with
    varying degrees of predictability.
  • QUESTIONS??
Write a Comment
User Comments (0)
About PowerShow.com