Title: Ensemble Forecasting, Forecast Calibration, and Evaluation
1Ensemble Forecasting, Forecast Calibration, and
Evaluation
- Tom Hamill
- tom.hamill_at_noaa.gov
- www.cdc.noaa.gov/people/tom.hamill
2Topics
- Current state-of-the-art in ensemble forecasting
(EF), and where are we headed? - Brief review of EF evaluation techniques
- Deficiencies in current approaches for
estimation/calibration/evaluation of EFs - New EF calibration techniques.
- Methods for summarizing probabilistic information
for end users and forecasters (time permitting).
3What does it take to get a perfect
probabilistic forecast from an ensemble?
- IF you start with an ensemble of initial
conditions that samples the distribution of
analysis uncertainty, and - IF your forecast model is perfect (error growth
only due to chaos), and - IF your ensemble is infinite in size,
- THEN probabilistic forecast perfect.
Big IFs
4Examining the IFs (1) Do we sample the
distribution of analysis uncertainty?
NCEP SREF rank histogram, 39-h forecast
Not enough spread in the forecasts. Consequently,
focus in 1st-generation EF systems is to design
initial perturbations that will grow different
from each other quickly, not on sampling
analysis-uncertainty distribution.
5The Breeding Technique (NCEP)
Breeding takes a pair of forecast perturbations
and periodically rescales them down and adds them
to the new analysis state. Perturbations only
grossly reflect analysis errors.
Wang and Bishop showed that a larger ensemble
formed from independent pairs of bred
members will be comprised of pairs that
have almost identical perturbation structures
6Singular Vectors (ECMWF)
Singular vectors (SVs) here indicate the field of
perturbations that are expected to grow the most
rapidly in a short-range forecast. The SV
structure depends upon a choice of norm. Here,
these leading three SVs have magnitude only in
one area over the globe.
7Singular vectors, continued
Case 1 (9 Jan 1993)
Case 2 (8 Feb 1993)
final initial20
SVs tend to have their initial perturbations in
the mid-troposphere, little amplitude near
surface or tropopause. If theyre meant to
sample analysis errors, are analysis errors
really near-perfect at these levels? As SVs
evolve, they grow to have amplitudes aloft and
near the surface. Expect SV ensemble forecast
spread to be unrepresentative in the early hours
of the forecast (e.g., spread of EFs too small
near the surface).
8A better way to construct initial conditions?
Ensemble-based data assimilation
Observations and error stats
First Guess 1
Analysis 1
Ensemble Data Assimilation
First Guess 2
Analysis 2
First Guess N
Analysis N
An ensemble of forecasts is used to define the
error statistics of the first-guess forecast. An
ensemble of analyses are produced. If designed
correctly, theyre sampling the analysis
uncertainty and can be used for initializing EFs
and are lower in error.
9Why might initial conditions from ensemble data
assimilation be more accurate?
Flow-dependent error statistics for the first
guess, improve the blending of observations and
forecasts.
10Example 500 hPa height analyses assimilating
only SfcP obs
Full NCEP-NCAR Reanalysis (3DVar) (120,000 obs)
Black dots show pressure ob locations
Ensemble Filter EnSRF (214 surface pressure obs)
RMS 39.8 m
Optimal Interpolation (214 surface pressure obs)
RMS 82.4 m (3D-Var worse)
11Perturb the land surface in EFs?
12Examining the IFs (2) Is the forecast model
anywhere close to perfect?
- Model errors
- due to limited resolution (truncation)
- due to parameterizations
- due to numerical methods choices
- etc.
- Manifestations
- biases, especially near the surface, and in
precipitation - slow growth of forecast differences among
ensemble members due to coarse grid spacing, less
scale-interaction.
13Dealing with model errors
(2) Introduce stochastic element into model
- Better models (4-km,
- 60-h WRF for Katrina)
(3) Multi-model ensembles
(4) Calibration
14Examining the IFs (3) Can we run a nearly
infinite ensemble?
- Clearly not CPU availability finite.
- -- large ensemble low resolution
- (small sampling error, larger model error).
- -- small ensemble higher resolution
- (large sampling error, smaller model error).
- -- optimal size/resolution tradeoff may be
different for different variables (large ensemble
for 500 hPa anomaly correlation, smaller ensemble
for fine-scale precipitation events).
15Probabilistic Forecast Verification
Relative Economic Value
Many well-established methods now wont review
here. Often problem-specific diagnostics are
needed to understand specific errors in
ensemble forecasts.
16A tool for exploring calibration issues CDCs
reforecast data set
- Definition a data set of retrospective numerical
forecasts using the same model to generate
real-time forecasts - Model T62L28 NCEP global forecast model, circa
1998 (http//www.cdc.noaa.gov/people/jeffrey.s.whi
taker/refcst for details). - Initial States NCEP-NCAR reanalysis plus 7 /-
bred modes (Toth and Kalnay 1993). - Duration 15 days runs every day at 00Z from
1978/11/01 to now. (http//www.cdc.noaa.gov/people
/jeffrey.s.whitaker/refcst/week2). - Data Selected fields (winds, hgt, temp on 5
press levels, precip, t2m, u10m, v10m, pwat,
prmsl, rh700, heating). NCEP/NCAR reanalysis
verifying fields included (Web form to download
at http//www.cdc.noaa.gov/reforecast).
17Why reforecast? Bias structure can be difficult
to evaluate with small forecast samples
18New calibration techniqueReforecasting with
analogs
19Analog calibration results
20What are appropriate methods for summarizing
probabilistic information for end users?
Forecasters?
- Box shows 25-75 range
- Whiskers show full range (or 95 after
calibration) - Central bar shows median
Confident cold spell
21Methods for summarizing probabilistic
information, continued
Probability Maps
22What isnt very helpful to the end forecaster
Spaghetti Diagram
Does widely spread spaghetti indicate unpredictabi
lity or slack gradient?
552
558
6 dm spread
564
570
552
558
6 dm spread
564
570
23Where EF is headed
- Use of ensembles in data assimilation / better
methods for initializing EFs - Sharing of data across countries (TIGGE) to do
multi-model ensembling - Higher-resolution ensembles with model- error
parameterizations - Improved calibration using reforecasts
- Increased use by sophisticated users (e.g.,
coupling into hydrologic models)
24References and notes
- Page 3 Getting a perfect probabilistic
forecast from an ensemble - Ehrendorfer, M., 1994 The Liouville equation
and its potential usefulness for the prediction
of forecast skill. Part 1 Theory. Mon. Wea.
Rev., 122, 703-713. - Page 4 Do we sample the distribution of
analysis uncertainty? - Rank histograms from NCEPs SREF web page,
http//wwwt.emc.ncep.noaa.gov/mmb/SREF/SREF.html - Page 5 The breeding technique
- Toth, Z. and E. Kalnay, 1997 Ensemble
forecasting at NCEP and the breeding method.
Mon. Wea. Rev., 125, 3297-3319. - Pages 6-7 Singular vectors.
- Buizza, R., and T. N. Palmer, 1995 The
singular-vector structure of the atmospheric
global circulation. J. Atmos. Sci., 52,
1434-1456. - Barkmeijer, J. M. Van Gijzen, and F. Bouttier,
1998 Singular vectors and estimates of the
analysis-error covariance metric. Quart. J.
Royal Meteor. Soc., 124, 1697-1713.
25- Pages 8-10 Ensemble-based data assimilation
- Hamill, T. M., 2005 Ensemble-based atmospheric
data assimilation. To appear in Predictability
of Weather and Climate, Cambridge Press, T. N.
Palmer and R. Hagedorn, eds. Available at
http//www.cdc.noaa.gov/people/tom.hamill/efda_rev
iew5.pdf . - Whitaker, J. S., G. P. Compo, X. Wei, and T. M.
Hamill, 2004 Reanalysis without radiosondes
using ensemble data assimilation. Mon. Wea.
Rev., 132, 1190-1200. - Page 11 Perturbing the land surface
- Sutton, C., T. M. Hamill, and T. T. Warner,
2005 Will perturbing soil moisture improve
warm-season ensemble forecasts? A proof of
concept. Mon. Wea. Rev., in review. Available
from http//www.cdc.noaa.gov/people/tom.hamill/lan
d_sfc_perts.pdf . - Pages 12-13 Model errors
- WRF forecast from http//wrf-model.org/plots/real
time_hurricane.php (try 2005-08-27, and rain
mixing ratio, 60-h forecast) - Stochastic element picture from Judith Berners
presentation at the ECMWF workshop on the
representation of sub-grid processes using
stochastic-dynamic models. http//www.ecmwf.int/ne
wsevents/meetings/workshops/2005/Sub-grid_Processe
s/Presentations.html . - Hagedorn, R., F. J. Doblas-Reyes, and T. N.
Palmer, 2005 The rationale behind the success
of multi-model ensembles in seasonal forecasting
I. Basic concept. Tellus, 57A, 219-233.
(multi-model ensemble picture) - Hamill, T. M., J. S. Whitaker, and S. L. Mullen,
2005 Reforecasts, an important data set for
improving weather predictions, Bull. Amer.
Meteor. Soc., in press. Available at
http//www.cdc.noaa.gov/people/tom.hamill/reforeca
st_bams4.pdf
26- Page 15, probabilistic forecast verification
- Hamill, T. M., 2001 Interpretation of rank
histograms for verifying ensemble forecasts.
Mon. Wea. Rev., 129, 550-560. - Wilks, D. S., 1995 Statistical Methods in the
Atmospheric Sciences. Cambridge Press, 467 pp
(section 7.4). - Mason, I. B., 1982 A model for the assessment
of weather forecasts. Aust. Meteor. Mag., 30,
291-303. (for the relative operating
characteristic). - Richardson, D. S. , 2000 Skill and relative
economic value of the ECMWF ensemble prediction
system. Quart. J. Royal Meteor. Soc., 126,
649-667. - Pages 16-19 Calibration using reforecasts.
- Hamill, T. M., J. S. Whitaker, and S. L. Mullen,
2005 Reforecasts, an important data set for
improving weather predictions, Bull. Amer.
Meteor. Soc., in press. Available at
http//www.cdc.noaa.gov/people/tom.hamill/reforeca
st_bams4.pdf - Page 20 Summarizing probabilistic information
for end users - From Ken Mylne, UK Met Office also ECMWF
newsletter 92, available from www.ecmwf.int . - Page 21 Summarizing continued, probability maps