Title: Forecast verification: deterministic aspects
1Forecast verification deterministic aspects
Anna Ghelli, ECMWF
2SUMMARY
- What is forecast verification?
- Why do we need to verify?
- Verification against analysis
- Verification against observations
- Conclusions
3What is forecast verification?
- FORECAST ? prediction of the future state of the
weather, stock market prices, etc. - FORECAST VERIFICATION ? process of assessing the
quality of a forecast. - The forecast is verified against a corresponding
observation (or a good estimate of the true
outcome) that describes the status of the
atmosphere at the forecast verification time - The verification can be
- Qualitative it will answer to questions like
"does my forecast look right? - Quantitative it will answer to questions like
"how accurate was my forecast?".
4Why verify?
- A forecasting system should include
verification procedures - to monitor forecast quality forecast accuracy
and improvements - to improve forecast quality understand system
failures and improve them. - to compare the quality of different forecast
systems compare the current system to an
experimental setting or other centres forecasting
systems to improve the current model.
5(No Transcript)
6Forecast quality and forecast value
- A forecast has high QUALITY if it predicts the
observed conditions well according to some
objective or subjective criteria. - A forecast has VALUE if it helps the user to make
a better decision. - A forecast has poor quality if it predicts the
development of isolated thunderstorms in a
particular region. Thunderstorms are indeed
observed in the region but not in the particular
spots suggested by the model. This forecast might
be very valuable to the forecaster in issuing a
public weather warning for the area. - An example of a forecast with high quality but
little value is a forecast of clear skies over
the Sahara Desert during the dry season.
7Do observations represent the "truth"?
- Rain gauge measurements, temperature
observations, satellite-derived cloud cover and
analyses provide a description of the atmosphere
that can be used for our verifications. - Are those observations/analyses telling us the
exact truth? - random and bias errors in the measurements
themselves, - sampling error
- analysis error when the observational data are
analysed or otherwise altered to match the scale
of the forecast. - If the errors in the truth data are much smaller
than the expected error in the forecast (high
signal to noise ratio) they can be ignored. - Skewed or under-sampled verification data can
give some insight on the forecasts quality when
different forecast methods are compared. - Knowing the pitfalls of our observing system can
enable us to optimally use the information that
we obtain from our verifications.
8Pooling versus stratifying results
- The larger the number of forecast/observations
pairs (samples), the more reliable the
verification results. To accomplish this samples
may be pooled over time and/or space. - Pooling samples can mask differences in forecast
performance when the data are not homogeneous. - Results can be biased toward the most commonly
sampled regime ( i.e. days with no severe
weather). - Stratifying the samples into quasi-homogeneous
subsets (by season, by geographical region, by
intensity of the observations, etc.) helps to
highlight forecast behaviour during particular
regimes (i.e. rainy season in Europe, monsoon
period in India). - Subsets must contain enough samples to give
trustworthy verification results!
9Verification forecast vs analysiscontinuous
variables
- Root Mean Square Error measures the average
error, weighted according to the square of the
error. It puts greater emphasis on large errors.
Range 0 to infinity, perfect score 0 - Anomaly Correlation Coefficient measures
correspondence or phase difference between
forecast and analysis, taking away the
climatology (c) at each point. Range -100 to
100, perfect score 100
10 Verification forecast vs observationscontinuou
s variables
- BIAS measures the average difference between
forecast and observation. Range minus infinity
to plus infinity, perfect score 0 - Mean Absolute Error measures the average
magnitude of the difference between forecast and
observation. Range 0 to infinity, perfect score
0
11Verification dichotomous forecasts
- Equitable Threat Score measures the fraction of
observed and/or forecast events that are
correctly predicted, adjusted for hits associated
to random chance. Range 1/3 to 1, perfect score
1, no skill 0 - Frequency Bias Index measures the ratio of the
frequency of forecast events to the frequency of
observed events. Range 0 to infinity, perfect
score 1
12D3 MSL FORECAST VT 12.March.2002
-- MSL ANALYSIS -- 12.March.2002
13Timeseries of Anomaly Correlation Coefficient
(ACC)
ACC is usually expressed as a percentage, with
100 being perfect. Evaluation of forecast
charts have shown that 60 is about the lower
limit for a forecast to provide useful
guidance. Here we see a timeseries of ACC for
the t72 forecast for different models. ECMWF
forecast (red curve) shows an ACC around 90-95 .
For the t72 forecast started on the 9th March
2002, ACC drops to 70. The forecast has
displaced the low in the Norwegian Sea
Mean Sea Level Pressure D3 forecast range
14-- Anomaly Correlation Coefficient
geopotential height Z at 500hPa -- forecast
range D10
Daily variability of the ACC for the summer
seasons
15Anomaly Correlation Coefficient vs forecast day
The ACC plotted against the forecast range gives
some indications on the skill of the forecasting
system. The interception between the ACC curve
and the 60 line, indicates the range up to which
the forecast gives an useful guidance. The
diagram shows the ACC value at each forecast day
for the last winter period (Dec. 2002 to Feb.
2003). The ACC is calculated for the Northern
Hemisphere and the variable under consideration
is the geopotential at 500 hPa.
16Anomaly Correlation Coefficient
The ACC curve will cross the 60, 65, 70, 75,
80, 85 and 90 line at a different forecast
day. The forecast system will improve if for the
same forecast day the crossing happens at higher
and higher values of ACC.
The example shows the gain in predictability of
the ECMWF model. Back in in 80s the line of
ACC90 was crossed at D2.5. In 2002 the same
line was crossed at D3.5
ACC60
ACC90
17RMSE vs forecast day Verification against
analysis
RMSE measures the average error, weighted
according to the square of the error. The RMSE
puts more influence on large error. It ranges
from 0 to infinity and the perfect score is
0 The diagram shows the RMSE as a function of
the forecast day for different forecasting
systems. As the forecast range increases, the
RMSE gets larger.
18Time series of RMSE for z500 verification
against analysis
RMSE for different models and three forecast
ranges RMSE is larger during winter months,
while it reaches the lowest values in the
summers RMSE is larger for longer model
integration times
144
96
48
19Verification against observations
- MAIN ISSUES
- Spatial scales of models and observations
- Interpolation methods
- Error associated to observations
- Sampling
20Parameterisation
verification
Theories
21EUROPE -- 2 metre temperature
The bias gives an indication of the average
difference between forecast and observations. The
forecast is (bi-linearly) interpolated to the
station location. The top panel shows the
timesteps verifying at 00UTC, while the bottom
panel is relative to the timesteps verifying at
12UTC. In general the bias is positive from
spring to autumn and negative in the winter
months.
22EUROPE 2m Temperature
The skill score measures the improvement of the
forecasting system over a reference
forecast The reference system is the
persistence the previous day observed
temperature. Its range is from infinity to 100.
The perfect forecast will score 100, while 0
indicates no improvements in skill over the
reference forecast. The top panel shows the RMSE
SKILL for time steps verifying at 00UTC, while
the bottom panel is for timesteps verifying at
12UTC Autumn to winter temperature more skilful
than summer
23N. America 2m Temperature
Timeseries of BIAS (top) can be filtered to see
any trend. The filter used in this specific case
is a 12-month filter. One needs to be careful
about the bias, as one can get a perfect score
with a bad forecast, as long as there are
compensating errors. The timeseries of RMSE
(bottom) are filtered (12 months). A particular
emphasis is posed on large errors. A slow
increase of the BIAS while the RMSE decreases of
about a degree.
24EUROPE -- 6h accumulated precipitation
The timeseries of forecast error, show a negative
bias during the summer months (under-forecast
amounts of rain) for the timesteps verifying at
00UTC Over-forecasting of the precipitation
amount is particularly evident in the summer
months, for forecasts verifying at 12UTC. A
significant impact has been produced by the
introduction of 60 levels in the vertical
(increased vertical resolution) and a change in
the cloud scheme (October 1999)
Revised convection scheme
L60 and change in cloud scheme
25SOUTH-EAST ASIA 6-h accumulated precipitation
The Mean Absolute Error measures the average
magnitude of the difference between the forecast
and the observation The timeseries of 6h
precipitation mean absolute error are filtered
(12 months) to indicate the possible presence of
trends in the skill of the forecasting
system. Improvements are small, but evident in
the t66 and t72 forecasts
26Verifications for dichotomous forecasts
Score for the 24 hour accumulated precipitation
over Europe. The scores have been calculated for
different forecast ranges and thresholds The
observed precipitation is from the SYNOP data
available on the GTS
Stations used in the calculation of the threat
scores.
27EUROPE 24h accumulated precipitation
A forecast of a continuous variable can be
reduced to a dichotomous forecast (yes/no event)
if thresholds are chosen. The ETS timeseries is
an example. Two different thresholds are shown
1mm/24h and 5mm/24h. The ETS looks like this
It is evident that in the ten year, the ETS
shows a gain of 1 day in predictability.
28EUROPE 24h accumulated precipitation
Timeseries of FBI for two different
thresholds. The FBI looks like this The FBI
measures the ratio between the frequency of the
forecast events and the frequency of the observed
events. If the forecast under-estimates the
number of events, then the FBI lt 1, if the
forecast over-estimates, then FBI gt
1 Overestimation of number of events decreases
in the late part of 1999 (more evident in the
5mm/24h threshold)
29A different perspective
- The model will not produce exact results for
scales smaller than its own spatial scale. - Until now comparisons were carried out
interpolating the model forecast value for a
given grid-point to a station location. - Precipitation shows large variability and the
precipitation amount measured in a specific
location, may not be representative of an area
(under sampling) - Precipitation forecast should be interpreted as
an areal value rather than a point value. - High resolution network stations used to produce
mean values of precipitation to be attributed to
each grid-point. Such values are then compared to
the model forecast. Up-scaling of the
information contained in the observations to make
comparisons that are fairer to model
30GTS-SYNOP
- The Up-scaling technique
- There are many methods available to up-scale
observations to the model resolution - We have used a simple averaging procedure of all
the observations contained in a model gridbox - Alps SYNOP coverage, high-density observations
and up-scaled observed values for Sept. 20, 1999
High density obs
Up-scaled obs
31UP-SCALING PROS and CONS
- Can be used to produce areal quantities
- Model independent
- Can be gridded
- Up-scaling methods can smooth out maxima of
precipitation - They may be difficult to obtain
- There may be inconsistencies in the observation
networks over the years
The forecast is shaded (as per legend) and the
observations are the small numbers
32Verification using the Up-scaled observations
Time series of FBI FBI is calculated over a
three-month period (standard seasons) and
averaged over the French territory. Météo-France
provided the high resolution observations FBI gt
1 indicates over-forecasting of the event number
FBI lt 1 indicates under-forecasting of the
event number
33Precipitation forecast verification using the
up-scaled observations
The precipitation forecast is verified for a
river catchement area (Douro/Duero) The model
horizontal resolution is 40km The catchment
area contains 400 high resolution stations and
only 11 GTS SYNOPs stations. There are 72
grid-points in the area. The Spanish and
Portuguese National Meteorological Offices
provided the high-resolution observations
34Does my forecast look right?
Weather event a few days of consecutive rain at
the end of February/ beginning of March 2001 on
the river catchement area lead to floods. The
observed precipitation (red curve) is accumulated
from the beginning to the end of the forecast
period and compared to the forecast values (black
curve) The forecast started on February 25, 2001
follows the observed curve. The forecast started
on February 26, 2001 has a clear delay and weaker
intensity precipitation.
35What kind of error can I expect?
The daily absolute errors of the whole period (
January 2001 to March 2002) are considered. Each
day the absolute error is averaged over the
catchment area. Accumulation over a period of 1
day and 5 days Box and whiskers diagram gives an
idea of the spread of the absolute error for
different precipitation classes Spread small for
lower amounts of precipitation and short
accumulation periods. Could be stratified by
seasons
36What kind of error can I expect?
- MAE grows with class and accumulation period
- Mean bias is below 2mm the model in general
overestimates the amounts of precipitation - Sample size problems from the class 10mm and above
1 ? 1-day accumulation 3 ? 3-day accumulation 5
? 5-day accumulation
37Conclusion
- Verification as a way to establish strengths and
weaknesses of the forecasting system, assess
quality of forecast and improve the forecast
model. - There are different verification techniques. A
specific type of forecast may require a specific
verification method. - One can verify against observations or against
analysis. - When verifying against observations there are
some issues to take into consideration - Spatial and temporal scales
- Interpolation
- Error associated to the observations
- Up-scaling of obs (using high resolution network
of meteorological stations) makes comparisons
fairer to model.