Forecast verification: deterministic aspects - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Forecast verification: deterministic aspects

Description:

Anomaly Correlation Coefficient: measures correspondence or phase difference ... Anomaly Correlation Coefficient geopotential height Z at 500hPa -- forecast ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 38
Provided by: ecm1
Category:

less

Transcript and Presenter's Notes

Title: Forecast verification: deterministic aspects


1
Forecast verification deterministic aspects
Anna Ghelli, ECMWF
2
SUMMARY
  • What is forecast verification?
  • Why do we need to verify?
  • Verification against analysis
  • Verification against observations
  • Conclusions

3
What is forecast verification?
  • FORECAST ? prediction of the future state of the
    weather, stock market prices, etc.
  • FORECAST VERIFICATION ? process of assessing the
    quality of a forecast.
  • The forecast is verified against a corresponding
    observation (or a good estimate of the true
    outcome) that describes the status of the
    atmosphere at the forecast verification time
  • The verification can be
  • Qualitative it will answer to questions like
    "does my forecast look right?
  • Quantitative it will answer to questions like
    "how accurate was my forecast?".

4
Why verify?
  • A forecasting system should include
    verification procedures
  • to monitor forecast quality forecast accuracy
    and improvements
  • to improve forecast quality understand system
    failures and improve them.
  • to compare the quality of different forecast
    systems compare the current system to an
    experimental setting or other centres forecasting
    systems to improve the current model.

5
(No Transcript)
6
Forecast quality and forecast value
  • A forecast has high QUALITY if it predicts the
    observed conditions well according to some
    objective or subjective criteria.
  • A forecast has VALUE if it helps the user to make
    a better decision.
  • A forecast has poor quality if it predicts the
    development of isolated thunderstorms in a
    particular region. Thunderstorms are indeed
    observed in the region but not in the particular
    spots suggested by the model. This forecast might
    be very valuable to the forecaster in issuing a
    public weather warning for the area.
  • An example of a forecast with high quality but
    little value is a forecast of clear skies over
    the Sahara Desert during the dry season.

7
Do observations represent the "truth"?
  • Rain gauge measurements, temperature
    observations, satellite-derived cloud cover and
    analyses provide a description of the atmosphere
    that can be used for our verifications.
  • Are those observations/analyses telling us the
    exact truth?
  • random and bias errors in the measurements
    themselves,
  • sampling error
  • analysis error when the observational data are
    analysed or otherwise altered to match the scale
    of the forecast.
  • If the errors in the truth data are much smaller
    than the expected error in the forecast (high
    signal to noise ratio) they can be ignored.
  • Skewed or under-sampled verification data can
    give some insight on the forecasts quality when
    different forecast methods are compared.
  • Knowing the pitfalls of our observing system can
    enable us to optimally use the information that
    we obtain from our verifications.

8
Pooling versus stratifying results
  • The larger the number of forecast/observations
    pairs (samples), the more reliable the
    verification results. To accomplish this samples
    may be pooled over time and/or space.
  • Pooling samples can mask differences in forecast
    performance when the data are not homogeneous.
  • Results can be biased toward the most commonly
    sampled regime ( i.e. days with no severe
    weather).
  • Stratifying the samples into quasi-homogeneous
    subsets (by season, by geographical region, by
    intensity of the observations, etc.) helps to
    highlight forecast behaviour during particular
    regimes (i.e. rainy season in Europe, monsoon
    period in India).
  • Subsets must contain enough samples to give
    trustworthy verification results!

9
Verification forecast vs analysiscontinuous
variables
  • Root Mean Square Error measures the average
    error, weighted according to the square of the
    error. It puts greater emphasis on large errors.
    Range 0 to infinity, perfect score 0
  • Anomaly Correlation Coefficient measures
    correspondence or phase difference between
    forecast and analysis, taking away the
    climatology (c) at each point. Range -100 to
    100, perfect score 100

10
Verification forecast vs observationscontinuou
s variables
  • BIAS measures the average difference between
    forecast and observation. Range minus infinity
    to plus infinity, perfect score 0
  • Mean Absolute Error measures the average
    magnitude of the difference between forecast and
    observation. Range 0 to infinity, perfect score
    0

11
Verification dichotomous forecasts
  • Equitable Threat Score measures the fraction of
    observed and/or forecast events that are
    correctly predicted, adjusted for hits associated
    to random chance. Range 1/3 to 1, perfect score
    1, no skill 0
  • Frequency Bias Index measures the ratio of the
    frequency of forecast events to the frequency of
    observed events. Range 0 to infinity, perfect
    score 1



12
D3 MSL FORECAST VT 12.March.2002
-- MSL ANALYSIS -- 12.March.2002
13
Timeseries of Anomaly Correlation Coefficient
(ACC)
ACC is usually expressed as a percentage, with
100 being perfect. Evaluation of forecast
charts have shown that 60 is about the lower
limit for a forecast to provide useful
guidance. Here we see a timeseries of ACC for
the t72 forecast for different models. ECMWF
forecast (red curve) shows an ACC around 90-95 .
For the t72 forecast started on the 9th March
2002, ACC drops to 70. The forecast has
displaced the low in the Norwegian Sea
Mean Sea Level Pressure D3 forecast range
14
-- Anomaly Correlation Coefficient
geopotential height Z at 500hPa -- forecast
range D10
Daily variability of the ACC for the summer
seasons
15
Anomaly Correlation Coefficient vs forecast day
The ACC plotted against the forecast range gives
some indications on the skill of the forecasting
system. The interception between the ACC curve
and the 60 line, indicates the range up to which
the forecast gives an useful guidance. The
diagram shows the ACC value at each forecast day
for the last winter period (Dec. 2002 to Feb.
2003). The ACC is calculated for the Northern
Hemisphere and the variable under consideration
is the geopotential at 500 hPa.
16
Anomaly Correlation Coefficient
The ACC curve will cross the 60, 65, 70, 75,
80, 85 and 90 line at a different forecast
day. The forecast system will improve if for the
same forecast day the crossing happens at higher
and higher values of ACC.
The example shows the gain in predictability of
the ECMWF model. Back in in 80s the line of
ACC90 was crossed at D2.5. In 2002 the same
line was crossed at D3.5
ACC60
ACC90
17
RMSE vs forecast day Verification against
analysis
RMSE measures the average error, weighted
according to the square of the error. The RMSE
puts more influence on large error. It ranges
from 0 to infinity and the perfect score is
0 The diagram shows the RMSE as a function of
the forecast day for different forecasting
systems. As the forecast range increases, the
RMSE gets larger.
18
Time series of RMSE for z500 verification
against analysis
RMSE for different models and three forecast
ranges RMSE is larger during winter months,
while it reaches the lowest values in the
summers RMSE is larger for longer model
integration times
144
96
48
19
Verification against observations
  • MAIN ISSUES
  • Spatial scales of models and observations
  • Interpolation methods
  • Error associated to observations
  • Sampling

20
Parameterisation
verification
Theories
21
EUROPE -- 2 metre temperature
The bias gives an indication of the average
difference between forecast and observations. The
forecast is (bi-linearly) interpolated to the
station location. The top panel shows the
timesteps verifying at 00UTC, while the bottom
panel is relative to the timesteps verifying at
12UTC. In general the bias is positive from
spring to autumn and negative in the winter
months.
22
EUROPE 2m Temperature
The skill score measures the improvement of the
forecasting system over a reference
forecast The reference system is the
persistence the previous day observed
temperature. Its range is from infinity to 100.
The perfect forecast will score 100, while 0
indicates no improvements in skill over the
reference forecast. The top panel shows the RMSE
SKILL for time steps verifying at 00UTC, while
the bottom panel is for timesteps verifying at
12UTC Autumn to winter temperature more skilful
than summer
23
N. America 2m Temperature
Timeseries of BIAS (top) can be filtered to see
any trend. The filter used in this specific case
is a 12-month filter. One needs to be careful
about the bias, as one can get a perfect score
with a bad forecast, as long as there are
compensating errors. The timeseries of RMSE
(bottom) are filtered (12 months). A particular
emphasis is posed on large errors. A slow
increase of the BIAS while the RMSE decreases of
about a degree.
24
EUROPE -- 6h accumulated precipitation
The timeseries of forecast error, show a negative
bias during the summer months (under-forecast
amounts of rain) for the timesteps verifying at
00UTC Over-forecasting of the precipitation
amount is particularly evident in the summer
months, for forecasts verifying at 12UTC. A
significant impact has been produced by the
introduction of 60 levels in the vertical
(increased vertical resolution) and a change in
the cloud scheme (October 1999)
Revised convection scheme
L60 and change in cloud scheme
25
SOUTH-EAST ASIA 6-h accumulated precipitation
The Mean Absolute Error measures the average
magnitude of the difference between the forecast
and the observation The timeseries of 6h
precipitation mean absolute error are filtered
(12 months) to indicate the possible presence of
trends in the skill of the forecasting
system. Improvements are small, but evident in
the t66 and t72 forecasts
26
Verifications for dichotomous forecasts
Score for the 24 hour accumulated precipitation
over Europe. The scores have been calculated for
different forecast ranges and thresholds The
observed precipitation is from the SYNOP data
available on the GTS
Stations used in the calculation of the threat
scores.
27
EUROPE 24h accumulated precipitation
A forecast of a continuous variable can be
reduced to a dichotomous forecast (yes/no event)
if thresholds are chosen. The ETS timeseries is
an example. Two different thresholds are shown
1mm/24h and 5mm/24h. The ETS looks like this
It is evident that in the ten year, the ETS
shows a gain of 1 day in predictability.
28
EUROPE 24h accumulated precipitation
Timeseries of FBI for two different
thresholds. The FBI looks like this The FBI
measures the ratio between the frequency of the
forecast events and the frequency of the observed
events. If the forecast under-estimates the
number of events, then the FBI lt 1, if the
forecast over-estimates, then FBI gt
1 Overestimation of number of events decreases
in the late part of 1999 (more evident in the
5mm/24h threshold)
29
A different perspective
  • The model will not produce exact results for
    scales smaller than its own spatial scale.
  • Until now comparisons were carried out
    interpolating the model forecast value for a
    given grid-point to a station location.
  • Precipitation shows large variability and the
    precipitation amount measured in a specific
    location, may not be representative of an area
    (under sampling)
  • Precipitation forecast should be interpreted as
    an areal value rather than a point value.
  • High resolution network stations used to produce
    mean values of precipitation to be attributed to
    each grid-point. Such values are then compared to
    the model forecast. Up-scaling of the
    information contained in the observations to make
    comparisons that are fairer to model

30
GTS-SYNOP
  • The Up-scaling technique
  • There are many methods available to up-scale
    observations to the model resolution
  • We have used a simple averaging procedure of all
    the observations contained in a model gridbox
  • Alps SYNOP coverage, high-density observations
    and up-scaled observed values for Sept. 20, 1999

High density obs
Up-scaled obs
31
UP-SCALING PROS and CONS
  • Can be used to produce areal quantities
  • Model independent
  • Can be gridded
  • Up-scaling methods can smooth out maxima of
    precipitation
  • They may be difficult to obtain
  • There may be inconsistencies in the observation
    networks over the years

The forecast is shaded (as per legend) and the
observations are the small numbers
32
Verification using the Up-scaled observations
Time series of FBI FBI is calculated over a
three-month period (standard seasons) and
averaged over the French territory. Météo-France
provided the high resolution observations FBI gt
1 indicates over-forecasting of the event number
FBI lt 1 indicates under-forecasting of the
event number
33
Precipitation forecast verification using the
up-scaled observations
The precipitation forecast is verified for a
river catchement area (Douro/Duero) The model
horizontal resolution is 40km The catchment
area contains 400 high resolution stations and
only 11 GTS SYNOPs stations. There are 72
grid-points in the area. The Spanish and
Portuguese National Meteorological Offices
provided the high-resolution observations
34
Does my forecast look right?
Weather event a few days of consecutive rain at
the end of February/ beginning of March 2001 on
the river catchement area lead to floods. The
observed precipitation (red curve) is accumulated
from the beginning to the end of the forecast
period and compared to the forecast values (black
curve) The forecast started on February 25, 2001
follows the observed curve. The forecast started
on February 26, 2001 has a clear delay and weaker
intensity precipitation.
35
What kind of error can I expect?
The daily absolute errors of the whole period (
January 2001 to March 2002) are considered. Each
day the absolute error is averaged over the
catchment area. Accumulation over a period of 1
day and 5 days Box and whiskers diagram gives an
idea of the spread of the absolute error for
different precipitation classes Spread small for
lower amounts of precipitation and short
accumulation periods. Could be stratified by
seasons
36
What kind of error can I expect?
  • MAE grows with class and accumulation period
  • Mean bias is below 2mm the model in general
    overestimates the amounts of precipitation
  • Sample size problems from the class 10mm and above

1 ? 1-day accumulation 3 ? 3-day accumulation 5
? 5-day accumulation
37
Conclusion
  • Verification as a way to establish strengths and
    weaknesses of the forecasting system, assess
    quality of forecast and improve the forecast
    model.
  • There are different verification techniques. A
    specific type of forecast may require a specific
    verification method.
  • One can verify against observations or against
    analysis.
  • When verifying against observations there are
    some issues to take into consideration
  • Spatial and temporal scales
  • Interpolation
  • Error associated to the observations
  • Up-scaling of obs (using high resolution network
    of meteorological stations) makes comparisons
    fairer to model.
Write a Comment
User Comments (0)
About PowerShow.com