Verification Introduction - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Verification Introduction

Description:

... IVP: Deterministic Scalar Measures ME: smallest; + and errors cancel MAE vs. RMSE: RMSE influenced by large errors for large events MAXERR: ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 38

Provided by: Hydrolo8

Learn more at: https://www.weather.gov

Category:

more less

Transcript and Presenter's Notes

Title: Verification Introduction

1
Verification Introduction Holly C.
Hartmann Department of Hydrology and Water
ResourcesUniversity of Arizona hollyoregon_at_juno.c
om
RFC Verification Workshop, 08/14/2007
2
Goals

General concepts of verification
Think about how to apply to your operations
Be able to respond to and influence NWS
verification program
Be prepared as new tools become available
Be able to do some of their own verification
Be able to work with researchers on verification
projects
Contribute to development of verification tools
(e.g., look at various options)
Avoid some typical mistakes

3
Agenda

Introduction to Verification
Applications, Rationale, Basic Concepts
Data Visualization and Exploration
Deterministic Scalar measures
2. Categorical measures KEVIN WERNER
Deterministic Forecasts
Ensemble Forecasts
3. Diagnostic Verification
Reliability
Discrimination
Conditioning/Structuring Analyses
4. Lab Session/Group Exercise
- Developing Verification Strategies
- Connecting to Forecast Operations and Users

4
Why Do Verification? It depends
Administrative logistics, selected quantitative
criteria Operations inputs, model states,
outputs, quick! Research sources of error,
targeting research Users making decisions,
exploit skill, avoid mistakes
Concerns about verification?
5
Need for Verification Measures

Verification statistics identify
accuracy of forecasts
sources of skill in forecasts
- sources of uncertainty in forecasts
- conditions where and when forecasts are
skillful or not skillful, and why
Verification statistics then can inform
improvements in terms of forecast skill and
decision making with alternate forecast sources
(e.g., climatology, persistence, new forecast
systems)

Adapted from Regonda, Demargne, and Seo, 2006
6
Skill versus Value
Assess quality of forecast system i.e. determine
skill and value of forecast
Credit Hagedorn (2006) and Julie Demargne
7
Stakeholder Use of HydroClimate Info Forecasts
Common across all groups Uninformed, mistaken
about forecast interpretation Use of forecasts
limited by lack of demonstrated forecast
skill Have difficulty specifying required accuracy
Common across many, but not all,
stakeholders Have difficulty distinguishing
between good bad products Have difficulty
placing forecasts in historical context
Unique among stakeholders Relevant forecast
variables, regions (location scale), seasons,
lead times, performance characteristics Technical
sophistication base probabilities,
distributions, math Role of of forecasts in
decision making
8
What is a Perfect Forecast?
Forecast evaluation concepts
All happy families are alike each unhappy
family is unhappy in its own way. -- Leo
Tolstoy (1876)
All perfect forecasts are alike each imperfect
forecast is imperfect in its own way. -- Holly
Hartmann (2002)
9
Different Forecasts, Information, Evaluation
Deterministic Categorical Probabilistic
Todays high will be 76 degrees, and it will be
partly cloudy, with a 30 chance of rain.
10
Different Forecasts, Information, Evaluation
Deterministic Categorical Probabilistic
Todays high will be 76 degrees, and it will be
partly cloudy, with a 30 chance of rain.
Deterministic
Categorical
Probabilistic
76
30
No rain
Rain
How would you evaluate each of these?
11
Different Forecasts, Information, Evaluation
Deterministic Categorical Probabilistic
Todays high will be 76 degrees, and it will be
partly cloudy, with a 30 chance of rain.
Deterministic
12
ESP Forecasts User preferences influence
verification
From California-Nevada River Forecast Center
13
ESP Forecasts User preferences influence
verification
From California-Nevada River Forecast Center
14
ESP Forecasts User preferences influence
verification
From California-Nevada River Forecast Center
15
ESP Forecasts User preferences influence
verification
From California-Nevada River Forecast Center
16
So Many Evaluation Criteria!
Categorical Hit Rate Surprise rate Threat
Score Gerrity Score Success Ratio Post-agreement P
ercent Correct Pierce Skill Score Gilbert Skill
Score Heidke Skill Score Critical Success
index Percent N-class errors Modified Heidke
Skill Score Hannsen and Kuipers Score Gandin and
Murphy Skill Scores

Deterministic
Bias
Correlation
RMSE
Standardized RMSE
Nash-Sutcliffe
Linear Error in Probability Space

Probabilistic
Brier Score
Ranked Probability Score
Distributions-oriented Measures
Reliability
Discrimination
Sharpness

17
RFC Verification System Metrics
CATEGORIES DETERMINISTIC FORECAST VERIFICATION METRICS PROBABILISTIC FORECAST VERIFICATION METRICS
1. Categorical (predefined threshold, range of values) Probability Of Detection (POD), False Alarm Ratio (FAR), Probability of False Detection (POFD) Lead Time of Detection (LTD), Critical Success Index (CSI), Pierce Skill Score (PSS), Gerrity Score (GS) Brier Score (BS), Rank Probability Score (RPS)
2. Error (accuracy) Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Error (ME), Bias (), Linear Error in Probability Space (LEPS) Continuous RPS
3. Correlation Pearson Correlation Coefficient, Ranked correlation coefficient, scatter plots
4. Distribution Properties Mean, variance, higher moments for observation and forecasts Wilcoxon rank sum test, variance of forecasts, variance of observations, ensemble spread, Talagrand Diagram (or Rank Histogram)
Source Verification Group, courtesy J. Demargne
18
RFC Verification System Metrics
CATEGORIES DETERMINISTIC FORECAST VERIFICATION METRICS PROBABILISTIC FORECAST VERIFICATION METRICS
5. Skill Scores (relative accuracy over reference forecast) Root Mean Squared Error Skill Score (SS-RMSE) (with reference to persistence, climatology, lagged persistence), Wilson Score (WS), Linear Error in Probability Space Skill Score (SS-LEPS) Rank Probability Skill Score, Brier Skill Score (with reference to persistence, climatology, lagged persistence)
6. Conditional Statistics (based on occurrence of specific events) Relative Operating Characteristic (ROC), reliability measures, discrimination diagram, other discrimination measures ROC and ROC Area, other resolution measures, reliability diagram, discrimination diagram, other discrimination measures
7. Confidence (metric uncertainty) Sample size, Confidence Interval (CI) Ensemble size, sample size, Confidence Interval (CI)
Source Verification Group, courtesy J. Demargne
19
Possible Performance Criteria
Accuracy - overall correspondence between
forecasts and observations Bias - difference
between average forecast and average observation
Consistency - forecasts dont waffle around
Good consistency
20
Possible Performance Criteria
Accuracy - overall correspondence between
forecasts and observations Bias - difference
between average forecast and average observation
Consistency - forecasts dont waffle
around Sharpness/Refinement ability to make
bullish forecast statements
Not Sharp
Sharp
21
What makes a forecast good?
Forecasts should agree with observations, with few large errors Accuracy
Forecast mean should agree with observed mean Bias
Linear relationship between forecasts and observations Association
Forecast should be more accurate than low-skilled reference forecasts (e.g., random chance, persistence, or climatology) Skill
Adapted from Ebert (2003)
22
What makes a forecast good?
Binned forecast values should agree with binned observations (agreement between categories) Reliability
Forecast can discriminate between events non-events Resolution
Forecast can predict with strong probabilities (i.e., 100 for event, 0 for non-event) Sharpness
Forecast represents the associated uncertainty Spread (Variability)
Adapted from Ebert (2003)
23
Forecasting Tradeoffs
Forecast performance is multi-faceted
False Alarms Surprises warning without
event event without warning
No fire
False Alarm Ratio Probability of Detection
A forecasters fundamental challenge is
balancing these two. Which is more
important? Depends on the specific decision
context
24
How Good? Compared to What?
SForecast SBaseline SPerfect SBaseline
SForecast SBaseline
Skill Score
1 -
Skill Score (0.50 0.54)/(1.00-0.54)
-8.6 worse than guessing
What is the appropriate Baseline?
25
Graphical Forecast Evaluation
26
Basic Data Display
Historical seasonal water supply
outlooks Colorado River Basin
Morrill, Hartmann, and Bales, 2007
27
Scatter plots
Historical seasonal water supply
outlooks Colorado River Basin
Morrill, Hartmann, and Bales, 2007
28
Histograms
Historical seasonal water supply
outlooks Colorado River Basin
Morrill, Hartmann, and Bales, 2007
29
IVP Scatterplot Example
Source H. Herr
30
Cumulative Distribution Function (CDF) IVP
Cat 1 No Observed Precipitation Cat 2
Observed Precipitation (gt0.001)
Empirical distribution of forecast probabilities
for different observations categories
Goal Widely separated CDFs
Source H. Herr, IVP Charting Examples, 2007
31
Probability Density Function (PDF) IVP
Cat 1 No Observed Precipitation Cat 2
Observed Precipitation (gt0.001)
Empirical distribution for 10 bins for IVP GUI
Goal Widely separated PDFs
Source H. Herr, IVP Charting Examples, 2007
32
Box-plots Quantiles and Extremes
Based on summarizing CDF computation and plot
Goal Widely separated box-plots
Cat 1 No Observed Precipitation Cat 2
Observed Precipitation (gt0.001)
Source H. Herr, IVP Charting Examples, 2007
33
Scalar Forecast Evaluation
34
Standard Scalar Measures
Bias Mean forecast Mean observed Correlation
Coefficient Variance shared between forecast and
observed (r2) Says nothing about bias or whether
forecast variance observed variance Pearson
correlation coefficient assumes normal
distribution, can be or (Rank r only ,
non-normal ok) Root Mean Squared Error Distance
between forecast/observation values Better than
correlation, poor when error is
heteroscedastic Emphasizes performance for high
flows Alternative Mean Absolute Error (MAE)
Forecast
Observed
fcst
obs
35
Standard Scalar Measures (with Scatterplot)
1943-99 April 1 Forecasts for Apr-Sept
Streamflow at Stehekin R at Stehekin, WA
1954-97 January 1 Forecasts for Jan-May
Streamflow at Verde R blw Tangle Crk, AZ
Bias 22 Corr 0.92 RMSE 74.4
Bias -87.5 Corr 0.58 RMSE 228.3
Forecast (1000s ac-ft)
Observed (1000s ac-ft)
Observed (1000s ac-ft)
36
IVP Deterministic Scalar Measures
ME smallest and errors cancel MAE vs. RMSE
RMSE influenced by large errors for large
events MAXERR largest Sample Size small samples
have large uncertainty
Source H. Herr, IVP Charting Examples, 2007
37
IVP RMSE Skill Scores
Skill compared to Persistence Forecast
Source H. Herr, IVP Charting Examples, 2007

Write a Comment

User Comments (0)