Title: Replacing Missing Data for Ensemble Systems
1Replacing Missing Data for Ensemble Systems
- Tyler McCandless
- Dr. Sue Ellen Haupt
- Dr. George Young
-
- The Pennsylvania State University Department of
Meteorology
2Motivation
- What is the problem with missing ensemble
forecast data? - Limits spread and dispersion
- Who does it affect?
- Operational Meteorologists
- Research Scientists
- What is being done?
- Case deletion or ignoring the missing data
Graphic courtesy of University of Washington
http//www.atmos.washington.edu/ens/view_uwme.cgi
3Methodology
- What is unique to replacing missing ensemble
forecasts? - - Preserve ensemble dispersion
- - Preserve ensemble spread
- - Produce similar accuracy in post-processing
4Problem
- One year of 48-hour 2-m temperature forecasts
- Eight member University of Washington Mesoscale
Ensemble - 260 out of the 2920 (8.9) of the ensemble
temperature forecasts are missing - Out of the 365 days, 151 (41.4) are missing at
least one ensemble member.
5Process Layout
- The missing data are replaced before performing
bias-correction and the post-processing schemes. - Two post-processing methods 10-day
performance-weighted window and K-Means Regime
clustering.
6Methods to Replace Missing Data
- Persistence
- Use the previous days temperature forecast
- Ensemble Member Mean
- Use the mean forecast for the entire year
- Polynomial Imputation
- Use a fifth-degree polynomial fit
- Polynomial Imputation with 3-Iterations
- Use a fifth degree polynomial fit with
3-iterations
7Methods to Replace Missing Data
- Fourier fit
- Fit a Fourier series to each ensemble member and
replace the missing data.
Fourier fit for ensemble member 3
8Methods to Replace Missing Data
- Ensemble Member Mean Deviation
9Optimal Length of Mean Deviation for 10-Day
Window Post-Processing
10Metrics
- Have similar accuracy as the case-deletion method
for both K-Means and 10-day performance weighted
window post-processing methods. - -Use Mean Absolute Error (MAE) as accuracy
metric.
11Accuracy Results
12Metrics
- Produce similar ensemble dispersion and spread to
that for the case deletion method. - -Use verification rank histograms and ensemble
spread.
13Verification Rank Histograms
- Sorted ensemble member forecasts from lowest
(coldest) to highest (warmest) - Tally the rank of the verification relative to
the sorted forecasts - Used to assess reliability (calibration) ? the
relationship between the forecast and the
observation
Wilks, D.S., 2006 Statistical Methods in the
Atmospheric Sciences, 2nd ed., Academic Press pg
317.
14Verification Rank Histograms Portland
15Verification Rank Histograms Astoria
16Verification Rank Histograms
17Ensemble Spread
18Conclusions
- The three-day mean deviation method for replacing
the missing data both preserves the ensemble
calibration and produces similar accuracy to case
deletion. - The three-day mean deviation can be used to
develop training datasets and can also be used in
a real-time forecasting environment.
19Future Directions
- A longer time series dataset
- More locations
- Experiment with other advanced systematic
statistical methods (i.e. multiple imputation) - Perform statistical testing to determine the
significance of results
20Thanks!
- The authors would like to thank Steven Greybush
for the use of his code, detailed documentation,
and knowledgeable discussions. - The authors additionally thank Pennsylvania State
Climatologist, Paul Knight, for his insightful
idea for the mean deviation method of replacing
missing data. - The authors also wish to thank Richard Grumm and
Dr. Harry Glahn of the National Weather Service
for providing valuable information for this
project. - Thanks are also due to the University of
Washington for enabling public access to its
ensemble data. The research was funded in part by
the PSU Applied Research Laboratory Honors
Program.