Title: The Effects of Missing Data on Mean Hourly Values
1The Effects of Missing Data on Mean Hourly
Values Don Herzog NOAA/NGDC/CIRES XIIIth IAGA
Workshop Golden, Colorado June 2008
2 Outline Purpose Study
Design Results MHV Computational
Methods Conclusions Future Considerations
3Purpose
- Provide some quantitative measure of the
accuracy of - Mean Hourly Values (MHVs) computed
when data are - missing during the hour.
- Further discussion on the issue of MHV quality
when - data are missing to help answer the question
When might an MHV not be computed
if some data are missing during the
hour?
4- Study Design
- Used 3 USGS stations from the INTERMAGNET
CD-ROMs. - Used the X-Component instead of H or D.
- Selection Criteria
- 3 latitudes High (College) Mid (Boulder)
Low (San Juan) - 3 magnetic activity levels (based on K-Index)
- Active (K 8) Moderate (K 5) Quiet
(K 0) - Constructed non-missing data sets of 24-hour days
using 3-hour intervals with same K-Index - Generated sets of random numbers between 1 and 60
for - 5-minute, 10-minute, up to 40-minutes of
deletion
5- Study Design, p.2
- Applied those 8 deletion sets to the complete
data sets and compared the MHVs of the deleted
hours with those of the complete data MHVs. - Computed the Root Mean Square (RMS) of the
deleted 24-hours for each deletion set.
6Lower Limit K-Index Values (nT)
K
7(No Transcript)
8Composite Days Used in Study - College
9(No Transcript)
10Composite Days Used in Study - Boulder
11(No Transcript)
12Composite Days Used in Study San Juan
13Intervals used for composite Days
14Deletion minutes used for deletion sets
15Sample deletion sets for College Active (Hour 11)
16Root Mean Square (RMS)
- RMS provides a measure of the typical size of a
set of /- numbers - RMS is the same or larger than the average of
the unsigned values
17Expected Results
18Actual Results
19Results
(vs. 350)
20Results
21Results
22Results
23Results
24Results
25Results
26Results
27Normalized RMS
28Conclusions
- Need further study.
- Run ensemble of statistical deletion sets?
- No one-size-fits-all answer to original
question.
- For quiet times (K0), almost any data seems to
work.
- RMS relative to K-Index nearly constant for 5 Ã
30 missing minutes.
- No large difference between mid- and
low-latitudes for all activity.
29Future Considerations
- Why similar results for all data sets
(latitude activity)?
- Why is the 35-minute deletion set so different?
- What happens for 1-5 minute deletions?
- What happens for other K-Index levels?
- Change selection criteria
- Hourly Range vs. 3-hour K-Index.
30(No Transcript)
31Future Considerations
- Why similar results for all data sets
(latitude activity)?
- Why is the 35-minute deletion set so different?
- What happens for 1-5 minute deletions?
- What happens for other K-Index levels?
- Change selection criteria
- Hourly Range vs. 3-hour K-Index.
32Questions?
Thank You!
Photo by Frank McDonald
Photo by Jane Loughney