Title: Sei-Young Park
1Quality Control and the application of Cross
Validation in the Real-Time Mesoscale
Analysis(RTMA) system
Sei-Young Park
Sei-Young Park
KMA/NWPD, NCEP/EMC
Manuel Pondeca, Jim Purser, David Parrish, Geoff
Dimego John Derber, Xiujuan Su, Wan-Shu Wu, Geoff
Manikin
NCEP/EMC
sypark_at_kma.go.kr
2Contents
- Introduction of RTMA
- Quality control in RTMA
- Gross error check
- Variational QC
- Use list vs. Reject list
- Cross Validation
- Hilbert curves
- Summary and conclusion
3Real-Time Mesoscale Analysis (RTMA)
- The RTMA is a fast-track, proof-of-concept effort
intended to - leverage and enhance existing analysis
capabilities in order to generate experimental
CONUS-scale hourly NDFD-matching analyses - establish a real-time process that delivers a
sub-set of fields to allow preliminary
comparisons to NDFD forecast grids - also provide estimates of analysis uncertainty
- establish benchmark for future AOR (Analysis Of
Record) efforts - build constituency for subsequent AOR development
activities
4Real-Time Mesoscale Analysis (RTMA)
- Procedure
- Temperature dew point at 2 m wind at 10 m
- RUC forecast/analysis (13 km) is downscaled by
GSD to 5 km NDFD grid - Downscaled RUC used as first-guess in NCEPs
2DVar analysis of ALL surface observations - Estimate of analysis error/uncertainty
- Precipitation NCEP Stage II analysis
- Sky cover NESDIS GOES sounder effective cloud
amount - Logistics
- Hourly within 30 minutes
- 5 km NDFD grid in GRIB2
- Operational at NCEP Q3 FY2006
- Distribution of analyses and estimate of analysis
error/uncertainty via AWIPS SBN as part of OB7.2
upgrade end of CY2006 - Archived at NCDC
5Quality control in RTMA
QC is very important when using the high density
and unverified new data. This is one of the
reasons why the Mesonets have not been used,
despite their high data density. Therefore,
applying QC with reasonable methods is the first
step to using these data in the analysis system.
- 1. Gross error check decided by the observation
increment (residual) - 2. Variational QC
- 3. Use list vs. Reject list of Mesonet Wind
- Analysis will be concentrated on the Mesonet wind.
temperature
wind
61. Gross Error Check
Obs vs. Anal
Obs vs. Guess
Limit (o-a)/R 10
Limit (o-a)/R 5
72. Variational QC
(Y-Hxb )
The distributions of departure
often reveal a more frequent occurrence of large
departures than expected from the corresponding
Gaussian (normal) distribution with the same mean
and standard deviation-showing as wide Tails.
By Erik Andersson, 1999,2006
8Variational QC
By Erik Andersson, 1999,2006
9Variational QC
By Erik Andersson, 1999,2006
10Var QC weight function vs. IV( A 0.08 for
Metar, Synoptic sea and land )
A 0.08(288)
A 0.1(288)
A 0.06(288)
11Distribution of the innovation (VarQC)
Obs vs. Aanl
Obs vs. Guess
Limit (o-a)/R 10
Limit (o-a)/R 5
123. Uselist of Mesonet wind
- Mesonets comprise majority of obs but they are
not as good as other conventional sfc obs sources - 5/6 of all Mesonet data are from AWS which
includes most school sites and APRSWXNE(citizens
network) - No mesonet winds used in current RUC (or NAM) due
to slow wind bias. - GSD has constructed a Uselist of acceptable
networks based on overall siting strategies etc. - It depends on the Mesonet
provider name. - GSD Uselist was applied in the RTMA and has been
running on the parallel system. - Continuing need for scrutiny of mesonet quality
Provider name OK-Meso Oklahoma
Mesonet WT-Meso West Texas Mesonet APG
U.S. Army Aberdeen Proving Grounds CODOT
Colorado Department of Transportation FLDOT
Florida Dep of Transportation INDOT
Indiana Dep of Transportation MNDOT
Minnesota Dep of Transportation DCNet
DCNet GoMOOS Gulf of Maine Ocean Observing
System GPSMET ESRL/GSD Ground-Based
GPS NOS-PORT National Ocean Service Physical
Oceanographic Real-Time System RAWS
Remote Automated Weather Stations
MesoWestAGRIMET
U.S. Bureau of Reclamation MesoWestAQ NOAA
Air Resources Laboratory Special Operations and
Resource Division MesoWestARL FRD
NOAA Air Resources Laboratory
Field Research Division MesoWestARL SORD NOAA
Air Resources Laboratory Special Operations and
Resource Division MesoWestDOERD Department
of Energy Office of Repository Development MesoWes
tDUGWAY U.S. Army Dugway Proving
Grounds MesoWestITD Idaho Transportation
Department MesoWestMT DOT Montana Dep. of
Transportation MesoWestTOOELE U.S. Army
Desert Chemical Depot, Tooele County
13Number distribution of wind data (U)
For Var QC Var_pg0.05, wgtlim0.25, Gross10 m/s
With uselist
Without uselist
250
4000
1000
4500
14Number distribution of wind data (V)
For Var QC Var_pg0.05, wgtlim0.25, Gross10 m/s
With uselist
Without uselist
250
4000
1000
4500
15Verification of the Uselist
- 2006.5.23.00.2006.6.14.23. (23days, hourly)
-
without uselist with uselist
BIAS -0.687 0.046
RMSE 2.046 1.737
16Uselist of Mesonet wind
VarQC
CASE 1 2006.3.14.15 UTC
With uselist
Without uselist
All obs data
All obs data
17VarQC
Uselist of Mesonet wind
CASE 1 2006.3.14.15 UTC
With uselist
Without uselist
18Uselist of Mesonet wind
VarQC
CASE 2 2006.11.25.12 UTC
With uselist
Without uselist
All obs data
All obs data
19Uselist of Mesonet wind
VarQC
CASE 2 2006.11.25.12 UTC
With uselist
Without uselist
204. Reject list of Mesonet wind
- Rejest list constructed by the rejected data in
gross error check and VarQC - - hourly made and updated
- - It depends on the station
name.
station name lat lon 1
MLGC1 x 32.880 243.570 2 FHCC1 x
32.990 243.930 3 LTHC1
34.020 243.810 4 BPNC1 x 34.380
242.310 5 PIVC1 x 35.450
241.720 6 INTC1 x 36.120
242.910 7 QTWA3 x 36.580
246.270 8 TS037 x 36.620
241.790 9 QBRA3 x 36.790 246.240
10 BADU1 x 37.150 246.050 11
HP001 32.890 243.580 12 GDSN2
x 35.810 244.530 13 A36 x
36.540 244.460 14 AR221 x
34.190 243.290 15 AR745 34.500
242.680 16 C6728 34.840
240.920 17 H0099 x 34.380 242.400
18 PHELN 34.450 242.370 19
HSPRA 34.450 242.680 20 APPLE
34.510 242.820
21Distribution of rejected data
2006.6.8.2006.6.20. (13 days)
25 50 (16.4)
0 25 (81.4)
50 75 (1.8)
75 100 (0.6)
22Distribution of rejected data
2006.11.21.12.
2006.11.21.19.
23Verification of the reject list
24Verification of the reject list
2006.11.10.2006.11.17. (8 days)
25Verification of the reject list
2006.11.10.2006.11.17. (8 days)
26Estimates of RTMA Analysis Accuracy
Cross-Validation (CV)
- NWP data assimilation gauges the quality of
initial conditions via model forecast skill. - Cross-validation is really only way to verify
analysis for analysis sake - Withhold small percentage observations from
analysis (10) - Validate analysis at those withheld obs
- Measure ability of analysis system to reproduce
their values - Now built into GSI
- Can withhold and internally compare analysis
- Baseline CV also computed internally based on a
simple single-pass Cressman analysis scheme - Future performance metrics will be based on
improvement over this Baseline
Ordering of the each type ps, t, q, uv, spd
10 withhold
27Surface Obs Stations
28A. CTRL
Number of training set 10 (10) 1st set
Number of test set 10 (10) 1st set
29What is the Hilbert Curves?
- It is an example of a "space-filling" curve
discovered by David Hilbert in the early 1900's. - It literally covers every point in a
square. Like all good fractals, it is generated
in iterations.
Iteration 0 iteration 1 iteration 2
30Why should be the Hilbert curves for Cross
Validation?
- "A space-filling Hilbert curve provides an
efficient and convenient tool for arranging
randomly located data in a serial ordering from
which it is then possible to draw multiple
non-overlapping subsets of data, each subset
tending to be more evenly distributed in space
than the complete dataset. Each such subset can
be used as independent validating data for a
corresponding analysis that uses only the
complement of that subset. In this way, a
cross-validation of the parameters defining the
covariance models can be carried out and the
parameters optimized"
31Random and tanhx distribution
By Jim Purser
32Number of test set 10 (10) 1st set
Number of training set 10 (10) 1st set
A. CTRL
33Comparison of the three test sets
A. CTRL
B. Hilbert_curve
34Var QC vs. No Var QC for CV
35Reject list vs. no-list ( No CV)
Ctrl without reject list Exp with reject list
2006.11.20.2006.11.26. (7 days)
-0.275 -0.266
2.6973 2.7005
36Reject list with CV
Ctrl without reject list Exp with reject list
2006.11.20.2006.11.26. (7 days)
-0.099 -0.093
1.8941 2.0049
37Anisotropic background error parameter
Rltop Function correlation length In the
anisotropic background error covariance model,
the pattern is very sensitive to the parameter.
Rltop 500
Rltop 250
Rltop 900
isotropic
38Anisotropic background error parameter tests
BIAS distribution for 10 test sets (WIND)
- Considered Parameters
- Scale length
- Function correlation length
RMSE distribution for 10 test sets (WIND)
39Anisotropic background error parameter tests
Mean bias of 10 test sets in experiments
1 isotropic 2 w1.0_t1.0_w900_t100 3
w1.3_t1.0_w900_t100 4 w1.6_t1.0_w900_t100
5 w1.0_t1.0_w500_t100 6 w1.3_t1.0_w500_t100
7 w1.6_t1.0_w500_t100 8
w1.0_t1.0_w500_t500 9 w1.3_t1.0_w500_t500 10
w1.6_t1.0_w500_t500
iso
exp9
exp7
40Analysis Increment (U-wind)
anl-ges (iso)
anl-ges (exp9)
anl(iso)-anl(exp9)
Shaded smoothed terrain Solid analysis
increment
41Analysis Increment (U-wind)
anl-ges (iso)
Shaded smoothed terrain Solid analysis
increment
anl-ges (exp9)
anl-ges (exp7)
anl(iso)-anl(exp9)
anl(iso)-anl(exp7)
42Summary and conclusion
- RTMA - Phase I of AOR
- - leverage and enhance
existing analysis capabilities in order to
generate experimental CONUS-scale hourly
NDFD-matching analyses - - establish a real-time
process that delivers a sub-set of fields to
allow preliminary comparisons to NDFD forecast
grids - QC in RTMA By gross error check, VarQC and
reject list, the efficient QC could be done for
Mesonet wind data. - Cross Validation As the accurate validation
methods, the Cross validation is built in GSI and
tested in RTMA. - Hilbert curves For getting the homogenous test
sets, Hilber curves was introduced and
successfully implemented in the RTMA system. - Anisotropic background error It was shown that
the CV could be carried out to define the proper
parameters.
43Thank you !!