Title: PM2.5 Model Performance: Lessons Learned and Recommendations
1PM2.5 Model Performance Lessons Learned and
Recommendations
- Naresh Kumar
- Eladio Knipping
- EPRI
- February 11, 2004
2Acknowledgements
- Atmospheric Environmental Research, Inc. (AER)
- Betty Pun, Krish Vijayaraghavan and Christian
Seigneur - Tennessee Valley Authority (TVA)
- Elizabeth Bailey, Larry Gautney, Qi Mao and
others - University of California, Riverside
- Zion Wang, Chao-Jung Chien and Gail Tonnesen
3Overview
- Model Performance Issues
- Need for Performance Guidelines/Benchmarking
- Review of Statistics
- Summary
4Model Performance Issues
- Evaluation of Modeling Systems
- Local vs. Regional Evaluation
- Daily/Episodic/Seasonal/Annual Averaging
- Threshold and Outliers
- What Species to Evaluate?
- Sampling/Network Issues
5Examples from Model Applications
- Two applications of CMAQ-MADRID
- Southeastern U.S. (SOS 1999 Episode)
- Big Bend National Park, Texas (BRAVO) Four-Month
Study - Statistical performance for SO42, EC, OM, PM2.5
6Application in Southeastern U.S.
- Southern Oxidant Study (SOS 1999)
- June 29 to July 10, 1999
- Meteorology processed from MM5 simulations using
MCIP2.2 - Emissions files courtesy of TVA
- Simulation
- Continental U.S. Domain
- 32-km horizontal resolution without nesting
7Application to Big Bend National Park
REMSAD
CMAQ-MADRID
- The Georgia Tech/Goddard Global Ozone Chemistry
Aerosol Radiation Transport (GOCART) model
prescribed boundary conditions for SO2 and SO42
to the REMSAD domain. - Preliminary Base Case simulation used boundary
conditions as prescribed from a simulation of the
larger outer domain by REMSAD. - SO2 and SO42 concentrations were scaled at
CMAQ-MADRID boundary according to CASTNet and
IMPROVE Network observations.
8BRAVO Monitoring Network
9Local vs. Regional (SOS 1999)
10Local vs. Regional (BRAVO)
11Daily SO42 PO Pairs with Different Averaging
12Daily SO42 PO Pairs for Each Month
13Effect of Threshold
14Mean-Normalized/Fractional Statistics
15Need for Model Performance Guidelines
- If no guidelines exist
- Conduct model simulation with best estimate of
emissions and meteorology - Perform model evaluation using favorite
statistics - Difficult to compare across models
- State that model performance is quite good or
adequate or reasonable or not bad or as
good as it gets - Use relative reduction factors
- With guidelines for ozone modeling
- If model didnt perform within specified
guidelines - Extensive diagnostics performed to understand
poor performance - Improved appropriate elements of modeling system
- Enhanced model performance
16Issues with Defining Performance Guidelines for
PM2.5 Models
- What is reasonable, acceptable or good
model performance? - Past experience How well have current models
done? - What statistical measures should be used to
evaluate the models?
17Criteria to Select Statistical Measures I
- Simple yet Meaningful
- Easy to Interpret
- Relevant to Air Quality Modeling Community
- Properties of Statistics
- Normalized vs. Absolute
- Paired vs. Unpaired
- Non-Fractional vs. Fractional
- Symmetry
- Underestimates and overestimates must be
perceived equally - Underestimates and overestimates must be weighted
equally - Scalable biases scale appropriately in statistics
18Criteria to Select Statistical Measures II
- Statistics that can attest to
- Bias
- Error
- Ability to capture variability
- Peak accuracy (to some extent)
- Normalizes daily predictions paired with
corresponding daily observations - Inherently minimizes effect of outliers
- Some statistics/figures may be preferable for
EVALUATION, whereas others may be preferred for
DIAGNOSTICS
19Problems with Thresholds Outliers
- Issues with addressing low-end comparisons via
threshold - Instrumental uncertainty detection limit,
signal-to-noise - Operational uncertainty
- Additional considerations network, background
concentration, geography, demographics - Inspection for outliers
- Outlier vs. valid high observation
- Definition of outlier must be objective and
unambiguous - Clear guidance necessary for performance
analysis.
20Review of Statistics
- Ratio of Means (Bias of Means) or
Quantile-Quantile Comparisons - Defeats purpose of daily observations completely
unpaired - Hides any measure of true model performance
- Normalized Mean Statistics (not to confuse with
Mean Normalized) - Defeats purpose of daily observations Equally
weighs all errors regardless of magnitude of
individual daily observations - Masks results in bias (e.g., numerator zero
effect) - Based on Linear Regressions
- Slope of Least Squares Regression Root
(Normalized) Mean Square Error - Slope of Least Median of Squares Regression
(Rousseeuw regression) - Can be skewed neglects magnitude of
observations good for cross-comparisons. - Fractional Statistics
- Taints integrity of statistics by placing
predictions in denominator not scalable
21Bias Statistics
- Mean Normalized Bias/Arithmetic Bias Factor
- Same statistic ABF is the style for symmetric
perception - ABF 21 for 100 MNB, ABF 12 for 50 MNB
- MNB in can be useful during diagnostics due to
simple and meaningful comparison to MNE, but the
comparison is flawed. - The statistics give less weight to
underpredictions than to overpredicitions. - Logarithmic Bias Factor/Logarithmic-Mean
Normalized Bias - Wholly symmetric representation of bias that
satisfies all criteria - Can be written in factor form or in
percentage form
22Error Statistics
- Mean Normalized Error
- Each data point normalized with paired
observation - Similar flaw as Arithmetic Mean Normalized Bias
The statistic gives less weight to
underpredictions than to overpredicitions. - Logarithmic Error Factor/Logarithmic-Mean
Normalized Error - Satisfies all criteria
- Comparisons between logarithmic-based statistics
(bias and error) are visibly meaningful when
expressed in factor form
23Comparing Bias and Error Statistics Based on
Arithmetic and Logarithmic Means
24Mean Normalized/Fractional Statistics
25Logarithmic/Arithmetic Statistics
26Logarithmic/Arithmetic Statistics
Note MNB/ABF MNE use 95 data interval. FB,
FE, LMNB/LBF and LMNE/LEF use 100 of data.
27Relating Criteria for LBF/LMNB and LEF/LMNE
- Criterion for Logarithmic EF/MNE can be
Established from Criterion for Logarithmic BF/MNB - For example Error twice the amplitude of Bias
- Logarithmic Bias Factor/Logarithmic-Mean
Normalized Bias - LBF 1.251 to 11.25 LMNB 25 to -20
- Logarithmic Error Factor/Logarithmic-Mean
Normalized Error - LEF 1.56 LMNE 56
28Relating Criteria for LBF/LMNB and LEF/LMNE
- Criterion for Logarithmic EF/MNE can be
Established from Criterion for Logarithmic BF/MNB - For example Error twice the amplitude of Bias
- Logarithmic Bias Factor/Logarithmic-Mean
Normalized Bias - LBF 1.501 to 11.50 LMNB 50 to -33
- Logarithmic Error Factor/Logarithmic-Mean
Normalized Error - LEF 2.25 LMNE 125
29Variability Statistics
- Coefficient of Determination R2
- Should not be used in absence of previous
statistics - Coefficient of Determination of Linear
Regressions - Least Squares Regression through Origin Ro2
- Used by some in global model community as a
measure of performance and ability to capture
variability - Least Median of Squares Regression
- More robust, inherently minimizes effects of
outliers - Comparison of Coefficients of Variation
- Comparison of Standard Deviation/Mean of
predictions and observations - Other statistical metrics?
30Summary Items for Discussion
- What spatial scales to use for model performance?
- Single Site Local/Region of Interest
- Large Domain/Continental
- What statistics should be used?
- What are the guidelines/benchmarks for
performance evaluation? - Should the same guidelines be used for all
components - Sulfate, Nitrate, Carbonaceous, PM2.5
- Ammonium, Organic Mass, EC, Fine Soil, Major
Metal Oxides - How are network considerations taken into account
in guidelines? - Should models meet performance guidelines for an
entire year and/or other time scales (monthly,
seasonal)? - Should there be separate guidelines for different
time scales? - Statistics based on daily PO pairs
- Average daily results to create weekly, monthly,
seasonal or annual statistics
31More Examples
- More examples of Comparison of Statistics
- Fractional
- Arithmetic-Mean Normalized
- Logarithmic-Mean Normalized
32Mean Normalized/Fractional Statistics
33Logarithmic/Arithmetic Statistics
34Mean Normalized/Fractional Statistics
35Logarithmic/Arithmetic Statistics
36Mean Normalized/Fractional Statistics
37Logarithmic/Arithmetic Statistics