PM2.5 Model Performance: Lessons Learned and Recommendations - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

PM2.5 Model Performance: Lessons Learned and Recommendations

Description:

Big Bend National Park, Texas (BRAVO); Four-Month Study ... Falcon Dam. Laguna Atascosa. Padre Island. Lake Corpus Christi. Pleasanton. Hagerman. Purtis ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 38
Provided by: techni82
Category:

less

Transcript and Presenter's Notes

Title: PM2.5 Model Performance: Lessons Learned and Recommendations


1
PM2.5 Model Performance Lessons Learned and
Recommendations
  • Naresh Kumar
  • Eladio Knipping
  • EPRI
  • February 11, 2004

2
Acknowledgements
  • Atmospheric Environmental Research, Inc. (AER)
  • Betty Pun, Krish Vijayaraghavan and Christian
    Seigneur
  • Tennessee Valley Authority (TVA)
  • Elizabeth Bailey, Larry Gautney, Qi Mao and
    others
  • University of California, Riverside
  • Zion Wang, Chao-Jung Chien and Gail Tonnesen

3
Overview
  • Model Performance Issues
  • Need for Performance Guidelines/Benchmarking
  • Review of Statistics
  • Summary

4
Model Performance Issues
  • Evaluation of Modeling Systems
  • Local vs. Regional Evaluation
  • Daily/Episodic/Seasonal/Annual Averaging
  • Threshold and Outliers
  • What Species to Evaluate?
  • Sampling/Network Issues

5
Examples from Model Applications
  • Two applications of CMAQ-MADRID
  • Southeastern U.S. (SOS 1999 Episode)
  • Big Bend National Park, Texas (BRAVO) Four-Month
    Study
  • Statistical performance for SO42, EC, OM, PM2.5

6
Application in Southeastern U.S.
  • Southern Oxidant Study (SOS 1999)
  • June 29 to July 10, 1999
  • Meteorology processed from MM5 simulations using
    MCIP2.2
  • Emissions files courtesy of TVA
  • Simulation
  • Continental U.S. Domain
  • 32-km horizontal resolution without nesting

7
Application to Big Bend National Park
REMSAD
CMAQ-MADRID
  • The Georgia Tech/Goddard Global Ozone Chemistry
    Aerosol Radiation Transport (GOCART) model
    prescribed boundary conditions for SO2 and SO42
    to the REMSAD domain.
  • Preliminary Base Case simulation used boundary
    conditions as prescribed from a simulation of the
    larger outer domain by REMSAD.
  • SO2 and SO42 concentrations were scaled at
    CMAQ-MADRID boundary according to CASTNet and
    IMPROVE Network observations.

8
BRAVO Monitoring Network
9
Local vs. Regional (SOS 1999)
10
Local vs. Regional (BRAVO)
11
Daily SO42 PO Pairs with Different Averaging
12
Daily SO42 PO Pairs for Each Month
13
Effect of Threshold
14
Mean-Normalized/Fractional Statistics
15
Need for Model Performance Guidelines
  • If no guidelines exist
  • Conduct model simulation with best estimate of
    emissions and meteorology
  • Perform model evaluation using favorite
    statistics
  • Difficult to compare across models
  • State that model performance is quite good or
    adequate or reasonable or not bad or as
    good as it gets
  • Use relative reduction factors
  • With guidelines for ozone modeling
  • If model didnt perform within specified
    guidelines
  • Extensive diagnostics performed to understand
    poor performance
  • Improved appropriate elements of modeling system
  • Enhanced model performance

16
Issues with Defining Performance Guidelines for
PM2.5 Models
  • What is reasonable, acceptable or good
    model performance?
  • Past experience How well have current models
    done?
  • What statistical measures should be used to
    evaluate the models?

17
Criteria to Select Statistical Measures I
  • Simple yet Meaningful
  • Easy to Interpret
  • Relevant to Air Quality Modeling Community
  • Properties of Statistics
  • Normalized vs. Absolute
  • Paired vs. Unpaired
  • Non-Fractional vs. Fractional
  • Symmetry
  • Underestimates and overestimates must be
    perceived equally
  • Underestimates and overestimates must be weighted
    equally
  • Scalable biases scale appropriately in statistics

18
Criteria to Select Statistical Measures II
  • Statistics that can attest to
  • Bias
  • Error
  • Ability to capture variability
  • Peak accuracy (to some extent)
  • Normalizes daily predictions paired with
    corresponding daily observations
  • Inherently minimizes effect of outliers
  • Some statistics/figures may be preferable for
    EVALUATION, whereas others may be preferred for
    DIAGNOSTICS

19
Problems with Thresholds Outliers
  • Issues with addressing low-end comparisons via
    threshold
  • Instrumental uncertainty detection limit,
    signal-to-noise
  • Operational uncertainty
  • Additional considerations network, background
    concentration, geography, demographics
  • Inspection for outliers
  • Outlier vs. valid high observation
  • Definition of outlier must be objective and
    unambiguous
  • Clear guidance necessary for performance
    analysis.

20
Review of Statistics
  • Ratio of Means (Bias of Means) or
    Quantile-Quantile Comparisons
  • Defeats purpose of daily observations completely
    unpaired
  • Hides any measure of true model performance
  • Normalized Mean Statistics (not to confuse with
    Mean Normalized)
  • Defeats purpose of daily observations Equally
    weighs all errors regardless of magnitude of
    individual daily observations
  • Masks results in bias (e.g., numerator zero
    effect)
  • Based on Linear Regressions
  • Slope of Least Squares Regression Root
    (Normalized) Mean Square Error
  • Slope of Least Median of Squares Regression
    (Rousseeuw regression)
  • Can be skewed neglects magnitude of
    observations good for cross-comparisons.
  • Fractional Statistics
  • Taints integrity of statistics by placing
    predictions in denominator not scalable

21
Bias Statistics
  • Mean Normalized Bias/Arithmetic Bias Factor
  • Same statistic ABF is the style for symmetric
    perception
  • ABF 21 for 100 MNB, ABF 12 for 50 MNB
  • MNB in can be useful during diagnostics due to
    simple and meaningful comparison to MNE, but the
    comparison is flawed.
  • The statistics give less weight to
    underpredictions than to overpredicitions.
  • Logarithmic Bias Factor/Logarithmic-Mean
    Normalized Bias
  • Wholly symmetric representation of bias that
    satisfies all criteria
  • Can be written in factor form or in
    percentage form

22
Error Statistics
  • Mean Normalized Error
  • Each data point normalized with paired
    observation
  • Similar flaw as Arithmetic Mean Normalized Bias
    The statistic gives less weight to
    underpredictions than to overpredicitions.
  • Logarithmic Error Factor/Logarithmic-Mean
    Normalized Error
  • Satisfies all criteria
  • Comparisons between logarithmic-based statistics
    (bias and error) are visibly meaningful when
    expressed in factor form

23
Comparing Bias and Error Statistics Based on
Arithmetic and Logarithmic Means
24
Mean Normalized/Fractional Statistics
25
Logarithmic/Arithmetic Statistics
26
Logarithmic/Arithmetic Statistics
Note MNB/ABF MNE use 95 data interval. FB,
FE, LMNB/LBF and LMNE/LEF use 100 of data.
27
Relating Criteria for LBF/LMNB and LEF/LMNE
  • Criterion for Logarithmic EF/MNE can be
    Established from Criterion for Logarithmic BF/MNB
  • For example Error twice the amplitude of Bias
  • Logarithmic Bias Factor/Logarithmic-Mean
    Normalized Bias
  • LBF 1.251 to 11.25 LMNB 25 to -20
  • Logarithmic Error Factor/Logarithmic-Mean
    Normalized Error
  • LEF 1.56 LMNE 56

28
Relating Criteria for LBF/LMNB and LEF/LMNE
  • Criterion for Logarithmic EF/MNE can be
    Established from Criterion for Logarithmic BF/MNB
  • For example Error twice the amplitude of Bias
  • Logarithmic Bias Factor/Logarithmic-Mean
    Normalized Bias
  • LBF 1.501 to 11.50 LMNB 50 to -33
  • Logarithmic Error Factor/Logarithmic-Mean
    Normalized Error
  • LEF 2.25 LMNE 125

29
Variability Statistics
  • Coefficient of Determination R2
  • Should not be used in absence of previous
    statistics
  • Coefficient of Determination of Linear
    Regressions
  • Least Squares Regression through Origin Ro2
  • Used by some in global model community as a
    measure of performance and ability to capture
    variability
  • Least Median of Squares Regression
  • More robust, inherently minimizes effects of
    outliers
  • Comparison of Coefficients of Variation
  • Comparison of Standard Deviation/Mean of
    predictions and observations
  • Other statistical metrics?

30
Summary Items for Discussion
  • What spatial scales to use for model performance?
  • Single Site Local/Region of Interest
  • Large Domain/Continental
  • What statistics should be used?
  • What are the guidelines/benchmarks for
    performance evaluation?
  • Should the same guidelines be used for all
    components
  • Sulfate, Nitrate, Carbonaceous, PM2.5
  • Ammonium, Organic Mass, EC, Fine Soil, Major
    Metal Oxides
  • How are network considerations taken into account
    in guidelines?
  • Should models meet performance guidelines for an
    entire year and/or other time scales (monthly,
    seasonal)?
  • Should there be separate guidelines for different
    time scales?
  • Statistics based on daily PO pairs
  • Average daily results to create weekly, monthly,
    seasonal or annual statistics

31
More Examples
  • More examples of Comparison of Statistics
  • Fractional
  • Arithmetic-Mean Normalized
  • Logarithmic-Mean Normalized

32
Mean Normalized/Fractional Statistics
33
Logarithmic/Arithmetic Statistics
34
Mean Normalized/Fractional Statistics
35
Logarithmic/Arithmetic Statistics
36
Mean Normalized/Fractional Statistics
37
Logarithmic/Arithmetic Statistics
Write a Comment
User Comments (0)
About PowerShow.com