Title: Evaluating CPB Forecasts: A Comparison to VAR Models
1Evaluating CPB Forecasts A Comparison to VAR
Models
- Adam Elbourne, Henk Kranendonk, Rob Luginbuhl,
Bert Smid, Martin Vromans
2Evaluating CPB Forecasts A Comparison to VAR
Models
- Introduction
- Literature
- Competitors
- Variable Choice
- Results
- Model Selection?
- Conclusion
3Introduction
- Who is CPB?
- Publish forecasts 4 times per year
- Real forecasts in March and September
- Updates in June and December
- Forecasts are used as baseline scenarios for
policy decisions - Use large macro model SAFFIER
- Often evaluate our forecasts on various metrics
4Literature
- 1970s Large macro-models performed worse than
simple time series models - 1980s Time series properties of large macro
models improved - Macro-models as good as simple time series,
although Bayesian VARs were promising - Late 1990s Pooling forecasts emerges as
promising new field - Causes of forecast failure also heavily researched
5Literature
- 4 key conclusions
- Simple methods do best
- The accuracy measure matters
- Pooling helps
- The evaluation horizon matters
- Also...
- Structural breaks are endemic
- Search for robust models
6Competitors
- SAFFIER
- Yearly VAR and dVAR
- Quarterly VAR and dVAR
- Quarterly VECM
- Bayesian variants
- As above but estimated using Bayesian methods
- Minnesota prior
- E(A1) I, E(Aj) 0 for j gt 1
7Key differences as concerns recent literature
- Recent literature says structural breaks are
endemic - relationships between levels of variables are
unstable - VECMs place great emphasis on estimating the
long-run relationships between the levels - VAR in levels estimation also converges
asymptotically to the correct long-run
relationship between the levels - dVAR removes levels information
8Variable Choice
- We chose 9 variables to include in addition to
GDP from Real Time data sets. - Chose on basis of cross-correlations with GDP
growth in the period 1977-1992. - Yearly data 1974 until 1993-2006
- Quarterly data 1977 until 2001-2006
- We estimated all model combinations from 1 to 4
lags of up to 5 variables - 14 lags univariate models
- 94 lags bivariate models
- 364 lags trivariate models
- 844 lags 4 variable models
- 1264 lags 5 variable models
9Variables
- GDP
- Consumption
- Total Worker Compensation
- CPI
- World Trade
- 3 Month Interest Rates
- Business Climate Survey
- Consumer Confidence
- Bankruptcies
- Ifo Survey
10Results March Forecasts
- Comparison also made for September, but focus on
March today - SAFFIER
- Since we are estimating approx 5,000 VAR models,
some are bound to beat SAFFIER - This is pure model mining unfair
- Compare to averages How well does a class of
models do on average?
11Results - March Averages 1993-2006
12Results - March Averages 2001-2006
13Results - March Pooled 1993-2006
14Results - March Pooled 2001-2006
15Do simple models perform better?
- Effect of increasing number of variables
- Yearly dVARs
- Current Year
16Do simple models perform better?
- Effect of increasing lag length
- Yearly dVARs
- Next Year
17Relation to the Literature
- Do simple methods do best?
- VECMs do well
- Increasing lag length can help
- Increasing model dimension helps
- Does the accuracy measure matter?
- There is some difference between mean error and
the other accuracy measures - MAE or RMSE basically tell the same story
- Does pooling help?
- Yes, more so for classical then Bayesian
- Does the evaluation horizon matter?
- Yearly models consistently do well for next year
- Quarterly models do better for the current year
18Could we pick a subset of models?
- On what basis?
- Must be based on factors known before the
forecast is made - In-sample fit
- Previous MAE or RMSE
19In-Sample Fit
20Previous Accuracy
- Correlation between accuracy in previous periods
with accuracy in subsequent periods - Looks promising for yearly models
- However, these were less accurate than quarterly
over same period, especially current year - Only small improvement possible
- Over 2001-06, picking top 50 previous performers
reduced average MAE from 1.57 to 1.49 - Still worse than quarterly models
21Can we pick good models this way?
- We looked at correlations between these two
measures of previous performance and subsequent
forecast accuracy. - Correlations with fit had the wrong sign
- Quarterly model no relationship
- Yearly previous accuracy had correct sign but
didnt improve performance enough to match
quarterly models - Not possible to pick good models like this
22Conclusion
- A randomly picked VAR based model is not likely
to outperform SAFFIER - The pooled forecast are always comparable to
SAFFIER or better - Models utilising levels information do better
- Pooling works better for classical estimation
than for Bayesian estimation - Pooled classical is now marginally better than
pooled Bayesian