Title: Predicting the Present
1Predicting the Present
Hyunyoung Choi Hal Varian June 2009
2Problem statement
- Government agencies and other organizations
produce monthly reports on economic activity - Retail Sales
- House Sales
- Automotive Sales
- Unemployment
- Problems with reports
- Compilation delay of several weeks
- Subsequent revisions
- Sample size may be small
- Not available at all geographic levels
- Google Trends releases daily and weekly index of
search queries by industry vertical - Real time data
- No revisions (but some sampling variation)?
- Large samples
- Available by country, state and city
- Can Google Trends data help predict current
economic activity? - Before release of preliminary statistics
- Before release of final revision
2
3Categories in Google Trends by Query Shares
Note Queries from 2009-01-01 to 2009-04-30
Growth Comparison w/ the same time window
3
4Real Estate
5Geography
Time window
Category
5
6Subcategories under Real Estate by Query Shares
6
7Search on Real Estate Agencies
7
8Searches on Rental Listings Referrals
8
9Depicting trends
- Google Trends measures normalized query share of
particular category of queries controls for
overall growth - Often useful to look at year-on-year changes to
eliminate seasonality. - Illustrate correlations and covariates.
- Improving predictions
- Forecast time series using its own lagged values
and add Trends data as a predictor. - Statistical significance?
- Improved fit?
- Improved forecasts?
- Identify turning points?
9
1015 yr Mortgage Rate vs. Home Financing
10
11Forecasting primer
- Basic forecasting models
- Autoregressive value at time t depends on
- Value at time t-1
- Seasonal adjustment value at time t depends on
- Value at time t-12
- For monthly data
- Transfer function value at time t depends on
- Other contemporaneous or lagging variables
- Seasonal autoregressive transfer model Value at
time t depends on - Value at time t-12 (seasonality)?
- Value at time t-1 (recent behavior)?
- Other lagging or contemporaneous variables (such
as Google Trends data)? - Typical question of interest
- How much more accurate forecasts can you get from
additional variables over and above the accuracy
you get with the history of the time series
itself?
11
11
12Model
New Home Sales
- Recent Search Activity on
- Real Estate Agencies
- Rental Listings Referrals
- Home Inspections Appraisal
- Property Management
- Home Insurance
- Home Financing
Housing affordability with Average/Median Home
Price
Exogenous Variables
Recent Trend with New Home Sales at
t-1 Seasonality with New Home Sales at t-12
Time Series
Google Trends
13Predicting the present
New Residential Sales from US Census
Google Trends Real Estate by Category
- Monthly release 24 28 days after the month
- Seasonally adjusted
- National and Regional aggregate
-
- Home Inspections Appraisal
- Home Insurance
- Home Financing
- Property Management
- Rental Listings Referrals
- Real Estate Agencies
13
14New House Sales vs. Real Estate Google Trends
14
15Analysis and Forecasting
Model Yt 446.1 0.864 Yt - 1 4.340
us378.1 4.198 us96.2 0.001 AvgPt 1 Yt
New house sold at t-th month AvgPt 1 Average
Sales Price of New One-Family Houses Sold at
(t-1)-th month us378.1 Google Trend of vertical
id 378 (Rental Listings Referrals ) at t-th
month 1st week us96.2 Google Trend of
vertical id 96 (Real Estate Agent) at t-th
month 2nd week
July 2008 Actual 515K Predicted
442.98K Z-score 2.53
August 2008 Prediction 417.52K
15
16Analysis and Forecasting
- Observations
- Since 2005 new house sales have been decreasing,
with little seasonality - Google Trends captures seasonality recent
trends - Positive association with Real Estate Agencies
(96) - Negative association with Rental Listings
Referrals (378) and Average Price
16
17Travel
18Subcategories under Travel by Query Shares
18
19Travel to Hong Kong
Google Trends Travel by Category
Visitors Arrival Statistics from Hong Kong
Tourism Board
- Monthly summaries release with 1 month lag
- Reports Country/Territory of Residence of
visitors - Data available 2004-2008
- Hotels Accommodations
- Air Travel
- Car Rental Taxi Services
- Cruises Charters
- Attractions Activities
- Vacation Destinations
- Australia
- Caribbean Islands
- Hawaii
- Hong Kong
- Las Vegas
- Mexico
- New York City
- Orlando
- Adventure Travel
- Bus Rail
19
20Visitors Arrival Statistics vs. Google Trends
20
21Analysis and Forecasting
Model log(Yi,t) 0.664 0.113 log(Yi,t-1)
0.828 log(Yi,t-12) 0.001 Xi,t,2 0.001
Xi,t,3 0.005 FXrate i,t
?i, ei,t ei,t N(0, 0.09382), ?i
N(0, 0.02282)? Yi,t Arrival to Hong Kong at
month t and from i-th country Xi,t,1 Google
Trend Search at 1st week of month t and from i-th
country Xi,t,2 Google Trend Search at 2nd week
of month t and from i-th country Xi,t,3 Google
Trend Search at 3rd week of month t and from i-th
country FXrate i,t Hong Kong Dollar per one
unit of i-th countrys local currency at month t.
Average of first weeks FX rate is used as a
proxy to FX rate per each month.
21
22Visitor Arrival Statistics - Actual Fitted
22
23Analysis and Forecasting
- Conclusion
- Arrival at time t is positively associated with
arrival at time t-1 and arrival at time t-12. - It shows strong seasonality and autocorrelation
- Arrival at time t is positively associated with
searches on Hong Kong. - Arrival at time t is positively associated with
FX rates. - When the local currency appreciates relative to
Hong Kong Dollar, visitors to Hong Kong increase.
23
24Automobiles
25US Auto Sales by Make
Google Trends under Vehicle Brands Category
US Auto Sales by Make
- Monthly summaries released 1 week after end of
month - Data available by Car Sales, Truck Sales and
Total Sales for each make - Data available from 2003-2008
- Source Automotive News Data Center
- Google Trends subcategory Vehicle Brands.
- Weekly Search query index
- Total 31 verticals in this subcategory
- 27 verticals matching to Monthly Sales available
25
25
26Google Categories under Vehicle Brands
NOTE Area represents the queries volume from
first half year 2008 and the color represents
queries yearly growth rate
26
27Auto Sales by Make (Top 9 Make by Sales) Monthly
Sales vs. Google Trends at Second Week of each
month
27
27
28Analysis and Forecasting
Fixed effects model log(Yi,t) 2.4276
0.2552 log(Yi,t-1) 0.4930 log(Yi,t-12)
0.0005 Xi,t,2 0.0014 Xi,t,2
ai Makei ei,t ei,t N(0,
0.13472) , Adjusted R2 0.9829 Yi,t Auto Sales
of i-th Make at month t Xi,t,1 Google Trend
Search at 1st week of month t and from i-th
make Xi,t,2 Google Trend Search at 2nd week of
month t and from i-th make Makei Dummy variable
for Auto Make ai Coefficient to capture the
mean level of Auto Sales by Make
ANOVA Table Df Sum Sq Mean Sq
F value Pr(gtF) trends1 1
12.89 12.89 710.3542 lt 2e-16 trends2
1 0.05 0.05 2.7987 0.09455 .
log(s1) 1 1532.95 1532.95 84452.7530
lt 2e-16 log(s12) 1 24.07 24.07
1325.9741 lt 2e-16 as.factor(brand) 26
3.34 0.13 7.0696 lt 2e-16 Residuals
1480 26.86 0.02
28
28
29Actual vs. Fitted Sales (Top 9 Make by Sales)?
29
30Analysis and Forecasting
- Conclusion
- Sales at time t are positively associated with
Sales at time t-1 and Sales at time t-12. - Sales show strong seasonality and autocorrelation
- Monthly Sales are positively correlated to the
first and second weeks search volume of each
month. - If the search volume increase by 1, the sales
volume will increase by an average of 0.19.
30
30
31Unemployment
32YoY Growth in Initial Claims Google Search
According to the NBER, the current recession
started December 2007. National unemployment
rate passed 5 in mid 2008 and search queries on
Welfare and Unemployment also increased at same
time.
33Initial claims is an important leading indicator
34Google Trends data Search Insights screenshot
35Initial Claims and Google Trends
36Strong Autocorrelation in Initial Claims
Time Series
Autocorrelation Function
37Initial Claims Before/After Recession Started
California
New York
38Time Window for Analysis
Recession Starts
Window For Long Term Model
Window For Short Term Model
39Model
Reference ARIMA(0,1,1) X (1,0,0)12 Model
ARIMA(0,1,1) X (1,0,0)12 Model With Google Trends
Model Fit improved significantly smaller
Standard deviation, high log likelihood and
smaller AIC Initial Claims are positively
correlated with searches on Jobs and Welfare.
40Long Term Model Prediction Comparison with MAE
- With Google Trends, the out-of-sample prediction
MAE decreases by 16.84. - Prediction with rolling window from 1/11/2009 to
4/12/2009 - Prediction Error at t
- Mean Absolute Error
41Short Term Model Prediction Comparison with MAE
- With Google Trends, the out-of-sample prediction
MAE decreases by 19.23. - Prediction errors are within the same range as LT
Model. - Fit improvement is better with ST Model.
42Summary
- Google Trends significantly improves
out-of-sample prediction of state unemployment,
up to 18 days in advance of data release. - Mean absolute error for out-of-sample predictions
declines by 16.84 for LT Model and 19.23 for ST
Model. - Further work
- Can examine metro level data
- Other local data (real estate)?
- Combine with other predictors
- Detect turning points?