Title: Experience with the Gravity Model
1Experience with the Gravity Model
2Introduction
- There is demand for air travel between every
city-pair in the world (can be very small) - We have imperfect data on the actual travel
(for many of the larger demands) - The Gravity Model is the long-standing
traditional formulation - Demand is bigger, the bigger the origin city
- Demand is bigger, the bigger the destination
- Demand is smaller, the longer the distance
- Other things may also matter
3Prologue
- We have some experience and prejudices
- Doubling the origin city size should double the
demand - Doubling the destination size should double the
demand - Cost may be a better metric than distance
- The zero demands should not be left out
- Air Demand lt200 miles competes with ground
- Common Language helps demand
- Common alphabet helps demand
- Different incomes hurts demand
- Gravity works better at distributing total
outbound demand than at estimating size of total
travel - Leisure destinations are origin-specific and
arbitrary
4Act 1 We go Exploring
- Guidelines (Pirates Code)
- Take the easiest first
- Use places you know about
- Examine the results in detail
- US domestic data
- Best reporting quality
- One country, one language, one income
- Lots of points
- Use Seattle (SEA), Boston (BOS), Chicago (CHI)
- Disparate types of cities
- Ive lived there
5SEA, BOS, CHI
- Passenger data from US ticket sample
- Origin-Destination reporting
- Some breakage of interline trips
- Domestic points only similar fares, taxes,
hassle - Origin gravity weight
- Not population or income or .....
- Use outbound departing passengers
- Focus results on distributing to destinations
- Destination gravity weight
- Arriving passengers to destination
- O-D data as used is not directional
- Original source has home city for trips
- All further data (outside US) will not
- So we will use US data in non-directional form
6Starter Formulation
- Calibrate gravity model
- Pax WO WD / Dista
- Where Pax Origin-Destination Demand
- WO Origin weight (size)
- WD Destination weight (size)
- Dist Intercity Distance
- a Distance exponent (calibrated variable)
- Use Log formulation, linear least squares fit
- Examine forecast to actual Passengers/Demand
- Allow origin size WO to be a calibrated variable
- One each for SEA, BOS, CHI
-
7Early Observations Some Wild Outliers
8Fitzing with data
- Most Distlt200 had low actuals
- Demand diverted to surface modes not in data
- SEA high actuals were
- Points in Alaska
- Had trips interlining in SEA--with broken data
- Were dedicated Seattle pointslike college towns
- BOS high actuals were
- Leisure destinations, for Boston
- Characteristics of high actuals
- Destinations had small number of origin cities
- Destinations had one large demand to origin
- Some were secondary airports in a city
- End of Act 1
9Act 2 Our First Regressions
- We eliminate all pointslt200 miles
- Due to ground competition
- We eliminate all points with lt12 origins
- Tend to be captive-to-single-origin points
- We did a big side-study on share-of-largest
origin - We generate zeros by destination (16)
- When 1 or 2 of SEA, BOS, CHI lack demand
- Due to log form, zeros dont work
- We try .3, .1, .01, and .001 for zeros
- We get rising a with smaller zeros, significantly
- We include only 5 zeros, but get same reactions
10Small Demands are a Real Problem
- Regression results driven by zero points
- Least squares in log form gives equal weight to
each demand point - Log form emphasizes percentage error
- Actual needs are different
- Forecast big demands with smaller errors
- Forecast the small demands merely as small
11Compromise
- Ignore small demands and zeros
- Require Paxgt10 for all three cities
- Or drop the destination
- Merge multiple airports in a city to a single
city destination - (We had been using airports gt cities)
- We now get same answers, with or without
remaining outliers (errors below ½ or above 2) - Errors on large demands more reasonable
- Most small markets forecast as small
- Exceptions are large for one origin
- Could be large for other two, but no online
services
12Early Observations
- Define draw as ratio actual / forecast
13Lessons Learned So Far
- Distance exponent a -0.66
- NOT the same as domestic fares (Fare Dist0.2)
- Do not include zero demand points
- Destinations with few origins tend to be
captive - Do not use them in generic calibrations
- To improve errors in forecasts of large demands,
use only points with large demands - Result will forecast small demands small, mostly
- Use Cities, not airports
14More Lessons
- City WO fairly consistent with city size
- More about this on next slide
- Ran against Pax data adjusted to standard fares
- Many under-forecasts were in discount markets
- Ran international destinations
- True O D not from US ticketing source
- Distance exponent a of -1.5 (much different)
- Demands 1/5th of forecasts from domestic
- Suggests language, or other barriers count
- Goods research found borders act like 3000 mi.
15Play within the Play
- Observed different ratios to total outbound
travel for SEA, BOS, CHI (Wo). - But not very different
- Ran all US domestic pairs (Paxgt10)
- Using just a single variable (WO WD), with
exponent ß - Results
- Distance exponent a -0.55 (had been -0.66)
- City-Size exponent ß 0.85
- Suggests larger cities have smaller demands
- Maybe because higher of demands are gt1 and
therefore are captured by data base. (Bigger W
? demands.) - Also small cities show more short-haul, which was
removed - Otherwise, large cities have more direct services
lower fares ! - Interpretation allows ß 1.0 to be reasonable
16Act 3 European Regional
- New set of data points
- London, Copenhagen, Istanbul
- 200 mi lt Distance lt 2800 mi
- All 3 (LON, CPH, IST) have Pax gt 10
- 219 points
- Regression Results
- Distance exponent a near -0.80
- Origin-Specific adjustments not significant
- Removing outliers has small effect on answers
- Some really big errors in really big markets
- Tends to confirm US data experiences
17Europe All Points
- Distance gt 200 mi
- Pax gt 10
- Least squares regression
- Distance exponent a goes to -1.2
- Weights (WO WD) exponent ß 0.4
- Gives almost all demands near 40
- Results Not Satisfactory
- Distance exponent seems wrong (beyond -1)
- City size (weights) exponent ß too far from 1.0
- Unsatisfactory forecasts by inspection
- Most big markets forecast too small
- Most smaller market forecast too big
18Go Back to Detailed Look
- All markets with Pax gt 200
- Drop 12 high-side outliers
- Redefine Error
- Not percentage-error-squared (log least sq.)
- Not Diff (Passengers Forecast)
- Compromise Diff0.75
- Compromise is halfway between size and
- Iterate
19Iterative Procedure
- Start with Distance and Weight Exponents 1
- Adjust scaling so median Forecast/Pax 1.00
- Adjust Weight exponent ß to reduce Error
- Readjust scaling on each try
- Adjust Distance exponent a to reduce Error
- Readjust scaling on each try
- Iterate to find min ? Diff0.75 (min Error)
- An ugly, unofficial, but practical, process
20Results from Procedure
- Distance Exponent a likes to be -1.05
- Could be cultural distance
- City Weights Exponent ß likes to be 1.25
- Why???
- Two effects are independent
- Many too big forecasts for small demands
21Poor Fit of Forecast to Data
22One Last Regression
- All Europe Classic Gravity formula
- Pax gt 10, Dist gt 200
- Distance exponent fixed at 1.00
- City weight exponent fixed at 1.00
- Allowed factor for same country
- Was about 5x, as for US vs International
- Nice scatter
- Fewer unreasonable forecasts
- Huge errors everywhere
23Gravity Forecast is Very Poor
24Obituary on Gravity Model
- Forecasts are really bad
- Outliers have large effect on answer
- Need to be removed
- Zeros have large effect on answer
- Forecasts more sensible when not included
- Results will be misleading
- Small markets will be forecast as medium
25Overall Conclusion
- Air travel between cities is
- Strongly influenced by city-pair specific factors
- Not amenable to gravity model approach
- If you have to have a forecast
- Calibrate from existing larger culturally similar
cities to same destinations - Recognize the same country effect is large
(maybe 5x)
26More Gravity Long Haul
- All world markets
- Distance gt 3100 mi (5000 km)
- Passengers gt 20
- No existing nonstop service
- Least Squares Regressions
- Four Equations (log calibrations)
- Traditional calibrate ratio to gravity term
- Distance exponent a only (-1.37)
- Whole Gravity term exponent only (0.19)
- Separate City Size ß and Distance a exponents
- (ß 0.18 and a -0.03)
27Best Fit was not usefulmeasured by either or
value errors
- Models 3 4 fit best
- Fit achieved by low variance
- No forecasts at large values
- No forecasts at small values
- Most forecasts near 40
- This is a pretty worthless forecast
- Model 2 had much worse misses than 1
- Traditional Gravity form had least harmful
answers
28Traditional Gravity was Best But not Good
29Median Forecasts are Weakly Correlated with
Actuals