Title: Mining Airfare Data to Minimize Ticket Purchase Price
1Mining Airfare Data to Minimize Ticket Purchase
Price
- Oren Etzioni
- Craig Knoblock
- Alex Yates
- Rattapoom Tuchinda
2Outline
- Introduction and Motivation
- Data Mining Methods for Airfare Data
- Experiments
- Conclusion
3Introduction
- Corporations often use complex polices to vary
product prices over time. - Airline industry often use dynamic pricing
strategies to maximize its revenue. (based on
seasons, seats available, pricing of other
airlines) - In 1988, Bell Laboratories patented a
mathematical formula that could perform rapid
calculations on fare problems with thousands of
variables.
4Introduction Cont.
- Airlines have learned to more effectively price
their product, instead of attacking each other to
gain market share. - Today, airlines use sophisticated software to
track their competitors pricing history and
propose adjustments that optimize their overall
revenue.
5Pricing Strategies
- Flights depart around holidays or weekends appear
to change more. - 7 and 14 day advance purchase (airlines
usually increase ticket price two weeks before
departure date and ticket prices are at a maximum
on departure dates.) - Weekend Boosts
6Example
- Price change over time for American Airlines
flight 192223, LAX-BOS, departing on Jan. 2. - (This example shows the rapid price change in
the days priori to the New Year)
7Categories of airlines
- Big players (United Airlines, American Airlines)
- Smaller airlines which concentrate on selling
low-price tickets ( Air Trans, Southwest) - Different types of tickets (economic, business,
first class, restricted or unrestricted)
8Is Airfare Prediction Possible?
- Airlines have tons of historical data.
- Airlines have many fare classes for seats on the
same flight, use different sales channels (e.g.,
travel agents, priceline.com) - Airfare prices information become increasingly
available on the Web. - But some information is not available (e.g., of
unsold seats in a particular flight).
9Motivation
- The goal of this paper is to learn whether to buy
a ticket or wait at a particular time point, for
a particular flight, given the current price
history recorded.
10Advisor Model
- Consumer wants to buy a ticket.
- Hamlet buy (this is a good price).
- Or wait (a better price will emerge).
- Notify consumer when price drops.
11Will Flights sell out?
- Watch the number of empty seats.
- Upgrade to business class.
- Place on another flight
- In our experiment upgrades were sufficient
12Flight Data
- Airlines typically user the same flight number
(e.g., NW17) to refer to multiple flights with
the same route that depart at the same time on
different dates. - In this paper, a particular flight is referred to
a combination of its flight number and date. - In the training dataset, same class is the set of
states with the same flight number and the same
hours but different department dates.
13Data Set
- Used Fetch.coms data collection infrastructure.
- Collected over 12,000 price observations
- 41 day period, every 3 hours.
- Lowest available fare for a one-week roundtrip.
- LAX-BOS and SEA-IAD.
- 6 airlines including American, United, etc.
14Learning Task Formulation
- Input price observation data (built a flight
data collection agent that runs at a scheduled
internal, extracts the pricing data, and stores
the results in a database. - Algorithm label observations run learner--
Hamlet - Output for each purchase point ?
- buy VS wait
15Formulation cont.
- Want to use the latest information.
- Run learner daily to produce new model.
- Learner is trained on data gathered to date.
- Learned policy is a collection of these models.
- Our experiments evaluate the savings on the
flights.
16Candidate Approaches
- By hand an expert looks at the data.
- Time series
- Not effective at price jumps!
- Reinforcement learning Q-learning.
- Used in computational finance.
- Rule learning Ripper,
17RIPPER
- RIPPER (Repeated Incremental Pruning to Produce
Error Reduction) is an efficient rule learning
algorithm that can process noisy datasets
containing thousands of examples. - First, the algorithm partition examples into a
growing set and a pruning set. Next, a rule is
grown by adding and pruning features. Repeat the
process until the rule set output by RIPPER. - RIPPER is suitable to handle two-class learning
problem.
18RIPPER Cont.
- In this study, features include price, airline,
route, hours-before-takeoff, etc. - Learned 20-30 rules
19Simple Time Series
- Predict price using a fixed window of k price
observations weighted by a. - We used a linearly increasing function for a
- The time series model makes its decision based on
a one-step prediction of the ticket price change - IF Pt1gtPt THEN buy, ELSE wait.
20Reinforcement Learning
- Reinforcement learning is learning what to
do---how to map situations to actions---so as to
maximize a numerical reward signal. - Supervised learning is learning from examples
provided by some knowledgeable external
supervisor.
21Q-learning
- Q learning is a method addressing the task of
Reinforcement Learning. The main idea is to
approximate the function assigning each
state-action pair the highest possible payoff. - Q denote a transition function, mapping each
state-action pair to a successor state - S denote a finite set of states
- A denote a finite set of actions
- R denote a reward function
22Q-learning Cont.
- Standard Q-Learning formula
- s is the state resulting from taking action a in
state s. - r is the discount factor for future rewards, in
this paper, r1
23Hand-Crafted Rule
- A fairly simple policy consulted with travel
agents, using it to compare with other data
mining algorithms - IF ExpPrice(s0,t0)ltCurPrice AND s0gt7 days
- THEN wait ELSE buy
- ExpPrice(s0,t0) denotes the average over all
MinPrice(s,t) for flights in the training set
with that flight number, where MinPrice(s,t) is
the minimum price of that flight over the
interval starting from s days before departure up
until time t
24Hamlet
- Stacking with three base learners
- Ripper (e.g., Rwait)
- Time series
- Q-learning (e.g., Qbuy)
- Using multiple data mining methods to combine the
outputs - Output classifies each purchase point as buy
or wait.
25A Sample Rule Generated by Hamlet
- IF hours-before-takeoffgt480
- AND airlineUnited
- AND pricegt360
- AND TSbuy
- AND QLwait
- THEN wait
- TS is the output of the Time Series algorithm,
QL is the output of Q-Learning.
26Ticket Purchasing Simulation
- Real price data (collected from the Web)
- Simulated passengers (a passenger is a person
wanting to buy a ticket on a particular flight at
a particular date and time) - Hamlet run once per day (training data all data
gathered in the past)
27Saving Experiments
- Savings for a simulated passenger is the
difference between the price of a ticket at the
earliest purchase point and the price of the
ticket at the point when the predictive model
recommends buying. - Net savings is savings net of both losses and
upgrade costs.
28Effectiveness
- HAMLETs savings that were 61.8 of optimal!
- Savings buy immediately VS Hamlet.
- Optimal buy at the best possible time (knowing
the future price information)
29Savings by Method
- Savings over buy now.
- Penalty for sell out upgrade cost.
- Total ticket cost is 4,579,600.
30Savings by Method Cont.
- Table below shows the savings, losses, upgrade
costs, and net savings achieved in the simulation
by each predictive model
31Sensitivity Analysis
- Varying two key parameters to test the robustness
of the results to changes in simulation - Change the distribution of passengers requesting
flight tickets (e.g., uniform, linear
decrease/increase) to check the performance on
multiple flights in three hour interval. - Allow a passenger to purchase a ticket at any
time during a three hour internal (e.g.,
specifies fly in the morning, afternoon or
evening)
32Sensitivity Analysis Cont.
Legend
Time Series Q-Learning By Hand Ripper Hamlet Optim
al
33Upgrade Penalty
- Most algorithms avoided the costly upgrades
almost all the time. ( Upgrades as a fraction
of the number of test passengers 4488 of them)
34Discussion
- 76 of the time --- no savings possible.
- (Prices never dropped from the earliest
purchase point until the flight departed) - Uniform distribution over 21 days.
- 33 of the passengers arrived in the last week.
- No passengers arrived 28 days before.
- Simulation understates possible savings!
35Savings on Feasible Flights
- Comparison of Net Savings (as a percent of total
ticket price) on Feasible Flights (price saving
is possible)
36Related Work
- Trading agent competition.
- Auction strategies
- Temporal data mining.
- Time Series.
- Computational finance.
37Future Work
- More tests are necessary international,
multi-routes, hotels, etc. - Cost sensitive learning
- Additional base learners
- Bagging/boosting
- Refined predictions
- Commercialization patent, license.
38Conclusions
- Dynamic pricing is prevalent.
- Price mining a-la-Hamlet is feasible.
- Price drops can be surprisingly predictable.
- Need additional studies and algorithms.
- Great potential to help consumers!?
39But
- Airlines may introduce noise into their pricing
patterns to fool a price miner. - Demand and supply of seats are uncertain. (Good
prediction based on good assumptions) - Who can earn most benefit in the end?
- (Airlines, consumers, price miner?)
40John Nash said
Its a GAME!
41References
- Fast Effective Rule Induction, In A. Prieditis
and S. Russell, editors, Proc of the 12th ICML,
1995. - A General Method for making classifiers
cost-sensitive, In Proc. of Fifth ACM SIGKDD,
1999. - Reinforcement Learning An Introduction. MIT
Press, Cambridge, MA, 1998. - The Analysis of Time Series An Introduction.
Chapman and Hall, London, UK, 1989. - Airlines Rely on Technology To Manipulate Fare
Structure By Scott Mccartney, Wall Street
Jounal, November 3, 1997.
42End