Handling Advertisements of Unknown Quality in Search Advertising - PowerPoint PPT Presentation

About This Presentation
Title:

Handling Advertisements of Unknown Quality in Search Advertising

Description:

Select ads to show for each query, in an online fashion. Constraints: ... Bandit: Classical example of online learning under the explore/exploit tradeoff. K arms. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 17
Provided by: csC76
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Handling Advertisements of Unknown Quality in Search Advertising


1
Handling Advertisements of Unknown Quality in
Search Advertising
  • Sandeep Pandey
  • Christopher Olston
  • (CMU and Yahoo! Research)

2
Sponsored Search
  • How does it work?
  • Search engine displays ads next to search results
  • Advertisers pay search engine per click
  • Who benefits from it?
  • Main source of funding for search engines
  • Information flow from advertisers to users

3
Sponsored Search
Search query results
Sponsored search results
  • Click-through-rate (CTR) given an ad and a
    query, CTR probability that the ad receives a
    click
  • Optimal policy to maximize search engines
    revenue display ads of highest (CTR x bid) value

4
Challenges in Sponsored Search
  • Problem CTRs initially unknown
  • estimating CTRs requires going around the circle
  • Exploration/Exploitation Tradeoff
  • explore ads to estimate CTRs
  • exploit known high-CTR ads to maximize revenue

5
The Advertisement Problem
  • Problem
  • Advertiser Ai submits ad ai,j for Query phrase Qj
  • User clicks on aij -gt Ai pays bij (the bid
    value)
  • Queries arrive one after another
  • Select ads to show for each query, in an online
    fashion
  • Constraints
  • Show at most C ads per query
  • Advertisers have daily budgets Ai pays at most di
  • Goal Maximize search engines revenue

6
Our Approach
  • Unbudgeted Advertisement Problem
  • Isomorphic to multi-armed bandit problem
  • Budgeted Advertisement Problem
  • Similar to bandit problem, but with additional
    budget constraints that span arms
  • Introduce Budgeted Multi-armed Multi-bandit
    problem (BMMP)

7
Unbudgeted Advertisement Problem as Multi-armed
Bandit Problem
  • Bandit Classical example of online learning
    under the explore/exploit tradeoff
  • K arms. Arm i has an associated reward ri and
    unknown payoff probability pi
  • Pull C arms at each time instant to maximize the
    reward accrued over time
  • Isomorphism query phrase bandit instance
    ads arms CTR payoff probability bid
    reward

8
Policy for Unbudgeted Problem
  • Policy MIX (adopted from Auer et. al. ML02)
  • When query phrase Qj arrives
  • Compute the priority pi,j of each ad ai,j where
    pi,j (ei,j sqrt(2 ln nj / ni,j)) . bi,j
  • ei,j is the MLE of the CTR value of ai,j
  • bi,j is the price or bid value of ad ai,j
  • ni,j times ad ai,j has been shown in the past
  • nj times query Qj has been answered
  • Display the C highest-priority ads

9
Budgeted Multi-armed Multi-Bandit problem (BMMP)
  • Finite set of bandit instances each instance has
    a finite number of arms
  • Each arm has an associated type
  • Each type Ti has budget di
  • Upper limit on the total amount of reward that
    can be generated by the arms of type Ti
  • An external actor invokes a bandit instance at
    each time instant
  • the policy must choose C arms of the invoked
    instance

10
Meta Policy for BMMP
  • Input BMMP instance and policy POL for the
    conventional multi-armed bandit problem
  • Output The following Policy BPOL
  • Run POL in parallel for each bandit instance Bi
  • Whenever Bi is invoked
  • Discard arm(s) with depleted budget
  • If one or more arms was discarded, restart POLi
  • Let POLi decide which of the remaining arms to
    activate

11
Performance Guarantee of BPOL
  • OPT algorithm that knows in advance
  • Full sequence of bandit invocations
  • Payoff probabilities
  • Claim bpol(N) gt opt(N)/2 O(f(N))
  • bpol(N) total expcted reward of BPOL policy
    after N bandit invocations
  • opt(N) total expected reward of OPT
  • f(N) regret of POL after N invocations of the
    regular bandit problem

12
Proof of Performance Guarantee
  • Divide the time instants into 3 categories
  • 1 BPOL chooses an arm of higher expected reward
    than OPT
  • opt1(N) lt bpol1(N)
  • 2 BPOL chooses an arm of lower expected reward
    because OPTs arm has run out of budget
  • opt2(N) lt bpol2(N) (types . max reward)
  • 3 otherwise
  • opt3(N) O(f(N))
  • Claim (implies from the above bounds)
  • opt(N) lt bpol(N) bpol(N) O(1) O(f(N))
  • bpol(N) gt opt(N)/2 O(f(n))

13
Advertisement Policies
  • BMIX Output of our generic BPOL policy when
    given MIX as input
  • BMIX-E Replace sqrt(2 ln nj / ni,j) in priority
    pi,j by sqrt(min(0.25, V(ni,j,nj)). ln nj /
    ni,j), where V(ni,j,nj) ei,j .(1-ei,j). sqrt(2
    ln nj / ni,j)
  • Suggested in Auer. et. al. ML02.
  • Purpose Aggressive exploitation
  • BMIX-T Replace bi,j in priority pi,j by bi,j .
    throttle(di), throttle(di) 1-e(- di/di)
    where di is the remaining budget of advertiser
    Ai
  • Suggested in Mehta et. al. FOCS05
  • Purpose Delay the depletion of advertisers
    budgets
  • BMIX-ET with both E and T modifications

14
Experiments
  • Simulations over real data
  • Data
  • 85,000 query phrases from Yahoo! query log
  • Yahoo! ads with daily budget constraints
  • CTRs drawn from Yahoo!s CTR distribution
  • Simulated user clicks using the CTR values
  • Time horizon multiple days
  • Policies carried over the CTR estimates from one
    day to the next

15
Results
  • GREEDY select ads with highest current reward
    estimate (ei,j . bi,j)
  • Does not explore. Only exploits.
  • Revenue values scaled for confidentiality reasons

16
Conclusion
  • Search advertisement problem
  • Exploration/exploitation tradeoff
  • Model as multi-armed bandit
  • Introduced new Bandit variant
  • Budgeted multi-armed multi-bandit problem (BMMP)
  • New policy for BMMP with performance guarantee
  • In paper
  • Variable set of ads (ads come and go)
  • Prior CTR estimates
Write a Comment
User Comments (0)
About PowerShow.com