Catching the Drift: Learning Broad Matches from Clickthrough Data - PowerPoint PPT Presentation

About This Presentation
Title:

Catching the Drift: Learning Broad Matches from Clickthrough Data

Description:

University of Texas at Austin, Microsoft Research. Introduction. Keyword-based online advertising: bidded keywords are extracted from context ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 17
Provided by: mish152
Learn more at: https://cs.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Catching the Drift: Learning Broad Matches from Clickthrough Data


1
Catching the Drift Learning Broad Matches
from Clickthrough Data
  • Sonal Gupta, Mikhail Bilenko, Matthew Richardson
  • University of Texas at Austin, Microsoft Research

2
Introduction
  • Keyword-based online advertising bidded keywords
    are extracted from context
  • Context query (search ads) or page (content
    ads)
  • Broad matching expanding keywords via
    keyword-to-keywords mapping
  • Example electric cars ? tesla, hybrids, toyota
    prius, golf carts
  • Broad matching benefits advertisers (increased
    reach, less campaign tuning), users (more
    relevant ads), ad platform (higher monetization)

Selected Ads
3
Identifying Broad Matches
  • Good keyword mappings retrieve relevant ads that
    users click
  • How to measure what is relevant and likely to be
    clicked?
  • Human judgments expensive, hard to scale
  • Past user clicks provide click data for kw ?
    kw when user was shown ad(kw' ) in context of
    kw
  • Highly available, less trustworthy
  • What similarity functions may indicate relevance
    of kw ? kw' ?
  • Syntactic (edit distance, TF-IDF cosine, string
    kernels, )
  • Co-occurrence (in documents, query sessions, bid
    campaigns, )
  • Expanded representation (search result snippets,
    category bags, )

4
Approach
  • Task train a learner to estimate p(click kw ?
    kw' ) for any kw ? kw'
  • Data
  • ltkw, ad(kw' ), clickgt triples from clickthrough
    logs, where kw ? kw' was suggested by previous
    broad match mappings
  • Features
  • Convert each pair to a feature vector capturing
    similarities etc.
  • (kw ? kw') ?
  • For each triple ltkw, ad(kw'), clickgt, create an
    instance (?(kw, kw' ), click)
  • Learner max-margin averaged perceptron (strong
    theory, very efficient)

where ?i(kw, kw' ) can be any function of kw,
kw' or both
5
Example Creating an Instance
  • Historical broad match clickthrough data kw ?
    kw' ? ad(kw' )? click event
  • digital slr ? canon rebel ? Canon Rebel
    Kit for 499 ? click
  • seattle baseball ? mariners tickets ?
    Mariners season tickets ? no click
  • Feature functions
  • Instances
  • 0.78 0.001 0.9, 1
  • 0.05 0.02 0.2, 0

6
Experiments
  • Data
  • 2 months of previous broad match ads from
    Microsoft Content Ads logs
  • 1 month for training, 1 month for testing
  • 68 features (syntactic, co-occurrence based,
    etc.) greedy feature selection
  • Metrics
  • LogLoss
  • LogLoss Lift difference between obtained
    LogLoss and an oracle that has access to
    empirical p(click kw ? kw' ) in test set.
  • CTR and revenue results in live test with users

7
Results
8
Live Test Results
  • Use CTR prediction to maximize expected revenue
  • Re-rank mappings to incorporate revenue
  • 18 revenue, -2 CTR

9
Online Learning with Amnesia
  • Advertisers, campaigns, bidded keywords and
    delivery contexts change very rapidly high
    concept drift
  • Recent data is more informative
  • Goal utilize older data while capturing changes
    in distributions
  • Averaged Perceptron doesnt capture drift
  • Solution Amnesiac Averaged Perceptron
  • Exponential weight decay when averaging
    hypotheses

10
Results
11
Contributions and Conclusions
  • learning broad matches from implicit feedback
  • Combining arbitrary similarity measures/features
  • Using clickthrough logs as implicit feedback
  • Amnesiac Averaged Perceptron
  • Exponentially weighted averaging distant
    examples fade out
  • Online learning adapts to market dynamics

12
Thank You!
13
Features and Feature Selection
  • Co-occurrence feature examples
  • User search sessions keywords searched within
    10 mins
  • Advertiser campaigns keywords co-bidded by the
    same advertiser
  • Past clickthrough rates of original and broad
    matched keywords
  • Various syntactic similarities
  • Various existing broad matching lists
  • and so on
  • Feature Selection
  • A total of 68 features
  • Greedy feature selection

13
14
Additional Information
  • Estimation of expected value of click over all
    the ads shown for a broad match mapping
    E(p(click(ad(kw))q))
  • Query Expansion vs. Broad Matching
  • Our broad matching algorithm can be extended for
    query expansion
  • But, broad matching is for a fixed set of bidded
    keywords
  • Forgetron vs. Amesiac Averaged Perceptron
  • Forgetron maintains a set of budget support
    vectors stores examples explicitly and does not
    take into account all the data
  • AAP weighted average over all the examples, no
    need to store examples explicitly

15
Results
16
Amnesiac Averaged Perceptron
Write a Comment
User Comments (0)
About PowerShow.com