Catching the Drift: Learning Broad Matches from Clickthrough Data

About This Presentation

Title:

Description:

Number of Views:40

Avg rating:3.0/5.0

Slides: 17

Provided by: mish152

Learn more at: https://cs.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Catching the Drift: Learning Broad Matches from Clickthrough Data

1
Catching the Drift Learning Broad Matches
from Clickthrough Data

2
Introduction

Keyword-based online advertising bidded keywords
are extracted from context
Context query (search ads) or page (content
ads)
Broad matching expanding keywords via
keyword-to-keywords mapping
Example electric cars ? tesla, hybrids, toyota
prius, golf carts
Broad matching benefits advertisers (increased
reach, less campaign tuning), users (more
relevant ads), ad platform (higher monetization)

Selected Ads
3
Identifying Broad Matches

Good keyword mappings retrieve relevant ads that
users click
How to measure what is relevant and likely to be
clicked?
Human judgments expensive, hard to scale
Past user clicks provide click data for kw ?
kw when user was shown ad(kw' ) in context of
kw
Highly available, less trustworthy
What similarity functions may indicate relevance
of kw ? kw' ?
Syntactic (edit distance, TF-IDF cosine, string
kernels, )
Co-occurrence (in documents, query sessions, bid
campaigns, )
Expanded representation (search result snippets,
category bags, )

4
Approach

Task train a learner to estimate p(click kw ?
kw' ) for any kw ? kw'
Data
ltkw, ad(kw' ), clickgt triples from clickthrough
logs, where kw ? kw' was suggested by previous
broad match mappings
Features
Convert each pair to a feature vector capturing
similarities etc.
(kw ? kw') ?
For each triple ltkw, ad(kw'), clickgt, create an
instance (?(kw, kw' ), click)
Learner max-margin averaged perceptron (strong
theory, very efficient)

where ?i(kw, kw' ) can be any function of kw,
kw' or both
5
Example Creating an Instance

6
Experiments

Data
2 months of previous broad match ads from
Microsoft Content Ads logs
1 month for training, 1 month for testing
68 features (syntactic, co-occurrence based,
etc.) greedy feature selection
Metrics
LogLoss
LogLoss Lift difference between obtained
LogLoss and an oracle that has access to
empirical p(click kw ? kw' ) in test set.
CTR and revenue results in live test with users

7
Results
8
Live Test Results

9
Online Learning with Amnesia

Advertisers, campaigns, bidded keywords and
delivery contexts change very rapidly high
concept drift
Recent data is more informative
Goal utilize older data while capturing changes
in distributions
Averaged Perceptron doesnt capture drift
Solution Amnesiac Averaged Perceptron
Exponential weight decay when averaging
hypotheses

10
Results
11
Contributions and Conclusions

12
Thank You!
13
Features and Feature Selection

13
14
Additional Information

Estimation of expected value of click over all
the ads shown for a broad match mapping
E(p(click(ad(kw))q))
Query Expansion vs. Broad Matching
Our broad matching algorithm can be extended for
query expansion
But, broad matching is for a fixed set of bidded
keywords
Forgetron vs. Amesiac Averaged Perceptron
Forgetron maintains a set of budget support
vectors stores examples explicitly and does not
take into account all the data
AAP weighted average over all the examples, no
need to store examples explicitly