Estimating Rates of Rare Events at Multiple Resolutions - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Estimating Rates of Rare Events at Multiple Resolutions

Description:

Variance stabilizing transformation: Var(y) is independent of E[y] needed in ... variance Wij. Vparent(ij) parent(ij) Wparent(ij) 14. Rare rate modeling ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 27
Provided by: DC270
Category:

less

Transcript and Presenter's Notes

Title: Estimating Rates of Rare Events at Multiple Resolutions


1
Estimating Rates of Rare Events at Multiple
Resolutions
  • Deepak AgarwalAndrei BroderDeepayan
    ChakrabartiDejan DiklicVanja JosifovskiMayssam
    Sayyadian

2
Estimation in the tail
  • Contextual Advertising
  • Show an ad on a webpage (impression)
  • Revenue is generated if a user clicks
  • Problem Estimate the click-through rate (CTR) of
    an ad on a page
  • Most (ad, page) pairs have very few impressions,
    if any,
  • and even fewer clicks
  • Severe data sparsity

3
Estimation in the tail
  • Use an existing, well-understood hierarchy
  • Categorize ads and webpages to leaves of the
    hierarchy
  • CTR estimates of siblings are correlated
  • The hierarchy allows us to aggregate data
  • Coarser resolutions
  • provide reliable estimates for rare events
  • which then influences estimation at finer
    resolutions

4
System overview
Retrospective dataURL, ad, isClicked
Crawl URLs
a sample of URLs
Classify pages and ads
Rare event estimation using hierarchy
Impute impressions, fix sampling bias
5
Sampling of webpages
  • Naïve strategy sample at random from the set of
    URLs
  • Sampling errors in impression volume AND click
    volume
  • Instead, we propose
  • Crawling all URLs with at least one click, and
  • a sample of the remaining URLs
  • Variability is only in impression volume

6
Imputation of impression volume
impressions nij mij xij
sums to ?nij K.?mijrow constraint
sums toTotal impressions(known)
sums to impressions on ads of this ad
classcolumn constraint
7
Imputation of impression volume
Level 0
  • Region (page node, ad node)
  • Region Hierarchy
  • A cross-product of the page hierarchy and the ad
    hierarchy

Level i
Region
Page classes
Ad classes
Page hierarchy
Ad hierarchy
8
Imputation of impression volume
Level i
Level i1
sums to
block constraint
9
Imputing xij
  • Iterative Proportional Fitting Darroch/1972
  • Initialize xij nij mij
  • Iteratively scale xij values to match
    row/col/block constraint
  • Ordering of constraints top-down, then
    bottom-up, and repeat

Level i
Level i1
block
Page classes
Ad classes
10
Imputation Summary
  • Given
  • nij (impressions in clicked pool)
  • mij (impressions in sampled non-clicked pool)
  • impressions on ads of each ad class in the ad
    hierarchy
  • We get
  • Estimated impression volume Ñij nij mij
    xijin each region ij of every level

11
System overview
Retrospective datapage, ad, isclicked
Crawl Pages
a sample of pages
Classify pages and ads
Rare event estimation using hierarchy
Impute impressions, fix sampling bias
12
Rare rate modeling
  • Freeman-Tukey transform
  • yij F-T(clicks and impressions at ij)
    transformed-CTR
  • Variance stabilizing transformation Var(y) is
    independent of Ey ? needed in further modeling

13
Rare rate modeling
  1. Generative Model (Tree-structured Markov Model)

variance Wij
Wparent(ij)
Unobserved state
Sparent(ij)
Sij
ßparent(ij)
covariates ßij
variance Vij
Vparent(ij)
yparent(ij)
yij
14
Rare rate modeling
  • Model fitting with a 2-pass Kalman filter
  • Filtering Leaf to root
  • Smoothing Root to leaf
  • Linear in thenumber of regions

15
Experiments
  • 503M impressions
  • 7-level hierarchy of which the top 3 levels were
    used
  • Zero clicks in
  • 76 regions in level 2
  • 95 regions in level 3
  • Full dataset DFULL, and a 2/3 sample DSAMPLE

16
Experiments
  • Estimate CTRs for all regions R in level 3 with
    zero clicks in DSAMPLE
  • Some of these regions Rgt0 get clicks in DFULL
  • A good model should predict higher CTRs for Rgt0
    as against the other regions in R

17
Experiments
  • We compared 4 models
  • TS our tree-structured model
  • LM (level-mean) each level smoothed
    independently
  • NS (no smoothing) CTR proportional to 1/Ñ
  • Random Assuming Rgt0 is given, randomly predict
    the membership of Rgt0 out of R

18
Experiments
TS
Random
LM, NS
19
Experiments
Few impressions ? Estimates depend more on
siblings
Enough impressions ? little borrowing from
siblings
20
Related Work
  • Multi-resolution modeling
  • studied in time series modeling and spatial
    statistics Openshaw/79, Cressie/90, Chou/94
  • Imputation
  • studied in statistics Darroch/1972
  • Application of such models to estimation of such
    rare events (rates of 10-3) is novel

21
Conclusions
  • We presented a method to estimate
  • rates of extremely rare events
  • at multiple resolutions
  • under severe sparsity constraints
  • Our method has two parts
  • Imputation ? incorporates hierarchy, fixes
    sampling bias
  • Tree-structured generative model ? extremely fast
    parameter fitting

22
Rare rate modeling
  • Freeman-Tukey transform
  • Distinguishes between regions with zero clicks
    based on the number of impressions
  • Variance stabilizing transformation Var(y) is
    independent of Ey ? needed in further modeling

clicks in region r


impressions in region r
23
Rare rate modeling
  • Generative Model
  • Sij values can be quickly estimated using a
    Kalman filtering algorithm
  • Kalman filter requires knowledge of ß, V, and W
  • EM wrapped around the Kalman filter

filtering
smoothing
24
Rare rate modeling
  • Fitting using a Kalman filtering algorithm
  • Filtering Recursively aggregate data from leaves
    to root
  • Smoothing Propagate information from root to
    leaves
  • Complexity linear in the number of regions, for
    both time and space

filtering
smoothing
25
Rare rate modeling
  • Fitting using a Kalman filtering algorithm
  • Filtering Recursively aggregate data from leaves
    to root
  • Smoothing Propagates information from root to
    leaves
  • Kalman filter requires knowledge of ß, V, and W
  • EM wrapped around the Kalman filter

filtering
smoothing
26
Imputing xij
  • Iterative Proportional Fitting Darroch/1972
  • Initialize xij nij mij
  • Top-down
  • Scale all xij in every block in Z(i1) to sum to
    its parent in Z(i)
  • Scale all xij in Z(i1) to sum to the row totals
  • Scale all xij in Z(i1) to sum to the column
    totals
  • Repeat for every level Z(i)
  • Bottom-up Similar

Z(i)
Z(i1)
block
Page classes
Ad classes
Write a Comment
User Comments (0)
About PowerShow.com