Detecting Genre Shift - PowerPoint PPT Presentation

About This Presentation
Title:

Detecting Genre Shift

Description:

Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10 – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 48
Provided by: TimOa152
Category:

less

Transcript and Presenter's Notes

Title: Detecting Genre Shift


1
Detecting Genre Shift
  • Mark Dredze, Tim Oates, Christine Piatko
  • Paper to appear at EMNLP-10

2
Natural Language Processing and Machine Learning
  • Extracting findings from scientific papers
  • Genetic epidemiology (development domain)
  • PubMed search produces thousands of papers
  • Manually reviewed to extract findings
  • Findings determine relevant papers/studies
  • Automate this process with ML/NLP methods
  • Create searchable database of findings
  • Allow machine inference over findings
  • Suggest new scientific hypotheses

3
Genre Shift in Statistical NLP
told that John Paul Stevens is retiring this
summer
Named Entity Recognition
4
Supervised Machine Learning for Named Entity
Recognition
Windowed Text Label
Today the Atlantic Ocean is B
the Atlantic Ocean is in I
Atlantic Ocean is in an O
Ocean is in an uproar O
is in an uproar and O
in an uproar and North O
an uproar and North Carolina O
uproar and North Carolina remains B
and North Carolina remains in I
North Carolina remains in a O
Today the Atlantic Ocean is in an uproar and
North Carolina remains in a state of anxiety.
5
Supervised Machine Learning for Named Entity
Recognition
Windowed Text Label
Today the Atlantic Ocean is B
the Atlantic Ocean is in I
Atlantic Ocean is in an O
Feature Vector Label
today, the, atlantic, ocean, is, U, L, U, U, L B
the, atlantic, ocean, is, in, L, U, U, L, L I
atlantic, ocean, is, in, an, U, U, L, L, L O
6
Genre Shift in Statistical NLP
told that John Paul Stevens is retiring this
summer
PRESIDENT BARACK OBAMA IS URGING MEMBERS TO
Named Entity Recognition
???
7
This is a Pervasive Problem
  • Extracting regulatory pathways from online
    bioinformatics journals using a parser trained on
    the WSJ
  • Finding faces in images of disaster victims using
    a model trained on mug shot images
  • Identifying RNA sequences that regulate gene
    expression in a lab in Baltimore using a model
    trained on data gathered in a lab in Germany

When things change in a way thats harmful, wed
like to know!
8
Data Streams Change Over Time
Sentiment classification from movie reviews
  • Natural drift
  • Users unaware of system limitations

9
Detecting Genre Shift
Genre shift hurts system performance (accuracy)
  • Two problems
  • Detect changes in stream of numbers (A-distance)
  • Convert document stream to stream of informative
    numbers (margin)

10
Detecting Genre Shift
Genre shift hurts system performance (accuracy)
  • Measure accuracy directly
  • Requires labeled examples!
  • Look for changes in feature distributions
  • Words become more/less common
  • New words appear

11
Measuring Changes in StreamsThe A-Distance
A nonparametric, distribution independent measure
of changes in univariate, real-valued data
streams (Kifer, Ben-David, and Gherke, 2004)
12
Measuring Changes in StreamsThe A-Distance
gt e
13
Measuring Changes in StreamsThe A-Distance
gt e
14
Changes in Document Streams
President Barack Obama is urging members to
15
Changes in Document Streams
4
Obama
4
1
1
embassy
President Barack Obama is urging members to
16
Changes in Document Streams
X
W
Obama
4
1.6
1
0.1
embassy
President Barack Obama is urging members to
17
Changes in Document Streams
X
W
Obama
4
1.6
1
0.1
embassy
President Barack Obama is urging members to
  • WX margin
  • sign of WX is class label (/-)
  • magnitude of WX is certainty in label

18
Why Margins?
  • We have an easy way of producing them from
    unlabeled examples!
  • We want to track feature changes
  • Margins are linear combinations of feature values
  • Removing important features yields smaller
    margins
  • Only track features that matter, features with
    zero (small) weight dont affect margin (much)
  • Spoiler alert! Tracking margins works really
    well for unsupervised detection on genre shifts.

19
Accuracy vs. Margins
DVD to Electronics
20
Accuracy vs. Margins
DVD to Electronics
Average in block
Average over last 100 instances
21
Accuracy vs. Margins
DVD to Electronics
22
Confidence Weighted Margins
  • Margins can be viewed as measure of confidence
  • We detect when confidence in classifications
    drops
  • Confidence Weighted (CW) learning refines this
    idea
  • Gaussian distribution over weight vectors
  • Mean of weight vector µ in RN
  • Diagonal co-variance matrix s in RNxN
  • Low variance ? high confidence
  • Normalized margin µx / (xTsx)0.5
  • Called VARIANCE in slides that follow

µ
s 0.02
1.6
s 1.74
0.1
23
Experiments
  • Datasets
  • Sentiment classification between domains (Blitzer
    et al., 2007)
  • DVDs, electronics, books, kitchen appliances
  • Spam classification between users (Jiang and
    Zhai, 2007)
  • Named entity classification between genres (ACE
    2005)
  • News articles, broadcast news, telephone, blogs,
    etc.
  • Algorithms
  • Baselines SVM, MIRA, CW
  • Our method VARIANCE

24
Experiments
  • Simulated domain shifts between each pair of
    genres
  • 38 pairs, 10 trials each with different random
    instance orderings
  • 500 source examples
  • 1500 target examples
  • False change
  • 11 datasets with no shift, 10 trials with
    different random instance orderings
  • If no shift found then detection recorded as end
    of target examples when computing averages

25
Comparing Algorithms
26
SVM vs. VARIANCE
27
SVM vs. VARIANCE
28
Summary of Results Thus Far
  • VARIANCE detected shifts faster than
  • SVM 34 times out of 38
  • MIRA 26 times out of 38
  • CW 27 times out of 38

29
Gradual Shifts
30
What if you have labels?
  • STEPD a Statistical Test of Equal Proportions to
    Detect concept drift (Nishida and Yamauchi, 2007)
  • Monitors accuracy of classifier from stream of
    labeled examples
  • Parameters window size, W, and threshold, a

31
Comparison to STEPD
32
What about false positives?
33
The A-Distance Choosing Parameters
P
gt e
34
The A-Distance Choosing Parameters
P
gt e
35
The A-Distance Choosing Parameters
  • A-distance paper gives bounds on FPs and FNs
  • Bounds depend on n and e
  • Bounds do not depend on tiling!
  • So loose as to be meaningless
  • No guidance on how to choose tiling
  • What if tiles lie outside support of data?

36
Better Bounds
  • PA true probability of a point falling in tile
    A
  • h number of points that actually fell in A
  • pA h/n ML estimate of PA
  • Define PA, h, and pA for second window
  • Suppose PA PA, then any change detected is a
    false positive

What is the probability that pA pA gt e/2?
gt e
37
Posterior Over PA
  • B(a, b) is the Beta function over a b Bernoulli
    trials
  • a trials have one outcome (point lands in tile A)
  • b trials have the other (point lands in some
    other tile)

38
False Positives Two Cases
39
Dont worry, Im not going to explain this (much)
40
Probability of a FP (n 200)
41
Probability of FN
42
Minimizing Expected Loss
43
Moving Forward
44
Genre Shift Fix
told that John Paul Stevens is retiring this
summer
PRESIDENT BARACK OBAMA IS URGING MEMBERS TO
Named Entity Recognition
45
Genre Shift Fix
told that John Paul Stevens is retiring this
summer
PRESIDENT BARACK OBAMA IS URGING MEMBERS TO
President Barack Obama is urging members to
Named Entity Recognition
46
Conclusion
  • Changes in margins convey useful information
    about changes in classification accuracy
  • No need for labeled examples!
  • The A-distance applied to margin streams finds
    genre shifts with few false positives/negatives
  • Confidence weighted margins normalized by
    variance detect shifts faster than SVM, MIRA, or
    (non-normalized) CW margins
  • Our approach even works with gradual shifts and
    compares favorably to shift detectors that use
    labeled examples

47
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com