Title: Detecting Genre Shift
1Detecting Genre Shift
- Mark Dredze, Tim Oates, Christine Piatko
- Paper to appear at EMNLP-10
2Natural Language Processing and Machine Learning
- Extracting findings from scientific papers
- Genetic epidemiology (development domain)
- PubMed search produces thousands of papers
- Manually reviewed to extract findings
- Findings determine relevant papers/studies
- Automate this process with ML/NLP methods
- Create searchable database of findings
- Allow machine inference over findings
- Suggest new scientific hypotheses
3Genre Shift in Statistical NLP
told that John Paul Stevens is retiring this
summer
Named Entity Recognition
4Supervised Machine Learning for Named Entity
Recognition
Windowed Text Label
Today the Atlantic Ocean is B
the Atlantic Ocean is in I
Atlantic Ocean is in an O
Ocean is in an uproar O
is in an uproar and O
in an uproar and North O
an uproar and North Carolina O
uproar and North Carolina remains B
and North Carolina remains in I
North Carolina remains in a O
Today the Atlantic Ocean is in an uproar and
North Carolina remains in a state of anxiety.
5Supervised Machine Learning for Named Entity
Recognition
Windowed Text Label
Today the Atlantic Ocean is B
the Atlantic Ocean is in I
Atlantic Ocean is in an O
Feature Vector Label
today, the, atlantic, ocean, is, U, L, U, U, L B
the, atlantic, ocean, is, in, L, U, U, L, L I
atlantic, ocean, is, in, an, U, U, L, L, L O
6Genre Shift in Statistical NLP
told that John Paul Stevens is retiring this
summer
PRESIDENT BARACK OBAMA IS URGING MEMBERS TO
Named Entity Recognition
???
7This is a Pervasive Problem
- Extracting regulatory pathways from online
bioinformatics journals using a parser trained on
the WSJ - Finding faces in images of disaster victims using
a model trained on mug shot images - Identifying RNA sequences that regulate gene
expression in a lab in Baltimore using a model
trained on data gathered in a lab in Germany
When things change in a way thats harmful, wed
like to know!
8Data Streams Change Over Time
Sentiment classification from movie reviews
- Natural drift
- Users unaware of system limitations
9Detecting Genre Shift
Genre shift hurts system performance (accuracy)
- Two problems
- Detect changes in stream of numbers (A-distance)
- Convert document stream to stream of informative
numbers (margin)
10Detecting Genre Shift
Genre shift hurts system performance (accuracy)
- Measure accuracy directly
- Requires labeled examples!
- Look for changes in feature distributions
- Words become more/less common
- New words appear
11Measuring Changes in StreamsThe A-Distance
A nonparametric, distribution independent measure
of changes in univariate, real-valued data
streams (Kifer, Ben-David, and Gherke, 2004)
12Measuring Changes in StreamsThe A-Distance
gt e
13Measuring Changes in StreamsThe A-Distance
gt e
14Changes in Document Streams
President Barack Obama is urging members to
15Changes in Document Streams
4
Obama
4
1
1
embassy
President Barack Obama is urging members to
16Changes in Document Streams
X
W
Obama
4
1.6
1
0.1
embassy
President Barack Obama is urging members to
17Changes in Document Streams
X
W
Obama
4
1.6
1
0.1
embassy
President Barack Obama is urging members to
- WX margin
- sign of WX is class label (/-)
- magnitude of WX is certainty in label
18Why Margins?
- We have an easy way of producing them from
unlabeled examples! - We want to track feature changes
- Margins are linear combinations of feature values
- Removing important features yields smaller
margins - Only track features that matter, features with
zero (small) weight dont affect margin (much) - Spoiler alert! Tracking margins works really
well for unsupervised detection on genre shifts.
19Accuracy vs. Margins
DVD to Electronics
20Accuracy vs. Margins
DVD to Electronics
Average in block
Average over last 100 instances
21Accuracy vs. Margins
DVD to Electronics
22Confidence Weighted Margins
- Margins can be viewed as measure of confidence
- We detect when confidence in classifications
drops - Confidence Weighted (CW) learning refines this
idea - Gaussian distribution over weight vectors
- Mean of weight vector µ in RN
- Diagonal co-variance matrix s in RNxN
- Low variance ? high confidence
- Normalized margin µx / (xTsx)0.5
- Called VARIANCE in slides that follow
µ
s 0.02
1.6
s 1.74
0.1
23Experiments
- Datasets
- Sentiment classification between domains (Blitzer
et al., 2007) - DVDs, electronics, books, kitchen appliances
- Spam classification between users (Jiang and
Zhai, 2007) - Named entity classification between genres (ACE
2005) - News articles, broadcast news, telephone, blogs,
etc. - Algorithms
- Baselines SVM, MIRA, CW
- Our method VARIANCE
24Experiments
- Simulated domain shifts between each pair of
genres - 38 pairs, 10 trials each with different random
instance orderings - 500 source examples
- 1500 target examples
- False change
- 11 datasets with no shift, 10 trials with
different random instance orderings - If no shift found then detection recorded as end
of target examples when computing averages
25Comparing Algorithms
26SVM vs. VARIANCE
27SVM vs. VARIANCE
28Summary of Results Thus Far
- VARIANCE detected shifts faster than
- SVM 34 times out of 38
- MIRA 26 times out of 38
- CW 27 times out of 38
29Gradual Shifts
30What if you have labels?
- STEPD a Statistical Test of Equal Proportions to
Detect concept drift (Nishida and Yamauchi, 2007) - Monitors accuracy of classifier from stream of
labeled examples - Parameters window size, W, and threshold, a
31Comparison to STEPD
32What about false positives?
33The A-Distance Choosing Parameters
P
gt e
34The A-Distance Choosing Parameters
P
gt e
35The A-Distance Choosing Parameters
- A-distance paper gives bounds on FPs and FNs
- Bounds depend on n and e
- Bounds do not depend on tiling!
- So loose as to be meaningless
- No guidance on how to choose tiling
- What if tiles lie outside support of data?
36Better Bounds
- PA true probability of a point falling in tile
A - h number of points that actually fell in A
- pA h/n ML estimate of PA
- Define PA, h, and pA for second window
- Suppose PA PA, then any change detected is a
false positive
What is the probability that pA pA gt e/2?
gt e
37Posterior Over PA
- B(a, b) is the Beta function over a b Bernoulli
trials - a trials have one outcome (point lands in tile A)
- b trials have the other (point lands in some
other tile)
38False Positives Two Cases
39Dont worry, Im not going to explain this (much)
40Probability of a FP (n 200)
41Probability of FN
42Minimizing Expected Loss
43Moving Forward
44Genre Shift Fix
told that John Paul Stevens is retiring this
summer
PRESIDENT BARACK OBAMA IS URGING MEMBERS TO
Named Entity Recognition
45Genre Shift Fix
told that John Paul Stevens is retiring this
summer
PRESIDENT BARACK OBAMA IS URGING MEMBERS TO
President Barack Obama is urging members to
Named Entity Recognition
46Conclusion
- Changes in margins convey useful information
about changes in classification accuracy - No need for labeled examples!
- The A-distance applied to margin streams finds
genre shifts with few false positives/negatives - Confidence weighted margins normalized by
variance detect shifts faster than SVM, MIRA, or
(non-normalized) CW margins - Our approach even works with gradual shifts and
compares favorably to shift detectors that use
labeled examples
47Thank you!