Domain Adaptation in Natural Language Processing - PowerPoint PPT Presentation

1 / 70

About This Presentation

Title:

Domain Adaptation in Natural Language Processing

Description:

Sentiment analysis of product reviews (positive vs. negative) Movies books. Cell phones digital cameras. Problem with this non-standard setting with domain ... – PowerPoint PPT presentation

Number of Views:168

Avg rating:3.0/5.0

Slides: 71

Provided by: wingCom

Category:

more less

Transcript and Presenter's Notes

Title: Domain Adaptation in Natural Language Processing

1
Domain Adaptation in Natural Language Processing

Jing Jiang
Department of Computer Science
University of Illinois at Urbana-Champaign

2
Textual Data in the Information Age

Contains much useful information
E.g. gt85 corporate data stored as text
Hard to handle
Large amount e.g. by 2002, 2.5 billion documents
on surface Web, 7.3 million / day
Diversity emails, news, digital libraries, Web
logs, etc.
Unstructured vs. relation databases

How to manage textual data?
3

Information retrieval to rank documents based on
relevance to keyword queries
Not always satisfactory
More sophisticated services desired

4
Automatic Text Summarization
5
Question Answering
6
Information Extraction
7
Beyond Information Retrieval

Automatic text summarization
Question answering
Information extraction
Sentiment analysis
Machine translation
Etc.

All relies on Natural Language Processing (NLP)
techniques
to deeply understand and analyze text
8
Typical NLP Tasks

Larry Page was Googles founding CEO
Part-of-speech tagging
Larry/noun Page/noun was/verb Google/noun
s/possessive-end founding/adjective CEO/noun
Chunking
NP Larry Page V was NP Google s founding
CEO
Named entity recognition
person Larry Page was organization Google
s founding CEO
Relation extraction
Founder(Larry Page, Google)
Word sense disambiguation
Larry Page vs. Page 81

state-of-the-art solution supervised machine
learning
9
Supervised Learning for NLP
representative corpus
human annotation
WSJ articles
POS-tagged WSJ articles
Larry/NNP Page/NNP was/VBD Google/NNP s/POS
founding/ADJ CEO/NN
Standard Supervised Learning Algorithm
training
part-of-speech tagging on news articles
trained POS tagger
10
In Reality
X human annotation is expensive
representative corpus
human annotation
MEDLINE articles
POS-tagged MEDLINE articles
POS-tagged WSJ articles
We/PRP analyzed/VBD the/DT mutations/NNS of/IN
the/DT H-ras/NN genes/NNS
Standard Supervised Learning Algorithm
training
part-of-speech tagging on biomedical articles
trained POS tagger
11
Many Other Examples

Named entity recognition
News articles ? personal blogs
Organism A ? organism B
Spam filtering
Public email collection ? personal inboxes
Sentiment analysis of product reviews (positive
vs. negative)
Movies ? books
Cell phones ? digital cameras

Problem with this non-standard setting with
domain difference?
12
Domain Difference? Performance Degradation
ideal setting
POS Tagger
MEDLINE
MEDLINE
96
realistic setting
POS Tagger
MEDLINE
WSJ
86
13
Another Example
ideal setting
gene name recognizer
54.1
realistic setting
gene name recognizer
28.1
14
Domain Adaptation
source domain
target domain
Labeled
Labeled
Unlabeled
to design learning algorithms that are aware of
domain difference and exploit all available data
to adapt to the target domain
Domain Adaptive Learning Algorithm
15
With Domain Adaptation Techniques
standard learning
gene name recognizer
Yeast
Fly Mouse
63.3
domain adaptive learning
gene name recognizer
Yeast
Fly Mouse
75.9
16
Roadmap

What is domain adaptation in NLP?
Our work
Overview
Instance weighting
Feature selection
Summary and future work

17
Overview
Source Domain
Target Domain
18
Ideal Goal
Source Domain
Target Domain
19
Standard Supervised Learning
Source Domain
Target Domain
20
Standard Semi-Supervised Learning
Source Domain
Target Domain
21
Idea 1 Generalization
Source Domain
Target Domain
22
Idea 2 Adaptation
Source Domain
Target Domain
23
Source Domain
Target Domain
How to formally formulate the ideas?
24
Instance Weighting
instance space (each point represents an observed
instance)
Source Domain
Target Domain
to find appropriate weights for different
instances
25
Feature Selection
feature space (each point represents a useful
feature)
Source Domain
Target Domain
to separate generalizable features from
domain-specific features
26
Roadmap

What is domain adaptation in NLP?
Our work
Overview
Instance weighting
Feature selection
Summary and future work

27
Observation
source domain
target domain
28
Observation
source domain
target domain
29
Analysis of Domain Difference
x observed instance y class label (to be
predicted)
p(x, y)
ps(y x) ? pt(y x)
p(x)p(y x)
ps(x) ? pt(x)
labeling difference
instance difference
?
labeling adaptation
instance adaptation
30
Labeling Adaptation
source domain
target domain
pt(y x) ? ps(y x)
remove/demote instances
31
Labeling Adaptation
source domain
target domain
pt(y x) ? ps(y x)
remove/demote instances
32
Instance Adaptation (pt(x) lt ps(x))
source domain
target domain
pt(x) lt ps(x)
remove/demote instances
33
Instance Adaptation (pt(x) lt ps(x))
source domain
target domain
pt(x) lt ps(x)
remove/demote instances
34
Instance Adaptation (pt(x) gt ps(x))
source domain
target domain
pt(x) gt ps(x)
promote instances
35
Instance Adaptation (pt(x) gt ps(x))
source domain
target domain
pt(x) gt ps(x)
promote instances
36
Instance Adaptation (pt(x) gt ps(x))
source domain
target domain
pt(x) gt ps(x)

Target domain instances are useful

37
Empirical Risk Minimization with Three Sets of
Instances
Dt, l
Dt, u
Ds
loss function
use empirical loss to replace expected loss
optimal classification model
expected loss
38
Using Ds
Dt, l
Dt, u
Ds
X?Ds
instance difference (hard for high-dimensional
data)
labeling difference (need labeled target data)
39
Using Dt,l
Dt, l
Dt, u
Ds
X?Dt,l
small sample size estimation not accurate
40
Using Dt,u
Dt, l
Dt, u
Ds
X?Dt,u
use predicted labels (bootstrapping)
41
Combined Framework
a flexible setup covering both standard methods
and new domain adaptive methods
42
Experiments

NLP tasks
POS tagging WSJ (Penn TreeBank) ? Oncology
(biomedical) text (Penn BioIE)
NE type classification newswire ? conversational
telephone speech (CTS) and web-log (WL) (ACE
2005)
Spam filtering public email collection ?
personal inboxes (u01, u02, u03) (ECML/PKDD 2006)
Three heuristics to partially explore the
parameter settings

43
Instance Pruningremoving misleading instances
from Ds
POS
NE Type
Spam
44
Dt,l with Larger Weights
POS
NE Type
Dt,l is very useful promoting Dt,l is more useful
Spam
45
Bootstrapping with Larger Weightsuntil Ds and
Dt,u are balanced
POS
NE Type
promoting target instances is useful, even with
predicted labels
Spam
46
Roadmap

What is domain adaptation in NLP?
Our work
Overview
Instance weighting
Feature selection
Summary and future work

47
Observation 1Domain-specific features
wingless daughterless eyeless apexless
48
Observation 1Domain-specific features
wingless daughterless eyeless apexless

describing phenotype in fly gene nomenclature
feature -less useful for this organism

CD38 PABPC5
feature still useful for other organisms?
No!
49
Observation 2Generalizable features
50
Observation 2Generalizable features
feature X be expressed
51
Assume Multiple Source Domains
source domains
target domain
Labeled
Unlabeled
Domain Adaptive Learning Algorithm
52
Detour Logistic Regression Classifiers
0 1 0 0 1 0 1 0
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
-less
p binary features
X be expressed
and wingless are expressed in
x
wyT x
wy
53
Learning a Logistic Regression Classifier
0 1 0 0 1 0 1 0
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
regularization term
penalize large weights control model complexity
log likelihood of training data
wyT x
54
Generalizable Features in Weight Vectors
D1
D2
DK
K source domains
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
3.2 0.5 4.5 -0.1 3.5 0.1 -1.0 -0.2
0.1 0.7 4.2 0.1 3.2 1.7 0.1 0.3
domain-specific features

generalizable features
w1
w2
wK
55
Decomposition of wk for Each Source Domain
shared by all domains
domain-specific
a matrix that selects generalizable features
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
0.2 4.5 0.4 -0.3 -0.2 2.1 -0.9 0.4
0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0
0 0
4.6 3.2 3.6

wk AT v
uk
56
Framework for Generalization
Fix A, optimize
regularization term
log likelihood of labeled data from K source
domains
wk
?s gtgt 1 to penalize domain-specific features
57
Framework for Adaptation
Fix A, optimize
log likelihood of target domain examples with
predicted labels
?t 1 ltlt ?s to pick up domain-specific
features in the target domain
58
How to Find A? (1)

Joint optimization

59
How to Find A? (2)

Domain cross validation
Idea training on (K 1) source domains and
validate on the held-out source domain
Approximation
wfk weight for feature f learned from domain k
wfk weight for feature f learned from other
domains
rank features by

60
Intuition for Domain Cross Validation

domains
D1
D2
Dk-1
Dk (fly)
w 1.5 0.05
w 2.0 1.2
expressed -less
-less expressed
expressed -less
1.8 0.1
w1
w2
product of w1 and w2
61
Experiments

Data set
BioCreative Challenge Task 1B
Gene/protein recognition
3 organisms/domains fly, mouse and yeast
Experimental setup
2 organisms for training, 1 for testing
F1 as performance measure

62
Experiments Generalization
using generalizable features is effective
F fly M mouse Y yeast
domain cross validation is more effective than
joint optimization
63
Experiments Adaptation
F fly M mouse Y yeast
domain-adaptive bootstrapping is more effective
than regular bootstrapping
64
Related Work

Problem relatively new to NLP and ML communities
Most related work developed concurrently with our
work

65
Roadmap

What is domain adaptation in NLP?
Our work
Overview
Instance weighting
Feature selection
Summary and future work

66
Summary

Domain adaptation is a critical novel problem in
natural language processing and machine learning
Contributions
First systematic formal analysis of domain
adaptation
Two novel general frameworks, both shown to be
effective
Potentially applicable to other classification
problems outside of NLP
Future work
Domain difference measure
Unify two frameworks
Incorporate domain knowledge into adaptation
process
Leverage domain adaptation to perform large-scale
information extraction on scientific literature
and on the Web

67
Information Extraction System
Entity Recognition
Relation Extraction
Domain Adaptive Learning
Intelligent Learning
Knowledge Resources Exploitation
Interactive Expert Supervision
Existing Knowledge Bases
Labeled Data from Related Domains
Domain Expert
68
Hypothesis Generation
Inference Engine
Applications
Pathway Construction
Knowledge Base Curation
Biomedical Literature (MEDLINE abstracts,
full-text articles, etc.)

Entity Recognition
DWnt-2 is expressed in somatic cells of the gonad
throughout development.
Relation Extraction
expression relations
Information Extraction System
Extracted Facts
69
Applications (cont.)

Similar ideas for Web text mining
Product reviews
Existing annotated reviews limited (certain
products from certain sources)
Large amount of semi-structured reviews from
review websites
Unstructured reviews from personal blogs

70
Selected Publications
this talk

J. Jiang C. Zhai. A two-stage approach to
domain adaptation for statistical classifiers.
In CIKM07.
J. Jiang C. Zhai. Instance weighting for
domain adaptation in NLP. In ACL07.
J. Jiang C. Zhai. Exploiting domain structure
for named entity recognition. In HLT-NAACL06.
J. Jiang C. Zhai. A systematic exploration of
the feature space for relation extraction. In
NAACL-HLT07.
J. Jiang C. Zhai. Extraction of coherent
relevant passages using hidden Markov models.
ACM Transactions on Information Systems (TOIS),
Jul 2006.
J. Jiang C. Zhai. An empirical study of
tokenization strategies for biomedical
information retrieval. Information Retrieval,
Oct 2007.
X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai B.
Schatz. Generating semi-structured gene
summaries from biomedical literature.
Information Processing Management, Nov 2007.
X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai B.
Schatz. Automatically generating gene summaries
from biomedical literature. In PSB06.