Domain Adaptation in Natural Language Processing - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Domain Adaptation in Natural Language Processing

Description:

Sentiment analysis of product reviews (positive vs. negative) Movies books. Cell phones digital cameras. Problem with this non-standard setting with domain ... – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 71
Provided by: wingCom
Category:

less

Transcript and Presenter's Notes

Title: Domain Adaptation in Natural Language Processing


1
Domain Adaptation in Natural Language Processing
  • Jing Jiang
  • Department of Computer Science
  • University of Illinois at Urbana-Champaign

2
Textual Data in the Information Age
  • Contains much useful information
  • E.g. gt85 corporate data stored as text
  • Hard to handle
  • Large amount e.g. by 2002, 2.5 billion documents
    on surface Web, 7.3 million / day
  • Diversity emails, news, digital libraries, Web
    logs, etc.
  • Unstructured vs. relation databases

How to manage textual data?
3
  • Information retrieval to rank documents based on
    relevance to keyword queries
  • Not always satisfactory
  • More sophisticated services desired

4
Automatic Text Summarization
5
Question Answering
6
Information Extraction
7
Beyond Information Retrieval
  • Automatic text summarization
  • Question answering
  • Information extraction
  • Sentiment analysis
  • Machine translation
  • Etc.

All relies on Natural Language Processing (NLP)
techniques
to deeply understand and analyze text
8
Typical NLP Tasks
  • Larry Page was Googles founding CEO
  • Part-of-speech tagging
  • Larry/noun Page/noun was/verb Google/noun
    s/possessive-end founding/adjective CEO/noun
  • Chunking
  • NP Larry Page V was NP Google s founding
    CEO
  • Named entity recognition
  • person Larry Page was organization Google
    s founding CEO
  • Relation extraction
  • Founder(Larry Page, Google)
  • Word sense disambiguation
  • Larry Page vs. Page 81

state-of-the-art solution supervised machine
learning
9
Supervised Learning for NLP
representative corpus
human annotation
WSJ articles
POS-tagged WSJ articles
Larry/NNP Page/NNP was/VBD Google/NNP s/POS
founding/ADJ CEO/NN
Standard Supervised Learning Algorithm
training
part-of-speech tagging on news articles
trained POS tagger
10
In Reality
X human annotation is expensive
representative corpus
human annotation
MEDLINE articles
POS-tagged MEDLINE articles
POS-tagged WSJ articles
We/PRP analyzed/VBD the/DT mutations/NNS of/IN
the/DT H-ras/NN genes/NNS
Standard Supervised Learning Algorithm
training
part-of-speech tagging on biomedical articles
trained POS tagger
11
Many Other Examples
  • Named entity recognition
  • News articles ? personal blogs
  • Organism A ? organism B
  • Spam filtering
  • Public email collection ? personal inboxes
  • Sentiment analysis of product reviews (positive
    vs. negative)
  • Movies ? books
  • Cell phones ? digital cameras

Problem with this non-standard setting with
domain difference?
12
Domain Difference? Performance Degradation
ideal setting
POS Tagger
MEDLINE
MEDLINE
96
realistic setting
POS Tagger
MEDLINE
WSJ
86
13
Another Example
ideal setting
gene name recognizer
54.1
realistic setting
gene name recognizer
28.1
14
Domain Adaptation
source domain
target domain
Labeled
Labeled
Unlabeled
to design learning algorithms that are aware of
domain difference and exploit all available data
to adapt to the target domain
Domain Adaptive Learning Algorithm
15
With Domain Adaptation Techniques
standard learning
gene name recognizer
Yeast
Fly Mouse
63.3
domain adaptive learning
gene name recognizer
Yeast
Fly Mouse
75.9
16
Roadmap
  • What is domain adaptation in NLP?
  • Our work
  • Overview
  • Instance weighting
  • Feature selection
  • Summary and future work

17
Overview
Source Domain
Target Domain
18
Ideal Goal
Source Domain
Target Domain
19
Standard Supervised Learning
Source Domain
Target Domain
20
Standard Semi-Supervised Learning
Source Domain
Target Domain
21
Idea 1 Generalization
Source Domain
Target Domain
22
Idea 2 Adaptation
Source Domain
Target Domain
23
Source Domain
Target Domain
How to formally formulate the ideas?
24
Instance Weighting
instance space (each point represents an observed
instance)
Source Domain
Target Domain
to find appropriate weights for different
instances
25
Feature Selection
feature space (each point represents a useful
feature)
Source Domain
Target Domain
to separate generalizable features from
domain-specific features
26
Roadmap
  • What is domain adaptation in NLP?
  • Our work
  • Overview
  • Instance weighting
  • Feature selection
  • Summary and future work

27
Observation
source domain
target domain
28
Observation
source domain
target domain
29
Analysis of Domain Difference
x observed instance y class label (to be
predicted)
p(x, y)
ps(y x) ? pt(y x)
p(x)p(y x)
ps(x) ? pt(x)
labeling difference
instance difference
?
labeling adaptation
instance adaptation
30
Labeling Adaptation
source domain
target domain
pt(y x) ? ps(y x)
remove/demote instances
31
Labeling Adaptation
source domain
target domain
pt(y x) ? ps(y x)
remove/demote instances
32
Instance Adaptation (pt(x) lt ps(x))
source domain
target domain
pt(x) lt ps(x)
remove/demote instances
33
Instance Adaptation (pt(x) lt ps(x))
source domain
target domain
pt(x) lt ps(x)
remove/demote instances
34
Instance Adaptation (pt(x) gt ps(x))
source domain
target domain
pt(x) gt ps(x)
promote instances
35
Instance Adaptation (pt(x) gt ps(x))
source domain
target domain
pt(x) gt ps(x)
promote instances
36
Instance Adaptation (pt(x) gt ps(x))
source domain
target domain
pt(x) gt ps(x)
  • Target domain instances are useful

37
Empirical Risk Minimization with Three Sets of
Instances
Dt, l
Dt, u
Ds
loss function
use empirical loss to replace expected loss
optimal classification model
expected loss
38
Using Ds
Dt, l
Dt, u
Ds
X?Ds
instance difference (hard for high-dimensional
data)
labeling difference (need labeled target data)
39
Using Dt,l
Dt, l
Dt, u
Ds
X?Dt,l
small sample size estimation not accurate
40
Using Dt,u
Dt, l
Dt, u
Ds
X?Dt,u
use predicted labels (bootstrapping)
41
Combined Framework
a flexible setup covering both standard methods
and new domain adaptive methods
42
Experiments
  • NLP tasks
  • POS tagging WSJ (Penn TreeBank) ? Oncology
    (biomedical) text (Penn BioIE)
  • NE type classification newswire ? conversational
    telephone speech (CTS) and web-log (WL) (ACE
    2005)
  • Spam filtering public email collection ?
    personal inboxes (u01, u02, u03) (ECML/PKDD 2006)
  • Three heuristics to partially explore the
    parameter settings

43
Instance Pruningremoving misleading instances
from Ds
POS
NE Type
Spam
44
Dt,l with Larger Weights
POS
NE Type
Dt,l is very useful promoting Dt,l is more useful
Spam
45
Bootstrapping with Larger Weightsuntil Ds and
Dt,u are balanced
POS
NE Type
promoting target instances is useful, even with
predicted labels
Spam
46
Roadmap
  • What is domain adaptation in NLP?
  • Our work
  • Overview
  • Instance weighting
  • Feature selection
  • Summary and future work

47
Observation 1Domain-specific features
wingless daughterless eyeless apexless
48
Observation 1Domain-specific features
wingless daughterless eyeless apexless
  • describing phenotype in fly gene nomenclature
  • feature -less useful for this organism

CD38 PABPC5
feature still useful for other organisms?
No!
49
Observation 2Generalizable features
50
Observation 2Generalizable features
feature X be expressed
51
Assume Multiple Source Domains
source domains
target domain
Labeled
Unlabeled
Domain Adaptive Learning Algorithm
52
Detour Logistic Regression Classifiers
0 1 0 0 1 0 1 0
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
-less
p binary features
X be expressed
and wingless are expressed in
x
wyT x
wy
53
Learning a Logistic Regression Classifier
0 1 0 0 1 0 1 0
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
regularization term
penalize large weights control model complexity
log likelihood of training data
wyT x
54
Generalizable Features in Weight Vectors
D1
D2
DK
K source domains
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
3.2 0.5 4.5 -0.1 3.5 0.1 -1.0 -0.2
0.1 0.7 4.2 0.1 3.2 1.7 0.1 0.3
domain-specific features

generalizable features
w1
w2
wK
55
Decomposition of wk for Each Source Domain
shared by all domains
domain-specific
a matrix that selects generalizable features
0.2 4.5 5 -0.3 3.0 2.1 -0.9 0.4
0.2 4.5 0.4 -0.3 -0.2 2.1 -0.9 0.4
0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0
0 0
4.6 3.2 3.6


wk AT v
uk
56
Framework for Generalization
Fix A, optimize
regularization term
log likelihood of labeled data from K source
domains
wk
?s gtgt 1 to penalize domain-specific features
57
Framework for Adaptation
Fix A, optimize
log likelihood of target domain examples with
predicted labels
?t 1 ltlt ?s to pick up domain-specific
features in the target domain
58
How to Find A? (1)
  • Joint optimization

59
How to Find A? (2)
  • Domain cross validation
  • Idea training on (K 1) source domains and
    validate on the held-out source domain
  • Approximation
  • wfk weight for feature f learned from domain k
  • wfk weight for feature f learned from other
    domains
  • rank features by

60
Intuition for Domain Cross Validation

domains
D1
D2
Dk-1
Dk (fly)
w 1.5 0.05
w 2.0 1.2
expressed -less
-less expressed
expressed -less
1.8 0.1
w1
w2
product of w1 and w2
61
Experiments
  • Data set
  • BioCreative Challenge Task 1B
  • Gene/protein recognition
  • 3 organisms/domains fly, mouse and yeast
  • Experimental setup
  • 2 organisms for training, 1 for testing
  • F1 as performance measure

62
Experiments Generalization
using generalizable features is effective
F fly M mouse Y yeast
domain cross validation is more effective than
joint optimization
63
Experiments Adaptation
F fly M mouse Y yeast
domain-adaptive bootstrapping is more effective
than regular bootstrapping
64
Related Work
  • Problem relatively new to NLP and ML communities
  • Most related work developed concurrently with our
    work

65
Roadmap
  • What is domain adaptation in NLP?
  • Our work
  • Overview
  • Instance weighting
  • Feature selection
  • Summary and future work

66
Summary
  • Domain adaptation is a critical novel problem in
    natural language processing and machine learning
  • Contributions
  • First systematic formal analysis of domain
    adaptation
  • Two novel general frameworks, both shown to be
    effective
  • Potentially applicable to other classification
    problems outside of NLP
  • Future work
  • Domain difference measure
  • Unify two frameworks
  • Incorporate domain knowledge into adaptation
    process
  • Leverage domain adaptation to perform large-scale
    information extraction on scientific literature
    and on the Web

67
Information Extraction System
Entity Recognition
Relation Extraction
Domain Adaptive Learning
Intelligent Learning
Knowledge Resources Exploitation
Interactive Expert Supervision
Existing Knowledge Bases
Labeled Data from Related Domains
Domain Expert
68
Hypothesis Generation
Inference Engine
Applications
Pathway Construction
Knowledge Base Curation
Biomedical Literature (MEDLINE abstracts,
full-text articles, etc.)

Entity Recognition
DWnt-2 is expressed in somatic cells of the gonad
throughout development.
Relation Extraction
expression relations
Information Extraction System
Extracted Facts
69
Applications (cont.)
  • Similar ideas for Web text mining
  • Product reviews
  • Existing annotated reviews limited (certain
    products from certain sources)
  • Large amount of semi-structured reviews from
    review websites
  • Unstructured reviews from personal blogs

70
Selected Publications
this talk
  • J. Jiang C. Zhai. A two-stage approach to
    domain adaptation for statistical classifiers.
    In CIKM07.
  • J. Jiang C. Zhai. Instance weighting for
    domain adaptation in NLP. In ACL07.
  • J. Jiang C. Zhai. Exploiting domain structure
    for named entity recognition. In HLT-NAACL06.
  • J. Jiang C. Zhai. A systematic exploration of
    the feature space for relation extraction. In
    NAACL-HLT07.
  • J. Jiang C. Zhai. Extraction of coherent
    relevant passages using hidden Markov models.
    ACM Transactions on Information Systems (TOIS),
    Jul 2006.
  • J. Jiang C. Zhai. An empirical study of
    tokenization strategies for biomedical
    information retrieval. Information Retrieval,
    Oct 2007.
  • X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai B.
    Schatz. Generating semi-structured gene
    summaries from biomedical literature.
    Information Processing Management, Nov 2007.
  • X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai B.
    Schatz. Automatically generating gene summaries
    from biomedical literature. In PSB06.

feature exploration for relation extraction
information retrieval
gene summarization
Write a Comment
User Comments (0)
About PowerShow.com