The Necessity of Combining Adaptation Methods - PowerPoint PPT Presentation

About This Presentation
Title:

The Necessity of Combining Adaptation Methods

Description:

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Domain Adaptation Take home message Contributions – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 2
Provided by: View61
Category:

less

Transcript and Presenter's Notes

Title: The Necessity of Combining Adaptation Methods


1
The Necessity of Combining Adaptation
Methods Cognitive Computation Group, University
of Illinois

Domain Adaptation
Experimental Results
Title
Take home message
Contributions
It is necessary to combine labeled and unlabeled
adaptation frameworks! Most works only focus on
one aspect. We argue this is not enough
because 1. Mutual Benefit We analyze these two
types of frameworks and find that they address
different adaptation issues. 2.Complex
Interaction these two types of frameworks are
not independent.
  • Text
  • Propose a theoretical analysis of the
    Frustratingly Easy'' (FE) framework Daume07
  • Demonstrate the complex interaction between
    unlabeled and labeled approaches (via artificial
    experiments)
  • Simple SourceTarget Cluster-like features
    is often the best approach! (More details later)
  • State-of-the-art adaptation performance!

While recent advances in statistical modeling for
natural language processing are exciting, the
problem of domain adaptation remains a big
challenge. It is widely known that a classifier
trained on one domain (e.g. news domain) usually
performs poorly on a different domain (e.g.
medical domain). The inability of current
statistical models to handle multiple domains is
one of the key obstacles hindering the progress
of NLP.
Named Entity Recognition
The goal of this adaptation experiment is to
maximize the performance on the test data of MUC7
dataset with CoNLL training data and (some) MUC7
labeled data. As an unla- beled adaptation method
to address feature sparsity, we add cluster-like
features based on the gazetteers and word
clustering resources used in (Ratinov and Roth,
2009) to bridge the source and target domain.
Current Approaches
Focuses on P(X) (Unlabeled) This type of
adaptation algorithm attempts to resolve the
difference between the feature space statistics
of two domains. While many different techniques
have been proposed, the common goal of these
algorithms is to find (or append) a better shared
representation that brings the source domain and
the target domain closer. Often these algorithms
do not use labeled examples in the target domain.
The works BlitzerMcPe06,HuangYa09 all belong to
this category. Focuses on P(YX) (Labeled) These
adaptation algorithms assume that there exists a
small amount of labeled data for the target
domain. Instead of training two weight vectors
independently (one for source and the other for
the target domain), these algorithms try to
relate the source and target weight vectors. This
is often achieved by using a special designed
regularization term. The works ChelbaAc04,Daume07
,FinkelMa09 belong to this category.
TGT Only uses target labeled training
dataset. FE Uses both labeled datasets. FE
Modification of FE, equivalent to multiplying the
shared part of the FE feature vector by 10
(Finkel and Manning, 2009). ST Uses both source
and target labeled datasets to train a single
model with all labeled data directly.

Artificial Adaptation Experiments
To demonstrate some of the complexities and
benefits of combining adaptation approaches we
ran experiments on artificial data showing the
performance of three adaptation frameworks as
similarities between two domains were controlled.

NER Experiments NER Experiments NER Experiments NER Experiments NER Experiments
Algorithm TGT FE FE ST
SRC Labeled data? No Yes Yes Yes
TGT Labeled data Token F1 Token F1 Token F1 Token F1
MUC7 Dev 58.6 70.5 74.3 73.1
cluster 77.5 82.5 83.3 83.3
MUC7 Train 73.0 78.2 80.1 78.7
cluster 85.4 86.4 86.2 86.5

Tgt Train on target only FE Frustratingly Easy
ST Train on source and target labeled data
together as one. In both experiments training
and test data generated for two domains according
to random hyperplanes whose difference (cosine)
was controlled.
Importantly, adding cluster-like features changes
the behavior of the labeled adaptation
algorithms. When the cluster-like features are
not added, the FE algorithm is in general the
best labeled adaptation framework. However,
after adding the cluster-like features, the
simple ST approach becomes very competitive to
both FE and FE . Resolving features sparsity
will change the behavior of labeled adaptation
frameworks.
Adaptation Without Clusters
Adaptation With Clusters
In the first experiment above (without clusters)
we see that tasks need to be similar for FE to
work. Once they are nearly identical the simpler
ST is better. In the second experiment a set of
identical shared features are added to both
hyperplanes (clusters), so both adaptation
algorithms improve, and the cluster adaptation
has effectively moved the two tasks closer,
enlarging the region where ST improves over FE.
Addition of clusters allows simpler algorithm.
Adaptation Frameworks
Framework Labeled Data Unlabeled Data Approach
Unlabeled Source Cover Source and Target Generate features that span Domains
Labeled Source plus Target None Train classifier using both source and target data
NER Comparison NER Comparison NER Comparison NER Comparison NER Comparison
System Unlabeled? Labeled? P.F1 T.F1
FM09 No Yes 79.98 N/A
RR09 Yes No N/A 83.2
RR09 global Yes No N/A 86.2
Our NER Yes Yes 84.1 86.5

Selected References
J. R. Finkel and C. D. Manning. 2009.
Hierarchical Bayesian domain adaptation. In
NAACL. Fei Huang and Alexander Yates. 2009.
Distributional representations for handling
sparsity in supervised sequence-labeling. In
ACL. L. Ratinov and D. Roth. 2009. Design
challenges and misconceptions in named entity
recognition. In CoNLL.
John Blitzer, Ryan McDonald, and Fernando
Pereira. 2006. Domain adaptation with
structural correspondence learning. In
EMNLP. Ciprian Chelba and Alex Acero. 2004.
Adaptation of maximum entropy capitalizer
Little data can help a lot. In EMNLP. Hal Daum
III. 2007. Frustratingly easy domain adaptation.
In ACL.
Ming-Wei Chang, Michael Connor and Dan Roth
Write a Comment
User Comments (0)
About PowerShow.com