Title: Knowledge Transfer via Multiple Model Local Structure Mapping
1Knowledge Transfer via Multiple Model Local
Structure Mapping
Jing Gao, Wei Fan, Jing Jiang, Jiawei Han
Observations Each base model may be effective on
a subset of the test doamin. It is hard to
select the optimal model since class labels in
the test domain are unknown.
A Synthetic Example
Goal
Transfer from Multiple Domains
Transfer Learning
Source Domain
training (labeled)
test (unlabeled)
Source Domain
training (labeled)
Reuters
test (completey unlabeled)
Target Domain
ideal setting
Classifier
Reuters
Source Domain
Classifier
?
85.5
Performance degrades
New York Times
New York Times
Ng
Training (have conflicting concepts)
Test
realistic setting
To unify knowledge that are consistent with the
test domain from multiple source domains
Partially overlapping
Classifier
Newsgroup
64.1
Approximate Optimal Weights
Motivate
Solution
New York Times
Reuters
Goal To design learning methods that are aware of
the training and test domain difference. Examples
Spam filtering Public email collection ?
personal inboxes Intrusion detection Existing
types of intrusions ? unknown types of intrusions
Sentiment analysis Expert review articles? blog
review articles Related work Sample selection
bias correction Reweight training examples or
transform the representation Transfer learning
Adapt the classifier to the new domain Multi-task
learning Share learning among different
tasks New Problems Learn from multiple source
domains and transfer the knowledge to a target
domain. Importantly, target domain does not have
any labeled examples (different from some
previously proposed methods)
Graph-based Heuristic
Assumptions
Map the structures of a model onto the structures
of the test domain Weight each model locally
according to its consistency with the
neighborhood structure around the test example
Test examples that are closer in the feature
space are more likely to share the same class
label.
Framework
Locally Weighted Ensemble (LWE)
Determine Weights
Training set 1
Optimal solution can be obtained from the
regression problem if true labels are known
C1
Higher Weight
Example
Weight of a model is proportional to the
similarity between its neighborhood graph and the
clustering structure around x.
Training set 2
Test example x
C2
But groudtruth f is unknown!!!
Training set k
Local Structure Based Adjustment
Ck
Higher Weight
What if no models are similar to the clustering
structure at x? Simply means that the training
information are conflicting with the true target
distribution at x. Solution Ignore the training
information and propogate the labels of neighbors
in the test set to x.
0.9 0.1
C1
Groundtruth
Test example x
0.8 0.2
Example
0.4 0.6
C2
Experiments
Experiments on Synthetic Data
Experiments on Text Data
Parameter Sensitivity
Data Sets Synthetic Data Sets Spam Filtering
Public email collection ? personal inboxes (u01,
u02, u03) (ECML/PKDD 2006) Text Classification
Same top-level classification problems with
different sub-fields in the training and test
sets (Newsgroup, Reuters) Intrusion Detection
Two types of intrusions ? a different type of
intrusions (KDD Cup99 Data) Baseline
Methods Single models Winnow (WNN), Logistic
Regression (LRR), Support Vector Machine
(SVM) Simple model averaging ensemble
(SMA) Semi-supervised learning models
Transductive SVM (TSVM)
Take away messages
Experiments on Intrusion Data
Locally weighted ensemble framework transfers
useful knowledge from source domains and
Graph-based heuristics makes the framework
practical and effective
LWE beats the baslines in terms of prediction
accuracy!!!
Codes and datasets available at
http//ews.uiuc.edu/jinggao3/kdd08transfer.htm
Notes