Domain Adaptation with Multiple Sources - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Domain Adaptation with Multiple Sources

Description:

... (Dj,hz,f)= Conclusion: For any mixture expected loss at most Solving the problems: Redefine the distribution weighted rule: Claim: ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 49

Provided by: nyu89

Learn more at: https://cs.nyu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Domain Adaptation with Multiple Sources

1
Domain Adaptation with Multiple Sources

Yishay Mansour, Tel Aviv Univ. Google
Mehryar Mohri, NYU Google
Afshin Rostami, NYU

2
Adaptation
3
Adaptation motivation

High level
The ability to generalize from one domain to
another
Significance
Basic human property
Essential in most learning environments
Implicit in many applications.

4
Adaptation - examples

Sentiment analysis
Users leave reviews
products, sellers, movies,
Goal score reviews as positive or negative.
Adaptation example
Learn for restaurants and airlines
Generalize to hotels

5
Adaptation - examples

Speech recognition
Adaptation
Learn a few accents
Generalize to new accents
think foreign accents.

6
Adaptation and generalization

Machine Learning prediction
Learn from examples drawn from distribution D
predict the label of unseen examples
drawn from the same distribution D
generalization within a distribution
Adaptation
predict the label of unseen examples
drawn from a different distribution D
Generalization across distributions

7
Adaptation Related Work

Learn from D and test on D
relating the increase in error to dist(D,D)
Ben-David et al. (2006), Blitzer et al. (2007),
Single distribution varying label quality
Cramer et al. (2005, 2006)

8
Our Model
9
Our Model - input
Typical loss function L(a,b)a-b and L(D,h,f)
ExD f(x)-h(x)
10
Our Model target distribution
?1
target distribution D?
?k
11
Our model Combination Rule

Combine h1, , hk to a hypothesis h
Low expected loss
hopefully at most e
combining rules
let z S zi 1 and zi 0
linear h(x) S zi hi(x)
distribution weighted

. . .
hk
h1
combining rule
12
Combining Rules Pros

Alternative Build a dataset for the mixture.
Learning the mixture parameters is non-trivial
Combined data set might be huge size
Domain dependent data unavailable
Combined data might be huge
Sometimes only classifiers are given/exist
privacy
MOST IMPORTANT
FUNDAMENTAL THEORY QUESTION

13
Our Results

Linear Combining rule
Seems like the first thing to try
Can be very bad
Simple settings where any linear combining rule
performs badly.

14
Our Results

Distribution weighted combining rules
Given the mixture parameter ?
there is a good distribution weighted combining
rule.
expected loss at most e
For any target function f,
there is a good distribution combining rule hz
expected loss at most e
Extension for multiple consistent target
functions
expected loss at most 3e
OUTCOME This is the right hypothesis class

15
Known Distribution
16
Linear combining rules
h0 h1 f X
0 1 1 a
0 1 0 b
Original Loss e0 !!!
Db Da D
0 1 ½ a
1 0 ½ b
Any linear combining rule h has expected absolute
loss ½
17
Distribution weighted combining rule

Target distribution a mixture
D?(x)S ?i Di(x)
Set z?
Claim L(D?,h?,f) e

18
Distribution weighted combining rule
PROOF
19
Back to the bad example
h0 h1 f X
0 1 1 a
0 1 0 b
Original Loss e0 !!!
Db Da D
0 1 ½ a
1 0 ½ b
h(x) xa h(x)h1(x)1 xb h(x)h0(x)0
20
Unknown Distribution
21
Unknown mixture distribution

Zero-sum game
NATURE selects a distribution Di
LEARNER selects a z
hypothesis hz
Payoff L(Di,hz,f)
Restating to previous result
For any mixed action ? of NATURE
LEARNER has a pure action z ?
such that the expected loss is at most e

22
Unknown mixture distribution

Consequence
LEARNER has a mixed action (over zs)
for any mixed action ? of NATURE
a mixture distribution D?
The loss is at most e
Challenge
show a specific hypothesis hz
pure, not mixed, action

23
Searching for a good hypothesis

Uniformly good hypothesis hz
for any Di we have L(Di, hz,f) e
Assume all the hi are identical
Extremely lucky and unlikely case
If we have a good hypothesis we are done!
L(D?,hz,f) S ?i L(Di,hz,f) S ?i e e
We need to show in general a good hz !

24
Proof Outline

Balancing the losses
Show that some hz has identical loss on any Di
uses Brouwer Fixed Point Theorem
holds very generally
Bounding the losses
Show this hz has low loss for some mixture
specifically Dz

25
Brouwer Fixed Point Theorem For any convex and
compact set A and any continuous mapping f
A?A, there exists a point x in A such that f(x)x
A compact and convex set
f A?A continuous mapping
26
Balancing Losses
Problem 1 Need to get f continuous
27
Balancing Losses
Fixed point zf(z)
Problem 2 Needs that zi ?0
28
Bounding the losses

We can guarantee balanced losses even for linear
combining rule !

h0 h1 f X
0 1 1 a
0 1 0 b
For z(½, ½) we have L(Da,hz,f)½ L(Db,hz,f)½
Db Da D
0 1 ½ a
1 0 ½ b
29
Bounding Losses

Consider the previous z
from Brouwer fixed point theorem
Consider the mixture Dz
Expected loss is at most e
Also L(Dz,hz,f) SzjL(Dj,hz,f)?
Conclusion
For any mixture expected loss at most ? e

30
Solving the problems

Redefine the distribution weighted rule
Claim For any distribution D,
is continuous in z.

31
Main Theorem

For any target function f and any dgt0,
there exists ?gt0 and z such that
for any ? we have

32
Balancing Losses

The set A S zi 1 and zi 0
The simplex
The mapping f with parameters ? and ?
f(z)i (zi Li,z?/k)/ (SzjLj,z?)
where Li,zL(Di,hz,?,f)
For some z in A we have f(z)z
zi (zi Li,z?/k)/ (SzjLj,z?) gt0
Li,z (SzjLj,z)? - ?/(zi k) lt (SzjLj,z) ?

33
Bounding Losses

Consider the previous z
from Brouwer fixed point theorem
Consider the mixture Dz
Expected loss is at most e?
By definition SzjLj,z L(Dz,hz,?,f)
Conclusion ?SzjLj,z e?

34
Putting it together

There exists (z,?) such that
Expected loss of hz,? approximately balanced
L(Di,hz,?,f) ??
Bounding ? using Dz
? L(Dz,hz,?,f) e?
For any mixture D?
L(D?,hz,?,f) e? ?

35
A more general model

So far NATURE first fixes target function f
consistent target functions f
the expected loss w.r.t. Di is at most e
for any of the k distributions
Function class F f is consistent
New Model
LEARNER picks a hypothesis h
NATURE picks f in F and mixture D?
Loss L(D?,h,f)
RESULT L(D?,h,f) 3e.

36
Simple Algorithms
37
Uniform Algorithm

Hypothesis sets z(1/k , , 1/k)
Performance
For any mixture, expected error ke
There exists mixture with expected error O(ke)
For k2, there exists a mixture with 2e-e2

38
Open Problem

Find a uniformly good hypothesis
efficiently !!!
algorithmic issues
Search over the zs
Multiple local minima.

39
Empirical Results
40
Empirical Results

Data-set of sentiment analysis
good product takes a little time to start
operating very good for the price a little
trouble using it inside ca
it rocks man this is the rockinest think i've
ever seen or buyed dudes check it ou
does not retract agree with the prior reviewers i
can not get it to retract any longer and that was
only after 3 uses
dont buy not worth a cent got it at walmart can't
even remove a scuff i give it 100 good thing i
could return it
flash drive excelent hard drive good price and
good time for seller thanks

41
Empirical analysis