Domain Adaptation with Multiple Sources - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Domain Adaptation with Multiple Sources

Description:

... (Dj,hz,f)= Conclusion: For any mixture expected loss at most Solving the problems: Redefine the distribution weighted rule: Claim: ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 49
Provided by: nyu89
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Domain Adaptation with Multiple Sources


1
Domain Adaptation with Multiple Sources
  • Yishay Mansour, Tel Aviv Univ. Google
  • Mehryar Mohri, NYU Google
  • Afshin Rostami, NYU

2
Adaptation
3
Adaptation motivation
  • High level
  • The ability to generalize from one domain to
    another
  • Significance
  • Basic human property
  • Essential in most learning environments
  • Implicit in many applications.

4
Adaptation - examples
  • Sentiment analysis
  • Users leave reviews
  • products, sellers, movies,
  • Goal score reviews as positive or negative.
  • Adaptation example
  • Learn for restaurants and airlines
  • Generalize to hotels

5
Adaptation - examples
  • Speech recognition
  • Adaptation
  • Learn a few accents
  • Generalize to new accents
  • think foreign accents.

6
Adaptation and generalization
  • Machine Learning prediction
  • Learn from examples drawn from distribution D
  • predict the label of unseen examples
  • drawn from the same distribution D
  • generalization within a distribution
  • Adaptation
  • predict the label of unseen examples
  • drawn from a different distribution D
  • Generalization across distributions

7
Adaptation Related Work
  • Learn from D and test on D
  • relating the increase in error to dist(D,D)
  • Ben-David et al. (2006), Blitzer et al. (2007),
  • Single distribution varying label quality
  • Cramer et al. (2005, 2006)

8
Our Model
9
Our Model - input
Typical loss function L(a,b)a-b and L(D,h,f)
ExD f(x)-h(x)
10
Our Model target distribution
?1
target distribution D?
?k
11
Our model Combination Rule
  • Combine h1, , hk to a hypothesis h
  • Low expected loss
  • hopefully at most e
  • combining rules
  • let z S zi 1 and zi 0
  • linear h(x) S zi hi(x)
  • distribution weighted

. . .
hk
h1
combining rule
12
Combining Rules Pros
  • Alternative Build a dataset for the mixture.
  • Learning the mixture parameters is non-trivial
  • Combined data set might be huge size
  • Domain dependent data unavailable
  • Combined data might be huge
  • Sometimes only classifiers are given/exist
  • privacy
  • MOST IMPORTANT
  • FUNDAMENTAL THEORY QUESTION

13
Our Results
  • Linear Combining rule
  • Seems like the first thing to try
  • Can be very bad
  • Simple settings where any linear combining rule
    performs badly.

14
Our Results
  • Distribution weighted combining rules
  • Given the mixture parameter ?
  • there is a good distribution weighted combining
    rule.
  • expected loss at most e
  • For any target function f,
  • there is a good distribution combining rule hz
  • expected loss at most e
  • Extension for multiple consistent target
    functions
  • expected loss at most 3e
  • OUTCOME This is the right hypothesis class

15
Known Distribution
16
Linear combining rules
h0 h1 f X
0 1 1 a
0 1 0 b
Original Loss e0 !!!
Db Da D
0 1 ½ a
1 0 ½ b
Any linear combining rule h has expected absolute
loss ½
17
Distribution weighted combining rule
  • Target distribution a mixture
  • D?(x)S ?i Di(x)
  • Set z?
  • Claim L(D?,h?,f) e

18
Distribution weighted combining rule
PROOF
19
Back to the bad example
h0 h1 f X
0 1 1 a
0 1 0 b
Original Loss e0 !!!
Db Da D
0 1 ½ a
1 0 ½ b
h(x) xa h(x)h1(x)1 xb h(x)h0(x)0
20
Unknown Distribution
21
Unknown mixture distribution
  • Zero-sum game
  • NATURE selects a distribution Di
  • LEARNER selects a z
  • hypothesis hz
  • Payoff L(Di,hz,f)
  • Restating to previous result
  • For any mixed action ? of NATURE
  • LEARNER has a pure action z ?
  • such that the expected loss is at most e

22
Unknown mixture distribution
  • Consequence
  • LEARNER has a mixed action (over zs)
  • for any mixed action ? of NATURE
  • a mixture distribution D?
  • The loss is at most e
  • Challenge
  • show a specific hypothesis hz
  • pure, not mixed, action

23
Searching for a good hypothesis
  • Uniformly good hypothesis hz
  • for any Di we have L(Di, hz,f) e
  • Assume all the hi are identical
  • Extremely lucky and unlikely case
  • If we have a good hypothesis we are done!
  • L(D?,hz,f) S ?i L(Di,hz,f) S ?i e e
  • We need to show in general a good hz !

24
Proof Outline
  • Balancing the losses
  • Show that some hz has identical loss on any Di
  • uses Brouwer Fixed Point Theorem
  • holds very generally
  • Bounding the losses
  • Show this hz has low loss for some mixture
  • specifically Dz

25
Brouwer Fixed Point Theorem For any convex and
compact set A and any continuous mapping f
A?A, there exists a point x in A such that f(x)x
A compact and convex set
f A?A continuous mapping
26
Balancing Losses
Problem 1 Need to get f continuous
27
Balancing Losses
Fixed point zf(z)
Problem 2 Needs that zi ?0
28
Bounding the losses
  • We can guarantee balanced losses even for linear
    combining rule !

h0 h1 f X
0 1 1 a
0 1 0 b
For z(½, ½) we have L(Da,hz,f)½ L(Db,hz,f)½
Db Da D
0 1 ½ a
1 0 ½ b
29
Bounding Losses
  • Consider the previous z
  • from Brouwer fixed point theorem
  • Consider the mixture Dz
  • Expected loss is at most e
  • Also L(Dz,hz,f) SzjL(Dj,hz,f)?
  • Conclusion
  • For any mixture expected loss at most ? e

30
Solving the problems
  • Redefine the distribution weighted rule
  • Claim For any distribution D,
  • is continuous in z.

31
Main Theorem
  • For any target function f and any dgt0,
  • there exists ?gt0 and z such that
  • for any ? we have

32
Balancing Losses
  • The set A S zi 1 and zi 0
  • The simplex
  • The mapping f with parameters ? and ?
  • f(z)i (zi Li,z?/k)/ (SzjLj,z?)
  • where Li,zL(Di,hz,?,f)
  • For some z in A we have f(z)z
  • zi (zi Li,z?/k)/ (SzjLj,z?) gt0
  • Li,z (SzjLj,z)? - ?/(zi k) lt (SzjLj,z) ?

33
Bounding Losses
  • Consider the previous z
  • from Brouwer fixed point theorem
  • Consider the mixture Dz
  • Expected loss is at most e?
  • By definition SzjLj,z L(Dz,hz,?,f)
  • Conclusion ?SzjLj,z e?

34
Putting it together
  • There exists (z,?) such that
  • Expected loss of hz,? approximately balanced
  • L(Di,hz,?,f) ??
  • Bounding ? using Dz
  • ? L(Dz,hz,?,f) e?
  • For any mixture D?
  • L(D?,hz,?,f) e? ?

35
A more general model
  • So far NATURE first fixes target function f
  • consistent target functions f
  • the expected loss w.r.t. Di is at most e
  • for any of the k distributions
  • Function class F f is consistent
  • New Model
  • LEARNER picks a hypothesis h
  • NATURE picks f in F and mixture D?
  • Loss L(D?,h,f)
  • RESULT L(D?,h,f) 3e.

36
Simple Algorithms
37
Uniform Algorithm
  • Hypothesis sets z(1/k , , 1/k)
  • Performance
  • For any mixture, expected error ke
  • There exists mixture with expected error O(ke)
  • For k2, there exists a mixture with 2e-e2

38
Open Problem
  • Find a uniformly good hypothesis
  • efficiently !!!
  • algorithmic issues
  • Search over the zs
  • Multiple local minima.

39
Empirical Results
40
Empirical Results
  • Data-set of sentiment analysis
  • good product takes a little time to start
    operating very good for the price a little
    trouble using it inside ca
  • it rocks man this is the rockinest think i've
    ever seen or buyed dudes check it ou
  • does not retract agree with the prior reviewers i
    can not get it to retract any longer and that was
    only after 3 uses
  • dont buy not worth a cent got it at walmart can't
    even remove a scuff i give it 100 good thing i
    could return it
  • flash drive excelent hard drive good price and
    good time for seller thanks

41
Empirical analysis
  • Multiple domains
  • dvd, books, electronics, kitchen appliance.
  • Language model
  • build a model for each domain
  • unlike the theory, this is an additional error
    source
  • Tested on mixture distribution
  • known mixture parameters
  • Target score (1-5)
  • error Mean Square Error (MSE)

42
Distribution weighted
kitchen
dvd
linear
books
electronics
43
(No Transcript)
44
(No Transcript)
45
Summary
46
Summary
  • Adaptation model
  • combining rules
  • linear
  • distribution weighted
  • Theoretical analysis
  • mixture distribution
  • Future research
  • algorithms for combining rules
  • beyond mixtures

47
Thank You!
48
Adaptation Our Model
  • Input
  • target function f
  • k distributions D1, , Dk
  • k hypothesis h1, , hk
  • For every i L(Di,hi,f) e
  • where L(D,h,f) defines the expected loss
  • think L(D,h,f) ExD f(x)-h(x)
Write a Comment
User Comments (0)
About PowerShow.com