1 - PowerPoint PPT Presentation

About This Presentation
Title:

1

Description:

(IMDB) Data. M-TAMAR. R-TAMAR. Movie(T,A) WorkedFor(A,B) Movie(T,B) ... IMDB. Data about 20 movies. Predicates include Actor, Director, Movie, WorkedFor, Genre, etc. ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 69
Provided by: Raymond
Category:
Tags: imdb

less

Transcript and Presenter's Notes

Title: 1


1
Transfer Learning by Mapping and Revising
Relational Knowledge
  • Raymond J. Mooney
  • University of Texas at Austin
  • with acknowledgements to
  • Lily Mihalkova, Tuyen Huynh

2
Transfer Learning
  • Most machine learning methods learn each new task
    from scratch, failing to utilize previously
    learned knowledge.
  • Transfer learning concerns using knowledge
    acquired in a previous source task to facilitate
    learning in a related target task.
  • Usually assume significant training data was
    available in the source domain but limited
    training data is available in the target domain.
  • By exploiting knowledge from the source, learning
    in the target can be
  • More accurate Learned knowledge makes better
    predictions.
  • Faster Training time is reduced.

3
Transfer Learning Curves
  • Transfer learning increases accuracy in the
    target domain.

Predictive Accuracy
Amount of training data in target domain
4
Recent Work on Transfer Learning
  • Recent DARPA program on Transfer Learning has led
    to significant recent research in the area.
  • Some work focuses on feature-vector
    classification.
  • Hierarchical Bayes (Yu et al., 2005 Lawrence
    Platt, 2004)
  • Informative Bayesian Priors (Raina et al., 2005)
  • Boosting for transfer learning (Dai et al., 2007)
  • Structural Correspondence Learning (Blitzer et
    al., 2007)
  • Some work focuses on Reinforcement Learning
  • Value-function transfer (Taylor Stone, 2005
    2007)
  • Advice-based policy transfer (Torrey et al.,
    2005 2007)

5
Similar Research Problems
  • Multi-Task Learning (Caruana, 1997)
  • Learn multiple tasks simultaneously each one
    helped by the others.
  • Life-Long Learning (Thrun, 1996)
  • Transfer learning from a number of prior source
    problems, picking the correct source problems to
    use.

6
Logical Paradigm
  • Represents knowledge and data in binary symbolic
    logic such as First Order Predicate Calculus.
  • Rich representation that handles arbitrary
    sets of objects, with properties, relations,
    quantifiers, etc.
  • ? Unable to handle uncertain knowledge and
    probabilistic reasoning.

7
Probabilistic Paradigm
  • Represents knowledge and data as a fixed set of
    random variables with a joint probability
    distribution.
  • Handles uncertain knowledge and probabilistic
    reasoning.
  • ? Unable to handle arbitrary sets of objects,
    with properties, relations, quantifiers, etc.

8
Statistical Relational Learning (SRL)
  • Most machine learning methods assume i.i.d.
    examples represented as fixed-length feature
    vectors.
  • Many domains require learning and making
    inferences about unbounded sets of entities that
    are richly relationally connected.
  • SRL methods attempt to integrate methods from
    predicate logic and probabilistic graphical
    models to handle such structured,
    multi-relational data.

9
Statistical Relational Learning
Actor
Movie
Director
WorkedFor
Multi-Relational Data
Learning Algorithm
Probabilistic Graphical Model
10
Multi-Relational Data Challenges
  • Examples cannot be effectively represented as
    feature vectors.
  • Predictions for connected facts are not
    independent. (e.g. WorkedFor(brando,
    coppolo), Movie(godFather, brando))
  • Data is not i.i.d.
  • Requires collective inference (classification)
    (Taskar et al., 2001)
  • A single independent example (mega-example) often
    contains information about a large number of
    interconnected entities and can vary in length.
  • Leave one university out testing (Craven et al.,
    1998)

11
TL and SRL and I.I.D.
  • Standard Machine Learning assumes examples are
  • Independent and Identically Distributed

TL breaks the assumption that test examples
are drawn from the same distribution as the
training instances
SRL breaks the assumption that examples are
independent
12
Multi-Relational Domains
  • Domains about people
  • Academic departments (UW-CSE)
  • Movies (IMDB)
  • Biochemical domains
  • Mutagenesis
  • Alzheimer drug design
  • Linked text domains
  • WebKB
  • Cora

13
Relational Learning Methods
  • Inductive Logic Programming (ILP)
  • Produces sets of first-order rules
  • Not appropriate for probabilistic reasoning
  • If a student wrote a paper with a professor, then
    the professor is the students advisor
  • SRL models learning algorithms
  • SLPs (Muggleton, 1996)
  • PRMs (Koller, 1999)
  • BLPs (Kersting De Raedt, 2001)
  • RMNs (Taskar et al., 2002)
  • MLNs (Richardson Domingos, 2006)

14
MLN Transfer(Mihalkova, Huynh, Mooney, 2007)
  • Given two multi-relational domains, such as
  • Transfer a Markov logic network learned in the
    Source to the Target by
  • Mapping the Source predicates to the Target
  • Revising the mapped knowledge

15
First-Order Logic Basics
  • Literal A predicate (or its negation) applied to
    constants and/or variables.
  • Gliteral Ground literal WorkedFor(brando,
    coppola)
  • Vliteral Variablized literal ?WorkedFor(A, B)
  • We assume predicates have typed arguments.
  • For example Movie(godFather, coppola)

16
First-Order Clauses
  • Clause A disjunction of literals
  • Can be rewritten as a set of rules

17
Representing the Data
  • Makes a closed world assumption
  • The gliterals listed are true the rest are false

18
Markov Logic Networks(Richardson Domingos,
2006)
  • Set of first-order clauses, each assigned a
    weight.
  • Larger weight indicates stronger belief that the
    clause should hold.
  • The clauses are called the structure of the MLN.

19
Markov Networks(Pearl, 1988)
  • A concise representation of the joint probability
    distribution of a set of random variables using
    an undirected graph.

Joint distribution
Reputation of Author
Quality of Paper
Same probability distribution can be represented
as the product of a set of functions defined over
the cliques of the graph
20
Markov Network Equations
  • General form
  • Log-linear models

21
Ground Markov Network for an MLN
  • MLNs are templates for constructing Markov
    networks for a given set of constants
  • Include a node for each type-consistent grounding
    (a gliteral) of each predicate in the MLN.
  • Two nodes are connected by an edge if their
    corresponding gliterals appear together in any
    grounding of any clause in the MLN.
  • Include a feature for each grounding of each
    clause in the MLN with weight equal to the weight
    of the clause.

22
  • constants
  • coppola,
  • brando,
  • godFather

1.3
1.2
0.5
Actor(brando)
Director(brando)
WorkedFor(brando, brando)
WorkedFor(brando, coppola)
Movie(godFather, brando)
Movie(godFather,coppola)
WorkedFor(coppola, brando)
WorkedFor(coppola, coppola)
Director(coppola)
Actor(coppola)
23
MLN Equations
24
MLN Equation Intuition
  • A possible world (a truth assignment to all
    gliterals) becomes exponentially less likely as
    the total weight of all the grounded clauses it
    violates increases.

25
MLN Inference
  • Given truth assignments for given set of evidence
    gliterals, infer the probability that each member
    of set of unknown query gliterals is true.

26
Actor(brando)
Director(brando)
WorkedFor(brando, brando)
WorkedFor(brando, coppola)
Movie(godFather, brando)
Movie(godFather,coppola)
WorkedFor(coppola, brando)
WorkedFor(coppola, coppola)
Director(coppola)
Actor(coppola)
27
MLN Inference Algorithms
  • Gibbs Sampling (Richardson Domingos, 2006)
  • MC-SAT (Poon Domingos, 2006)

28
MLN Learning
  • Weight-learning (Richardson Domingos, 2006
    Lowd Domingos, 2007)
  • Performed using optimization methods.
  • Structure-learning (Kok Domingos, 2005)
  • Proceeds in iterations of beam search, adding the
    best-performing clause after each iteration to
    the MLN.
  • Clauses are evaluated using WPLL score.

29
WPLL (Kok Domingos, 2005)
  • Weighted pseudo log-likelihood

30
Alchemy
  • Open-source package of MLN software provided by
    UW that includes
  • Inference algorithms
  • Weight learning algorithms
  • Structure learning algorithm
  • Sample data sets
  • All our software uses and extends Alchemy.

31
TAMAR(Transfer via Automatic Mapping And
Revision)
Target (IMDB) Data
32
Predicate Mapping
  • Each clause is mapped independently of the
    others.
  • The algorithm considers all possible ways to map
    a clause such that
  • Each predicate in the source clause is mapped to
    some target predicate.
  • Each argument type in the source is mapped to
    exactly one argument type in the target.
  • Each mapped clause is evaluated by measuring its
    WPLL for the target data, and the most accurate
    mapping is kept.

33
Predicate Mapping Example
Consistent Type Mapping title ?
name person ? person
34
Predicate Mapping Example 2
Consistent Type Mapping title ?
person person ? gend
35
TAMAR(Transfer via Automatic Mapping And
Revision)
Target (IMDB) Data
36
Transfer Learning as Revision
  • Regard mapped source MLN as an approximate model
    for the target task that needs to be accurately
    and efficiently revised.
  • Thus our general approach is similar to that
    taken by theory revision systems (Richards
    Mooney, 1995).
  • Revisions are proposed in a bottom-up fashion.

37
R-TAMAR
Relational Data
New clause discovery
New Candidate Clauses
Change in WPLL
0.1
-0.2
0.5
1.7
1.3
38
R-TAMAR Self-Diagnosis
  • Use mapped source MLN to make inferences in the
    target and observe the behavior of each clause
  • Consider each predicate P in the domain in turn.
  • Use Gibbs sampling to infer truth values for the
    gliterals of P, using the remaining gliterals as
    evidence.
  • Bin the clauses containing gliterals of P based
    on whether they behave as desired.
  • Revisions are focused only on clauses in the
    Bad bins.

39
Self-Diagnosis Clause Bins
Actor(brando) Director(coppola) Movie(godFather,
brando) Movie(godFather, coppola) Movie(rainMaker,
coppola) WorkedFor(brando, coppola)
Current gliteral Actor(brando)
  • Relevant

Good
40
Self-Diagnosis Clause Bins
Actor(brando) Director(coppola) Movie(godFather,
brando) Movie(godFather, coppola) Movie(rainMaker,
coppola) WorkedFor(brando, coppola)
Current gliteral Actor(brando)
  • Relevant

Good
  • Relevant

Bad
41
Self-Diagnosis Clause Bins
Actor(brando) Director(coppola) Movie(godFather,
brando) Movie(godFather, coppola) Movie(rainMaker,
coppola) WorkedFor(brando, coppola)
Current gliteral Actor(brando)
  • Relevant

Good
  • Relevant

Bad
  • Irrelevant

Good
42
Self-Diagnosis Clause Bins
Actor(brando) Director(coppola) Movie(godFather,
brando) Movie(godFather, coppola) Movie(rainMaker,
coppola) WorkedFor(brando, coppola)
Current gliteral Actor(brando)
  • Relevant

Good
  • Relevant

Bad
  • Irrelevant

Good
  • Irrelevant

Bad
43
Structure Revisions
  • Using directed beam search
  • Literal deletions attempted only from clauses
    marked for shortening.
  • Literal additions attempted only for clauses
    marked for lengthening.
  • Training is much faster since search space is
    constrained by
  • Limiting the clauses considered for updates.
  • Restricting the type of updates allowed.

44
New Clause Discovery
  • Uses Relational Pathfinding (Richards Mooney,
    1992)

Actor(brando) Director(coppola) Movie(godFather,
brando) Movie(godFather, coppola) Movie(rainMaker,
coppola) WorkedFor(brando, coppola)
WorkedFor
WorkedFor
brando
coppola
Movie
Movie
Movie
godFather
rainMaker
45
Weight Revision
Publication(T,A) ? AdvisedBy(A,B) ?
Publication(T,B)
Target (IMDB) Data
Movie(T,A) ? WorkedFor(A,B) ? Movie(T,B)
Movie(T,A) ? WorkedFor(A,B) ? Relative(A,B) ?
Movie(T,B)
46
Experiments Domains
  • UW-CSE
  • Data about members of the UW CSE department
  • Predicates include Professor, Student, AdvisedBy,
    TaughtBy, Publication, etc.
  • IMDB
  • Data about 20 movies
  • Predicates include Actor, Director, Movie,
    WorkedFor, Genre, etc.
  • WebKB
  • Entity relations from the original WebKB domain
    (Craven et al. 1998)
  • Predicates include Faculty, Student, Project,
    CourseTA, etc.

47
Dataset Statistics
Data is organized as mega-examples
  • Each mega-example contains information about a
    group of related entities.
  • Mega-examples are independent and disconnected
    from each other.

48
Manually Developed Source KB
  • UW-KB is a hand-built knowledge base (set of
    clauses) for the UW-CSE domain.
  • When used as a source domain, transfer learning
    is a form of theory refinement that also includes
    mapping to a new domain with a different
    representation.

49
Systems Compared
  • TAMAR Complete transfer system.
  • ScrKD Algorithm of Kok Domingos (2005)
    learning from scratch.
  • TrKD Algorithm of Kok Domingos (2005)
    performing transfer, using M-TAMAR to produce a
    mapping.

50
Methodology Training Testing
  • Generated learning curves using leave-one-out
    CV
  • Each run keeps one mega-example for testing and
    trains on the remaining ones, provided one by
    one.
  • Curves are averages over all runs.
  • Evaluated learned MLN by performing inference for
    all gliterals of each predicate in turn,
    providing the rest as evidence, and averaging the
    results.

51
Methodology Metrics Kok Domingos (2005)
  • CLL Conditional Log Likelihood
  • The log of the probability predicted by the model
    that a gliteral has the correct truth value given
    in the data.
  • Averaged over all test gliterals.
  • AUC Area under the precision recall (PR) curve
  • Produce a PR curve by varying the probability
    threshold.
  • Compute the area under this curve.

52
Metrics to Summarize Curves
  • Transfer Ratio (Cohen et al. 2007)
  • Gives overall idea of improvement achieved over
    learning from scratch

53
Transfer Scenarios
  • Source/target pairs tested
  • WebKB ? IMDB
  • UW-CSE ? IMDB
  • UW-KB ? IMDB
  • WebKB ? UW-CSE
  • IMDB ? UW-CSE
  • WebKB not used as a target since one mega-example
    is sufficient to learn an accurate theory for its
    limited predicate set.

54
(No Transcript)
55
(No Transcript)
56
Sample Learning Curve
ScrKD TrKD, Hand Mapping TAMAR, Hand
Mapping TrKD TAMAR
57
(No Transcript)
58
Future Research Issues
  • More realistic application domains.
  • Application to other SRL models (e.g. SLPs,
    BLPs).
  • More flexible predicate mapping
  • Allow argument ordering or arity to change.
  • Map 1 predicate to conjunction of 1 predicates
  • AdvisedBy(X,Y) ?? Movie(M,X) ? Director(M,Y)

59
Multiple Source Transfer
  • Transfer from multiple source problems to a given
    target problem.
  • Determine which clauses to map and revise from
    different source MLNs.

60
Source Selection
  • Select useful source domains from a large number
    of previously learned tasks.
  • Ideally, picking source domain(s) is sub-linear
    in the number of previously learned tasks.

61
Conclusions
  • Presented TAMAR, a complete transfer system for
    SRL that
  • Maps relational knowledge in the source to the
    target domain.
  • Revises the mapped knowledge to further improve
    accuracy.
  • Showed experimentally that TAMAR improves speed
    and accuracy over existing methods.

62
Questions?
  • Related papers at
  • http//www.cs.utexas.edu/users/ml/publication/tran
    sfer.html

63
Why MLNs?
  • Inherit the expressivity of first-order logic
  • Can apply insights from ILP
  • Inherit the flexibility of probabilistic
    graphical models
  • Can deal with noisy uncertain environments
  • Undirected models
  • Do not need to learn causal directions
  • Subsume all other SRL models that are special
    cases of first-order logic or probabilistic
    graphical models Richardson 04
  • Publicly available software package Alchemy

64
Predicate Mapping Comments
  • A particular source predicate can be mapped to
    different target predicates in different clauses.
  • This makes our approach context sensitive.
  • More scalable.
  • In the worst-case, the number of mappings is
    exponential in the number of predicates.
  • The number of predicates in a clause is generally
    much smaller than the total number of predicates
    in a domain.

65
Relationship to Structure Mapping Engine
(Falkenheiner et al., 1989)
  • A system for mapping relations using analogy
    based on a psychological theory.
  • Mappings are evaluated based only on the
    structural relational similarity between the two
    domains.
  • Does not consider the accuracy of mapped
    knowledge in the target when determining the
    preferred mapping.
  • Determines a single global mapping for a given
    source target.

66
Summary of Methodology
  • Learn MLNs for each point on learning curve
  • Perform inference over learned models
  • Summarize inference results using 2 metrics CLL
    and AUC, thus producing two learning curves
  • Summarize each learning curve using transfer
    ratio and percentage improvement from one
    mega-example

67
(No Transcript)
68
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com