Machine Learning and ILP for Multi-Agent Systems

1 / 111
About This Presentation
Title:

Machine Learning and ILP for Multi-Agent Systems

Description:

Machine Learning and ILP for Multi-Agent Systems Daniel Kudenko & Dimitar Kazakov Department of Computer Science University of York, UK – PowerPoint PPT presentation

Number of Views:8
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Machine Learning and ILP for Multi-Agent Systems


1
Machine Learning and ILP for Multi-Agent Systems
  • Daniel Kudenko Dimitar Kazakov
  • Department of Computer Science
  • University of York, UK

2
Why Learning Agents?
  • Agent designers are not able to foresee all
    situations that the agent will encounter.
  • To display full autonomy Agents need to learn
    from and adapt to novel environments.
  • Learning is a crucial part of intelligence.

3
A Brief History
Disembodied ML
Single-Agent Learning
Machine Learning
Multiple Single-Agent Learners
Social Multi-Agent Learners
Social Multi-Agent System
Multiple Single-Agent System
Agents
Single-Agent System
4
Outline
  • Principles of Machine Learning (ML)
  • ML for Single Agents
  • ML for Multi-Agent Systems
  • Inductive Logic Programming for Agents

5
What is Machine Learning?
  • Definition A computer program is said to learn
    from experience E with respect to some class of
    tasks T and performance measure P, if its
    performance at tasks in T, as measured by P,
    improves with experience E. Mitchell 97
  • Example T play tennis, E playing
    matches, P score

6
Types of Learning
  • Inductive Learning (Supervised Learning)
  • Reinforcement Learning
  • Discovery (Unsupervised Learning)

7
Inductive Learning
  • An inductive learning system aims at
    determining a description of a given concept from
    a set of concept examples provided by the teacher
    and from background knowledge. Michalski et al.
    98

8
Inductive Learning
Examples of Category C1
Examples of Category C2
Examples of Category Cn
Inductive Learning System
Hypothesis (Procedure to Classify New Examples)
9
Inductive Learning Example
Ammo low Monster near Light good Category
shoot
Ammo low Monster far Light medium Category
shoot
Ammo high Monster far Light good Category
shoot
Inductive Learning System
If (Ammo high) and (light ?medium, good)
then shoot ..
10
Performance Measure
  • Classification accuracy on unseen test set.
  • Alternatively measure that incorporates cost of
    false-positives and false-negatives (e.g.
    recall/precision).

11
Wheres the knowledge?
  • Example (or Object) language
  • Hypothesis (or Concept) language
  • Learning bias
  • Background knowledge

12
Example Language
  • Feature-value vectors, logic programs.
  • Which features are used to represent examples
    (e.g., ammunition left)?
  • For agents which features of the environment are
    fed to the agent (or the learning module)?
  • Constructive Induction automatic feature
    selection, construction, and generation.

13
Hypothesis Language
  • Decision trees, neural networks, logic programs,
  • Further restrictions may be imposed, e.g., depth
    of decision trees, form of clauses.
  • Choice of hypothesis language influences choice
    of learning methods and vice versa.

14
Learning bias
  • Preference relation between legal hypotheses.
  • Accuracy on training set.
  • Hypothesis with zero error on training data is
    not necessarily the best (noise!).
  • Occams razor the simpler hypothesis is the
    better one.

15
Inductive Learning
  • No real learning without language or learning
    bias.
  • IL is search through space of hypotheses guided
    by bias.
  • Quality of hypothesis depends on proper
    distribution of training examples.

16
Inductive Learning for Agents
  • What is the target concept (i.e., categories)?
  • Example do(a), do(a) for specific action a.
  • Real-valued categories/actions can be
    discretized.
  • Where does the training data come from and what
    form does it take?

17
Batch vs Incremental Learning
  • Batch Learning collect a set of training
    examples and compute hypothesis.
  • Incremental Learning update hypothesis with each
    new training example.
  • Incremental learning more suited for agents.

18
Batch Learning for Agents
  • When should (re-)computation of hypothesis take
    place?
  • Example after experienced accuracy of hypothesis
    drops below threshold.
  • Which training examples should be used?
  • Example sequences of actions that led to
    success.

19
Eager vs. Lazy learning
  • Eager learning commit to hypothesis computed
    after training.
  • Lazy learning store all encountered examples and
    perform classification based on this database
    (e.g. nearest neighbour).

20
Active Learning
  • Learner decides which training data to receive
    (i.e. generates training examples and uses oracle
    to classify them).
  • Closed Loop ML learner suggests hypothesis and
    verifies it experimentally. If hypothesis is
    rejected, the collected data gives rise to a new
    hypothesis.

21
Black-Box vs. White-Box
  • Black-Box Learning Interpretation of the
    learning result is unclear to a user.
  • White-Box Learning Creates (symbolic) structures
    that are comprehensible.

22
Reinforcement Learning
  • Agent learns from environmental feedback
    indicating the benefit of states.
  • No explicit teacher required.
  • Learning target optimal policy (i.e.,
    state-action mapping)
  • Optimality measure e.g., cumulative discounted
    reward.

23
Q Learning
Value of a state discounted cumulative reward
V?(st) ?i ? 0 ?i r(sti,ati) 0 ?
? lt 1 is a discount factor (? 0 means that
only immediate reward is considered). r(sti
,ati) is the reward determined by performing
actions specified by policy ?. Q(s,a)
r(s,a) V(?(s,a)) Optimal Policy
?(s) argmaxa Q(s,a)
24
Q Learning
Initialize all Q(s,a) to 0 In some state s choose
some action a. Let s be the resulting state.
Update Q Q(s,a) r ? maxa
Q(s,a)
25
Q Learning
  • Guaranteed convergence towards optimum
    (state-action pairs have to be visited infinitely
    often).
  • Exploration strategy can speed up convergence.
  • Basic Q Learning does not generalize replace
    state-action table with function approximation
    (e.g. neural net) in order to handle unseen
    states.

26
Pros and Cons of RL
  • Clearly suited to agents acting and exploring an
    environment.
  • Simple.
  • Engineering of suitable reward function may be
    tricky.
  • May take a long time to converge.
  • Learning result may be not transparent (depending
    on representation of Q function).

27
Combination of IL and RL
  • Relational reinforcement learning Dzeroski et
    al. 98 leads to more general Q function
    representation that may still be applicable even
    if the goals or environment change.
  • Explanation-based learning and RL Dietterich and
    Flann, 95.
  • More ILP and RL see later.

28
Unsupervised Learning
  • Acquisition of useful or interesting patterns
    in input data.
  • Usefulness and interestingness are based on
    agents internal bias.
  • Agent does not receive any external feedback.
  • Discovered concepts are expected to improve agent
    performance on future tasks.

29
Learning and Verification
  • Need to guarantee agent safety.
  • Pre-deployment verification for non-learning
    agents.
  • What to do with learning agents?

30
Learning and VerificationGordon 00
  • Verification after each self-modification step.
  • Problem Time-consuming.
  • Solution 1 use property-preserving learning
    operators.
  • Solution 2 use learning operators which permit
    quick (partial) re-verification.

31
Learning and Verification
  • What to do if verification fails?
  • Repair (multi)-agent plan.
  • Choose different learning operator.

32
Learning in Multi-Agent Systems
  • Classification
  • Social Awareness.
  • Communication
  • Role Learning.
  • Distributed Learning.

33
Types of Multi-Agent LearningWeiss
Dillenbourg 99
  • Multiplied Learning No interference in the
    learning process by other agents (except for
    exchange of training data or outputs).
  • Divided Learning Division of learning task on
    functional level.
  • Interacting Learning cooperation beyond the pure
    exchange of data.

34
Social Awareness
  • Awareness of existence of other agents and
    (eventually) knowledge about their behavior.
  • Not necessary to achieve near optimal MAS
    behavior rock sample collection Steels 89.
  • Can it degrade performance?

35
Levels of Social Awareness VidalDurfee 97
  • 0-level agent no knowledge about existence of
    other agents.
  • 1-level agent recognizes that other agents
    exist, model other agents as 0-level.
  • 2-level agent has some knowledge about behavior
    of other agents and their behavior model other
    agents as 1-level agents.
  • k-level agent model other agents as (k-1)-level.

36
Social Awareness and Q Learning
  • 0-level agents already learn implicitly about
    other agents.
  • Mundhe and Sen, 00 study of two Q learning
    agents up to level 2.
  • Two 1-level agents display slowest and least
    effective learning (worse than two 0-level
    agents).

37
Agent models and Q Learning
  • Q S ? An ? R, where n is the number of agents.
  • If other agents actions are not observable, need
    assumption for actions of other agents.
  • Pessimistic assumption given an agents action
    choice other agents will minimize reward.
  • Optimistic assumption other agents will maximize
    reward.

38
Agent Models and Q Learning
  • Pessimistic Assumption leads to overly cautious
    behavior.
  • Optimistic Assumption guarantees convergence
    towards optimum Lauer Riedmiller 00.
  • If knowledge of other agents behavior available,
    Q value update can be based on probabilistic
    computation Claus and Boutilier 98. But no
    guarantee of optimality.

39
Q Learning and CommunicationTan 93
  • Types of communication
  • Sharing sensation
  • Sharing or merging policies
  • Sharing episodes
  • Results
  • Communication generally helps
  • Extra sensory information may hurt

40
Role Learning
  • Often useful for agents to specialize in specific
    roles for joint tasks.
  • Pre-defined roles reduce flexibility, often not
    easy to define optimal distribution, may be
    expensive.
  • How to learn roles?
  • Prasad et al. 96 learn optimal distribution of
    pre-defined roles.

41
Q Learning of roles
  • CritesBarto 98 elevator domain regular Q
    learning no specialization achieved (but highly
    efficient behavior).
  • OnoFukumoto 96 Hunter-Prey domain,
    specialization achieved with greatest mass
    merging strategy.

42
Q Learning of Roles Balch 99
  • Three types of reward function local
    performance-based, local shaped, global.
  • Global reward supports specialization.
  • Local reward supports emergence of homogeneous
    behaviors.
  • Some domains benefit from learning team
    heterogeneity (e.g., robotic soccer), others do
    not (e.g., multi-robot foraging).
  • Heterogeneity measure social entropy.

43
Distributed Learning
  • Motivation Agents learning a global hypothesis
    from local observations.
  • Application of MAS techniques to (inductive)
    learning.
  • Applications Distributed Data Mining Provost
    Kolluri 99, Robotic Soccer.

44
Distributed Data Mining
  • Provost Hennessy 96 Individual learners see
    only subset of all training examples and compute
    a set of local rules based on these.
  • Local rules are evaluated by other learners based
    on their data.
  • Only rules with good evaluation are carried over
    to the global hypothesis.

45
B R E A K
46
Machine Learning and ILP for MAS Part II
  • Integration of ML and Agents
  • ILP and its potential for MAS
  • Agent Applications of ILP
  • Learning, Natural Selection and Language

47
Machine Learning and ILP for MAS Part II
  • Integration of ML and Agents
  • ILP and its potential for MAS
  • Agent Applications of ILP
  • Learning, Natural Selection and Language

48
From Machine Learning to Learning Agents
  • Machine Learning Learning as
    the only goal

Classic Machine Learning
Active Learning
Closed Loop Machine Learning
Learning as one of many goals Learning
Agent(s)
49
Integrating Machine Learning into the Agent
Architecture
  • Time constraints on learning
  • Synchronisation between agents actions
  • Learning and Recall

50
Time Constraints on Learning
  • Machine Learning alone
  • predictive accuracy matters, time doesnt (just a
    price to pay)
  • ML in Agents
  • Soft deadlines resources must be shared with
    other activities (perception, planning, control)
  • Hard deadlines imposed by environment Make up
    your mind now! (or theyll eat you)

51
Doing Eager vs. Lazy Learning under Time Pressure
  • Eager Learning
  • Theories typically more compact
  • and faster to use
  • Takes more time to learn do it when the agent
    is idle
  • Lazy Learning
  • Knowledge acquired at (almost) no cost
  • May be much slower when a test example comes

52
Clear-cut vs. Any-time Learning
  • Consider two types of algorithms
  • Running a prescribed number of steps guarantees
    finding a solution
  • can use worst case complexity analysis to find an
    upper bound on the execution time
  • Any-time algorithms
  • a longer run may result in a better solution
  • dont know an optimal solution when they see one
  • example Genetic Algorithms
  • policies halt learning to meet hard deadlines or
    when cost outweighs expected improvements of
    accuracy

53
Time Constraints on Learning in Simulated
Environments
  • Consider various cases
  • Unlimited time for learning
  • Upper bound on time for learning
  • Learning in real time
  • Gradually tightening the constraints makes
    integration easier
  • Not limited to simulations real-world problems
    have similar setting
  • e.g., various types of auctions

54
Synchronisation ? Time Constraints
Unlimited time Unlimited time Upper bound Real time
1-move-per-round, batch update Logic-based MAS for conflict simulations (Kudenko, Alonso) Logic-based MAS for conflict simulations (Kudenko, Alonso)
1-move-per-round, immediate update The York MA Environment(Kazakov et al.) The York MA Environment(Kazakov et al.) The York MA Environment(Kazakov et al.)
Asynchronous Multi-agent Progol (Muggleton)
55
Learning and Recall
  • Agent must strike a balance between
  • Learning, which updates the model of the world
  • Recall, which applies existing model of the world
    to other tasks

56
Learning and Recall (2)
Recall current model of world to choose and carry
out an action

Observe the action outcome
Update sensory information
Learn new model of the world
57
Learning and Recall (3)

Update sensory information
Recall current model of world to choose and carry
out an action
Learn new model of the world
  • In theory, the two can run in parallel
  • In practice, must share limited resources

58
Learning and Recall (4)
  • Possible strategies
  • Parallel learning and recall at all times
  • Mutually exclusive learning and recall
  • After incremental, eager learning, examples are
    discarded
  • or kept if batch or lazy learning used
  • Cheap on-the-fly learning (preprocessing),
    off-line computationally expensive learning
  • reduce raw information, change object language
  • analogy with human learning and the role of sleep

59
Machine Learning and ILP for MAS Part II
  • Integration of ML and Agents
  • ILP and its potential for MAS
  • Agent Applications of ILP
  • Learning, Natural Selection and Language

60
Machine Learning Revisited
  • ML can be seen as the task of
  • taking a set of observations represented in a
    given object/data language and
  • representing (the information in) that set in
    another language called concept/hypothesis
    language.
  • A side effect of this step the ability to deal
    with unseen observations.

61
Object and Concept Language
  • Object Language (x,y,/-).
  • Concept Language any ellipse (5 param.)

?
?

_
_




_
_
62
Machine Learning Biases
  • The concept/hypothesis language specifies the
    language bias, which limits the set of all
    concepts/hypotheses that can be
    expressed/considered/learned.
  • The preference bias allows us to decide between
    two hypotheses if they both classify the training
    data equally.
  • The search bias defines the order in which
    hypotheses will be considered.
  • Important if one does not search the whole
    hypothesis space.

63
Preference Bias, Search Bias Version Space
  • Version space the subset of hypotheses that have
    zero training error.


most gen. concept
_
_



most spec. concept

_
_
64
Inductive Logic Programming
  • Based on three pillars
  • Logic Programming (LP) to represent data and
    concepts (i.e., object and concept language)
  • Background Knowledge to extend the concept
    language
  • Induction as learning method

65
LP as ILP Object Language
  • A subset of First Order Predicate Logic (FOPL)
    called Logic Programming.
  • Often limited to ground facts, i.e.,
    propositional logic (cf. ID3 etc.).
  • In the latter case, data can be represented as a
    single table.

66
ILP Object Language Example
Good bargain cars Good bargain cars Good bargain cars Good bargain cars ILP representation
model mileage price y/n
BMW Z3 50,000 5000 gbc(z3,50000,5000).
Audi V8 30,000 4000 gbc(v8,30000,4000).
Fiat Uno 90,000 3000 - - gbc(uno,90000,3000).
67
LP as ILP Concept Language
  • The concept language of ILP is relations
    expressed as Horn clauses, e.g.
  • equal(X,X).greater(X,Y) - X gt Y.
  • Cf. propositional logic representation(arg11
    arg21)or(arg12 arg22)...
  • Tedious for finite domains and impossible
    otherwise.
  • Most often there is one target predicate
    (concept) only.
  • exceptions exist, e.g., Progol 5.

68
Modes in ILP
  • Used to distinguish between
  • input attributes (mode )
  • output attributes (mode -) of the predicate
    learned.
  • Mode used to describe attributes that must
    contain a constant in the predicate definition.
  • E.g., use mode car_type(,,) to
    learncar_type(Doors,Roof,sports_car)- Doors
    lt 2, Roof convertible.

69
Modes in ILP
  • Used to distinguish between
  • input attributes (mode )
  • output attributes (mode -) of the predicate
    learned.
  • Mode used to describe attributes that must
    contain a constant in the predicate definition.
  • E.g., use mode car_type(,,) to
    learncar_type(Doors,Roof,sports_car)- Doors
    lt 2, Roof convertible.

70
Modes in ILP
  • Used to distinguish between
  • input attributes (mode )
  • output attributes (mode -) of the predicate
    learned.
  • Mode used to describe attributes that must
    contain a constant in the predicate definition.
  • E.g., use mode car_type(,,) to
    learncar_type(Doors,Roof,sports_car)- Doors
    lt 2, Roof convertible.

71
Modes in ILP
  • Used to distinguish between
  • input attributes (mode )
  • output attributes (mode -) of the predicate
    learned.
  • Mode used to describe attributes that must
    contain a constant in the predicate definition.
  • E.g., use mode car_type(-,-,) to
    learncar_type(Doors,Roof,sports_car)- (Doors
    1 Doors 2), Roof convertible.

72
Types in ILP
  • Specify the range for each argument
  • User-defined types represented as unary
    predicatescolour(blue). colour(red).
    colour(black).
  • Built-in types also providednat/1, real/1,
    any/1 in Progol.
  • These definitions may or may not be generative
    colour(X) instantiates X,nat(X) does not.

73
ILP Types and Modes Example
Good bargain cars Good bargain cars Good bargain cars Good bargain cars ILP representation (Progol)
model mileage price y/n modeh(1,gbc(model,mileage,price))?
BMW Z3 50,000 5000 gbc(z3,50000,5000).
Audi V8 30,000 4000 gbc(v8,30000,4000).
Fiat Uno 90,000 3000 - - gbc(uno,90000,3000).
74
Positive Only Learning
  • A way of dealing with domains where no negative
    examples are available.
  • Learn the concept of non-self-destructive
    actions.
  • The trivial definition Anything belongs to the
    target concept looks all right !
  • Trick generate random examples and treat them as
    negative.
  • Requires generative type definitions.

75
Background Knowledge
  • Only very simple math. relations, such as
    identity and greater than used so
    farequal(X,X).greater(X,Y) - X gt Y.
  • These can also be easily hard-wired in the
    concept language of propositional learners.
  • ILPs big advantage one can extend the concept
    language with user-defined concepts or background
    knowledge.

76
Background Knowledge (2)
  • The use of certain BK predicates may be a
    necessary condition for learning the right
    hypothesis.
  • Redundant or irrelevant BK slows down the
    learning.
  • Example
  • BK prod(Miles,Price,Threshold)-
    Miles Price lt Threshold.
  • Modes modeh(1,gbc(model,miles,price))?
    modeb(1,prod(miles,price,threshold))?
  • Th gbc(z3,Miles,Price) -
    prod(Miles,Price,250000001).

77
Choice of Background Knowledge
  • In an ideal world one should start from a
    complete model of the background knowledge of the
    target population. In practice, even with the
    most intensive anthropological studies, such a
    model is impossible to achieve. We do not even
    know what it is that we know ourselves. The best
    that can be achieved is a study of the directly
    relevant background knowledge, though it is only
    when a solution is identified that one can know
    what is or is not relevant.
  • The Critical Villager, Eric Dudley

78
ILP Preference Bias
  • Typically a trade-off between generality and
    complexity
  • cover as many positive examples (and as few
    negative ones) as you can
  • with as simple a theory as possible
  • Some ILP learners allow the users to specify
    their own preference bias.

79
Induction in ILP
  • Bottom-up (least general generalisation)
  • Map a term into a variable
  • Drop a literal from the clause body
  • Top-down (refinement operator)
  • Instantiate a variable
  • Add a literal to the clause body
  • Mixed techniques (e.g., Progol)

80
Example of Induction
BK q(b).q(c). Training examples p(b,a).p(f,g).
- p(i,j).
p(X,Y). p(b,a) - q(b).
p(X,a).
p(X,Y) - q(X).
81
Induction in Progol
  • For each training example
  • Find the most general theory (clause) T
  • Find the most specific theory (clause) ?
  • Search the space in between in a top-down fashion

T p(X,Y) ? p(X,a) - q(X).
p(X,a).
p(X,Y) - q(X)
82
Summary of ILP Basics
  • Symbolic
  • Eager
  • Knowledge-oriented (white-box) learner
  • Complex, flexible hypothesis space
  • Based on Induction

83
Learning Pure Logic Programs vs. Decision Lists
  • Pure logic programs the order of clauses is
    irrelevant, and they must not contradict each
    other.
  • Decision lists the concept language includes the
    predicate cut (!).
  • The use of decision lists can make for simpler
    (more concise) theories.

84
Decision List Example
  • action(Cat,ObservedAnimal,Action).
  • action(Cat,Animal,stay)-dog(Animal),owner(Owner
    ,Animal),owner(Owner,Cat),!.
  • action(Cat,Animal,run)-dog(Animal),!.
  • action(Cat,Animal,stay).

85
Updating Decision Lists with Exceptions
  • action(Cat,caesar,run)- !.
  • action(Cat,Animal,stay)-dog(Animal),owner(Owner
    ,Animal),owner(Owner,Cat),!.
  • action(Cat,Animal,run)-dog(Animal),!.
  • action(Cat,Animal,stay).

86
Updating Decision Lists with Exceptions
  • Could be very beneficial in agents when immediate
    updating of the agents knowledge is important
    just add the exception at the top of the list.
  • Computationally inexpensive does not need to
    modify the rest of the list.
  • Exceptions could be compiled into rules when
    agent is inactive.

87
Replacing Exceptions with Rules Before
  • action(Cat,caesar,run)- !.
  • action(Cat,rex,run)- !.
  • action(Cat,rusty,run)- !.
  • action(Cat,Animal,stay)-dog(Animal),owner(Owner
    ,Animal),owner(Owner,Cat),!.

88
Replacing Exceptions with Rules After
  • action(Cat,Animal,run)-
  • dog(Animal),
  • owner(richard,Animal),!.
  • action(Cat,Animal,stay)-dog(Animal),owner(Owner
    ,Animal),owner(Owner,Cat),!.

89
Eager ILP vs. Analogical Prediction
  • Eager Learning learn theory, dispose of
    observations.
  • Lazy Learning
  • keep all observations
  • compare new with old ones to classify
  • no explanation provided.
  • Analogical Prediction (Muggleton, Bain 98)
  • Combines the often higher accuracy of lazy
    learning with an intelligible, explicit
    hypothesis typical for ILP
  • Constructs a local theory for each new
    observation that is consistent with the largest
    number of training examples.

90
Analogical Prediction Example
  • owner(richard,caesar).
  • action(Cat,caesar,run).
  • owner(richard,rex).
  • action(Cat,rex,run).
  • owner(daniel,blackie).
  • action(Cat,blackie,stay).
  • owner(richard,rusty).
  • action(Cat,rusty,?).

91
Analogical Prediction Example
  • owner(richard,caesar).
  • action(Cat,caesar,run).
  • owner(richard,rex).
  • action(Cat,rex,run).
  • owner(daniel,blackie).
  • action(Cat,blackie,stay).
  • owner(richard,rusty).
  • action(Cat,Dog,run)-
  • owner(richard,Dog).

92
Timing Analysis of Theories Learned with ILP
  • The more training examples, the more accurate the
    theory
  • but how long does it take to produce an answer ?
  • No theoretical work on the subject so far
  • Experiment shows nontrivial behaviour (reminding
    of the phase transitions observed in SAT
    learning).

93
Timing Analysis of ILP Theories Example
  • Kazakov, PhD Thesis
  • left simple theory with low coverage succeeds
    or quickly fails ? high speed
  • middle medium coverage, fragmentary theory,
    lots of backtracking ? low speed
  • right general theory with high coverage less
    backtracking ? high speed

94
Machine Learning and ILP for MAS Part II
  • Integration of ML and Agents
  • ILP and its potential for MAS
  • Agent Applications of ILP
  • Learning, Natural Selection and Language

95
Agent Applications of ILP
  • Relational Reinforcement Learning (Džeroski, De
    Raedt, Driessens)
  • combines reinforcement learning with ILP
  • generalises over previous experience and goals
    (Q-table) to produce logical decision trees
  • results can be used to address new situations
  • Dont miss the next talk (1140 1310h) !

96
Agent Applications of ILP
  • ILP for Verification and Validation of MAS
    (Jacob, Driessens, De Raedt)
  • Also uses FOPL decision trees
  • Observes agents behavour and represents it as a
    logical decision tree
  • The rules in the decision tree can be compared
    with the designers intentions
  • Test domain RoboCup

97
Agent Applications of ILP
  • Reid Ryan 2000
  • ILP used to help hierarchical reinforcement
    learning
  • ILP constructs high-level features that help
    discriminate between (state,action) transitions
    with non-deterministic behaviour

98
Agent Applications of ILP
  • Matsui et al. 2000
  • Proposed an ILP agent that avoids actions which
    will probably fail to achieve the goal.
  • Application domain RoboCup
  • Alonso Kudenko 99
  • ILP and EBL for conflict simulations.

99
The York MA Environment
  • Species of 2D agents competing for renewable,
    limited resources.
  • Agents have simple hard-coded behaviour based on
    the notion of drives.
  • Each agent can optionally have an ILP (Progol)
    mind a separate process receiving observations
    and suggesting actions.
  • Allows to select the values of inherited features
    through natural selection.

100
The York MA Environment
101
The York MA Environment
  • ILP hasnt been used in experiments yet (to come
    soon).
  • A number of experiments using inheritance studied
    Kinship-driven Altruism among Agents.
  • The start-up project sponsored by Microsoft.
  • Undergraduate students involved so far Lee
    Mallabone, Steve Routledge, John Barton.

102
Machine Learning and ILP for MAS Part II
  • Integration of ML and Agents
  • ILP and its potential for MAS
  • Agent Applications of ILP
  • Learning, Natural Selection and Language

103
Learning and Natural Selection
  • In learning, search is trivial, choosing the
    right bias is hard.
  • But, the choice of learning bias is always
    external to the learner !
  • To find the best suited bias one could combine
    arbitrary choices of bias of with evolution and
    natural selection of the fittest individuals.

104
Darwinian vs. Lamarckian Evolution
  • Darwinian evolution nothing learned by the
    individual is encoded in the genes and passed on
    to the offspring.
  • The Baldwin effect learning abilities (good
    biases) are selected in evolution because they
    give the individual a better chance in a dynamic
    environment.
  • What is passed on to the offspring is useful, but
    very general.

105
Darwinian vs. Lamarckian Evolution (2)
  • Lamarckian Evolution individual experience
    acquired in life can be inherited.
  • Not the case in nature.
  • Doesnt mean we cant use it.
  • The inherited concepts may be too specific and
    not of general importance.

106
Learning and Language
  • Language uses concepts which are
  • specific enough to be useful to most/all speakers
    of that language
  • general enough to correspond to shared experience
    (otherwise, how would one know what the other is
    talking about !)
  • The concepts of a language serve as a learning
    bias which is inherited not in genes but
    through education.

107
Communication and Learning
  • Language
  • helps one learn (in addition to inherited biases)
  • allows to communicate knowledge.
  • Distinguish between
  • Knowledge things that one can explain by the
    means of a language to another.
  • Skills the rest, require individual learning,
    cannot be communicated.
  • If watching was enough to learn, the dog would
    have become a butcher. Bulgarian proverb.

108
Communication and Learning (2)
  • In NLP, forgetting examples may be harmful (van
    den Bosch et al.)
  • An expert is someone who does not think anymore
    he knows. Frank Lloyd Wright.
  • It may be difficult to communicate what one has
    learned because of
  • Limited bandwidth (for lazy learning)
  • The absence of appropriate concepts in the
    language (for black-box learning)

109
Communication and Learning (3)
  • In a society of communicating agents, less
    accurate white-box learning may be better than
    more accurate but expensive learning that cannot
    be communicated since the reduced performance
    could be outweighed by the much lower cost of
    learning.

110
Our Current Research
  • Inductive Bias Selection (Shane Greenaway)
  • Role Learning (Spiros Kapetanakis)
  • Inductive Learning for Games (Alex Champandard)
  • Machine Learning of Natural Language in MAS (Mark
    Bartlett)

111
The End
Write a Comment
User Comments (0)