Title: Machine Learning and ILP for Multi-Agent Systems
1Machine Learning and ILP for Multi-Agent Systems
- Daniel Kudenko Dimitar Kazakov
- Department of Computer Science
- University of York, UK
ACAI-01, Prague, July 2001
2Why Learning Agents?
- Agent designers are not able to foresee all
situations that the agent will encounter. - To display full autonomy Agents need to learn
from and adapt to novel environments. - Learning is a crucial part of intelligence.
3A Brief History
Disembodied ML
Single-Agent Learning
Machine Learning
Multiple Single-Agent Learners
Social Multi-Agent Learners
Social Multi-Agent System
Multiple Single-Agent System
Agents
Single-Agent System
4Outline
- Principles of Machine Learning (ML)
- ML for Single Agents
- ML for Multi-Agent Systems
- Inductive Logic Programming for Agents
5What is Machine Learning?
- Definition A computer program is said to learn
from experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E. Mitchell 97 - Example T play tennis, E playing
matches, P score
6Types of Learning
- Inductive Learning (Supervised Learning)
- Reinforcement Learning
- Discovery (Unsupervised Learning)
7Inductive Learning
- An inductive learning system aims at
determining a description of a given concept from
a set of concept examples provided by the teacher
and from background knowledge. Michalski et al.
98
8Inductive Learning
Examples of Category C1
Examples of Category C2
Examples of Category Cn
Inductive Learning System
Hypothesis (Procedure to Classify New Examples)
9Inductive Learning Example
Ammo low Monster near Light good Category
shoot
Ammo low Monster far Light medium Category
shoot
Ammo high Monster far Light good Category
shoot
Inductive Learning System
If (Ammo high) and (light ?medium, good)
then shoot ..
10Performance Measure
- Classification accuracy on unseen test set.
- Alternatively measure that incorporates cost of
false-positives and false-negatives (e.g.
recall/precision).
11Wheres the knowledge?
- Example (or Object) language
- Hypothesis (or Concept) language
- Learning bias
- Background knowledge
12Example Language
- Feature-value vectors, logic programs.
- Which features are used to represent examples
(e.g., ammunition left)? - For agents which features of the environment are
fed to the agent (or the learning module)? - Constructive Induction automatic feature
selection, construction, and generation.
13Hypothesis Language
- Decision trees, neural networks, logic programs,
- Further restrictions may be imposed, e.g., depth
of decision trees, form of clauses. - Choice of hypothesis language influences choice
of learning methods and vice versa.
14Learning bias
- Preference relation between legal hypotheses.
- Accuracy on training set.
- Hypothesis with zero error on training data is
not necessarily the best (noise!). - Occams razor the simpler hypothesis is the
better one.
15Inductive Learning
- No real learning without language or learning
bias. - IL is search through space of hypotheses guided
by bias. - Quality of hypothesis depends on proper
distribution of training examples.
16Inductive Learning for Agents
- What is the target concept (i.e., categories)?
- Example do(a), do(a) for specific action a.
- Real-valued categories/actions can be
discretized. - Where does the training data come from and what
form does it take?
17Batch vs Incremental Learning
- Batch Learning collect a set of training
examples and compute hypothesis. - Incremental Learning update hypothesis with each
new training example. - Incremental learning more suited for agents.
18Batch Learning for Agents
- When should (re-)computation of hypothesis take
place? - Example after experienced accuracy of hypothesis
drops below threshold. - Which training examples should be used?
- Example sequences of actions that led to
success.
19Eager vs. Lazy learning
- Eager learning commit to hypothesis computed
after training. - Lazy learning store all encountered examples and
perform classification based on this database
(e.g. nearest neighbour).
20Active Learning
- Learner decides which training data to receive
(i.e. generates training examples and uses oracle
to classify them). - Closed Loop ML learner suggests hypothesis and
verifies it experimentally. If hypothesis is
rejected, the collected data gives rise to a new
hypothesis.
21Black-Box vs. White-Box
- Black-Box Learning Interpretation of the
learning result is unclear to a user. - White-Box Learning Creates (symbolic) structures
that are comprehensible.
22Reinforcement Learning
- Agent learns from environmental feedback
indicating the benefit of states. - No explicit teacher required.
- Learning target optimal policy (i.e.,
state-action mapping) - Optimality measure e.g., cumulative discounted
reward.
23Q Learning
Value of a state discounted cumulative reward
V?(st) ?i ? 0 ?i r(sti,ati) 0 ?
? lt 1 is a discount factor (? 0 means that
only immediate reward is considered). r(sti
,ati) is the reward determined by performing
actions specified by policy ?. Q(s,a)
r(s,a) V(?(s,a)) Optimal Policy
?(s) argmaxa Q(s,a)
24Q Learning
Initialize all Q(s,a) to 0 In some state s choose
some action a. Let s be the resulting state.
Update Q Q(s,a) r ? maxa
Q(s,a)
25Q Learning
- Guaranteed convergence towards optimum
(state-action pairs have to be visited infinitely
often). - Exploration strategy can speed up convergence.
- Basic Q Learning does not generalize replace
state-action table with function approximation
(e.g. neural net) in order to handle unseen
states.
26Pros and Cons of RL
- Clearly suited to agents acting and exploring an
environment. - Simple.
- Engineering of suitable reward function may be
tricky. - May take a long time to converge.
- Learning result may be not transparent (depending
on representation of Q function).
27Combination of IL and RL
- Relational reinforcement learning Dzeroski et
al. 98 leads to more general Q function
representation that may still be applicable even
if the goals or environment change. - Explanation-based learning and RL Dietterich and
Flann, 95. - More ILP and RL see later.
28Unsupervised Learning
- Acquisition of useful or interesting patterns
in input data. - Usefulness and interestingness are based on
agents internal bias. - Agent does not receive any external feedback.
- Discovered concepts are expected to improve agent
performance on future tasks.
29Learning and Verification
- Need to guarantee agent safety.
- Pre-deployment verification for non-learning
agents. - What to do with learning agents?
30Learning and VerificationGordon 00
- Verification after each self-modification step.
- Problem Time-consuming.
- Solution 1 use property-preserving learning
operators. - Solution 2 use learning operators which permit
quick (partial) re-verification.
31Learning and Verification
- What to do if verification fails?
- Repair (multi)-agent plan.
- Choose different learning operator.
32Learning in Multi-Agent Systems
- Classification
- Social Awareness.
- Communication
- Role Learning.
- Distributed Learning.
33Types of Multi-Agent LearningWeiss
Dillenbourg 99
- Multiplied Learning No interference in the
learning process by other agents (except for
exchange of training data or outputs). - Divided Learning Division of learning task on
functional level. - Interacting Learning cooperation beyond the pure
exchange of data.
34Social Awareness
- Awareness of existence of other agents and
(eventually) knowledge about their behavior. - Not necessary to achieve near optimal MAS
behavior rock sample collection Steels 89. - Can it degrade performance?
35Levels of Social Awareness VidalDurfee 97
- 0-level agent no knowledge about existence of
other agents. - 1-level agent recognizes that other agents
exist, model other agents as 0-level. - 2-level agent has some knowledge about behavior
of other agents and their behavior model other
agents as 1-level agents. - k-level agent model other agents as (k-1)-level.
36Social Awareness and Q Learning
- 0-level agents already learn implicitly about
other agents. - Mundhe and Sen, 00 study of two Q learning
agents up to level 2. - Two 1-level agents display slowest and least
effective learning (worse than two 0-level
agents).
37Agent models and Q Learning
- Q S ? An ? R, where n is the number of agents.
- If other agents actions are not observable, need
assumption for actions of other agents. - Pessimistic assumption given an agents action
choice other agents will minimize reward. - Optimistic assumption other agents will maximize
reward.
38Agent Models and Q Learning
- Pessimistic Assumption leads to overly cautious
behavior. - Optimistic Assumption guarantees convergence
towards optimum Lauer Riedmiller 00. - If knowledge of other agents behavior available,
Q value update can be based on probabilistic
computation Claus and Boutilier 98. But no
guarantee of optimality.
39Q Learning and CommunicationTan 93
- Types of communication
- Sharing sensation
- Sharing or merging policies
- Sharing episodes
- Results
- Communication generally helps
- Extra sensory information may hurt
40Role Learning
- Often useful for agents to specialize in specific
roles for joint tasks. - Pre-defined roles reduce flexibility, often not
easy to define optimal distribution, may be
expensive. - How to learn roles?
- Prasad et al. 96 learn optimal distribution of
pre-defined roles.
41Q Learning of roles
- CritesBarto 98 elevator domain regular Q
learning no specialization achieved (but highly
efficient behavior). - OnoFukumoto 96 Hunter-Prey domain,
specialization achieved with greatest mass
merging strategy.
42Q Learning of Roles Balch 99
- Three types of reward function local
performance-based, local shaped, global. - Global reward supports specialization.
- Local reward supports emergence of homogeneous
behaviors. - Some domains benefit from learning team
heterogeneity (e.g., robotic soccer), others do
not (e.g., multi-robot foraging). - Heterogeneity measure social entropy.
43Distributed Learning
- Motivation Agents learning a global hypothesis
from local observations. - Application of MAS techniques to (inductive)
learning. - Applications Distributed Data Mining Provost
Kolluri 99, Robotic Soccer.
44Distributed Data Mining
- Provost Hennessy 96 Individual learners see
only subset of all training examples and compute
a set of local rules based on these. - Local rules are evaluated by other learners based
on their data. - Only rules with good evaluation are carried over
to the global hypothesis.
45Bibliography
Mitchell 97 T. Mitchell. Machine Learning.
McGraw Hill, 1997. Michalski et al. 98 R.S.
Michalski, I. Bratko, M. Kubat. Machine Learning
and Data Mining Methods and Applications. Wiley,
1998. DietterichFlann 95 T. Dietterich and
N.Flann. Explanation-based Learning and
Reinforcement Learning. In Proceedings of the
Twelfth International Conference on Machine
Learning, 1995. Dzeroski et al. 98 S. Dzeroski,
L. DeRaedt, and H. Blockeel. Relational
Reinforcement Learning. In Proceedings of the
Eighth International Conference on Inductive
Logic Programming ILP-98. Springer, 1998. Gordon
00 D. Gordon Asimovian Adaptive Agents. Journal
of Artificial Intelligence Research, 13,
2000. Weiss Dilelnbourg 99 G. Weiss and P.
Dillenbourg. What is Multi in Multi-Agent
Learning? In P. Dillenbourg (ed.), Collaborative
Learning. Cognitive and Computational Approaches.
Pergamon Press, 1999. Vidal Durfee 97 J.M.
Vidal and E. Durfee. Agents Learning about
Agents A Framework and Analysis. In Working
Notes of the AAAI-97 workshop on Multiagent
Learning, 1997. Mundhe Sen 00 M. Mundhe and
S. Sen. Evaluating Concurrent Reinforcement
Learners. Proceedings of the Fourth International
Conference on Multiagent Systems, IEEE Press,
2000. Claus Boutillier 98 C. Claus and C.
Boutillier. The Dynamics of Reinforcement
Learning in Cooperative Multiagent Systems. AAAI
98. Lauer Riedmiller 00 M. Lauer and M.
Riedmiller. An Algorithm for Distributed
Reinforcement Learning in Cooperative Multi-Agent
Systems. In Proceedings of the Seventeenth
International Conference in Machine Learning,
2000.
46Bibliography
Tan 93 M. Tan. Multi-Agent Reinforcement
Learning Independent vs. Cooperative Agents. In
Proceedings of the Tenth International Conference
on Machine Learning, 1993. Prasad et al. 96
M.V.N. Prasad, S.E. Lander and V.R. Lesser.
Learning Organizational Roles for Negotiated
Search. International Journal of Human-Computer
Studies, 48(1), 1996. Ono Fukomoto 96 N. Ono
and K. Fukomoto. A Modular Approach to
Multi-Agent Reinforcement Learning. Proceedings
of the First International Conference on
Multi-Agent Systems, 1996. Crites Barto 98 R.
Crites and A. Barto. Elevator Group Control Using
Multiple Reinforcement Learning Agents. Machine
Learning, 1998. Balch 99 T. Balch. Reward and
Diversity in Multi-Robot Foraging. Proceedings of
the IJCAI-99 Workshop on Agents Learning About,
From, and With other Agents, 1999. Provost
Kolluri 99 F. Provost and V. Kolluri. "A Survey
of Methods for Scaling Up Inductive Algorithms."
Data Mining and Knowledge Discovery 3,
1999. Provost Hennessy 96 F. Provost and D.
Hennessy. Scaling up Distributed Machine
Learning with Cooperation. AAAI 96, 1996.
47B R E A K
48Machine Learning and ILP for MAS Part II
- Integration of ML and Agents
- ILP and its potential for MAS
- Agent Applications of ILP
- Learning, Natural Selection and Language
49Machine Learning and ILP for MAS Part II
- Integration of ML and Agents
- ILP and its potential for MAS
- Agent Applications of ILP
- Learning, Natural Selection and Language
50From Machine Learning to Learning Agents
- Machine Learning Learning as
the only goal
Classic Machine Learning
Active Learning
Closed Loop Machine Learning
Learning as one of many goals Learning
Agent(s)
51Integrating Machine Learning into the Agent
Architecture
- Time constraints on learning
- Synchronisation between agents actions
- Learning and Recall
52Time Constraints on Learning
- Machine Learning alone
- predictive accuracy matters, time doesnt (just a
price to pay) - ML in Agents
- Soft deadlines resources must be shared with
other activities (perception, planning, control) - Hard deadlines imposed by environment Make up
your mind now! (or theyll eat you)
53Doing Eager vs. Lazy Learning under Time Pressure
- Eager Learning
- Theories typically more compact
- and faster to use
- Takes more time to learn do it when the agent
is idle - Lazy Learning
- Knowledge acquired at (almost) no cost
- May be much slower when a test example comes
54 Clear-cut vs. Any-time Learning
- Consider two types of algorithms
- Running a prescribed number of steps guarantees
finding a solution - can use worst case complexity analysis to find an
upper bound on the execution time - Any-time algorithms
- a longer run may result in a better solution
- dont know an optimal solution when they see one
- example Genetic Algorithms
- policies halt learning to meet hard deadlines or
when cost outweighs expected improvements of
accuracy
55Time Constraints on Learning in Simulated
Environments
- Consider various cases
- Unlimited time for learning
- Upper bound on time for learning
- Learning in real time
- Gradually tightening the constraints makes
integration easier - Not limited to simulations real-world problems
have similar setting - e.g., various types of auctions
56Synchronisation ? Time Constraints
Unlimited time Unlimited time Upper bound Real time
1-move-per-round, batch update Logic-based MAS for conflict simulations (Kudenko, Alonso) Logic-based MAS for conflict simulations (Kudenko, Alonso)
1-move-per-round, immediate update The York MA Environment(Kazakov et al.) The York MA Environment(Kazakov et al.) The York MA Environment(Kazakov et al.)
Asynchronous Multi-agent Progol (Muggleton)
57Learning and Recall
- Agent must strike a balance between
- Learning, which updates the model of the world
- Recall, which applies existing model of the world
to other tasks
58Learning and Recall (2)
Recall current model of world to choose and carry
out an action
Observe the action outcome
Update sensory information
Learn new model of the world
59Learning and Recall (3)
Update sensory information
Recall current model of world to choose and carry
out an action
Learn new model of the world
- In theory, the two can run in parallel
- In practice, must share limited resources
60Learning and Recall (4)
- Possible strategies
- Parallel learning and recall at all times
- Mutually exclusive learning and recall
- After incremental, eager learning, examples are
discarded - or kept if batch or lazy learning used
- Cheap on-the-fly learning (preprocessing),
off-line computationally expensive learning - reduce raw information, change object language
- analogy with human learning and the role of sleep
61Machine Learning and ILP for MAS Part II
- Integration of ML and Agents
- ILP and its potential for MAS
- Agent Applications of ILP
- Learning, Natural Selection and Language
62Machine Learning Revisited
- ML can be seen as the task of
- taking a set of observations represented in a
given object/data language and - representing (the information in) that set in
another language called concept/hypothesis
language. - A side effect of this step the ability to deal
with unseen observations.
63Object and Concept Language
- Object Language (x,y,/-).
- Concept Language any ellipse (5 param.)
?
?
_
_
_
_
64Machine Learning Biases
- The concept/hypothesis language specifies the
language bias, which limits the set of all
concepts/hypotheses that can be
expressed/considered/learned. - The preference bias allows us to decide between
two hypotheses if they both classify the training
data equally. - The search bias defines the order in which
hypotheses will be considered. - Important if one does not search the whole
hypothesis space.
65Preference Bias, Search Bias Version Space
- Version space the subset of hypotheses that have
zero training error.
most gen. concept
_
_
most spec. concept
_
_
66Inductive Logic Programming
- Based on three pillars
- Logic Programming (LP) to represent data and
concepts (i.e., object and concept language) - Background Knowledge to extend the concept
language - Induction as learning method
67LP as ILP Object Language
- A subset of First Order Predicate Logic (FOPL)
called Logic Programming. - Often limited to ground facts, i.e.,
propositional logic (cf. ID3 etc.). - In the latter case, data can be represented as a
single table.
68ILP Object Language Example
Good bargain cars Good bargain cars Good bargain cars Good bargain cars ILP representation
model mileage price y/n
BMW Z3 50,000 5000 gbc(z3,50000,5000).
Audi V8 30,000 4000 gbc(v8,30000,4000).
Fiat Uno 90,000 3000 - - gbc(uno,90000,3000).
69LP as ILP Concept Language
- The concept language of ILP is relations
expressed as Horn clauses, e.g. - equal(X,X).greater(X,Y) - X gt Y.
- Cf. propositional logic representation(arg11
arg21)or(arg12 arg22)... - Tedious for finite domains and impossible
otherwise. - Most often there is one target predicate
(concept) only. - exceptions exist, e.g., Progol 5.
70Modes in ILP
- Used to distinguish between
- input attributes (mode )
- output attributes (mode -) of the predicate
learned. - Mode used to describe attributes that must
contain a constant in the predicate definition. - E.g., use mode car_type(,,) to
learncar_type(Doors,Roof,sports_car)- Doors
lt 2, Roof convertible.
71Modes in ILP
- Used to distinguish between
- input attributes (mode )
- output attributes (mode -) of the predicate
learned. - Mode used to describe attributes that must
contain a constant in the predicate definition. - E.g., use mode car_type(,,) to
learncar_type(Doors,Roof,sports_car)- Doors
lt 2, Roof convertible.
72Modes in ILP
- Used to distinguish between
- input attributes (mode )
- output attributes (mode -) of the predicate
learned. - Mode used to describe attributes that must
contain a constant in the predicate definition. - E.g., use mode car_type(,,) to
learncar_type(Doors,Roof,sports_car)- Doors
lt 2, Roof convertible.
73Modes in ILP
- Used to distinguish between
- input attributes (mode )
- output attributes (mode -) of the predicate
learned. - Mode used to describe attributes that must
contain a constant in the predicate definition. - E.g., use mode car_type(-,-,) to
learncar_type(Doors,Roof,sports_car)- (Doors
1 Doors 2), Roof convertible.
74Types in ILP
- Specify the range for each argument
- User-defined types represented as unary
predicatescolour(blue). colour(red).
colour(black). - Built-in types also providednat/1, real/1,
any/1 in Progol. - These definitions may or may not be generative
colour(X) instantiates X,nat(X) does not.
75ILP Types and Modes Example
Good bargain cars Good bargain cars Good bargain cars Good bargain cars ILP representation (Progol)
model mileage price y/n modeh(1,gbc(model,mileage,price))?
BMW Z3 50,000 5000 gbc(z3,50000,5000).
Audi V8 30,000 4000 gbc(v8,30000,4000).
Fiat Uno 90,000 3000 - - gbc(uno,90000,3000).
76Positive Only Learning
- A way of dealing with domains where no negative
examples are available. - Learn the concept of non-self-destructive
actions. - The trivial definition Anything belongs to the
target concept looks all right ! - Trick generate random examples and treat them as
negative. - Requires generative type definitions.
77Background Knowledge
- Only very simple math. relations, such as
identity and greater than used so
farequal(X,X).greater(X,Y) - X gt Y. - These can also be easily hard-wired in the
concept language of propositional learners. - ILPs big advantage one can extend the concept
language with user-defined concepts or background
knowledge.
78Background Knowledge (2)
- The use of certain BK predicates may be a
necessary condition for learning the right
hypothesis. - Redundant or irrelevant BK slows down the
learning. - Example
- BK prod(Miles,Price,Threshold)-
Miles Price lt Threshold. - Modes modeh(1,gbc(model,miles,price))?
modeb(1,prod(miles,price,threshold))? - Th gbc(z3,Miles,Price) -
prod(Miles,Price,250000001).
79 Choice of Background Knowledge
- In an ideal world one should start from a
complete model of the background knowledge of the
target population. In practice, even with the
most intensive anthropological studies, such a
model is impossible to achieve. We do not even
know what it is that we know ourselves. The best
that can be achieved is a study of the directly
relevant background knowledge, though it is only
when a solution is identified that one can know
what is or is not relevant. - The Critical Villager, Eric Dudley
80ILP Preference Bias
- Typically a trade-off between generality and
complexity - cover as many positive examples (and as few
negative ones) as you can - with as simple a theory as possible
- Some ILP learners allow the users to specify
their own preference bias.
81Induction in ILP
- Bottom-up (least general generalisation)
- Map a term into a variable
- Drop a literal from the clause body
- Top-down (refinement operator)
- Instantiate a variable
- Add a literal to the clause body
- Mixed techniques (e.g., Progol)
82Example of Induction
BK q(b).q(c). Training examples p(b,a).p(f,g).
- p(i,j).
p(X,Y). p(b,a) - q(b).
p(X,a).
p(X,Y) - q(X).
83Induction in Progol
- For each training example
- Find the most general theory (clause) T
- Find the most specific theory (clause) ?
- Search the space in between in a top-down fashion
T p(X,Y) ? p(X,a) - q(X).
p(X,a).
p(X,Y) - q(X)
84Summary of ILP Basics
- Symbolic
- Eager
- Knowledge-oriented (white-box) learner
- Complex, flexible hypothesis space
- Based on Induction
85Learning Pure Logic Programs vs. Decision Lists
- Pure logic programs the order of clauses is
irrelevant, and they must not contradict each
other. - Decision lists the concept language includes the
predicate cut (!). - The use of decision lists can make for simpler
(more concise) theories.
86Decision List Example
- action(Cat,ObservedAnimal,Action).
- action(Cat,Animal,stay)-dog(Animal),owner(Owner
,Animal),owner(Owner,Cat),!. - action(Cat,Animal,run)-dog(Animal),!.
- action(Cat,Animal,stay).
87Updating Decision Lists with Exceptions
- action(Cat,caesar,run)- !.
- action(Cat,Animal,stay)-dog(Animal),owner(Owner
,Animal),owner(Owner,Cat),!. - action(Cat,Animal,run)-dog(Animal),!.
- action(Cat,Animal,stay).
88Updating Decision Lists with Exceptions
- Could be very beneficial in agents when immediate
updating of the agents knowledge is important
just add the exception at the top of the list. - Computationally inexpensive does not need to
modify the rest of the list. - Exceptions could be compiled into rules when
agent is inactive.
89Replacing Exceptions with Rules Before
- action(Cat,caesar,run)- !.
- action(Cat,rex,run)- !.
- action(Cat,rusty,run)- !.
- action(Cat,Animal,stay)-dog(Animal),owner(Owner
,Animal),owner(Owner,Cat),!. -
90Replacing Exceptions with Rules After
- action(Cat,Animal,run)-
- dog(Animal),
- owner(richard,Animal),!.
- action(Cat,Animal,stay)-dog(Animal),owner(Owner
,Animal),owner(Owner,Cat),!. -
91Eager ILP vs. Analogical Prediction
- Eager Learning learn theory, dispose of
observations. - Lazy Learning
- keep all observations
- compare new with old ones to classify
- no explanation provided.
- Analogical Prediction (Muggleton, Bain 98)
- Combines the often higher accuracy of lazy
learning with an intelligible, explicit
hypothesis typical for ILP - Constructs a local theory for each new
observation that is consistent with the largest
number of training examples.
92Analogical Prediction Example
- owner(richard,caesar).
- action(Cat,caesar,run).
- owner(richard,rex).
- action(Cat,rex,run).
- owner(daniel,blackie).
- action(Cat,blackie,stay).
- owner(richard,rusty).
- action(Cat,rusty,?).
93Analogical Prediction Example
- owner(richard,caesar).
- action(Cat,caesar,run).
- owner(richard,rex).
- action(Cat,rex,run).
- owner(daniel,blackie).
- action(Cat,blackie,stay).
- owner(richard,rusty).
- action(Cat,Dog,run)-
- owner(richard,Dog).
94Timing Analysis of Theories Learned with ILP
- The more training examples, the more accurate the
theory - but how long does it take to produce an answer ?
- No theoretical work on the subject so far
- Experiment shows nontrivial behaviour (reminding
of the phase transitions observed in SAT
learning). -
95Timing Analysis of ILP Theories Example
- left simple theory with low coverage succeeds
or quickly fails ? high speed - middle medium coverage, fragmentary theory,
lots of backtracking ? low speed - right general theory with high coverage less
backtracking ? high speed
96Machine Learning and ILP for MAS Part II
- Integration of ML and Agents
- ILP and its potential for MAS
- Agent Applications of ILP
- Learning, Natural Selection and Language
97Agent Applications of ILP
- Relational Reinforcement Learning (Džeroski, De
Raedt, Driessens) - combines reinforcement learning with ILP
- generalises over previous experience and goals
(Q-table) to produce logical decision trees - results can be used to address new situations
- Dont miss the next talk (1140 1310h) !
98Agent Applications of ILP
- ILP for Verification and Validation of MAS
(Jacob, Driessens, De Raedt) - Also uses FOPL decision trees
- Observes agents behavour and represents it as a
logical decision tree - The rules in the decision tree can be compared
with the designers intentions - Test domain RoboCup
99Agent Applications of ILP
- Reid Ryan 2000
- ILP used to help hierarchical reinforcement
learning - ILP constructs high-level features that help
discriminate between (state,action) transitions
with non-deterministic behaviour
100Agent Applications of ILP
- Matsui et al. 2000
- Proposed an ILP agent that avoids actions which
will probably fail to achieve the goal. - Application domain RoboCup
- Alonso Kudenko 99
- ILP and EBL for conflict simulations.
101The York MA Environment
- Species of 2D agents competing for renewable,
limited resources. - Agents have simple hard-coded behaviour based on
the notion of drives. - Each agent can optionally have an ILP (Progol)
mind a separate process receiving observations
and suggesting actions. - Allows to select the values of inherited features
through natural selection.
102The York MA Environment
103The York MA Environment
- ILP hasnt been used in experiments yet (to come
soon). - A number of experiments using inheritance studied
Kinship-driven Altruism among Agents. - The start-up project sponsored by Microsoft.
- Undergraduate students involved so far Lee
Mallabone, Steve Routledge, John Barton.
104Machine Learning and ILP for MAS Part II
- Integration of ML and Agents
- ILP and its potential for MAS
- Agent Applications of ILP
- Learning, Natural Selection and Language
105Learning and Natural Selection
- In learning, search is trivial, choosing the
right bias is hard. - But, the choice of learning bias is always
external to the learner ! - To find the best suited bias one could combine
arbitrary choices of bias of with evolution and
natural selection of the fittest individuals.
106Darwinian vs. Lamarckian Evolution
- Darwinian evolution nothing learned by the
individual is encoded in the genes and passed on
to the offspring. - The Baldwin effect learning abilities (good
biases) are selected in evolution because they
give the individual a better chance in a dynamic
environment. - What is passed on to the offspring is useful, but
very general.
107Darwinian vs. Lamarckian Evolution (2)
- Lamarckian Evolution individual experience
acquired in life can be inherited. - Not the case in nature.
- Doesnt mean we cant use it.
- The inherited concepts may be too specific and
not of general importance.
108Learning and Language
- Language uses concepts which are
- specific enough to be useful to most/all speakers
of that language - general enough to correspond to shared experience
(otherwise, how would one know what the other is
talking about !) - The concepts of a language serve as a learning
bias which is inherited not in genes but
through education.
109Communication and Learning
- Language
- helps one learn (in addition to inherited biases)
- allows to communicate knowledge.
- Distinguish between
- Knowledge things that one can explain by the
means of a language to another. - Skills the rest, require individual learning,
cannot be communicated. - If watching was enough to learn, the dog would
have become a butcher. Bulgarian proverb.
110Communication and Learning (2)
- In NLP, forgetting examples may be harmful (van
den Bosch et al.) - An expert is someone who does not think anymore
he knows. Frank Lloyd Wright. - It may be difficult to communicate what one has
learned because of - Limited bandwidth (for lazy learning)
- The absence of appropriate concepts in the
language (for black-box learning)
111Communication and Learning (3)
- In a society of communicating agents, less
accurate white-box learning may be better than
more accurate but expensive learning that cannot
be communicated since the reduced performance
could be outweighed by the much lower cost of
learning.
112Our Current Research
- Inductive Bias Selection (Shane Greenaway)
- Role Learning (Spiros Kapetanakis)
- Inductive Learning for Games (Alex Champandard)
- Machine Learning of Natural Language in MAS (Mark
Bartlett)
113The End