Machine Learning for Agents and MultiAgent Systems

1 / 96
About This Presentation
Title:

Machine Learning for Agents and MultiAgent Systems

Description:

Values (discounted cumulative reward) of state-action pairs are stored in a Q-table. ... 0 1 is a discount factor ( = 0 means that only immediate reward is ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 97
Provided by: danielk

less

Transcript and Presenter's Notes

Title: Machine Learning for Agents and MultiAgent Systems


1
Machine Learning for Agents and Multi-Agent
Systems
  • Daniel Kudenko and Dimitar Kazakov
  • Department of Computer Science
  • University of York, UK

ECAI-02, Lyon, July 2002
2
Outline
  • Principles of Machine Learning (ML)
  • ML for Single Agents
  • ML for Multi-Agent Systems
  • Specialisation and Role Learning
  • Focus Topic 1 Learning of Co-ordination
  • Evolution, Indiv. Learning and Language
  • Focus Topic 2 Evolution of Kinship-Driven
    Altruism

3
Why Learning Agents?
  • Designers cannot foresee all situations that the
    agent will encounter.
  • To display full autonomy agents need to learn
    from and adapt to novel environ-ments.
  • Learning is a crucial part of intelligence.

4
Evolution Individual Learning in MAS
5
What is Machine Learning?
  • Definition A computer program is said to learn
    from experience E with respect to some class of
    tasks T and perform-ance measure P, if its
    performance at tasks in T, as measured by P,
    improves with experience E. Mitchell 97
  • Example T play tennis, E playing
    matches, P score

6
ML Another View
  • ML can be seen as the task of
  • taking a set of observations represented in a
    given object/data language and
  • representing (the information in) that set in
    another language called concept/hypothesis
    language.
  • A side effect of this step the ability to deal
    with unseen observations.

7
Object and Concept Language
  • Object Language (x,y,/-).
  • Concept Language any ellipse (5 param. x1, y1,
    x2, y2, l1l2)

l1
l2
?
?

_
_




_
_
8
Machine Learning Biases
  • The concept/hypothesis language specifies the
    language bias, which limits the set of all
    concepts/hypotheses that can be
    expressed/considered/learned.
  • The preference bias allows us to decide between
    two hypotheses (even if they both classify the
    training data equally).
  • The search bias defines the order in which
    hypotheses will be considered.
  • Important if one does not search the whole
    hypothesis space.

9
Concept Language andEager vs. Lazy Learning
  • Eager learning commit to hypothesis computed
    after training.
  • Lazy learning store all encountered examples and
    perform classification based on this database
    (e.g. nearest neighbour).

10
Concept Language and Black- vs. White-Box
Learning
  • Black-Box Learning Interpretation of the
    learning result is unclear to a user.
  • White-Box Learning Creates (symbolic) structures
    that are comprehensible.

11
Concept Language and Background Knowledge
  • Examples of concept language
  • A set of real or idealised examples expressed in
    the object language that represent each of the
    concepts learned (Nearest Neighbour)
  • attribute-value pairs (propositional logic)
  • relational concepts (first order logic)
  • One can extend the concept language with
    user-defined concepts or background knowledge.

12
Background Knowledge (2)
  • Characteristic for Inductive Logic Programming
    (ILP)
  • The use of certain BK predicates may be a
    necessary condition for learning the right
    hypothesis.
  • Redundant or irrelevant BK slows down the
    learning.

13
Choice of Background Knowledge (the
anthropologists view)
  • In an ideal world one should start from a
    complete model of the background knowledge of the
    target population. In practice, even with the
    most intensive anthropological studies, such a
    model is impossible to achieve. We do not even
    know what it is that we know ourselves. The best
    that can be achieved is a study of the directly
    relevant background knowledge, though it is only
    when a solution is identified that one can know
    what is or is not relevant.
  • The Critical Villager, Eric Dudley

14
Preference Bias, Search Bias Version Space
  • Version space the subset of hypotheses that have
    zero training error.


most gen. concept
_
_



most spec. concept

_
_
15
More Preference Biases
  • Consider the new representation of your data as
    made of a theory T and a description D needed to
    reconstruct the original data from T.
  • Ockhams razor Dont multiply the number of
    entities without a reason
  • In ML, it means the simpler the theory, the
    better
  • Minimal Description Length (Rissanen 89)
  • choose T, for which the binary representation
    of T and D combined is the shortest possible.

16
Positive Only Learning
  • A way of dealing with domains where no negative
    examples are available.
  • Learn the concept of non-self-destructive
    actions.
  • The trivial definition Anything belongs to the
    target concept looks all right !
  • Trick generate random examples and treat them as
    negative.

17
Active Learning
  • Learner decides which training data to receive
    (i.e. generates training examples and uses an
    oracle to classify them). (Thompson et
    al. 1999)
  • Closed Loop ML learner suggests hypothesis and
    verifies it experimentally. If hypothesis is
    rejected, the collected data gives rise to a new
    hypothesis. (Bryant and Muggleton 2000)

18
Machine Learning vs. Learning Agents
  • Machine Learning Learning as
    the only goal

Classic Machine Learning
Active Learning
Closed Loop Machine Learning
Learning as one of many goals Learning
Agent(s)
19
Integrating Machine Learning into the Agent
Architecture
  • Time constraints on learning
  • Synchronisation between agents actions
  • Learning and recall
  • Timing analysis of theories learned

20
Time Constraints on Learning
  • Machine Learning alone
  • predictive accuracy matters, time doesnt (just a
    price to pay)
  • ML in Agents
  • Soft deadlines resources must be shared with
    other activities (perception, planning, control)
  • Hard deadlines imposed by environment Make up
    your mind now!

21
Doing Eager vs. Lazy Learning under Time Pressure
  • Eager Learning
  • Theories typically more compact
  • and faster to use
  • Takes more time to learn do it when the agent
    is idle
  • Lazy Learning
  • Knowledge acquired at (almost) no cost
  • May be much slower when a test example comes

22
Any-Time Learning
  • Consider two types of algorithms
  • Running a prescribed number of steps guarantees
    finding a solution
  • can use worst case complexity analysis to find an
    upper bound on the execution time
  • Any-time algorithms
  • a longer run may result in a better solution
  • dont know an optimal solution when they see one
  • example Genetic Algorithms
  • policies halt learning to meet hard deadlines or
    when cost outweighs expected improvements of
    accuracy

23
Time Constraints on Learning in Simulated
Environments
  • Consider various cases
  • Unlimited time for learning
  • Upper bound on time for learning
  • Learning in real time
  • Gradually tightening the constraints makes
    integration easier
  • Not limited to simulations real-world problems
    have similar setting
  • e.g., various types of auctions

24
Learning and Recall
  • Agent must strike a balance between
  • Learning, which updates the model of the world
  • Recall, which applies existing model of the world
    to other tasks

25
Learning and Recall (2)

Update sensory information
Recall current model of world to choose and carry
out an action
Learn new model of the world
  • In theory, the two can run in parallel
  • In practice, must share limited resources

26
Learning and Recall (3)
  • Possible strategies
  • Parallel learning and recall at all times
  • Mutually exclusive learning and recall
  • After incremental, eager learning, examples are
    discarded
  • or kept if batch or lazy learning used
  • Cheap on-the-fly learning (preprocessing),
    off-line computationally expensive learning
  • reduce raw information, change object language
  • analogy with human learning and the role of sleep

27
Timing Analysis of Theories Learned Example
  • (Kazakov, PhD Thesis)
  • Beware of phase transition-like behaviour
  • left simple theory with low coverage succeeds
    or quickly fails ? high speed
  • middle medium coverage, fragmentary theory,
    lots of backtracking ? low speed
  • right general theory with high coverage less
    backtracking ? high speed

28
Types of Learning Task
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning

29
Reinforcement Learning
  • Agent learns from environmental feedback
    indicating the benefit of states.
  • No explicit teacher required.
  • Learning target optimal policy (i.e.,
    state-action mapping)
  • Optimality measure e.g., cumulative discounted
    reward.

30
Q Learning
  • Reinforcement Learning Algorithm.
  • Most popular agent learning technique.
  • Values (discounted cumulative reward) of
    state-action pairs are stored in a Q-table.
  • Optimal policy is easily derived from Q-table.

31
Q Learning
Value of a state discounted cumulative reward
V?(st) ?i ? 0 ?i r(sti,ati) 0 ?
? lt 1 is a discount factor (? 0 means that
only immediate reward is considered). r(sti
,ati) is the reward determined by performing
actions specified by policy ?. Q(s,a)
r(s,a) ? V(?(s,a)) Optimal Policy
?(s) argmaxa Q(s,a)
32
Example
100
50
33
Example (cont.)
  • Vshort (s0) 100
  • Vlong (s0) 50 ? 0 ?2 100
  • 0.8 ? Vlong (s0) 114 ? ? long
  • 0.5 ? Vlong (s0) 75 ? ? short

34
Q Learning
Initialize all Q(s,a) to 0 In some state s choose
some action a. Let s be the resulting state and
r the reward received. Update Q
Q(s,a) r ? maxa Q(s,a)
35
Q Learning
  • Guaranteed convergence towards optimum
    (state-action pairs have to be visited infinitely
    often).
  • Exploration strategy can speed up convergence
    (more on this later).
  • Basic Q Learning does not generalize replace
    state-action table with function approximation
    (e.g. neural net) in order to handle unseen
    states.

36
Learning in Multi-Agent Systems Important Issues
  • Classification
  • Social Awareness
  • Communication
  • Role Learning
  • Distributed Learning
  • Focus Learning of Coordination

37
A Brief History
Disembodied ML
Single-Agent Learning
Machine Learning
Multiple Single-Agent Learners
Social Multi-Agent Learners
Social Multi-Agent System
Multiple Single-Agent System
Agents
Single-Agent System
38
Types of Multi-Agent LearningWeiss
Dillenbourg 99
  • Multiplied Learning No interference in the
    learning process by other agents (except for
    exchange of training data or outputs).
  • Divided Learning Division of learning task on
    functional level.
  • Interacting Learning cooperation beyond the pure
    exchange of data.

39
Social Awareness
  • Awareness of existence of other agents and
    (eventually) knowledge about their behavior.
  • Not necessary to achieve near optimal MAS
    behavior rock sample collection Steels 89.
  • Can it degrade performance?

40
Levels of Social Awareness VidalDurfee 97
  • 0-level agent no knowledge about existence of
    other agents.
  • 1-level agent recognizes that other agents
    exist, model other agents as 0-level.
  • 2-level agent has some knowledge about behavior
    of other agents and their behavior model other
    agents as 1-level agents.
  • k-level agent model other agents as (k-1)-level.

41
Social Awareness and Q Learning
  • 0-level agents already learn implicitly about
    other agents.
  • Mundhe and Sen, 00 study of two Q learning
    agents up to level 2.
  • Two 1-level agents display slowest and least
    effective learning (worse than two 0-level
    agents).

42
Agent models and Q Learning
  • Q S ? An ? R, where n is the number of agents.
  • If other agents actions are not observable, need
    assumption for actions of other agents.
  • Pessimistic assumption given an agents action
    choice other agents will minimize reward.
  • Optimistic assumption other agents will maximize
    reward.

43
Agent Models and Q Learning
  • Pessimistic Assumption leads to overly cautious
    behavior.
  • Optimistic Assumption guarantees convergence
    towards optimum Lauer Riedmiller 00.
  • If knowledge of other agents behavior available,
    Q value update can be based on probabilistic
    computation Claus and Boutilier 98. But no
    guarantee of optimality.

44
Q Learning CommunicationTan 93
  • Types of communication
  • Sharing sensation
  • Sharing or merging policies
  • Sharing episodes
  • Results
  • Communication generally helps
  • Extra sensory information may hurt

45
Role Learning
  • Often useful for agents to specialize in specific
    roles for joint tasks.
  • Pre-defined roles reduce flexibility, often not
    easy to define optimal distribution, may be
    expensive.
  • How to learn roles?
  • Prasad et al. 96 learn optimal distribution of
    pre-defined roles.

46
Q Learning of roles
  • CritesBarto 98 elevator domain regular Q
    learning no specialization achieved (but highly
    efficient behavior).
  • OnoFukumoto 96 Hunter-Prey domain,
    specialization achieved with greatest mass
    merging strategy.

47
Q Learning of Roles Balch 99
  • Two main types of reward function local and
    global.
  • Global reward supports specialization.
  • Local reward supports emergence of homogeneous
    behaviors.
  • Some domains benefit from learning team
    heterogeneity (e.g., robotic soccer), others do
    not (e.g., multi-robot foraging).
  • Heterogeneity measure social entropy.

48
Distributed Learning
  • Motivation Agents learning a global hypothesis
    from local observations.
  • Application of MAS techniques to (inductive)
    learning.
  • Applications Distributed Data Mining Provost
    Kolluri 99, Robotic Soccer.

49
Distributed Data Mining
  • Provost Hennessy 96 Individual learners see
    only subset of all training examples and compute
    a set of local rules based on these.
  • Local rules are evaluated by other learners based
    on their data.
  • Only rules with good evaluation are carried over
    to the global hypothesis.

50
Learning to Coordinate
  • Good coordination is crucial for good MAS
    performance.
  • Example soccer team.
  • Pre-defined coordination protocols are often
    difficult to define in advance.
  • Needed learning of coordination.
  • Focus Q-learning of coordination.

51
Soccer Formation
52
Soccer Formation Control
  • Formation control is a coordination problem.
  • Good formations and set-plays seem to be a strong
    factor in winning teams.
  • To date pre-defined.
  • Can (near-)optimal formations be (reinforcement)
    learned?

53
A Sub-Problem
  • Given n agents at random positions, and a
    formation having n positions.
  • Wanted set of n policies that transforms initial
    state into the desired formation.
  • Specifically Q learning of these policies.

54
A Further Simplification
  • MAS Policy decision procedure who takes which
    position.
  • No two agents should choose the same formation
    position.
  • Problem reduces to reinforcement learning of
    coordination in cooperative games.

55
Cooperative Games
  • Players perform actions simultaneously.
  • Afterwards, all players receive the same reward
    based on the joint action.

Player 2
A1
A2
5
3
A1
Player 1
2
0
A2
56
Cooperative Games and Formations
  • Consider 2-player formation with 2 positions
    left, right.
  • Corresponding cooperative game

Player 2
left
right
0
5
left
Player 1
5
0
right
57
Learning in Cooperative Games
  • To date focus on Q-learning.
  • Is communication/observation amongst agents
    necessary?
  • Does this requirement change with increasing
    difficulty of the cooperative game?

58
Convergence
  • Single-agent Q-learning guaranteed convergence
    (to optimum).
  • Multi-agent Q-learning more assumptions needed.
  • Crucial in MAS action selection strategy.

59
Q Learning Revisited
  • Modified Q update function
  • Q(a) Q(a) ? (r Q(a))
  • Boltzmann action selection strategy

eEV(a)/T
P(a)
  • ?a eEV(a)/T

60
Boltzmann Exploration
  • Usually EV(a) Q(a).
  • Trade-off between exploration and exploitation.
  • Higher temperature T results in more emphasis on
    exploration.
  • Temperature T should be high at first, and
    lowered with time (T(t) e(-st)).

61
Q Learning of Coordination
  • Singh et al., 2000 convergence to some joint
    action can be ensured with specific temperature
    properties.
  • Convergence to optimal joint action for simple
    cases

Player 2
A1
A2
5
3
A1
Player 1
2
0
A2
62
Difficult Cooperative Games
  • Climbing Game Claus Boutillier, 98

Player 2
a
b
c
11
-30
0
a
-30
7
6
b
Player 1
0
0
5
c
63
Climbing Game
  • Multiplied Q learning with Boltzmann exploration
    converges to suboptimal (c,c).
  • C B, 98 Joint action learners (JAL).
  • Agents observe each others actions and build a
    probabilistic model, according to which the next
    action is chosen.
  • Agents get to (b,b) but are stuck there.

64
Climbing Game (cont.)
  • Optimistic assumption Lauer Riedmiller, 00
    never reduce Q-values due to penalties.
  • Converges quickly to optimal (a,a).
  • However, does not converge on stochastic version
    of climbing game.

65
Stochastic Climbing Game
Player 2
a
b
c
12/10
0/-60
0/-60
a
0/-60
14/0
8/4
b
Player 1
5/-5
5/-5
7/3
c
66
FMQ Heuristic
  • Kapetanakis Kudenko, 02
  • EV(a) Q(a) c freq(maxR(a)) maxR(a)
  • EV(a) carries information on how frequently an
    action produces its maximum corresponding reward.
  • Converges to optimal (a,a) for climbing game and
    partially stochastic climbing game.

67
Partially Stochastic Climbing Game
Player 2
a
b
c
11
0/-60
0/-60
a
0/-60
14/0
8/4
b
Player 1
5/-5
5/-5
7/3
c
68
Difficult Cooperative Games
  • Penalty Game Claus Boutillier, 98

Player 2
a
b
c
10
0
k
a
0
2
0
b
Player 1
k
0
10
c
69
Penalty Game
  • JAL convergence to optimal (a,a) or (c,c) only
    for small penalties k (kgt-20).
  • Both optimistic assumption and FMQ converge to
    either optimum also for large penalties (up to
    100).

70
Learning of Coordination More Questions
  • Scaling-up of Q learning approaches?
  • Agents with state Boutillier, 99.
  • Large numbers of actions/agents?
  • Learning of formations from non-explicit rewards?

71
Learning of Coordination Conclusions
  • Idealized and simple cases have been studied and
    solved.
  • Mutual communication/observation may not be
    needed.
  • Beyond Q learning Evolutionary approaches
    Quinn, 01.

72
Learning and Natural Selection
  • In learning, search is trivial, choosing the
    right bias is hard.
  • But, the choice of learning bias is always
    external to the learner !
  • To find the best suited bias one could combine
    arbitrary choices of bias with evolution and
    natural selection of the fittest individuals.

73
Darwinian vs. Lamarckian Evolution
  • Darwinian evolution nothing learned by the
    individual is encoded in the genes and passed on
    to the offspring.
  • The Baldwin effect learning abilities (good
    biases) are selected because they give the
    individual a better chance in a dynamic
    environment.
  • What is passed on to the offspring is useful, but
    very general.

74
Darwinian vs. Lamarckian Evolution (2)
  • Lamarckian Evolution individual experience
    acquired in life can be inherited.
  • Not the case in nature.
  • Doesnt mean we cant use it.
  • The inherited concepts may be too specific and
    not of general importance.

75
Learning and Language
  • Language uses concepts which are
  • specific enough to be useful to most/all speakers
    of that language
  • general enough to correspond to shared experience
    (otherwise, how would one know what the other is
    talking about !)
  • The concepts of a language serve as a learning
    bias which is inherited not in genes but
    through education.

76
Communication and Learning
  • Language
  • helps one learn (in addition to inherited biases)
  • allows to communicate knowledge.
  • Distinguish between
  • Knowledge things that one can explain by the
    means of a language to another.
  • Skills the rest, require individual learning,
    cannot be communicated.

77
Communication and Learning
  • In language learning, forgetting examples may
    be harmful (van den Bosch et al.)
  • An expert is someone who does not think anymore
    he knows. Frank Lloyd Wright.
  • It may be difficult to communicate what one has
    learned because of
  • Limited bandwidth (for lazy learning)
  • The absence of appropriate concepts in the
    language (for black-box learning)

78
Communication and Learning
  • In a society of communicating agents, less
    accurate white-box learning may be better than
    more accurate but expensive learning that cannot
    be communicated since the reduced performance
    could be outweighed by the much lower cost of
    learning.

79
Stochastic Simulation of Inherited Kinship-Driven
Altruism
Heather Turner and Dimitar Kazakov
  • Assess the rôle of a hypothetical inherited
    feature (gene) promoting altruism between
    relatives as a factor for survival

in the context of a simulated MAS employing
natural selection.
Studies the link b/w evolution and co-operation
80
Altruism
  • Definition A selfless behaviour/action that will
    provide benefit to another at no gain or to the
    detriment of the actor.
  • Kinship-driven altruism altruistic behaviour
    directed to relatives.

81
Natural Selection of Inherited Behaviour
  • Classical Darwinism- survival of the fittest
    individuals- fitness is the ability to
    reproduce- actions hindering fitness disappear
  • Neo-Darwinism (Dawkins, Hamilton)- genes rather
    than individuals selected- inclusive fitness of
    all copies of a gene- actions increasing
    inclusive fitness are promoted

82
Altruism and Natural Selection
  • Classical Darwinism- altruism hinders ones
    fitness, hence it should be demoted by natural
    selection
  • Neo-Darwinism- altruistic acts can influence
    positively the inclusive fitness of a gene in the
    population- e.g. kinship-driven altruism

83
Kinship-Driven Altruism Example
?
?
Case A
Case B
84
Kinship-Driven Altruism Example
  • A self-sacrifice is justified as it removes one
    rather than 1.5 copies of each gene.
  • Hamilton (1964) presents a detailed analytical
    model and supports it with evidence from nature.

85
Kinship-Driven Altruism
  • Extent of help f(degree of kinship)
  • Altruistic behaviour based on a well-chosen f
    could increase the inclusive fitness of all genes
    of its carriers, i.e., would help propagate
    them.
  • If altruism was an inherited feature (gene), it
    could be propagated itself for the same reason.

86
One Day in the Life of an Agent
  • Age and maybe die.
  • Mate
  • Hunt
  • Help a relative.
  • Death, finding a mate, food or relative modelled
    as stochastic processes.

87
Altruistic Behaviour
  • If you meet someone poorer than you
  • Use your sharing function to decide how much you
    would give an identical twin
  • Then reduce the amount according to the perceived
    degree of kinship (expected average percentage of
    shared genes), e.g. by half for a child

88
Experiments Degrees of Freedom
  • Type of sharing function
  • Models of the degree of kinship
  • Initial ratio b/w selfish and altruist
    individuals
  • (Hunting and mating gambling policies always
    subject to evolution.)

89
Type of Sharing Function
  • Communism
  • Progressive Taxation with a non-taxable
    allowance
  • Poll Tax
  • Pay the same amount pt, even if that kills you
  • q, a, pt inherited, subject to nat. selection

90
Modelling the Degree of Kinship
gp
gp
25
  • Royalty the entire family tree known.- in
    practice, just two generations back and forth.

par
par
50
half sib
full sib
100
25
50
?
Prediction based on similarity of visible
inherited features. Unknown indiscriminate,
optimistic (share with everyone as if child).
91
Results Population Size
Royalty
Prediction
Unknown
Communism
Progressive Taxation
Poll Tax
92
Results Percentage of Altruists
Royalty
Prediction
Unknown
?
Communism
?
?
?
Progressive Taxation
?
Poll Tax
93
Results Initial Percentage of Altruists
  • Royalty model, progressive taxation, initial
    levels of altruists 0, 25, 50, 75, 100.
  • Converge to the same ratio of altruists in the
    population.

94
Conclusions
  • Perfect knowledge of the degree of kinship or a
    sharing function based on progressive taxation
    promote altruism.
  • Progressive taxation supports a more altruistic
    population than communism (in the limit) when
    knowledge of kinship is uncertain.

95
Contribution
  • Replicate the natural phenomenon of kinship
    altruism in a simulated MAS.
  • Implement a model of natural selection different
    from the one commonly used in GA and MAS and
    closer to nature.

96
Bibliography
Alonso, E., D. Kudenko and D. Kazakov (eds.)
(2002). Proc. of the Second Symposium on Adaptive
Agents and Multi-Agent Systems, Imperial College,
London, ISBN 1902956280. Alonso, E. Kudenko,
D. (eds.) (2001). Proc. of the Symposium on
Adaptive Agents and Multi-Agent Systems,
University of York, UK. Baldwin, J.M. (1896). A
new factor in evolution. The American Naturalist
30. Bryant, C.H. Muggleton, S. (2000). Closed
loop machine learning. Technical Report YCS 330,
University of York, Department of Computer
Science, Heslington, York, UK. Boutillier, 99
C. Boutillier "Sequential Optimality and
Coordination in Multiagent Systems, IJCAI
99. Claus Boutillier 98 C. Claus and C.
Boutillier. The Dynamics of Reinforcement
Learning in Cooperative Multiagent Systems. AAAI
98. Hamilton 64 W.D. Hamilton. The genetical
evolution of social behaviour (I and II). Journal
of Theor. Biology. 1964. Lauer Riedmiller 00
M. Lauer and M. Riedmiller. An Algorithm for
Distributed Reinforcement Learning in Cooperative
Multi-Agent Systems. In Proc. of the 17th
International Conference in Machine Learning,
2000.
Mitchell 97 T. Mitchell. Machine Learning.
McGraw Hill, 1997. Mundhe Sen 00 M. Mundhe
and S. Sen. Evaluating Concurrent Reinforcement
Learners. Proceedings of the Fourth International
Conference on Multiagent Systems, IEEE Press,
2000. Quinn, 01 M. Quinn "Evolving
communication without dedicated communication
channels", ECAL '01, Springer LNCS
2159. Rissanen, J. (1989). Stochastic Complexity
in Statistical Enquiry. World Scientific
Pubishing Co, Singapore. Thompson, C., Califf,
M.E. Mooney, R. (1999). Active learning for
natural language parsing and information
extraction. Proceedings of the Sixteenth
International Conference on Machnine
Learning. Vidal Durfee 97 J.M. Vidal and E.
Durfee. Agents Learning about Agents A Framework
and Analysis. In Working Notes of the AAAI-97
workshop on Multiagent Learning, 1997. Weiss
Dilelnbourg 99 G. Weiss and P. Dillenbourg. What
is Multi in Multi-Agent Learning? In P.
Dillenbourg (ed.), Collaborative Learning.
Cognitive and Computational Approaches. Pergamon
Press, 1999.
Write a Comment
User Comments (0)