Computational Neuroscience of Reinforcement Learning - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Computational Neuroscience of Reinforcement Learning

Description:

The reinforcement learning framework was originally developed to ... When DA is blocked, animals used to running in a maze to get reward tend to stay still. ... – PowerPoint PPT presentation

Number of Views:188

Avg rating:3.0/5.0

Slides: 33

Provided by: francoi48

Category:

more less

Transcript and Presenter's Notes

Title: Computational Neuroscience of Reinforcement Learning

1
Computational Neuroscience of Reinforcement
Learning

François Rivest
Sept 25 2007
McGill University

Affiliation Département dinformatique et de
recherche opérationnelle Groupe de recherche sur
le système nerveux central Université de Montréal
2
Computational Neuroscience of Reinforcement
Learning
3
What is Reinforcement Learning?

The reinforcement learning framework was
originally developed to create a neural model of
animal learning behaviors (by Andrew Barto in the
80's).
The framework is the following
The agent is in an environment
It sees a state, takes an action
It receives a reward, and ends up in a new state
The agent's goal is to maximize its reward
This led to the development of the
temporal-difference learning algorithm.

The reward is the reinforcement signal
4
Temporal Difference Learning

The key idea in TD is to learn a function of the
state that is an estimate of the sum of future
rewards (to be expected from that state)
The TD trick to learn this is that the current
estimate should be equal to the next state
estimate plus the reward received.

Long-term estimate of reward
5
Temporal Difference Learning

Thus we would like
The error in the current state estimate is
therefore the temporal difference in estimate
(the difference between current and next time
step)

Called effective reinforcement signal!
6
Example The expected time in traffic
7
Linear Model for Neuron V

Assume a linear neuron fed by some other neurons
xk
To minimize the error dt
We take the gradient of the squared error and
find the learning rule

8
Demo

Conditioning
Masking
Secondary conditioning

Note that this TD A-C model covers far more
conditioning cases than Rescorla-Wagner model
(secondary conditioning , masking, etc..) (1)

9
The Actor-Critic Model
This is a policy
10
The Actor-Critic Model

(Stochastic) Softmax rule
Similar to hardmax or winner-take-all.
But the probability that a given actor unit will
be the winner (active neuron) is proportional to
its activation.
Intuitive learning rule (for winner unit aj)
More reward than expected means good alternative,
strengthen the state/action association
Less reward than expected means bad alternative,
weaken the state/action association

Hebbian?
11
Example Finding the traffic shortcut
12
Demo

Sequence Learning(1)

13
How is this Related to Neurophysiology?
Striatum
Frontal Cortex
SNc
VTA
14
The Actor-Critic Model
15
TD Error Signal and Dopaminergic Neurons
16
Incentive Saliency
17
Incentive Saliency

When DA is blocked, animals used to running in a
maze to get reward tend to stay still. But if
placed manually near the reward, they take as
much of it as they normally would.
Question
How can d (DA) also play a role in
action-selection modulation (real-time)?

18
Incentive Saliency

Assume that, to trigger an action, the winning
actor must reach a minimal activation threshold
How could we use DA signal to modulate activity?
Now, what happens when we block DA in this
modified model?

19
Drug Addiction
20
Relations to Drug Addiction

Some addictive drugs increase the dopamine level.
Questions? (without incentive saliency)
What does it mean in terms of reward estimate?
What does it mean in terms of action learning
(conditioning)?
Is there hope for unlearning?
What would be the effect of incentive saliency?

21
The Actor-Critic Model
22
Relation to Representation Learning
23
Relation to Representation Learning

Assumes the cortex learns by experimenting
Basal ganglia A-C TD uses cortical inputs as part
of its representation.
Assume novelty (something we can't yet represent
is novel) is rewarding (1).
How could this be helpful in learning new
representations?
Can this be related to play?

24
Relation to Representation Learning
Motor Output
Actors
V
Reward
DA
Novelty Signal
Frontal Cortex
Higher Cortex
Primary Cortex
25
Relation to Cognition, Frontal Cortex, and Gating
26
DA and Frontal cortex
Striatum
Frontal Cortex
SNc
VTA
27
Working Memory (and Gating)

Frontal cortex is often considered to implement
working memory
Working memory must contain some form of gating

28
Working Memory (and Gating)

Frontal cortex differs from cortex by its DA input

29
How could DA act on gates (directly)?

By indicating when to shift memory content or
memory context (WCS example)? (1)
By indicating salient events?
Novelty?
Prediction error?
Reward related stimulus?
By directing attention?
Incentive saliency in goals?

30
How could DA act on learning (indirectly)?

By guiding learning
What to learn (structural abstraction)?
When to learn (temporal abstraction)?
Positive or negative corrections?
Large or small corrections?

31
DA and Frontal cortex
Striatum
Frontal Cortex
SNc
VTA
32
Questions?
Email francois.rivest_at_mail.mcgill.ca Blog
www-etud.iro.umontreal.ca/rivestfr/wordpress

Write a Comment

User Comments (0)