Computational Neuroscience of Reinforcement Learning - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Computational Neuroscience of Reinforcement Learning

Description:

The reinforcement learning framework was originally developed to ... When DA is blocked, animals used to running in a maze to get reward tend to stay still. ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 33
Provided by: francoi48
Category:

less

Transcript and Presenter's Notes

Title: Computational Neuroscience of Reinforcement Learning


1
Computational Neuroscience of Reinforcement
Learning
  • François Rivest
  • Sept 25 2007
  • McGill University

Affiliation Département dinformatique et de
recherche opérationnelle Groupe de recherche sur
le système nerveux central Université de Montréal
2
Computational Neuroscience of Reinforcement
Learning
3
What is Reinforcement Learning?
  • The reinforcement learning framework was
    originally developed to create a neural model of
    animal learning behaviors (by Andrew Barto in the
    80's).
  • The framework is the following
  • The agent is in an environment
  • It sees a state, takes an action
  • It receives a reward, and ends up in a new state
  • The agent's goal is to maximize its reward
  • This led to the development of the
    temporal-difference learning algorithm.

The reward is the reinforcement signal
4
Temporal Difference Learning
  • The key idea in TD is to learn a function of the
    state that is an estimate of the sum of future
    rewards (to be expected from that state)
  • The TD trick to learn this is that the current
    estimate should be equal to the next state
    estimate plus the reward received.

Long-term estimate of reward
5
Temporal Difference Learning
  • Thus we would like
  • The error in the current state estimate is
    therefore the temporal difference in estimate
    (the difference between current and next time
    step)

Called effective reinforcement signal!
6
Example The expected time in traffic
7
Linear Model for Neuron V
  • Assume a linear neuron fed by some other neurons
    xk
  • To minimize the error dt
  • We take the gradient of the squared error and
    find the learning rule

8
Demo
  • Conditioning
  • Masking
  • Secondary conditioning
  • Note that this TD A-C model covers far more
    conditioning cases than Rescorla-Wagner model
    (secondary conditioning , masking, etc..) (1)

9
The Actor-Critic Model
This is a policy
10
The Actor-Critic Model
  • (Stochastic) Softmax rule
  • Similar to hardmax or winner-take-all.
  • But the probability that a given actor unit will
    be the winner (active neuron) is proportional to
    its activation.
  • Intuitive learning rule (for winner unit aj)
  • More reward than expected means good alternative,
    strengthen the state/action association
  • Less reward than expected means bad alternative,
    weaken the state/action association

Hebbian?
11
Example Finding the traffic shortcut
12
Demo
  • Sequence Learning(1)

13
How is this Related to Neurophysiology?
Striatum
Frontal Cortex
SNc
VTA
14
The Actor-Critic Model
15
TD Error Signal and Dopaminergic Neurons
16
Incentive Saliency
17
Incentive Saliency
  • When DA is blocked, animals used to running in a
    maze to get reward tend to stay still. But if
    placed manually near the reward, they take as
    much of it as they normally would.
  • Question
  • How can d (DA) also play a role in
    action-selection modulation (real-time)?

18
Incentive Saliency
  • Assume that, to trigger an action, the winning
    actor must reach a minimal activation threshold
  • How could we use DA signal to modulate activity?
  • Now, what happens when we block DA in this
    modified model?

19
Drug Addiction
20
Relations to Drug Addiction
  • Some addictive drugs increase the dopamine level.
  • Questions? (without incentive saliency)
  • What does it mean in terms of reward estimate?
  • What does it mean in terms of action learning
    (conditioning)?
  • Is there hope for unlearning?
  • What would be the effect of incentive saliency?

21
The Actor-Critic Model
22
Relation to Representation Learning
23
Relation to Representation Learning
  • Assumes the cortex learns by experimenting
  • Basal ganglia A-C TD uses cortical inputs as part
    of its representation.
  • Assume novelty (something we can't yet represent
    is novel) is rewarding (1).
  • How could this be helpful in learning new
    representations?
  • Can this be related to play?

24
Relation to Representation Learning
Motor Output
Actors
V
Reward
DA
Novelty Signal
Frontal Cortex
Higher Cortex
Primary Cortex
25
Relation to Cognition, Frontal Cortex, and Gating
26
DA and Frontal cortex
Striatum
Frontal Cortex
SNc
VTA
27
Working Memory (and Gating)
  • Frontal cortex is often considered to implement
    working memory
  • Working memory must contain some form of gating

28
Working Memory (and Gating)
  • Frontal cortex differs from cortex by its DA input

29
How could DA act on gates (directly)?
  • By indicating when to shift memory content or
    memory context (WCS example)? (1)
  • By indicating salient events?
  • Novelty?
  • Prediction error?
  • Reward related stimulus?
  • By directing attention?
  • Incentive saliency in goals?

30
How could DA act on learning (indirectly)?
  • By guiding learning
  • What to learn (structural abstraction)?
  • When to learn (temporal abstraction)?
  • Positive or negative corrections?
  • Large or small corrections?

31
DA and Frontal cortex
Striatum
Frontal Cortex
SNc
VTA
32
Questions?
Email francois.rivest_at_mail.mcgill.ca Blog
www-etud.iro.umontreal.ca/rivestfr/wordpress
Write a Comment
User Comments (0)
About PowerShow.com