ACQ and the Basal Ganglia - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

ACQ and the Basal Ganglia

Description:

Different actor-critic architectures have been proposed for learning ... premotor cortex (including the arcuate sulcus) to dorsal and ventral striatum in ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 16
Provided by: jamesbo2
Category:

less

Transcript and Presenter's Notes

Title: ACQ and the Basal Ganglia


1
ACQ and the Basal Ganglia
  • Jimmy Bonaiuto
  • USC Brain Project
  • 6/26/2007

2
Actor-Critic Learning
  • Actor learns action policy
  • Critic learns value functions
  • Different actor-critic architectures have been
    proposed for learning different value functions
  • V(s) State values (most common)
  • V(a) Action values
  • Q(s,a) State, action pair values

3
Actor-Critic Architecture
  • Core Data recording of midbrain dopaminergic
    neurons in appetitive learning tasks (Schultz,
    1992 Schultz, 1998)

(from Barto, 1995)
4
Critic V(s), V(a), or Q(s,a)?
  • How do dopamine cells know about reward value?
  • Largest striatum input is from cortex (Haber and
    Gdowski, 2004)
  • V(s) and Q(s,a) learning may require the ventral
    striatum, SNc, and/or VTA to receive a copy of
    the same cortical projections that the dorsal
    striatum receives (state information)
  • V(a) may only require a projection from the
    dorsal striatum or globus pallidus (actor) to the
    ventral striatum, SNc and/or VTA (critic)
  • Largest forebrain input to dopamine neurons is
    striatum (Haber and Gdowski, 2004)
  • V(a) may be more biologically plausible in terms
    of connectivity

5
Actor-Critic in the Basal Ganglia
  • Dopamine targets (striatum) are site of value and
    policy learning (Suri Schultz, 2001)
  • The striatum split into dorsal and ventral
    divisions (some say dorsolateral and
    ventromedial) (Voorn et al., 2004)
  • Ventral striatum inputs from limbic structures
    (critic?)
  • Dorsal striatum connected with motor and
    associative cortices (actor?)

6
Role of Dopamine
  • (Joel Weiner, 2000) Dopamine neurons in the
    ventral tegmental area (VTA) and substantia
    nigra pars compacta (SNc)
  • VTA projects to ventral striatum learning state
    values
  • SNc projects to dorsal striatum policy learning
  • Little difference in VTA and SNc firing (Schultz
    et al., 1993)
  • Predicted by TD learning equation since the
    policy and values are both updated using TD error

7
ACQ
  • Reinforcement learning should maximize total
    utility, not necessarily total reward.
    Motivations map outcomes to utilities (Niv et
    al., 2006)
  • Multiple critics one for each dimension of
    interoception (hunger, thirst, etc.)
  • Q(s ,a), s internal state, aaction
  • Actor
  • Composite policy
  • Desirability based on internal state
  • Executability based on environmental state
  • Eligibility trace from mirror and canonical motor
    signals

i
i
8
ACQ Actor/Multiple Critics
xexecuted action xrecognized action

9
ACQ - Eligibility Trace
  • executed action (from efference copy)
  • recognized action (from mirror system)

Idealized situations (perfect recognition) Realist
ic implementation would have confidence values
between 0.0 and 1.0 for x and x, but the pattern
of values for e would be the same

10
ACQ - Weight Modification
  • Desirability and Executability updated using same
    eligibility and reinforcement signals
  • Requires different weight change rules
  • Desirability
  • Executability

Dont update the value of the last action unless
some action is currently recognized
Step function of eligibility trace Makes sign
of weight change depend on r(t)

Tonic dopamine level, d, added to TD error
Makes sign of weight change depend on e(t)
11
Multiple Critics Q(s ,a)
i
  • Is there evidence for multiple critics gated by
    interoceptive information?
  • The lateral hypothalamus does project to the SNc,
    VTA, and the ventral striatum (Saper et al.,
    1979 Fadel Deutch, 2002 Brog et al., 1993)
  • The accumbens shell of the ventral striatum is
    reciprocally connected with the lateral
    hypothalamus and has been called a sensory
    sentinel or visceral striatum (Kelley, 1999,
    2004)
  • Motivational state, such as food deprivation can
    influence the magnitude of dopamine release in
    the ventral striatum (Wilson et al., 1995 Ahn
    Phillips, 1999)
  • Sexual satiety is signaled by serotonin from the
    lateral hypothalamus to the ventral striatum,
    which reduces dopamine levels (Lorrain et al.,
    1999)

12
Internal State-Dependent Policy
  • Is there evidence for internal state-dependent
    policies? (Kelley et al., 2005)
  • Information from the lateral hypothalamus reaches
    the dorsal striatum through the paraventricular
    nucleus
  • Hypothalamic-midline thalamic-striatal
    projections carry internal state information to
    cholinergic interneurons of the dorsal striatum
  • These are thought to modulate dorsal striatal
    output neurons

13
Eligibility Trace from the Mirror System
  • What is the evidence for an eligibility signal
    from mirror neurons?
  • People can implicitly learn sequences through
    action observation (Bird et al., 2005)
  • The striatum is consistently implicated in
    implicit sequence learning and the magnitude of
    activation is correlated with reaction time
    improvement (Rauch et al., 1997, 1998)
  • The basal ganglia is active during action
    observation (Frey Gerry, 2006)
  • Projection from ventral premotor cortex
    (including the arcuate sulcus) to dorsal and
    ventral striatum in the macaque (McFarland
    Haber, 2000)

14
References
  • Ahn S, Phillips AG (1999) Dopaminergic Correlates
    of Sensory-Specific Satiety in the Medial
    Prefrontal Cortex and Nucleus Accumbens of the
    Rat. The Journal of Neuroscience, 19RC291-6.
  • Bird G, Osman M, Saggerson A, Heyes C (2005)
    Sequence learning by action, observation and
    action observation. British Journal of
    Psychology, 96 371388.
  • Brog JS, Salyapongse A, Deutch AY, Zahm DS (1993)
    The patterns of afferent innervation of the core
    and shell in the Accumbens part of the rat
    ventral striatum Immunohistochemical detection
    of retrogradely transported fluoro-gold. The
    Journal of Comparative Neurology, 338(2)
    255-278.
  • Fadel J, Deutch AY (2002) Anatomical Substrates
    of Orexin-Dopamine Interactions Lateral
    hypothalamic projections to the ventral tegmental
    area. Neuroscience, 111(2) 379-387.
  • Frey SH, Gerry VE (2006) Modulation of Neural
    Activity during Observational Learning of Actions
    and Their Sequential Orders. The Journal of
    Neuroscience, 26(51)13194-13201.
  • Haber SN, Gdowski MJ (2004) The basal ganglia.
    In The human nervous system (Paxinos G, Mai JK,
    eds) Ed 2 pp. 676738. New York Elsevier
    Academic.
  • D. Joel and I. Weiner. The connections of the
    dopaminergic system with the striatum in rats and
    primates An analysis with respect to the
    functional and compartmental organization of the
    striatum. Neuroscience, 96451474, 2000.
  • Kelley AE (1999) Functional Specificity of
    Ventral Striatal Compartments in Appetitive
    Behaviors. Annals New York Academy of Sciences.
  • Kelley AE (2004) Ventral striatal control of
    appetitive motivation role in ingestive behavior
    and reward-related learning. Neurosci Biobehav
    Rev, 27 765-776.
  • Kelley AE, Baldo BA, Pratt WE (2005) A proposed
    hypothalamic-thalamic-striatal axis for the
    integration of energy balance, arousal, and food
    reward. J Comp Neurol. 493(1)72-85.

15
References
  • Lorrain DS, Riolo JV, Matuszewich L, Hull EM
    (1999) Lateral Hypothalamic Serotonin Inhibits
    Nucleus Accumbens Dopamine Implications for
    Sexual Satiety. The Journal of Neuroscience,
    19(17)7648-7652.
  • McFarland NR, Haber SN (2000) Convergent Inputs
    from Thalamic Motor Nuclei and Frontal Cortical
    Areas to the Dorsal Striatum in the Primate. The
    Journal of Neuroscience, 20(10) 37983813.
  • Niv Y, Joel D, Dayan P (2006) A normative
    perspective on motivation. Trends in Cognitive
    Sciences, 10(8) 375-381.
  • Rauch SL, Whalen PJ, Savage CR, Curran T,
    Kendrick A, Brown HD, Bush G, Breiter HC, Rosen
    BR (1997) Striatal Recruitment During an Implicit
    Sequence Learning Task as Measured by Functional
    Magnetic Resonance Imaging. Human Brain Mapping
    5124132.
  • Rauch SL, Whalen PJ, Curran T, McInerney S,
    Heckers S, Savage CR (1998) Thalamic deactivation
    during early implicit sequence learning a
    functional MRI study. NeuroReport, 9 865870.
  • Saper, C.B. Swanson, L.W. Cowan, W.M. (1979) An
    autoradiographic study of the efferent
    connections of the lateral hypothalamic area in
    the rat. J Comp Neurol., 183(4) 689-706.
  • W. Schultz. Activity of dopamine neurons in the
    behaving primate. Seminars in the Neurosciences,
    4129138, 1992.
  • W. Schultz. Predictive reward signal of dopamine
    neurons. Journal of Neurophysiology, 80127,
    1998.
  • W. Schultz, P. Apicella, and T. Ljungberg.
    Responses of monkey dopamine neurons to reward
    and conditioned stimuli during successive steps
    of learning a delayed response task. Journal of
    Neuroscience, 13900913, 1993.
  • R. E. Suri and W. Schultz. Temporal difference
    model reproduces predictive neural activity.
    Neural Computation, 13841862, 2001.
  • P. Voorn, L. J. Vanderschuren, H. J. Groenewegen,
    T. W. Robbins, and C. M. Pennartz. Putting a spin
    on the dorsal-ventral divide of the striatum.
    Trends in Neuroscience, 27468474, 2004.
  • Wilson C, Nomikos GG, Collu M, Fibiger HC (1995)
    Dopaminergic correlates of motivated behavior
    importance of drive. Journal of Neuroscience, 15
    5169-5178.
Write a Comment
User Comments (0)
About PowerShow.com