Title: ACQ and the Basal Ganglia
1ACQ and the Basal Ganglia
- Jimmy Bonaiuto
- USC Brain Project
- 6/26/2007
2Actor-Critic Learning
- Actor learns action policy
- Critic learns value functions
- Different actor-critic architectures have been
proposed for learning different value functions - V(s) State values (most common)
- V(a) Action values
- Q(s,a) State, action pair values
3Actor-Critic Architecture
- Core Data recording of midbrain dopaminergic
neurons in appetitive learning tasks (Schultz,
1992 Schultz, 1998)
(from Barto, 1995)
4Critic V(s), V(a), or Q(s,a)?
- How do dopamine cells know about reward value?
- Largest striatum input is from cortex (Haber and
Gdowski, 2004) - V(s) and Q(s,a) learning may require the ventral
striatum, SNc, and/or VTA to receive a copy of
the same cortical projections that the dorsal
striatum receives (state information) - V(a) may only require a projection from the
dorsal striatum or globus pallidus (actor) to the
ventral striatum, SNc and/or VTA (critic) - Largest forebrain input to dopamine neurons is
striatum (Haber and Gdowski, 2004) - V(a) may be more biologically plausible in terms
of connectivity
5Actor-Critic in the Basal Ganglia
- Dopamine targets (striatum) are site of value and
policy learning (Suri Schultz, 2001) - The striatum split into dorsal and ventral
divisions (some say dorsolateral and
ventromedial) (Voorn et al., 2004) - Ventral striatum inputs from limbic structures
(critic?) - Dorsal striatum connected with motor and
associative cortices (actor?)
6Role of Dopamine
- (Joel Weiner, 2000) Dopamine neurons in the
ventral tegmental area (VTA) and substantia
nigra pars compacta (SNc) - VTA projects to ventral striatum learning state
values - SNc projects to dorsal striatum policy learning
- Little difference in VTA and SNc firing (Schultz
et al., 1993) - Predicted by TD learning equation since the
policy and values are both updated using TD error
7ACQ
- Reinforcement learning should maximize total
utility, not necessarily total reward.
Motivations map outcomes to utilities (Niv et
al., 2006) - Multiple critics one for each dimension of
interoception (hunger, thirst, etc.) - Q(s ,a), s internal state, aaction
- Actor
- Composite policy
- Desirability based on internal state
- Executability based on environmental state
- Eligibility trace from mirror and canonical motor
signals
i
i
8ACQ Actor/Multiple Critics
xexecuted action xrecognized action
9ACQ - Eligibility Trace
- executed action (from efference copy)
- recognized action (from mirror system)
Idealized situations (perfect recognition) Realist
ic implementation would have confidence values
between 0.0 and 1.0 for x and x, but the pattern
of values for e would be the same
10ACQ - Weight Modification
- Desirability and Executability updated using same
eligibility and reinforcement signals - Requires different weight change rules
- Desirability
- Executability
Dont update the value of the last action unless
some action is currently recognized
Step function of eligibility trace Makes sign
of weight change depend on r(t)
Tonic dopamine level, d, added to TD error
Makes sign of weight change depend on e(t)
11Multiple Critics Q(s ,a)
i
- Is there evidence for multiple critics gated by
interoceptive information? - The lateral hypothalamus does project to the SNc,
VTA, and the ventral striatum (Saper et al.,
1979 Fadel Deutch, 2002 Brog et al., 1993) - The accumbens shell of the ventral striatum is
reciprocally connected with the lateral
hypothalamus and has been called a sensory
sentinel or visceral striatum (Kelley, 1999,
2004) - Motivational state, such as food deprivation can
influence the magnitude of dopamine release in
the ventral striatum (Wilson et al., 1995 Ahn
Phillips, 1999) - Sexual satiety is signaled by serotonin from the
lateral hypothalamus to the ventral striatum,
which reduces dopamine levels (Lorrain et al.,
1999)
12Internal State-Dependent Policy
- Is there evidence for internal state-dependent
policies? (Kelley et al., 2005) - Information from the lateral hypothalamus reaches
the dorsal striatum through the paraventricular
nucleus - Hypothalamic-midline thalamic-striatal
projections carry internal state information to
cholinergic interneurons of the dorsal striatum - These are thought to modulate dorsal striatal
output neurons
13Eligibility Trace from the Mirror System
- What is the evidence for an eligibility signal
from mirror neurons? - People can implicitly learn sequences through
action observation (Bird et al., 2005) - The striatum is consistently implicated in
implicit sequence learning and the magnitude of
activation is correlated with reaction time
improvement (Rauch et al., 1997, 1998) - The basal ganglia is active during action
observation (Frey Gerry, 2006) - Projection from ventral premotor cortex
(including the arcuate sulcus) to dorsal and
ventral striatum in the macaque (McFarland
Haber, 2000)
14References
- Ahn S, Phillips AG (1999) Dopaminergic Correlates
of Sensory-Specific Satiety in the Medial
Prefrontal Cortex and Nucleus Accumbens of the
Rat. The Journal of Neuroscience, 19RC291-6. - Bird G, Osman M, Saggerson A, Heyes C (2005)
Sequence learning by action, observation and
action observation. British Journal of
Psychology, 96 371388. - Brog JS, Salyapongse A, Deutch AY, Zahm DS (1993)
The patterns of afferent innervation of the core
and shell in the Accumbens part of the rat
ventral striatum Immunohistochemical detection
of retrogradely transported fluoro-gold. The
Journal of Comparative Neurology, 338(2)
255-278. - Fadel J, Deutch AY (2002) Anatomical Substrates
of Orexin-Dopamine Interactions Lateral
hypothalamic projections to the ventral tegmental
area. Neuroscience, 111(2) 379-387. - Frey SH, Gerry VE (2006) Modulation of Neural
Activity during Observational Learning of Actions
and Their Sequential Orders. The Journal of
Neuroscience, 26(51)13194-13201. - Haber SN, Gdowski MJ (2004) The basal ganglia.
In The human nervous system (Paxinos G, Mai JK,
eds) Ed 2 pp. 676738. New York Elsevier
Academic. - D. Joel and I. Weiner. The connections of the
dopaminergic system with the striatum in rats and
primates An analysis with respect to the
functional and compartmental organization of the
striatum. Neuroscience, 96451474, 2000. - Kelley AE (1999) Functional Specificity of
Ventral Striatal Compartments in Appetitive
Behaviors. Annals New York Academy of Sciences. - Kelley AE (2004) Ventral striatal control of
appetitive motivation role in ingestive behavior
and reward-related learning. Neurosci Biobehav
Rev, 27 765-776. - Kelley AE, Baldo BA, Pratt WE (2005) A proposed
hypothalamic-thalamic-striatal axis for the
integration of energy balance, arousal, and food
reward. J Comp Neurol. 493(1)72-85.
15References
- Lorrain DS, Riolo JV, Matuszewich L, Hull EM
(1999) Lateral Hypothalamic Serotonin Inhibits
Nucleus Accumbens Dopamine Implications for
Sexual Satiety. The Journal of Neuroscience,
19(17)7648-7652. - McFarland NR, Haber SN (2000) Convergent Inputs
from Thalamic Motor Nuclei and Frontal Cortical
Areas to the Dorsal Striatum in the Primate. The
Journal of Neuroscience, 20(10) 37983813. - Niv Y, Joel D, Dayan P (2006) A normative
perspective on motivation. Trends in Cognitive
Sciences, 10(8) 375-381. - Rauch SL, Whalen PJ, Savage CR, Curran T,
Kendrick A, Brown HD, Bush G, Breiter HC, Rosen
BR (1997) Striatal Recruitment During an Implicit
Sequence Learning Task as Measured by Functional
Magnetic Resonance Imaging. Human Brain Mapping
5124132. - Rauch SL, Whalen PJ, Curran T, McInerney S,
Heckers S, Savage CR (1998) Thalamic deactivation
during early implicit sequence learning a
functional MRI study. NeuroReport, 9 865870. - Saper, C.B. Swanson, L.W. Cowan, W.M. (1979) An
autoradiographic study of the efferent
connections of the lateral hypothalamic area in
the rat. J Comp Neurol., 183(4) 689-706. - W. Schultz. Activity of dopamine neurons in the
behaving primate. Seminars in the Neurosciences,
4129138, 1992. - W. Schultz. Predictive reward signal of dopamine
neurons. Journal of Neurophysiology, 80127,
1998. - W. Schultz, P. Apicella, and T. Ljungberg.
Responses of monkey dopamine neurons to reward
and conditioned stimuli during successive steps
of learning a delayed response task. Journal of
Neuroscience, 13900913, 1993. - R. E. Suri and W. Schultz. Temporal difference
model reproduces predictive neural activity.
Neural Computation, 13841862, 2001. - P. Voorn, L. J. Vanderschuren, H. J. Groenewegen,
T. W. Robbins, and C. M. Pennartz. Putting a spin
on the dorsal-ventral divide of the striatum.
Trends in Neuroscience, 27468474, 2004. - Wilson C, Nomikos GG, Collu M, Fibiger HC (1995)
Dopaminergic correlates of motivated behavior
importance of drive. Journal of Neuroscience, 15
5169-5178.