Title: Learning and Metalearning
1Learning and Meta-learning
- computation
- making predictions
- choosing actions
- acquiring episodes
- statistics
- algorithm
- gradient ascent (eg of the likelihood)
- correlation
- Kalman filtering
- implementation
- Hebbian synaptic plasticity
- neuromodulation
2Conditioning
prediction of important events control in
the light of those predictions
- Ethology
- optimality
- appropriateness
- Psychology
- classical/operant
- conditioning
- Computation
- dynamic progr.
- Kalman filtering
- Algorithm
- TD/delta rules
- simple weights
neuromodulators amygdala OFC nucleus accumbens
dorsal striatum
3Computational Neuromodulation
- general excitability, signal/noise ratios
- specific prediction errors, uncertainty signals
4Data
5Data
6Learning and Inference
- Learning predict control
- ? weight ? (learning rate) x (error) x (stimulus)
- dopamine
- phasic prediction error for future reward
- serotonin
- phasic prediction error for future punishment
- acetylcholine
- expected uncertainty boosts learning
- norepinephrine
- unexpected uncertainty boosts learning
7Learning and Inference
context
expected uncertainty
unexpected uncertainty
top-down processing
NE
ACh
cortical processing
prediction, learning, ...
bottom-up processing
sensory inputs
8Plan
- introduction
- prediction learning RW, TD
- action learning
- generalization, opponency
- intra-trial uncertainty
- (inter-trial uncertainty)
9Rescorla-Wagner Rule
Pavlovian conditioning, blocking, inhibitory
conditioning, etc
minimize
10BUT secondary conditioning
11Suttons TD
12Rewards rather than Punishments
TD error
R
L
V(t)
R
no prediction
prediction, reward
prediction, no reward
dopamine cells in VTA/SNc
Schultz et al
13More Completely
(if you can count time since the stimulus was
turned on)
14Dopamine
15Temporal Difference Prediction Error
High Pain
0.8
1.0
0.2
0.2
Low Pain
0.8
1.0
predict sum future pain
TD error
16Temporal Difference Prediction Error
TD error
Prediction error
Value
High Pain
0.8
1.0
0.2
0.2
Low Pain
0.8
1.0
17Temporal Difference Prediction Error
experimental sequence..
A B HIGH C D LOW C B HIGH
A B HIGH A D LOW C D
LOW A B HIGH A B HIGH C
D LOW C B HIGH
MR scanner
TD model
Brain responses
?
Ben Seymour John ODoherty
18 TD prediction error ventral striatum
Z-4
R
19Temporal Difference Values
dorsal raphe?
right anterior insula
20Unexpected - Expected
(Montague et al)
(ventral striatum we also find OFC and some
DLPFC)
21Plan
- introduction
- prediction learning RW, TD
- action learning
- generalization, opponency
- intra-trial uncertainty
- inter-trial uncertainty
22Action Selection
23Indirect Actor
24Direct Actor
25Action Selection
start with policy
evaluate it
A
B
C
improve it
1
-1
thus choose L more frequently than R
26Policy
- value is too pessimistic
- action is better than average
27Advantages
- indirect actor in maze is
- Q(state,action)
- learn using standard TD
- softmax
- advantages
- learned using
state,action
28fMRI for Actor-Critic
60
30
29fMRI for Actor-Critic
value
Pavlovian
Operant
Conjunction
ventral putamen
nucleus accumbens
advantage
dorsal striatum
30Plan
- introduction
- prediction learning RW, TD
- action learning
- generalization, opponency
- intra-trial uncertainty
- inter-trial uncertainty
31Generalization
32Generalization
33Aversion
34Opponency
35Solomon Corbit
36TD Prediction Errors
- computation dynamic programming and optimal
control - algorithm ongoing error in predictions of the
future - implementation
- dopamine phasic prediction error for reward
tonic punishment - serotonin phasic prediction error for
punishment tonic reward - evident in VTA striatum raphe?
- next bonuses Pavlovian actions motivation
addiction rates psychiatry
37Plan
- introduction
- prediction learning RW, TD
- action learning
- generalization, opponency
- intra-trial uncertainty
- inter-trial uncertainty
38Uncertainty
Computational functions of uncertainty
- weaken top-down influence over sensory
processing - promote learning about the relevant
representations
39Computational Neuromodulation
- general excitability, signal/noise ratios
- specific prediction errors, uncertainty signals
40Experimental Data
- ACh NE have similar physiological effects
- suppress recurrent feedback processing
- enhance thalamocortical transmission
- boost experience-dependent plasticity
(e.g. Kimura et al, 1995 Kobayashi et al, 2000)
(e.g. Gil et al, 1997)
(e.g. Bear Singer, 1986 Kilgard Merzenich,
1998)
- ACh NE have distinct behavioral effects
- ACh boosts learning to stimuli with uncertain
- consequences
- NE boosts learning upon encountering global
- changes in the environment
(e.g. Bucci, Holland, Gallagher, 1998)
(e.g. Devauges Sara, 1990)
41ACh in Hippocampus
ACh in Conditioning
- Given unfamiliarity, ACh
- boosts bottom-up, suppresses
- recurrent processing
- boosts recurrent plasticity
- Given uncertainty, ACh
- boosts learning to stimuli of
- uncertain consequences
(DG)
(CA3)
(CA1)
(MS)
(Bucci, Holland, Galllagher, 1998)
(Hasselmo, 1995)
42Cholinergic Modulation in the Cortex
Examples of Hallucinations Induced by
Anticholinergic Chemicals
Electrophysiology Data
(Gil, Conners, Amitai, 1997)
(Perry Perry, 1995)
- ACh agonists
- facilitate TC transmission
- enhance stimulus-specific
- activity
- ACh antagonists
- induce hallucinations
- interfere with stimulus processing
- effects enhanced by eye closure
43Norepinephrine
Something similar may be true for NE (Kasamatsu
et al, 1981)
Days after task shift
(Devauges Sara, 1990)
(Hasselmo et al, 1997)
NE specially involved in novelty, confusing
association with attention, vigilance
44Model Schematics
Context
Expected Uncertainty
Unexpected Uncertainty
Top-down Processing
NE
ACh
Cortical Processing
Prediction, learning, ...
Bottom-up Processing
Sensory Inputs
45Attention
Attentional selection for (statistically) optimal
processing, above and beyond the traditional view
of resource constraint
Example 1 Posners Task
cue
cue
high validity
low validity
0.1s
0.1s
stimulus location
stimulus location
0.2-0.5s
0.15s
sensory input
sensory input
Uncertainty-driven bias in cortical processing
46Attention
Attentional selection for (statistically) optimal
processing, above and beyond the traditional view
of resource constraint
Example 2 Attentional Shift
cue 1
cue 2
relevant
irrelevant
reward
cue 1
cue 2
irrelevant
relevant
reward
Uncertainty-driven bias in cortical processing
47A Common Framework
ACh
NE
Variability in quality of relevant cue
Variability in identity of relevant cue
Cues vestibular, visual, ...
Target stimulus location, exit direction...
avoid representing full uncertainty
48Simulation Results Posners Task
Vary cue validity ? Vary ACh
Fix relevant cue ? low NE
49Simulation Results Maze Navigation
Fix cue validity ? no explicit manipulation of ACh
Change relevant cue ? NE
50Simulation Results Full Model
51Simulated Psychopharmacology
50 NE
ACh compensation
50 ACh/NE
NE can nearly catch up
52Simulation Results Psychopharmacology
NE depletion can alleviate ACh depletion
revealing underlying opponency (implication for
neurological diseases such as Alzheimers)
Mean error rate
0.001 ACh
high expected uncertainty makes a high bar
for unexpected uncertainty
of Normal NE Level
53Summary
- Single framework for understanding ACh, NE and
some - aspects of attention
- ACh/NE as expected/unexpected uncertainty
signals - Experimental psychopharmacological data
replicated by model simulations - Implications from complex interactions between
ACh NE - Predictions at the cellular, systems, and
behavioral levels - Consider loss functions
- Activity vs weight vs neuromodulatory vs
population representations of uncertainty
54Plan
- introduction
- prediction learning RW, TD
- action learning
- generalization, opponency
- intra-trial uncertainty
- inter-trial uncertainty
55Aston-Jones Target Detection
detect and react to a rare target amongst common
distractors
- elevated tonic activity for reversal
- activated by rare target (and reverses)
- not reward/stimulus related? more response
related?
- no reason to persist as unexpected uncertainty
56Two Cohenesque Theories
- Qualitative (AJ) exploration v exploitation
- high tonic mode involves labile attention
- search for better options
- important if short term reward rate is below par
- implemented by changed brittleness?
- Quantitative (EB) gain change in decision nets
- NE controls balance of
- recurrence/bottom-up
- implements changed
- S/N ratio with target
- detect to detect
- barely any benefit
- why only for targets?
57Vigilance Task
- variable time in start
- ? controls confusability
- one single run
- cumulative is clearer
- exact inference
- effect of 80 prior
58Phasic NE
- NE reports uncertainty about current state
- state in the model, not state of the model
- divisively related to prior probability of that
state - NE measured relative to default state sequence
- start ? distractor
- temporal aspect - start ? distractor
- structural aspect target versus distractor
59Phasic NE
- onset response from timing
- uncertainty (SET)
- growth as P(target)/0.2 rises
- act when P(target)0.95
- stop if P(target)0.01
- arbitrarily set NE0 after
- 5 timesteps
(small prob of reflexive action)
60Four Types of Trial
19
1.5
1
77
fall is rather arbitrary
61Response Locking
slightly flatters the model since no
further response variability
62Task Difficulty
- set ?0.65 rather than 0.675
- information accumulates over a longer period
- hits more affected than crs
- timing not quite right
63Interrupts/Resets (SB)
PFC/ACC
LC
Sara Bouret 2005
64Intra-trial Uncertainty
- phasic NE as unexpected state change within a
model - relative to prior probability against default
- interrupts (resets) ongoing processing
- tie to ADHD?
- close to alerting (AJ) but not necessarily tied
to behavioral output (onset rise) - close to behavioural switching (PR) but not DA
- farther from optimal inference (EB)
- phasic ACh aspects of known variability within a
state?
65Learning and Meta-learning
- general excitability, signal/noise ratios
- specific prediction errors, uncertainty signals
66Learning and Meta-learning
? weight ? (learning rate) x (error) x (stimulus)
- precise, falsifiable, roles for DA/5HT NE/ACh
- only part of the story
- 5HT median raphe
- ACh TANs, septum, etc
- huge diversity of receptors regional specificity
- psychological disagreement about many facets
- attention over-extended
- reward reinforcement, liking, wanting, etc
- interesting role for imaging
- it didnt have to be that simple!