Learning and Metalearning

About This Presentation

Title:

Learning and Metalearning

Description:

Learning and Metalearning – PowerPoint PPT presentation

Number of Views:153

Avg rating:3.0/5.0

Slides: 67

Provided by: ANGEL133

Category:

more less

Transcript and Presenter's Notes

Title: Learning and Metalearning

1
Learning and Meta-learning

computation
making predictions
choosing actions
acquiring episodes
statistics
algorithm
gradient ascent (eg of the likelihood)
correlation
Kalman filtering
implementation
Hebbian synaptic plasticity
neuromodulation

2
Conditioning
prediction of important events control in
the light of those predictions

Ethology
optimality
appropriateness
Psychology
classical/operant
conditioning

Computation
dynamic progr.
Kalman filtering
Algorithm
TD/delta rules
simple weights

Neurobiology

neuromodulators amygdala OFC nucleus accumbens
dorsal striatum
3
Computational Neuromodulation

general excitability, signal/noise ratios
specific prediction errors, uncertainty signals

4
Data
5
Data
6
Learning and Inference

Learning predict control
? weight ? (learning rate) x (error) x (stimulus)
dopamine
phasic prediction error for future reward
serotonin
phasic prediction error for future punishment

acetylcholine
expected uncertainty boosts learning
norepinephrine
unexpected uncertainty boosts learning

7
Learning and Inference
context
expected uncertainty
unexpected uncertainty
top-down processing
NE
ACh
cortical processing
prediction, learning, ...
bottom-up processing
sensory inputs
8
Plan

introduction
prediction learning RW, TD
action learning
generalization, opponency
intra-trial uncertainty
(inter-trial uncertainty)

9
Rescorla-Wagner Rule
Pavlovian conditioning, blocking, inhibitory
conditioning, etc
minimize
10
BUT secondary conditioning
11
Suttons TD
12
Rewards rather than Punishments
TD error
R
L
V(t)
R
no prediction
prediction, reward
prediction, no reward
dopamine cells in VTA/SNc
Schultz et al
13
More Completely
(if you can count time since the stimulus was
turned on)
14
Dopamine
15
Temporal Difference Prediction Error
High Pain
0.8
1.0
0.2
0.2
Low Pain
0.8
1.0
predict sum future pain
TD error
16
Temporal Difference Prediction Error
TD error
Prediction error
Value
High Pain
0.8
1.0
0.2
0.2
Low Pain
0.8
1.0
17
Temporal Difference Prediction Error
experimental sequence..
A B HIGH C D LOW C B HIGH
A B HIGH A D LOW C D
LOW A B HIGH A B HIGH C
D LOW C B HIGH
MR scanner
TD model
Brain responses
?
Ben Seymour John ODoherty
18

TD prediction error ventral striatum
Z-4
R
19
Temporal Difference Values
dorsal raphe?
right anterior insula
20
Unexpected - Expected
(Montague et al)
(ventral striatum we also find OFC and some
DLPFC)
21
Plan

introduction
prediction learning RW, TD
action learning
generalization, opponency
intra-trial uncertainty
inter-trial uncertainty

22
Action Selection
23
Indirect Actor
24
Direct Actor
25
Action Selection
start with policy
evaluate it
A
B
C
improve it
1
-1
thus choose L more frequently than R
26
Policy

value is too pessimistic
action is better than average

27
Advantages

indirect actor in maze is
Q(state,action)
learn using standard TD
softmax
advantages
learned using

state,action
28
fMRI for Actor-Critic
60
30
29
fMRI for Actor-Critic
value
Pavlovian
Operant
Conjunction
ventral putamen
nucleus accumbens
advantage
dorsal striatum
30
Plan

introduction
prediction learning RW, TD
action learning
generalization, opponency
intra-trial uncertainty
inter-trial uncertainty

31
Generalization
32
Generalization
33
Aversion
34
Opponency
35
Solomon Corbit
36
TD Prediction Errors

computation dynamic programming and optimal
control
algorithm ongoing error in predictions of the
future
implementation
dopamine phasic prediction error for reward
tonic punishment
serotonin phasic prediction error for
punishment tonic reward
evident in VTA striatum raphe?
next bonuses Pavlovian actions motivation
addiction rates psychiatry

37
Plan

introduction
prediction learning RW, TD
action learning
generalization, opponency
intra-trial uncertainty
inter-trial uncertainty

38
Uncertainty
Computational functions of uncertainty

weaken top-down influence over sensory
processing
promote learning about the relevant
representations

39
Computational Neuromodulation

general excitability, signal/noise ratios
specific prediction errors, uncertainty signals

40
Experimental Data

ACh NE have similar physiological effects
suppress recurrent feedback processing
enhance thalamocortical transmission
boost experience-dependent plasticity

(e.g. Kimura et al, 1995 Kobayashi et al, 2000)
(e.g. Gil et al, 1997)
(e.g. Bear Singer, 1986 Kilgard Merzenich,
1998)

ACh NE have distinct behavioral effects
ACh boosts learning to stimuli with uncertain
consequences
NE boosts learning upon encountering global
changes in the environment

(e.g. Bucci, Holland, Gallagher, 1998)
(e.g. Devauges Sara, 1990)
41
ACh in Hippocampus
ACh in Conditioning

Given unfamiliarity, ACh
boosts bottom-up, suppresses
recurrent processing
boosts recurrent plasticity

Given uncertainty, ACh
boosts learning to stimuli of
uncertain consequences

(DG)
(CA3)
(CA1)
(MS)
(Bucci, Holland, Galllagher, 1998)
(Hasselmo, 1995)
42
Cholinergic Modulation in the Cortex
Examples of Hallucinations Induced by
Anticholinergic Chemicals
Electrophysiology Data
(Gil, Conners, Amitai, 1997)
(Perry Perry, 1995)

ACh agonists
facilitate TC transmission
enhance stimulus-specific
activity

ACh antagonists
induce hallucinations
interfere with stimulus processing
effects enhanced by eye closure

43
Norepinephrine
Something similar may be true for NE (Kasamatsu
et al, 1981)
Days after task shift
(Devauges Sara, 1990)
(Hasselmo et al, 1997)
NE specially involved in novelty, confusing
association with attention, vigilance
44
Model Schematics
Context
Expected Uncertainty
Unexpected Uncertainty
Top-down Processing
NE
ACh
Cortical Processing
Prediction, learning, ...
Bottom-up Processing
Sensory Inputs
45
Attention
Attentional selection for (statistically) optimal
processing, above and beyond the traditional view
of resource constraint
Example 1 Posners Task
cue
cue
high validity
low validity
0.1s
0.1s
stimulus location
stimulus location
0.2-0.5s
0.15s
sensory input
sensory input
Uncertainty-driven bias in cortical processing
46
Attention
Attentional selection for (statistically) optimal
processing, above and beyond the traditional view
of resource constraint
Example 2 Attentional Shift
cue 1
cue 2
relevant
irrelevant
reward
cue 1
cue 2
irrelevant
relevant
reward
Uncertainty-driven bias in cortical processing
47
A Common Framework
ACh
NE
Variability in quality of relevant cue
Variability in identity of relevant cue
Cues vestibular, visual, ...
Target stimulus location, exit direction...
avoid representing full uncertainty
48
Simulation Results Posners Task
Vary cue validity ? Vary ACh
Fix relevant cue ? low NE
49
Simulation Results Maze Navigation
Fix cue validity ? no explicit manipulation of ACh
Change relevant cue ? NE
50
Simulation Results Full Model
51
Simulated Psychopharmacology
50 NE
ACh compensation
50 ACh/NE
NE can nearly catch up
52
Simulation Results Psychopharmacology
NE depletion can alleviate ACh depletion
revealing underlying opponency (implication for
neurological diseases such as Alzheimers)
Mean error rate
0.001 ACh
high expected uncertainty makes a high bar
for unexpected uncertainty
of Normal NE Level
53
Summary

Single framework for understanding ACh, NE and
some
aspects of attention
ACh/NE as expected/unexpected uncertainty
signals
Experimental psychopharmacological data
replicated by model simulations
Implications from complex interactions between
ACh NE
Predictions at the cellular, systems, and
behavioral levels
Consider loss functions
Activity vs weight vs neuromodulatory vs
population representations of uncertainty

54
Plan

introduction
prediction learning RW, TD
action learning
generalization, opponency
intra-trial uncertainty
inter-trial uncertainty

55
Aston-Jones Target Detection
detect and react to a rare target amongst common
distractors

elevated tonic activity for reversal
activated by rare target (and reverses)
not reward/stimulus related? more response
related?

no reason to persist as unexpected uncertainty

56
Two Cohenesque Theories

Qualitative (AJ) exploration v exploitation
high tonic mode involves labile attention
search for better options
important if short term reward rate is below par
implemented by changed brittleness?
Quantitative (EB) gain change in decision nets
NE controls balance of
recurrence/bottom-up
implements changed
S/N ratio with target
detect to detect
barely any benefit
why only for targets?

57
Vigilance Task

variable time in start
? controls confusability

one single run
cumulative is clearer

exact inference
effect of 80 prior

58
Phasic NE

NE reports uncertainty about current state
state in the model, not state of the model
divisively related to prior probability of that
state
NE measured relative to default state sequence
start ? distractor
temporal aspect - start ? distractor
structural aspect target versus distractor

59
Phasic NE

onset response from timing
uncertainty (SET)
growth as P(target)/0.2 rises
act when P(target)0.95
stop if P(target)0.01
arbitrarily set NE0 after
5 timesteps

(small prob of reflexive action)
60
Four Types of Trial
19
1.5
1
77
fall is rather arbitrary
61
Response Locking
slightly flatters the model since no
further response variability
62
Task Difficulty

set ?0.65 rather than 0.675
information accumulates over a longer period
hits more affected than crs
timing not quite right

63
Interrupts/Resets (SB)
PFC/ACC
LC
Sara Bouret 2005
64
Intra-trial Uncertainty

phasic NE as unexpected state change within a
model
relative to prior probability against default
interrupts (resets) ongoing processing
tie to ADHD?
close to alerting (AJ) but not necessarily tied
to behavioral output (onset rise)
close to behavioural switching (PR) but not DA
farther from optimal inference (EB)
phasic ACh aspects of known variability within a
state?

65
Learning and Meta-learning

general excitability, signal/noise ratios
specific prediction errors, uncertainty signals

66
Learning and Meta-learning
? weight ? (learning rate) x (error) x (stimulus)

precise, falsifiable, roles for DA/5HT NE/ACh
only part of the story
5HT median raphe
ACh TANs, septum, etc
huge diversity of receptors regional specificity
psychological disagreement about many facets
attention over-extended
reward reinforcement, liking, wanting, etc
interesting role for imaging
it didnt have to be that simple!

Write a Comment

User Comments (0)

About PowerShow.com

Learning and Metalearning - PowerPoint PPT Presentation

Learning and Metalearning

Learning and Metalearning – PowerPoint PPT presentation