Title: Instrumental Conditioning: Motivational Mechanisms
1Instrumental Conditioning Motivational Mechanisms
2Contingency-Shaped Behaviour
- Uses three-term contingency
- Reinforcement schedule (e.g., FR10) imposes
contingency - Seen in non-humans and humans
3Rule Governed Behaviour
- Particularly in humans
- Behaviour can be varied and unpredictable
- Invent rules or use (in)appropriate rules across
conditions (e.g., language) - Age-dependent, primary vs. secondary reinforcers,
experience
4Role of Response in Operant Conditioning
- Thorndike
- Performance of response necessary
- Tolman
- Formation of expectation
- McNamara, Long Wike (1956)
- Maze
- Running rats or riding rats (cart)
- Association what is needed
5Role of the Reinforcer
- Is reinforcement necessary for operant
conditioning? - Tolman Honzik (1930)
- Latent learning
- Not necessary for learning
- Necessary for performance
6Results
no food
Average Errors
no food until day 11
food
Days
7Associative Structure in Instrumental Conditioning
- Basic forms of association
- S stimulus, R response, O outcome
- S-R
- Thorndike, Law of Effect
- Role of reinforcer stamps in S-R association
- No R-O association acquired
8Hull and Spence
- Law of Effect, plus a classical conditioning
process - Stimulus evokes response via Thorndikes S-R
association - Also, S-O association creates expectancy of
reward - Two-process approach
- Classical and instrumental are different
9One-Process or Two-Processes?
- Are instrumental and classical the same (one
process) or different (two processes)? - Omission control procedure
- US presentation depends on non-occurrence of CR
- No CR, then CS ---gt US
- CR, then CS ---gt no US
10Omission Control
11Gormenzano Coleman (1973)
- Eyeblink with rabbits
- USshock, CStone
- Classical group 5mA shock each trial, regardless
of response - Omission group making eyeblink CR to CS prevents
delivery of US
12- One-process prediction
- CR acquisition faster and stronger for Omission
group - Reinforcement for CR is shock avoidance
- In Classical group CR will be present because it
somehow reduces shock aversiveness - BUT
- CR acquisition slower in Omission group
- Classical conditioning extinction (not all CSs
followed by US) - Supports Two-process theory
13Classical in Instrumental
- Classical conditioning process provides
motivation - Stimulus substitution
- S acquires properties of O
- rg fractional anticipatory goal response
- Response leads to feedback
- sg sensory feedback
- rg-sg constitutes expectancy of reward
14Timecourse
S
R
O
Through stimulus substitution S elicits rg-sg,
giving motivational expectation of reward
15Prediction
- According to rg-sg CR should occur before operant
response but doesnt always - Dog lever pressing on FR33 ---gt PRP
- Low lever presses early, then higher but
salivation only later
Lever pressing
Magnitude
salivation
Time from start of trial
16Modern Two-Process Theory
- Classical conditioning in instrumental
- Neutral stimulus ---gt elicits motivation
- Central Emotional State (CES)
- CES is a characteristic of the nervous system
(mood) - CES wont produce only one response
- Bit annoying re prediction of effect
17Prediction
- Rate of operant response modified by presentation
of CS - CES develops to motivate operant response
- CS from classical conditioning also elicits CES
- Therefore, giving CS during instrumental
conditioning should alter CES that motivates
instrumental response
18Explicit Predictions
19 Aversive US Instrumental schedule CS(fear)
CS-(relief) Positive reinforcement decrease incr
ease Negative reinforcement increase decrease
20R-O and S(R-O)
- Earlier interpretations had no response-reinforcem
ent associations - Intuitive explanation, though
- Perform response to get reinforcer
21Colwill Rescorla (1986)
- R-O association
- Devalue reinforcer post-conditioning
- Does operant response decrease?
- Bar push right or left for different reinforcers
- Food or sucrose
Testing of Reinforcers
normal reinforcer
Mean responses/min.
devalued reinforcer
Blocks of Ext. Trials
22Interpretation
- Cant be S-R
- No reinforcer in this model
- Cant be S-O
- Two responses, same stimuli (the bar), but only
one response affected - Conclusion
- Each response associated with its own reinforcer
- R-O association
23Hierarchical S-(R-O)
- R-O model lacks stimulus component
- Stimulus required to activate association
- Really, Skinners (1938) three term contingency
- Old idea recent empirical testing
24Colwill Delameter (1995)
- Rats trained on pairs of S
- Biconditional discrimination problem
- Two stimuli
- Two responses
- One reinforcer
- Match the correct response to the stimuli to be
reinforced - Training, reinforcer devaluation, testing
25- Training
- Tone lever --gt food chain --gt nothing
- Noise chain --gt food lever --gt nothing
- Light poke --gt sucrose handle --gt nothing
- Flash handle --gt sucrose poke --gt nothing
- Aversion conditioning
- Testing marked reduction in previously
reinforced response - Tone lever press vs. chain
- Noise chain vs. lever
- Light poke vs. handle
- Flash handle vs. poke
26Analysis
- Cant be S-O
- Each stimulus associated with same reinforcer
- Cant be R-O
- Each response reinforced with same outcome
- Cant be S-R
- Due to devaluation of outcome
- Each S activates a corresponding R-O association
27Reinforcer Prediction, A Priori
- Simple definition
- A stimulus that increases the future probability
of a behaviour - Circular explanation
- Would be nice if we could predict beforehand
28Need Reduction Approach
- Primary reinforcers reduce biological needs
- Biological needs e.g., food, water
- Not biological needs e.g., sex, saccharin
- Undetectable biological needs e.g., trace
elements, vitamins
29Drive Reduction
- Clark Hull
- Homeostasis
- Drive systems
- Strong stimuli aversive
- Reduction in stimulation is reinforcer
- Drive is reduced
- Problems
- Objective measurement of stimulus intensity
- Where stimulation doesnt change or increases!
30Trans-situationality
- A stimulus that is a reinforcer in one situation
will be a reinforcer in others - Subsets of behaviour
- Reinforcing behaviours
- Reinforcable behaviours
- Often works with primary reinforcers
- Problems with other stimuli
31Primary and Incentive Motivation
- Where does motivation to respond come from?
- Primary biological drive state
- Incentive from reinforcer itself
32But Consider
- What if we treat a reinforcer not as a stimulus
or an event, but as a behaviour in and of itself - Fred Sheffield (1950s)
- Consummatory-response theory
- E.g., not the food, but the eating of food that
is the reinforcer - E.g., saccharin has no nutritional value, cant
reduce drive, but is reinforcing due to its
consumability
33Premacks Principle
- Reinforcing responses occur more than the
responses they reinforce - H high probability behaviour
- L low probability behaviour
- If L ---gt H, then H reinforces L
- But, if H ---gt L, H does not reinforce L
- Differential probability principle
- No fundamental distinction between reinforcers
and operant responses
34Premack (1965)
- Two alternatives
- Eat candy, play pinball
- Phase I determine individual behaviour
probability (baseline) - Gr1 pinball (operant) to eat (reinforcer)
- Gr2 eating candy (operant) to play pinball
(reinforcer) - Phase II (testing)
- T1 play pinball (operant) to eat (reinforcer)
- Only Gr1 kids increased operant
- T2 eat (operant) to play pinball (reinforcer)
- Only Gr2 kids increased operant
35Premack in Brief
Any activity
could be a reinforcer
if it is more likely to be preferred than the
operant response.
36Response Deprivation Hypothesis
- Restriction to reinforcer response
- Theory
- Impose response deprivation
- Now, low probability responses can reinforce high
probability responses - Instrumental procedures withhold reinforcer until
response made in essence, deprived of access to
reinforcer - Reinforcer produced by operant contingency itself
37Behavioural Regulation
- Physiological homeostasis
- Analogous process in behavioural regulation
- Preferred/optimal distribution of activities
- Stressors move organism away from optimum
behavioural state - Respond in ways to return to ideal state
38Behavioural Bliss Point
- Unconstrained condition distribute activities in
a way that is preferred - Behavioural bliss point (BBP)
- Relative frequency of all behaviours in
unconstrained condition - Across conditions
- BBP shifts
- Within condition
- BBP stable across time
39Imposing a Contingency
- Puts pressure on BBP
- Act to defend challenges to BBP
- But requirements of contingency (may) make
achieving BBP impossible - Compromise required
- Redistribute responses so as to get as close to
BBP as possible
40Minimum Deviation Model
- Behavioural regulation
- Due to imposed contingency
- Redistribute behaviour
- Minimize deviation of responses from BBP
- Get as close as you can
41restricted running
40 30 20 10
Time drinking
restricted drinking
10 20 30 40
Time running
42Strengths of BBP Theory
- Reinforcers not special stimuli or responses
- No difference between operant and reinforcer
- Explains new allocation of behaviour
- Fits with findings on cognition for costbenefit
optimization