Title: Chapter 5: Schedules of Reinforcement
1Chapter 5Schedules of Reinforcement
2Schedules of Reinforcement
- A Schedule of Reinforcement is a prescription
that states how and when discriminative stimuli
and behavioral consequences will be presented. - Schedules of reinforcement are common in everyday
life. A rat pressing a lever for food and a
person turning on a light seem to have little in
common. Humans are very complex organisms They
build cities, write books, go to college, go to
war, conduct experiments, and do many other
things that rats cannot do. Nonetheless,
performance on schedules of reinforcement has
been found to be remarkably similar for different
organisms, many types of behavior, and a variety
of reinforcers.
3Importance of Schedules of Reinforcement
- Rule for the presentation of stimuli that precede
operants and the consequences that follow them - Schedules are the fundamental determinants of
behavior - Rate and temporal patterns of responding
- Probability of responding
4Terminology
- Partial reinforcement effect
- Discrimination hypothesis
- Generalization decrement hypothesis
- Contingency of reinforcement - features defined
by the schedule - Steady-state performance
- Strained performance
5Partial reinforcement effect Resistance to
extinction greater after partial reinforcement
than after continuous reinforcement Theory Discr
imination hypothesis harder to detect change to
EXT after PRF than CRF, the subject cannot
discriminate that behavior its is on EXT.
Generalization decrement as EXT continues the
situation is more and more different than when
reinforcement was in effect. But the degree of
generalization is much greater from PRF to EXT
than from CRF to EXT. The reinforcement
contingencies in PRF and EXT are more similar
than in CRF and EXT.
6Understanding Cumulative Records
- Plots responses as they occur moment to moment
- A pen records time horizontally. Each response
moves the pen vertically - Reinforcers are marked
- Slope of the record indicates response rate
- Steep High
- Flat Low
7Time moves Response moves
low rate
No response
high rate
8Schedules and Patterns of Response
- Patterns of response develop on schedules of
reinforcement. These patterns come about after
an animal has experience with the contingency of
reinforcement defined by a particular schedule. - Subjects are exposed to a schedule of
reinforcement and, following an acquisition
period, behavior typically settles into a
consistent or steady-state performance. - It may take many experimental sessions before a
particular pattern emerges, but once it does, the
orderliness of behavior is remarkable.
9Schedules and Patterns of Response
- When organisms are reinforced for a fixed number
of responses, a pause-and-run pattern of behavior
develops. - Responses required by the schedule are made
rapidly and result in reinforcement. - Following each reinforcement, there is a pause in
responding, then another quick burst of
responses. - This pattern repeats over and over and occurs
even when the size of the schedule is changed. A
pause-and-run pattern has been found for horses,
chickens, vultures, and children.
10Schedules of Reinforcement
- CRF - continuous reinforcement FR 1
- Fixed Ratio- FR an FR schedule
- Postreinforcement pause
- run of responses
- Variable Ratio VR
- Fixed Interval - FI scalloping
- Long term - break and run
- Humans
- Variable Interval - VI
11Reinforcement Schedules
- Response based schedules
- Fixed ratio FR
- Variable ratio VR
- Progressive ratio PR
- Random ratio RR
- Time Based Schedules
- Fixed interval FI
- Variable interval VI
- Fixed and variable time FT/VT
12Other Schedules
- Differential reinforcement of low rates
- IRT gt t
- Differential reinforcement of high rates
- IRT lt t
13Behavior and Schedule Effects
- Schedules of reinforcement affect operant
behavior because the schedule may interact with
other independent variables - When punished after every response during FR
schedule, the pause length after reinforcement
increased. - Although once the animal emitted the first
response, the rate to finish the run was
unaffected. - Punishment reduces the tendency to begin
responding but once started the behavior is not
suppressed.
14Behavior and Schedule Effects
- Punishment has different effects when behavior is
maintained on different schedules of
reinforcement - When on a VI (variable interval) and punished
after each operant, the pattern of behavior is
the same but the rate of response declines
15Schedules of Positive Reinforcement
- Continuous Reinforcement
- Continuous reinforcement, or CRF, is the
simplest schedule of reinforcement. On this
schedule, every operant required by the
contingency is reinforced. - CRF and Resistance to Extinction
- Continuous reinforcement generates little
resistance to extinction. Resistance to
extinction is a measure of persistence when
reinforcement is discontinued.
16Reinforcement and Behavioral Momentum
- Concept of momentum derives from a combination of
response rate, as generated by schedules of
reinforcement, and the behavioral dynamic of
resistance to change, both of which are important
dimensions of operant behavior and analogous to
velocity and mass in physics - Behavioral Momentum refers to behavior persisting
in the presence of a particular stimulus despite
disruptive factors
17Schedules of Positive Reinforcement
- When compared to intermittent schedules,
continuous reinforcement produces less resistance
to extinction. This is called partial or
intermittent reinforcement effect.
18Response Stereotypy on CRF
- When on CRF, the topography of response become
very predictable and have very little
variability. - Rats were conditioned to poke their noses
anywhere along a 50 cm slot on CRF schedule.The
rats responded to the same point on the slot
while on CRF - When placed on extinction, the variability of the
placement along the slot increased.
19Response Stereotypy on CRF
- As continuous reinforcement persists less and
less variation occurs of the operant class. - The variability of response may be inversely
related to the rate of reinforcement. - Responses were stereotyped on CRF and became more
variable when on intermittent or extinction
schedule. - It appears that the general principle is when
things no longer work try new ways of behaving. - Resurgence increase in topography variability
during extinction, can contribute to the
development of creative or original behavior
20Ratio and Interval Schedules of Reinforcement
- On intermittent schedules of reinforcement, some
rather than all responses are reinforced. - Ratio schedules are response based that is,
these schedules are set to deliver reinforcement
following a number of responses. - Interval schedules pay off when one response is
made after some amount of time has passed. - Interval schedules may be fixed or variable.
- Fixed schedules set up reinforcement after a
fixed number of responses, or a constant amount
of time has passed. On variable schedules,
response and time requirements may vary from one
reinforcer to the next.
21Ratio Schedules
- A fixed ratio, or FR, schedule is programmed to
deliver reinforcement after a fixed number of
responses is made. - Continuous reinforcement is FR 1
- FR schedules produce a rapid run of responses,
followed by reinforcement, then a pause in
responding. - A cumulative record of behavior on fixed ratio
looks somewhat like a set of stairs. There is a
steep period of responding (the run), followed by
reinforcement, and finally a flat portion. The
flat part of the cumulative record is called the
postreinforcement pause, or PRP.
22Schedules of Reinforcement
- Response-based only x responses are required
- Fixed ratio reinforcer is delivered contingent
on a set number of responses - FR 5 v FR 15
- Direct contingency X responses must be emitted
- The higher the FR, the higher the response rate
(side effect) and the longer the pause - Side effect break and
- run pattern
- Ex piece rate
23FR schedules produce pause-and-run
responding. Pauses occur just after
reinforcement, then responding becomes
steady. Pause length is a function of the
ratio. In what way?
Cumulative responses
Time -----gt
24Analysis of Reinforcement Schedules
- FR postreinforcement pause theories
- Fatigue, a larger ratio produces a longer PRP as
the subject catches its breath.But why not a
PRP on VRs with large ratios? - Satiation, right after consuming reinforcement
the subject is in a state of relative satiation.
But PRPs happen after non-consumable reinforcers - Remaining response, on a multiple schedule, two
or more schedules alternate each with its own SD
and its own reinforcement. If a large ratio
alternates with a shorter ratio, the PRP will be
longest after the shorter ratio. A PRP might be
called a PreRatio Pause
25PRP in a Multiple Schedule
On a multiple schedule two or more basic
schedules alternate, each one w\ an SD and
primary reinforcement, as in a blue key light
FR50- SR-red key light-FR5-SR on a MULT schedule
such as this, the PRP is typically longer after
the FR5 than after the FR50. A PRP is more a
function of the upcoming ratio than the
ratio just completed.
26Ratio Schedules
- Variable-ratio, or VR, schedules are similar to
FRs except that the number of responses required
for reinforcement changes after each reinforcer
is presented. - The average number of responses is used to define
the schedule. - Ratio schedules produce a high rate of response.
- When VR and FR schedules are compared, responding
is typically faster on variable ratio. One
reason for this is that pausing after
reinforcement (PRP) is reduced or eliminated when
the ratio contingency is changed from fixed to
variable. The provides evidence that the PRP
does not occur because the animal is consuming
the reinforcer.
27Ratio Schedules
- Many contingencies set by games of chance are
similar to variable-ratio schedules. - Gambling is often called addictive, but from a
behavioral perspective it may be understood as
persistent high-rate behavior generated by ratio
contingencies of reinforcement. - A bird on a standard VR schedule may make
thousands of responses for a few brief
presentations of grain
28- Variable ratio a reinforcer is delivered after
a varied and unpredictable number of responses - VR 30 means an average number of responses
- Examples
- Casting a line
- Supposedly Gambling
- Random ratio
- Direct contingency mean number of responses
must occur - Side effect constant, high rate of responding
29Ratio Schedules
- It is possible to set the average ratio
requirements so high that an animal will spend
all of its time working for a small amount of
food - Animal will show a net energy loss where effort
expended exceeds caloric intake, similar to the
self-defeating response sometimes seen in
gambling behavior - Seemingly irrational behavior of gamblers is
generated by an unfavorable probabilistic
schedule of reinforcement
30Interval Schedules
- On fixed-interval (FI) schedules, an operant is
reinforced after a fixed amount of time has
passed. - For example, on a fixed-interval 90-second
schedule, one bar press after 90 seconds results
in reinforcement. - When organisms are exposed to interval
contingencies, they typically produce many more
responses than the schedule requires. - Fixed-interval schedules produce a characteristic
pattern of responding. There is a pause after
reinforcement (PRP), then a few probe responses,
followed by more and more rapid responding as the
interval times out. This pattern of response is
called scalloping.
31Response-plus-time based schedules
- Arrangement of reinforcer specifies time and
responses - Fixed interval reinforcer is delivered after the
first response occurs after a specific period of
time - FI 3 means the first response after three
minutes elapses gets reinforced - Contingency one response after 3
- Side effect scallop-shaped pattern
- Little responding at beginning of interval
- Lots of responding at end of interval
- Ex Checking watch while waiting for a bus
32FI schedules produce faster responding at the end
of the interval. Scallops appear in between.
Why?
Time -----gt
33Interval Schedules
- Following considerable experience with FI 5
minutes, you may get very good at judging the
time period. - In this case, you would wait out the interval and
then emit a burst of responses. Perhaps you
decide to pace back and forth during the session,
and you find out that after 250 steps the
interval has almost elapsed. This kind of
mediating behavior may develop after experience
with FI schedules. - Other animals behave in a similar way and
occasionally produce a break-and-run pattern of
responding.
34Generality of Schedule Effects
- Behavior analysts assume that research on
schedule effects with animals also apply to
humans. - The assumption of generality implies that the
effects of reinforcement extend over species,
reinforcement, and behavior. - Humans show similar performances to rats when
placed on FI schedules.
35Generality of Schedule Effects
- The influence of language may explain why humans
do not show characteristics of scalloping on FI
schedules. - Humans either produce a high rate of response or
a low rate of response. - People construct a verbal rule and behave
according to the rule rather than the
experimental contingencies. - Humans who have not developed language skills
will respond to that of a rat and more like the
characteristic effects of the schedule
36Interval Schedules
- On a variable-interval, VI, schedule responses
are reinforced after a variable amount of time
has passed. - For example, on a VI 30 second schedule, the time
to each reinforcement changes but the average
time is 30 seconds. - On this schedule rate of response is steady and
moderate. The pause after reinforcement that
occurs on FI usually does not appear in the
variable-interval record. Because rate of
response is moderate, VI performance is often
used as a baseline for evaluating other
independent variables. - VI contingencies are common in everyday life.
37Schedules of Reinforcement
- Generally, ratio schedules produce shorter IRTs
and consequently higher rates of response than
interval schedules. - Generally, it is well established that the
postreinforcement pause is a function of the
interreinforcement interval (IRI).
38PostReinforcement Pause
- PRP as a function of IRI
- FI - ½ interval value.
- FR pause increases as FR is increased, rate of
response also increases.
39- VR and VI response rates
- VR yoked VI
- Rates are higher on VR schedules even when rate
of reinforcement is the same - ?
40- VR and VI response rates
- VR yoked VI
- Rates are higher on VR schedules even when rate
of reinforcement is the same - ?
41VR VI maintain steady rates of responding FR
FI produce predictable post reinforcement
pauses Ratio schedules produce higher
response rates than interval schedules
Why? - Feedback function Ratio
schedules selectively reinforce high response
rates because increased response rate means more
reinforcement. The faster the subject responds,
the sooner the next reinforcement is
obtained. Interval schedules preferentially
reinforce long Inter Response Times (IRTs)
because the longer you pause, the more likely the
first response after the pause will be
reinforced. Rate of responding is irrelevant to
obtaining the next reinforcement any sooner.
42Variable schedules produce more consistent
responding (no pauses or scallops) than fixed
schedules. Why? VR schedules produce faster
response rates than VI. Why?
43Rates of Response
44Schedules and IRTs
FRs and VRs differentially reinforce short IRTS,
the shorter the time interval between consecutive
responses, the more frequent and rapid
reinforcement will be obtained. Ratio schedules
reinforce bursts of responding like a machine
gun. Interval schedules, whether fixed or
variable, tend to reinforce longer IRTS such as
respond-wait-respond-wait since responding faster
will not produce more frequent or rapid
reinforcement. On such schedules, only a single
response is required as long as that single
response occurs after the interval has elapsed.
As a result, a paced response rate with delays
between consecutive responses is likely to
produce reinforcement. These are molecular
accounts of response rate.
45Rate of Response on Schedules
- Dynamic interactions between
- Molecular aspects - moment-to-moment
relationships - Molar aspects- length of session
46Molecular Account of Rate of Response
- The time between any two responses, or what is
called the interresponse time (IRT) may be
treated as an operant. - Generally, ratio schedules produce shorter IRTs
and consequently higher rates of response than
interval schedules. - To understand this, consider the definition of an
operant class. It is a class of behavior that may
either increase or decrease in frequency on the
basis of contingencies of reinforcement. In other
words, if it could be shown that the time between
responses changes as a function of selective
reinforcement, then the IRT is by definition an
operant in its own right.
47Molecular Account of Rate of Response
- Several experiments have shown that the
distribution of IRTs may in fact be changed by
selectively reinforcing interresponse times of a
particular duration. - When compared to ratio schedules, interval
contingencies generate longer IRTs and
consequently a lower rate of response. - Interval schedules may pay off after some amount
of time has passed and a response is made. As
IRTs become longer, more and more of the time
requirement on the schedule elapses. This means
that the probability of reinforcement for a
response increases with longer IRTs.
48Molar Accounts of Rate Differences
- There are several problems with the IRT account
of rate differences on ratio and interval
schedules. - A logical objection is that showing that the
reinforcement of IRTs can change behavior does
not mean that this is what is happening on other
schedules. In other words, demonstrating that
IRTs can be selectively reinforced does not prove
that this occurs on either interval or ratio
schedules.
49Molar Accounts of Rate Differences
- Molar explanations of rate differences are
concerned with the global relationship between
responses and reinforcement. - In general terms, the correlation between
responses and reinforcement produces the
difference in the rate on interval and ratio
schedules. - Generally, if a high rate of response is
associated with a higher frequency of
reinforcement, then subjects will respond
rapidly. When the increased rate of response does
not affect the rate of reinforcement, organisms
do not respond faster.
50Molar Accounts of Response Rate
Consider a subject responding on a VR 100
schedule for a 50 minute session. If this subject
responds leisurely at .8 responses per second,
in the 50 minutes, 48 responses per minute,
about 2 minutes per VR 100 or about 30
reinforcers obtained in the 50 minutes. If on
the same VR 100, the subject responded at 2
responses per sec., or 120 responses/minute or
about 48 seconds per VR 100 many more
reinforcers are obtained in the same 50 minutes.
Here, increases in response rate are correlated
with more frequent reinforcement. Increases in
response rate on interval schedules are not
correlated with increases in response
rate. Ratio schedules produce higher response
rates not due to timing on the order of IRT
length but in terms of timing over entire
sessions.
51Molar Accounts of Rate Differences
- According to supporters of the molar view, this
correlation between increasing the rate of
response and the increased frequency of
reinforcement is responsible for rapid responding
on ratio schedules. - A different correlation between the rate of
response and the frequency of reinforcement is
set up on interval schedules.
52The VR-VI difference
- IRT reinforcement
- IRTs are conditionable
- Synthetic schedules, merely by making
reinforcement contingent upon IRTs of specific
lengths, a VI like performance can be obtained. - Response-reinforcer correlation
53Analysis of Reinforcement Schedules
- FR postreinforcement pause theories
- Fatigue, a larger ratio produces a longer PRP as
the subject catches its breath.But why not a
PRP on VRs with large ratios? - Satiation, right after consuming reinforcement
the subject is in a state of relative satiation.
But PRPs happen after non-consumable reinforcers - Remaining response, on a mixed schedule, two or
more schedules alternate each with its own SD and
its own reinforcement. If a large ratio
alternates with a shorter ratio, the PRP will be
longest after the shorter ratio. A PRP might be
called a PreRatio Pause
54Postreinforcement Pause on Fixed Schedules
- Molecular accounts of pausing are concerned with
the moment-to-moment relationships that
immediately precede reinforcement. Such accounts
are concerned with the relationship between the
number of bar presses that produce reinforcement
and the subsequent postreinforcement pause. - In contrast, molar accounts of pausing focus on
the overall rate of reinforcement for a session
and the average pause length.
55Postreinforcement Pause on Fixed Schedules
- Generally, it is well established that the
postreinforcement pause is a function of the
interreinforcement interval (IRI). - As the time between reinforcements becomes
longer, the PRP increases.
56Postreinforcement Pause on Fixed Schedules
- On fixed-interval schedules, in which the time
between reinforcement is controlled by the
experimenter, the postreinforcement pause is
approximately one-half the interreinforcement
interval. - For example, on a FI 300-s schedule (in which the
time between reinforcements is 300 s), the
average PRP will be 150 s. - On fixed ratio, the evidence suggests similar
control by the IRIas the ratio requirement
increases, the PRP becomes longer.
57Postreinforcement Pause on Fixed Schedules
- There is, however, a difficulty with analyzing
the postreinforcement pause on FR schedules. - On ratio schedules, the time between
reinforcements is partly determined by what the
animal does. That is, the animals rate of
pressing the lever affects the time between
reinforcements. - Another problem with ratio schedules, for an
analysis of pausing, is that the rate of response
goes up as the size of the ratio is increased.
Unless the rate of response exactly coincides
with changes in the size of the ratio,
adjustments in ratio size alter the
interreinforcement interval. - Thus, changes in postreinforcement pause as ratio
size is increased may be caused by the ratio
size, the interreinforcement interval, or both.
58A Molar Interpretation of Pausing
- We have noted that the average PRP is one half of
the interreinforcement interval. - Another finding is that the postreinforcement
pauses are normally distributed over the time
between reinforcements. - An animal that was sensitive to the overall rate
of reinforcement (maximization) should come to
emit pauses that are on average one half of the
FI interval, assuming a normal distribution.
Thus, maximization of reinforcement provides a
molar account of the postreinforcement pause.
59Molecular Interpretations of Pausing
- There are two molecular interpretations of
pausing on fixed schedules that have some amount
of research support. - One account is based on the observation that
animals often emit other behavior during the
postreinforcement pause. - For example, rats may engage in grooming,
sniffing, scratching, and stretching after the
presentation of a food pellet. - Because this behavior reliably follows
reinforcement, it is said to be induced by the
schedule. Schedule-induced behaviors may be
viewed as operants that automatically produce
reinforcement.
60Molecular Interpretations of Pausing
- One interpretation is that pausing occurs because
the animal is maximizing local rates of
reinforcement. - That is, the rat gets food for bar pressing as
well as the automatic reinforcement from the
induced activities. - The average pause should therefore reflect the
allocation of time to induced behavior and to the
operant that produces scheduled reinforcement
(e.g., food). - At present, experiments have not ruled out or
clearly demonstrated the induced-behavior
interpretation of pausing.
61Molecular Interpretations of Pausing
- A second molecular account of pausing is based on
the run of responses or the amount of work that
precedes reinforcement. - This work-time interpretation hold that the
previously experienced run of responses regulates
the length of the postreinforcement pause. - Work time affects the PRP by altering the value
of the next scheduled reinforcement. - In other words, the more effort or time expended
for the previous reinforcer, the lower the value
of the next reinforcer and the longer it takes
for the animal to initiate responding (i.e.,
pause length).
62Factors affecting performance on a schedule
- Quality of reinforcer
- Rate of reinforcement
- Delay of reinforcement
- Amount of reinforcement
- Level of motivation
63Schedule Performance in Transition
- Steady State occurs when behavior show little
change from day to day. Ex. Break and run
behavior. - Transition State are periods between steady
states. Ex. When initially place on any
reinforcement schedule - Most learning takes place when behavior is in
transition.
64Schedule Performance in Transition
- After steady-state performance is established on
CRF, you are faced with the problem of how to
program the steps from CRF to FR 100. - Notice that there is a large shift in the rate of
reinforcement for bar pressing. - If you simply move from CRF to large-ratio value,
the animal will show ratio strain in the sense
that it produces longer and longer pauses after
reinforcement.
65Schedule Performance in Transition
- Large and sudden increases in schedules may
produce extinction and is why a slow progression
to a higher schedule is implemented - Transitions in schedules occur in major life
events. Ex. Divorce - Following a divorce a shift in contingencies of
reinforcement take place. - Feelings of depression and loneliness may be
produced by ratio strain and extinction.
66Schedules used to condition response rate 1)
Differential reinforcement of low rates of
responding (DRL) Reinforcers only follow
responses that occur after a minimum amount of
time has passed has elapsed between two
consecutive responses. Ex) DRL 10
sec Response--10 sec delay --gt Response --gt
Reinf If a response occurs during the delay, a
reinforcer is not given, DRLs reinforce
very long IRTs.
672) Differential reinforcement of high rates of
responding (DRH) Reinforcers only follow
responses that occur before a minimum amount of
time has passed. Ex) DRH 10 Response-? 10 sec
delay --gt Response -gt No reinf Two consecutive
responses must occur before the 10 secs to be
reinforced. The IRT between two consecutive
responses must be less than some specified value.
A DRH reinforces very short IRTs.
68Sanford (IQ 65) a prisoner in GA prison
system The incentive value of the reinforcer
(points) was higher if Sanford learned faster. 1
grade level in 90 days (DRH 90) ---gt 120
pts (DRH 4) ----gt 900 pts
(DRH 1) -----gt 4700 pts With points exchangeable
for tangible privileges and goods, studying hard
became so reinforcing that Sanford started
skipping recreation time to study. Studying
Sanford completed 5 years of high school in 5
months! He was being differentially reinforced
for learning fast (high rate).
69Intelligence and IQ
Some students may be questioning whether the
subject Sanford actually had an IQ of 65. What is
the IQ test measuring, intelligence or the
subjects ability to do well on such tests?
Cognitive psychologists would of course argue
that some partially innate intellectual or
information processing capability is being
assessed. Hence, the notorious bell curve
data showing different races have different
IQs. But what if the IQ test is just measuring
ability to take such tests which could be
affected by variables such as motivation to work
hard??? African American students who took IQ
tests and were given affirmative feedback for
each correct score produced IQ test scores 10 15
points higher than for African Americans tested
without this feedback. The use of feedback had no
real effect on white students taking IQ tests.
70Applying Schedules to Smoking
- The use of drugs is operant behavior maintained
by the reinforcing effects of the drug. - A population of smokers (N60) were assigned to
one of three groups Progressive reinforcement
(N20), fixed rate reinforcement (N20), and
control (N20). Carbon monoxide testing detected
the abstinence from smoking - Money was the reinforcement for the experiment to
implement response cost.
71Applying Schedules to Smoking
- The progressive reinforcement group was given
3.00 for passing initial carbon monoxide test
and .50 for second and third consecutive
passings. On the third consecutive passing a
bonus of 10 was given. This schedule was
repeated. Failing a CO test would not be
reinforced and the payment was returned to 3.00.
72Applying Schedules to Smoking
- The fixed reinforcement group were paid 9.80 for
passing each test. No bonuses or resets for
failing. - The control group was paid the same as the
average payment to the progressive reinforcement
group unconditional to their CO levels.
73Applying Schedules to Smoking
- Smokers in both experimental groups passed 80 of
CO tests while the control group passed 40 of
the tests - 22 of the progressive group resumed smoking
while 60 of the fixed and 82 of the control
group resumed smoking. - Progressive reinforcement schedule appears
effective in the short run of abstinence of
smoking. Further research is needed to indicate
if this schedule is effective for long run
abstinence.