Title: Dynamics of Reward Bias Effects in Perceptual Decision Making
1Dynamics of Reward Bias Effects in Perceptual
Decision Making
- Jay McClelland Juan Gao
- Building on
- Newsome and RorieHolmes and FengUsher and
McClelland
2Our Questions
- Can we trace the effect of reward bias on
decision making over time? - Can we determine what would be the optimal reward
effect? - Can we determine how well participants do at
achieving optimality? - Can we uncover the processing mechanisms that
lead to the observed patterns of behavior?
3Overview
- Experiment
- Results
- Optimality analysis
- Abstract (one-d) dynamical model
- Mechanistic (two-d) dynamical model
4Human Experiment Examining Reward Bias Effect at
Different Time Points after Target Onset
Response
750 msec
Response Signal
Reward Cue
Stimulus
Response Window
0-2000 msec
- Reward cue occurs 750 msec before stimulus.
- Small arrow head visible for 250 msec.
- Only biased reward conditions (2 vs 1 and 1 vs 2)
are considered. - Stimuli are rectangles shifted 1,3, or 5 pixels L
or R of fixation - Response signal (tone) occurs at 10 different
lags 0 75 150 225 300 450 600 900 1200 2000 - Participant receives reward if response occurs
within 250 msec of response signal and is
correct. - Participants were run for 15-25 sessions to
provide stable data. - Data are from later sessions in which the effect
of reward appeared to be fairly stable.
5Slide from Zhang 2007
6A participant with very little reward bias
- Top panel shows probability of response giving
larger reward as a function of actual response
time for combinations of - Stimulus shift (1 3 5) pixels
- Reward-stimulus compatibility
- Lower panel shows data transformed to z scores,
and corresponds to the theoretical construct
mean(x1(t)-x2(t))bias(t)
sd(x1(t)-x2(t)) - where x1 represents the state of the accumulator
associated with greater reward, x2 the same for
lesser reward,and S is thought to choose larger
reward if x1(t)-x2(t)bias(t) gt 0.
7Participants Showing Reward Bias
8(No Transcript)
9Summary
- Initial bias is high, and tapers off over time,
to a fixed low level. - Questions
- Is this reasonable?
- How close to optimal is it?
- Are some subjects more optimal that others?
10Abstract optimality analysis
11Assumptions (from Signal Detection Theory)
0.6
0.5
m
-m
0.4
0.3
0.2
0.1
0
-10
-8
-6
-4
-2
0
2
4
6
8
10
- At a given time, decision variable comes from one
of two distributions, means -m, m, same STD s1.
- is consistent with high reward - Choose High reward alternative (H) if x lt Xc,
else choose low reward alternative (L). - For three difficulty levels, means mi (i1,2,3),
with shared s1, same choice policy.
12Optimal Bias
Premises
Expected Rewardc LikelihoodcRewardc
Choose Alternative with larger Expected Rewardc
(This policy maximizes expected reward overall.)
Result
Xopt/s log(RewardH/RewardL)/d'
13As d increases, Xopt decreases
d0
d increases
14Estimating normalized m and Xc values from
dataat each signal lag, one difficulty level
Std normal deviates (s 1)
15Estimating normalized m and Xc values from
dataat each signal lag with multiple difficulty
levels
16Optimal Bias with Multiple Difficulties
Optimal Criterion
Actual Criterion
17Empirical Characterization of Time-courseof
Change in Sensitivity (d)
Subjects sensitivity, a definition in theory of
signal detectability
When response signal delay varies
For each subject, fit with function
18Subject Sensitivity
19(No Transcript)
20Actual vs. Optimal bias for three Ss Except for
sl, all participants show thestart with a high
bias, then level off,conforming approximately to
optimal. All participants are under-biased for
short lags At longer lags, some are under,
someare over, and some are optimal.
21Our Questions
- Can we trace the effect of reward bias on
decision making over time? - Can we determine what would be the optimal reward
effect? - Can we determine how well participants do at
achieving optimality? - Can we uncover the processing mechanisms that
lead to the observed patterns of behavior?
22Two Paths
- Qualitative analysis with a one-dimensional
decision variable (following Holmes and Feng)
asking - How should reward bias be represented?
- Possible answers
- Offset in initial conditions?
- An additional term in the input to the decision
variable? - A time-varying offset that optimizes reward?
- A fixed offset in the value of the decision
variable?
- Inverse-Micro-Speed-Accuracy Tradeoff (discovered
by Juan) - Steps toward a Leaky Competing Accumulator model
that addresses this and other aspects of the data.
23Qualitative Dynamical Analysis
- Based on one dimensional leaky integrator model.
- Input I aC C is chosen from -5,-3,-1,1,3,5.
- Initial condition x 0
- Chose left if x gt 0 when the response signal is
detected otherwise choose right. - Accuracy approximates exponential approach to
asymptote because of leakage. - How is the reward implemented?
- Offset in initial conditions?
- An additional term in the input to the decision
variable? - A time-varying offset that optimizes reward?
- A fixed offset in the value of the decision
variable?
24Offset in Initial Conditions
- Note
- Effect of bias decays away as t increases.
25Reward as a term in the input
- Reward signal comes at t processing starts at
that time - For tlt0 input b
- For tgt0, input baC
- Notes
- Effect of the bias persists.
- But bias is sub-optimal initially, and there is
no dip. - Initially high bias and dip occurs if s starts
low and increases at stimulus onset.
26Time-varying term that optimizes rewards (No
free parameter for reward bias)
- Expression for b(t) is for a single difficulty
level. - Bias is equivalent to a time-varying criterion
-b(t). - There is a dip at
-
- No analytic expression is available for multiple
difficulty levels, but numerical simulation is
possible.
1
0.8
RSC 1, diff 5
RSC 0, diff 5
0.6
RSC 1, diff 3
P of choice toward larger reward
RSC 0, diff 3
RSC 1, diff 1
0.4
RSC 0, diff 1
0.2
0
0
0.5
1
1.5
2
2.5
Time (s)
27Reward as a constant offset in the decision
variable
- Notes
- Equivalent to setting criterion at m0
- Bias effect persists for llt0.
- With a single C level , a dip at
- Prediction and test higher C level ? earlier dip
- Variability in starting point or magnitude of
offset can pull initial bias off ceiling.
28Preliminary Conclusion
- Fitqual seems possible with one-d model
- If
- We treat reward as a constant offset in the
decision variable - And
- The value of the constant varies from trial to
trial - Or
- There is added starting point variability
- Next step
- Actually try to fit individual subject data
29A New Phenomenon Discovered by Juan
Inverse-Micro-SAT! (also occurs in Monkey Data)
30Consistent with other models?
- Ratcliff and colleagues, and also Shadlen and
colleagues, argue for integration to a bound,
even in response-signal tasks like this one. - Once bound is reached, the participant enters a
discrete decision state. - Our data suggests that the decision variable
remains continuous even to the end of the trial.
- Time to respond reflects this continuous state.
31High-Threshold Leaky Competing Accumulator Model
Decision variable remains continuous until signal
occurs Signal provides additional input to the
accumulators, driving to high threshold
Response Triggered
x1
x2
Response Signal
32Preliminary Simulations
33Reward, Stimulus and Response Cue All Contribute
Input to Accumulators
34Three PossibleArchitectures
2
1
3