Dynamics of Reward Bias Effects in Perceptual Decision Making - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Dynamics of Reward Bias Effects in Perceptual Decision Making

Description:

Abstract optimality analysis Assumptions (from Signal Detection Theory) At a given time, decision variable comes from one of two distributions, means -m, ... – PowerPoint PPT presentation

Number of Views:182

Avg rating:3.0/5.0

Slides: 35

Provided by: psychSta

Category:

more less

Transcript and Presenter's Notes

Title: Dynamics of Reward Bias Effects in Perceptual Decision Making

1
Dynamics of Reward Bias Effects in Perceptual
Decision Making

Jay McClelland Juan Gao
Building on
Newsome and RorieHolmes and FengUsher and
McClelland

2
Our Questions

Can we trace the effect of reward bias on
decision making over time?
Can we determine what would be the optimal reward
effect?
Can we determine how well participants do at
achieving optimality?
Can we uncover the processing mechanisms that
lead to the observed patterns of behavior?

3
Overview

Experiment
Results
Optimality analysis
Abstract (one-d) dynamical model
Mechanistic (two-d) dynamical model

4
Human Experiment Examining Reward Bias Effect at
Different Time Points after Target Onset
Response
750 msec
Response Signal
Reward Cue
Stimulus
Response Window
0-2000 msec

Reward cue occurs 750 msec before stimulus.
Small arrow head visible for 250 msec.
Only biased reward conditions (2 vs 1 and 1 vs 2)
are considered.
Stimuli are rectangles shifted 1,3, or 5 pixels L
or R of fixation
Response signal (tone) occurs at 10 different
lags 0 75 150 225 300 450 600 900 1200 2000
Participant receives reward if response occurs
within 250 msec of response signal and is
correct.
Participants were run for 15-25 sessions to
provide stable data.
Data are from later sessions in which the effect
of reward appeared to be fairly stable.

5
Slide from Zhang 2007
6
A participant with very little reward bias

Top panel shows probability of response giving
larger reward as a function of actual response
time for combinations of
Stimulus shift (1 3 5) pixels
Reward-stimulus compatibility
Lower panel shows data transformed to z scores,
and corresponds to the theoretical construct
mean(x1(t)-x2(t))bias(t)
sd(x1(t)-x2(t))
where x1 represents the state of the accumulator
associated with greater reward, x2 the same for
lesser reward,and S is thought to choose larger
reward if x1(t)-x2(t)bias(t) gt 0.

7
Participants Showing Reward Bias
8
(No Transcript)
9
Summary

Initial bias is high, and tapers off over time,
to a fixed low level.
Questions
Is this reasonable?
How close to optimal is it?
Are some subjects more optimal that others?

10
Abstract optimality analysis
11
Assumptions (from Signal Detection Theory)
0.6
0.5
m
-m
0.4
0.3
0.2
0.1
0
-10
-8
-6
-4
-2
0
2
4
6
8
10

At a given time, decision variable comes from one
of two distributions, means -m, m, same STD s1.
- is consistent with high reward
Choose High reward alternative (H) if x lt Xc,
else choose low reward alternative (L).
For three difficulty levels, means mi (i1,2,3),
with shared s1, same choice policy.

12
Optimal Bias
Premises
Expected Rewardc LikelihoodcRewardc
Choose Alternative with larger Expected Rewardc
(This policy maximizes expected reward overall.)
Result
Xopt/s log(RewardH/RewardL)/d'
13
As d increases, Xopt decreases
d0
d increases
14
Estimating normalized m and Xc values from
dataat each signal lag, one difficulty level
Std normal deviates (s 1)
15
Estimating normalized m and Xc values from
dataat each signal lag with multiple difficulty
levels
16
Optimal Bias with Multiple Difficulties
Optimal Criterion
Actual Criterion
17
Empirical Characterization of Time-courseof
Change in Sensitivity (d)
Subjects sensitivity, a definition in theory of
signal detectability
When response signal delay varies
For each subject, fit with function
18
Subject Sensitivity
19
(No Transcript)
20
Actual vs. Optimal bias for three Ss Except for
sl, all participants show thestart with a high
bias, then level off,conforming approximately to
optimal. All participants are under-biased for
short lags At longer lags, some are under,
someare over, and some are optimal.
21
Our Questions

Can we trace the effect of reward bias on
decision making over time?
Can we determine what would be the optimal reward
effect?
Can we determine how well participants do at
achieving optimality?
Can we uncover the processing mechanisms that
lead to the observed patterns of behavior?

22
Two Paths

Qualitative analysis with a one-dimensional
decision variable (following Holmes and Feng)
asking
How should reward bias be represented?
Possible answers
Offset in initial conditions?
An additional term in the input to the decision
variable?
A time-varying offset that optimizes reward?
A fixed offset in the value of the decision
variable?

Inverse-Micro-Speed-Accuracy Tradeoff (discovered
by Juan)
Steps toward a Leaky Competing Accumulator model
that addresses this and other aspects of the data.

23
Qualitative Dynamical Analysis

Based on one dimensional leaky integrator model.
Input I aC C is chosen from -5,-3,-1,1,3,5.
Initial condition x 0
Chose left if x gt 0 when the response signal is
detected otherwise choose right.
Accuracy approximates exponential approach to
asymptote because of leakage.
How is the reward implemented?
Offset in initial conditions?
An additional term in the input to the decision
variable?
A time-varying offset that optimizes reward?
A fixed offset in the value of the decision
variable?

24
Offset in Initial Conditions

Note
Effect of bias decays away as t increases.

25
Reward as a term in the input

Reward signal comes at t processing starts at
that time
For tlt0 input b
For tgt0, input baC

Notes
Effect of the bias persists.
But bias is sub-optimal initially, and there is
no dip.
Initially high bias and dip occurs if s starts
low and increases at stimulus onset.

26
Time-varying term that optimizes rewards (No
free parameter for reward bias)

Expression for b(t) is for a single difficulty
level.
Bias is equivalent to a time-varying criterion
-b(t).
There is a dip at
No analytic expression is available for multiple
difficulty levels, but numerical simulation is
possible.

1

0.8
RSC 1, diff 5
RSC 0, diff 5
0.6
RSC 1, diff 3
P of choice toward larger reward
RSC 0, diff 3
RSC 1, diff 1
0.4
RSC 0, diff 1
0.2
0

0
0.5
1
1.5
2
2.5
Time (s)
27
Reward as a constant offset in the decision
variable

Notes
Equivalent to setting criterion at m0
Bias effect persists for llt0.
With a single C level , a dip at
Prediction and test higher C level ? earlier dip
Variability in starting point or magnitude of
offset can pull initial bias off ceiling.

28
Preliminary Conclusion

Fitqual seems possible with one-d model
If
We treat reward as a constant offset in the
decision variable
And
The value of the constant varies from trial to
trial
Or
There is added starting point variability
Next step
Actually try to fit individual subject data

29
A New Phenomenon Discovered by Juan
Inverse-Micro-SAT! (also occurs in Monkey Data)
30
Consistent with other models?

Ratcliff and colleagues, and also Shadlen and
colleagues, argue for integration to a bound,
even in response-signal tasks like this one.
Once bound is reached, the participant enters a
discrete decision state.
Our data suggests that the decision variable
remains continuous even to the end of the trial.
Time to respond reflects this continuous state.

31
High-Threshold Leaky Competing Accumulator Model
Decision variable remains continuous until signal
occurs Signal provides additional input to the
accumulators, driving to high threshold
Response Triggered
x1
x2
Response Signal
32
Preliminary Simulations
33
Reward, Stimulus and Response Cue All Contribute
Input to Accumulators
34
Three PossibleArchitectures
2
1
3

Write a Comment

User Comments (0)