Dynamics of Reward Bias Effects in Perceptual Decision Making - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Dynamics of Reward Bias Effects in Perceptual Decision Making

Description:

Abstract optimality analysis Assumptions (from Signal Detection Theory) At a given time, decision variable comes from one of two distributions, means -m, ... – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 35
Provided by: psychSta
Category:

less

Transcript and Presenter's Notes

Title: Dynamics of Reward Bias Effects in Perceptual Decision Making


1
Dynamics of Reward Bias Effects in Perceptual
Decision Making
  • Jay McClelland Juan Gao
  • Building on
  • Newsome and RorieHolmes and FengUsher and
    McClelland

2
Our Questions
  • Can we trace the effect of reward bias on
    decision making over time?
  • Can we determine what would be the optimal reward
    effect?
  • Can we determine how well participants do at
    achieving optimality?
  • Can we uncover the processing mechanisms that
    lead to the observed patterns of behavior?

3
Overview
  • Experiment
  • Results
  • Optimality analysis
  • Abstract (one-d) dynamical model
  • Mechanistic (two-d) dynamical model

4
Human Experiment Examining Reward Bias Effect at
Different Time Points after Target Onset
Response
750 msec
Response Signal
Reward Cue
Stimulus
Response Window
0-2000 msec
  • Reward cue occurs 750 msec before stimulus.
  • Small arrow head visible for 250 msec.
  • Only biased reward conditions (2 vs 1 and 1 vs 2)
    are considered.
  • Stimuli are rectangles shifted 1,3, or 5 pixels L
    or R of fixation
  • Response signal (tone) occurs at 10 different
    lags 0 75 150 225 300 450 600 900 1200 2000
  • Participant receives reward if response occurs
    within 250 msec of response signal and is
    correct.
  • Participants were run for 15-25 sessions to
    provide stable data.
  • Data are from later sessions in which the effect
    of reward appeared to be fairly stable.

5
Slide from Zhang 2007
6
A participant with very little reward bias
  • Top panel shows probability of response giving
    larger reward as a function of actual response
    time for combinations of
  • Stimulus shift (1 3 5) pixels
  • Reward-stimulus compatibility
  • Lower panel shows data transformed to z scores,
    and corresponds to the theoretical construct
    mean(x1(t)-x2(t))bias(t)
    sd(x1(t)-x2(t))
  • where x1 represents the state of the accumulator
    associated with greater reward, x2 the same for
    lesser reward,and S is thought to choose larger
    reward if x1(t)-x2(t)bias(t) gt 0.

7
Participants Showing Reward Bias
8
(No Transcript)
9
Summary
  • Initial bias is high, and tapers off over time,
    to a fixed low level.
  • Questions
  • Is this reasonable?
  • How close to optimal is it?
  • Are some subjects more optimal that others?

10
Abstract optimality analysis
11
Assumptions (from Signal Detection Theory)
0.6
0.5
m
-m
0.4
0.3
0.2
0.1
0
-10
-8
-6
-4
-2
0
2
4
6
8
10
  • At a given time, decision variable comes from one
    of two distributions, means -m, m, same STD s1.
    - is consistent with high reward
  • Choose High reward alternative (H) if x lt Xc,
    else choose low reward alternative (L).
  • For three difficulty levels, means mi (i1,2,3),
    with shared s1, same choice policy.

12
Optimal Bias
Premises
Expected Rewardc LikelihoodcRewardc
Choose Alternative with larger Expected Rewardc
(This policy maximizes expected reward overall.)
Result
Xopt/s log(RewardH/RewardL)/d'
13
As d increases, Xopt decreases
d0
d increases
14
Estimating normalized m and Xc values from
dataat each signal lag, one difficulty level
Std normal deviates (s 1)
15
Estimating normalized m and Xc values from
dataat each signal lag with multiple difficulty
levels
16
Optimal Bias with Multiple Difficulties
Optimal Criterion
Actual Criterion
17
Empirical Characterization of Time-courseof
Change in Sensitivity (d)
Subjects sensitivity, a definition in theory of
signal detectability
When response signal delay varies
For each subject, fit with function
18
Subject Sensitivity
19
(No Transcript)
20
Actual vs. Optimal bias for three Ss Except for
sl, all participants show thestart with a high
bias, then level off,conforming approximately to
optimal. All participants are under-biased for
short lags At longer lags, some are under,
someare over, and some are optimal.
21
Our Questions
  • Can we trace the effect of reward bias on
    decision making over time?
  • Can we determine what would be the optimal reward
    effect?
  • Can we determine how well participants do at
    achieving optimality?
  • Can we uncover the processing mechanisms that
    lead to the observed patterns of behavior?

22
Two Paths
  • Qualitative analysis with a one-dimensional
    decision variable (following Holmes and Feng)
    asking
  • How should reward bias be represented?
  • Possible answers
  • Offset in initial conditions?
  • An additional term in the input to the decision
    variable?
  • A time-varying offset that optimizes reward?
  • A fixed offset in the value of the decision
    variable?
  • Inverse-Micro-Speed-Accuracy Tradeoff (discovered
    by Juan)
  • Steps toward a Leaky Competing Accumulator model
    that addresses this and other aspects of the data.

23
Qualitative Dynamical Analysis
  • Based on one dimensional leaky integrator model.
  • Input I aC C is chosen from -5,-3,-1,1,3,5.
  • Initial condition x 0
  • Chose left if x gt 0 when the response signal is
    detected otherwise choose right.
  • Accuracy approximates exponential approach to
    asymptote because of leakage.
  • How is the reward implemented?
  • Offset in initial conditions?
  • An additional term in the input to the decision
    variable?
  • A time-varying offset that optimizes reward?
  • A fixed offset in the value of the decision
    variable?

24
Offset in Initial Conditions
  • Note
  • Effect of bias decays away as t increases.

25
Reward as a term in the input
  • Reward signal comes at t processing starts at
    that time
  • For tlt0 input b
  • For tgt0, input baC
  • Notes
  • Effect of the bias persists.
  • But bias is sub-optimal initially, and there is
    no dip.
  • Initially high bias and dip occurs if s starts
    low and increases at stimulus onset.

26
Time-varying term that optimizes rewards (No
free parameter for reward bias)
  • Expression for b(t) is for a single difficulty
    level.
  • Bias is equivalent to a time-varying criterion
    -b(t).
  • There is a dip at
  • No analytic expression is available for multiple
    difficulty levels, but numerical simulation is
    possible.

1

0.8
RSC 1, diff 5
RSC 0, diff 5
0.6
RSC 1, diff 3
P of choice toward larger reward
RSC 0, diff 3
RSC 1, diff 1
0.4
RSC 0, diff 1
0.2
0

0
0.5
1
1.5
2
2.5
Time (s)
27
Reward as a constant offset in the decision
variable
  • Notes
  • Equivalent to setting criterion at m0
  • Bias effect persists for llt0.
  • With a single C level , a dip at
  • Prediction and test higher C level ? earlier dip
  • Variability in starting point or magnitude of
    offset can pull initial bias off ceiling.

28
Preliminary Conclusion
  • Fitqual seems possible with one-d model
  • If
  • We treat reward as a constant offset in the
    decision variable
  • And
  • The value of the constant varies from trial to
    trial
  • Or
  • There is added starting point variability
  • Next step
  • Actually try to fit individual subject data

29
A New Phenomenon Discovered by Juan
Inverse-Micro-SAT! (also occurs in Monkey Data)
30
Consistent with other models?
  • Ratcliff and colleagues, and also Shadlen and
    colleagues, argue for integration to a bound,
    even in response-signal tasks like this one.
  • Once bound is reached, the participant enters a
    discrete decision state.
  • Our data suggests that the decision variable
    remains continuous even to the end of the trial.
  • Time to respond reflects this continuous state.

31
High-Threshold Leaky Competing Accumulator Model
Decision variable remains continuous until signal
occurs Signal provides additional input to the
accumulators, driving to high threshold
Response Triggered
x1
x2
Response Signal
32
Preliminary Simulations
33
Reward, Stimulus and Response Cue All Contribute
Input to Accumulators
34
Three PossibleArchitectures
2
1
3
Write a Comment
User Comments (0)
About PowerShow.com