Title: Sequential Hypothesis Testing under Stochastic Deadlines
1Sequential Hypothesis Testing under Stochastic
Deadlines
- Peter Frazier, Angela Yu
- Princeton University
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAAA
2- Sequential
- Hypothesis
- Testing
3- under
- Stochastic
- Deadlines
4- Peter Frazier Angela Yu
- Princeton University
5Summary
- We consider the sequential hypothesis testing
problem and generalize the sequential probability
ratio test (SPRT) to the case with stochastic
deadlines. - This causes reaction times for correct responses
to be faster than for errors, as seen in
behavioral studies.
6- Both decreasing the deadlines mean and
increasing its variance causes more response
urgency. - Results extend to the general case with convex
continuation cost.
71. Sequential Probability Ratio Test
8Sequential Hypothesis Testing
At each time, the subject decides whether to act
(A or B), or collect more information. This
requires balancing speed vs. accuracy.
9- We observe a sequence of i.i.d. samples x1,x2,...
from some density. - The underlying density is unknown, but is known
to equal either f0 or f1. - We begin with a prior belief about whether f0 or
f1 is the true density, which we update through
time based on the samples. - We want to maximize accuracy
10- Let ? be the index of the true distribution.
- Let p0 be the initial belief, P?1.
- Let pt P?1 x1,...,xt.
- Let c be a cost paid per-sample.
- Let d be a cost paid to violate the deadline
(used later) - Let ? be time-index of the last sample collected.
- Let ? be the guessed hypothesis.
11- Posterior probabilities may be calculated via
Bayes Rule
Probability (pt)
Time (t)
12Objective Function
The objective function is
Probability of Error
Time Delay Penalty
where we require that the decisions ? and ? are
non-anticipative, that is, whether ? lt t is
entirely determined by the samples x1,...,xt, and
? is entirely determined by the samples x1,...,x?.
13Optimal Policy (SPRT)
Wald Wolfowitz (1948) showed that the optimal
policy is to stop as soon as p exits an interval
A,B, and to choose the hypothesis that appears
more likely at this time.
A
Probability (pt)
B
?
Time (t)
This policy is called the Sequential Probability
Ratio Test or SPRT.
142. Models for Behavior
15- A classic sequential hypothesis testing task is
detecting coherent motion in random dots. - One hypothesis is that monkeys and people behave
optimally and according to the SPRT.
16Broadly speaking, the model based on the classic
SPRT fits experimental behavior well.
Accuracy vs. Coherence
Reaction Time vs. Coherence
(Roitman Shadlen, 2002)
There is one caveat, however
17- SPRT fails to predict the difference in response
time distributions between correct and error
responses. - Correct responses are more rapid in experiments.
- SPRT predicts they should be identically
distributed.
Accuracy
Mean RT
(Data from Roitman Shadlen, 2002 analysis from
Ditterich, 2007)
183. Generalizing to Stochastic Deadlines
19Monkeys occasionally abort trials without
responding, but it is always better to guess than
to abort under the assumed objective function.
(Data from Roitman Shadlen, 2002) (Analysis
from Ditterich, 2006)
To explain the discrepancy, we hypothesize a
limit on the length of time that monkeys can
fixate the target.
20Objective Function
Hypothesizing a decision deadline D leads to a
new objective function
Deadline Penalty
Time Penalty
Error Penalty
We will assume that D has a non-decreasing
failure rate, i.e. PDt1 Dgtt is
non-decreasing in t. This assumption is met by
deterministic, normal, gamma, and exponential
deadlines, and others.
21Optimal Policy
The resulting optimal policy is to stop as soon
as pt exits a region that narrows with time.
Generalized SPRT
Probability (pt)
Classic SPRT
Deadline
Time (t)
22Response Times
Under this policy, correct responses are
generally faster than error responses.
Correct Responses
Error Responses
Frequency of Occurrence
Reaction Time
23Influence of the Parameters
Deadline Uncertainty
Deadline Mean
Deadline Penalty
Time Penalty
Plots of the continuation region Ct (blue), and
the probability of a correct response P???t
(red). D was gamma distributed, and the default
settings were c.001, d2, mean(D)40, std(D)1.
In each plot we varied one while keeping the
others fixed.
24Theorem The continuation region at time t for
the optimal policy, Ct, is either empty or a
closed interval, and it shrinks with time (Ct1
µ Ct).
Proposition If PDlt1 1 then there exists a
T lt 1 such that CT . That is, the optimal
reaction time is bounded above by T.
25Proof Sketch
Define Q(t,pt) to be the conditional loss given
pt of continuing once from time t and then
behaving optimally.
Lemma 1 The continuation cost of the optimal
policy, Q(t,p), is concave as a function of p.
Lemmas 2 and 3 Wasting a time period incurs an
opportunity cost in addition to its immediate
cost c.
Lemma 4 If we are certain which hypothesis is
correct (p0 or p1), then the optimal policy is
to stop as soon as possible. Its value is
26Proof Sketch
Expected Loss
Q(t1,p)-c Q(t,p) min(p,1-p)
1
p
0
Ct1
Ct
27References
- Anderson, T W (1960). Ann. Math. Statist. 31
165-97. - Bogacz, R et al. (2006). Pyschol. Rev. 113
700-65. - Ditterich, J (2006). Neural Netw.
19(8)981-1012. - Luce, R D (1986). Response Times Their Role in
Inferring Elementary Mental Org. Oxford Univ.
Press. - Mozer et al (2004). Proc. Twenty Sixth Annual
Conference of the Cognitive Science Society.
981-86. - Poor, H V (1994). An Introduction to Signal
Detection and Estimation. Springer-Verlag. - Ratcliff, R Rouder, J N (1998). Psychol. Sci.
9 347-56. - Roitman J D, Shadlen M N (2002). J. Neurosci.
22 9475-9489. - Siegmund, D (1985). Sequential Analysis.
Springer. - Wald, A Wolfowitz, J (1948). Ann. Math.
Statisti. 19326-39.