Title: Optimal recall length in survey design
1- Optimal recall length in survey design
 Philip M. Clarke, Denzil G. Fiebig and Ulf-G
Gerdtham If the data were perfect, collected
from well designed randomised experiments, there
would be hardly room for a separate field of
econometrics. Griliches (1986, p. 1466)
2Life is short
- Q1 Have you drunk alcohol in the last 5 minutes?
- Q2 Have you drunk alcohol in the last three
days? - Q3 Have you had a pap smear in the last month?
- Why choose a short recall period?
3Background
- Types of error
- Omission- not remembering an event
- Telescoping recalling events outside the recall
window. - if the object of the exercise is to estimate
average consumption over a year, one extreme is
to approach a sample of households on January 1
and ask each to recall expenditures for the last
year. The other extreme is to divide the sample
over the days of the year, and to ask each to
report consumption for the previous day. The
first method would yield a good picture of each
households consumption, but runs the risk of
measurement error the second method will give
a good estimate of the mean consumption over all
households but will yield estimates of
individual expenditure that are only weakly
related to normal expenditures
4What if survey designers have some control over
errors?
- Suppose interested in estimating yearly incidence
- Did you visit a GP in the last 12 months?
- Answers subject to recall error due to long
survey window - What is optimal survey window (w) over which to
ask utilization questions? - Changing w has implications for recall error
- But w lt 12 months introduces a potentially
problematic prediction/imputation process - Did you visit a GP in the last 2 weeks?
- More accurate recall but now considerably less
information
5Illustration of the problem
12 Months ago (S)
Time of survey
Correctly recalled events
Incorrectly recalled events
6No accepted recall period
- National health surveys use very different recall
periods - MEPS For emergency room visits use in the past
12, while detailed health care use questions are
over 3-6 months - Australia uses 2 weeks for GP specialist visits
- European panel uses a year
7Continuous vs discrete case
- Continuous case relevant to a person that is
constantly using health care (e.g. insulin use of
a person with Type 1 diabetes) - Discrete case is simply whether someone has used
a type of health care (e.g. has the person been
screened in the last 12 months) - Many types of health care are both discrete
continuous, stats is difficult.
8Continuous Case
Actual use
Recall response
Wrecall window Starget period of interest
Mean use over S
Variance (over S)
9Nature of the error
(Bias)
(Expanded variance)
10Types of error I Bias
Short recall window
Long recall window
11Types of error II Increased variance
Short recall window
Long recall window
12Perfect recall
13Cases
Table 1 Relative RMSE of alternative estimators
of m Case 2 with N 1000, S 12, m s 1
Table entries represent
14Optimal recall for various
A hmax0, gmax1 B hmax0.1, gmax0.2 C
hmax0.5, gmax0.2 D hmax1, gmax0
15Discrete case
- Have analyzed both continuous discrete cases
- Suppose variable of interest is binary (0/1)
- Visit GP or not, screen or not, public or private
patient - Collect noisy measure X of truth Y
- Source of errors
- Incomplete or inaccurate recall
- False or strategic responses to sensitive
questions - Misclassification structure
- p01 Pr(X0Y1) (false negative)
- p10 Pr(X1Y0) (false positive)
16A simple (but useful) model
- All models are wrong but some are useful, George
Box - Make p01 and p10 functions of w 1,,S
- Assume
- p01(1) p10(1) 0
- dpjk(w)/dw gt 0
- p01(w) p10(w) lt 1
- Last assumption is monotonicity condition (MC) of
Hausman et al (1998) - Violating MC ? Pr(X1) (reported use) is not
monotonically increasing in Pr(Y1) (actual use)
17A simple (but useful) model
- Pr(YS) p
- S is target period
- Pr(Yw) pw f(p)
- p1 gt 0
- Problems when w lt S
- Less information (missing data) implies less
precise estimate of p - Need to know imputation process p f-1(pw)
- Latter problem ignored
- Suppose f(p) (w/S)p
- Natural estimator of p is suitably scaled sample
proportion estimated from N observations on Xw
18A basis for choosing recall window
- Need criterion to capture recall
error/information tradeoff in estimation of p - Recall errors increase with w
- Prediction error decreases with w
- Use Mean Squared Error of alternative estimators
19MSE discrete case
20W1 compared WS (Discrete Case)
21RMSE comparisons of estimators of probability
p0.6, p01p100.2
22Whats happening?
- As w increases there is more information being
gathered for each respondent - Implies estimator variance decreases
- Bias is not monotonic in w
- Thus RMSE curves relatively unrestricted in shape
- For case of continuous variable might expect
monotonicity and thus convex RMSE curves - Optimal window can be anywhere
- Depends on unknown parameters
- Depends on MSE criteria
- Other possible criteria?
23Example from Swedish data
- Data from Statistics Sweden Survey of Living
Conditions - Length in stay in hospital from 1996-7 surveys
- Target period of 12 weeks
- Assume no recall errors for w 1 week
- 11,948 respondents to survey
- Calibrate our model to the Patient registry gold
standard - p01(12) 0.13 p10(12) 0.01
24Whether hospitalized in last three months
25Recall error summary
- Optimal window can be anywhere
- Will depend on optimality criterion, objective of
analysis, unknown parameters - With some prior knowledge may be possible to
decide between short or long recall window - Recall error can be eliminated by choosing w1
- Ignores presence of target window different from
w1 - Have stressed tradeoff with information loss
- Have noted but not incorporated imputation
problem - Elimination of recall error may come at
considerable cost - Analysis hasnt considered accommodating problem
- Reinforces Why choose short w?
26Moving forward
- Need to understand more about the nature of
recall error over time for different types of
health care use. - So far studies have been largely restricted to
one-off validations of particular recall periods - A trial of different recall periods is need to
really understand nature of error and to move the
field forward. Watch this space
27Subjects
Medium recall group (130 patients)
Long recall group (130 patients)
Short recall group (130 patients)
Question 1 How many times in past two weeks have
you visited a doctor (i.e. General
practitioner)? Question 2 How many times in
past year have you visited a doctor (i.e. General
practitioner)? Question 3 During the past two
weeks have you had a blood test for any
reason? Question 4 During the past two weeks
have you been to a chemist to obtain Statins,
which are prescription drugs to lower your
cholesterol?
Question 1How many times in past six months have
you visited a doctor (i.e. General
practitioner)? Question 2How many times in past
year have you visited a doctor (i.e. General
practitioner)? Question 3 During the past six
months have you had a blood test for any
reason? Question 4 During the past six months
have you been to a chemist to obtain Statins,
which are prescription drugs to lower your
cholesterol?
Question 1How many times in past year have you
visited a doctor (i.e. General practitioner)? Que
stion 2 During the past year have you had a
blood test for any reason? Question 3During the
past year have you been to a chemist to obtain
Statins, which are prescription drugs to lower
your cholesterol?