Title: Do We Still Need Probability Sampling in Surveys
1Do We Still Need Probability Sampling in Surveys?
- Robert M. Groves
- University of Michigan and
- Joint Program in Survey Methodology, USA
2Outline
- The total survey error paradigm in scientific
surveys - The decline in survey participation
- The rise of internet panels
- The second era of internet panels
- So... do we need probability sampling?
3Outline
- The total survey error paradigm in scientific
surveys - The decline in survey participation
- The rise of internet panels
- The second era of internet panels
- So... do we need probability sampling?
4The Ingredients of Scientific Surveys
- A target population
- A sampling frame
- A sample design and selection
- A set of target constructs
- A measurement process
- Statistical estimation
5Deming (1944) On Errors in Surveys
- American Sociological Review!
- First listing of sources of problems, beyond
sampling, facing surveys
6(No Transcript)
7Comments on Deming (1944)
- Includes nonresponse, sampling, interviewer
effects, mode effects, various other measurement
errors, and processing errors - Includes nonstatistical notions (auspices)
- Includes estimation step errors (wrong weighting)
- Omits coverage errors
- total survey error not used as a term
8Sampling Text Treatment of Total Survey Error
- Kish, Survey Sampling, 1965
- 65 of 643 pages on various errors, with specified
relationship among errors - Graphic on biases
9Frame biases
Consistent Sampling Bias
Sampling Biases
Constant Statistical Bias
Noncoverage
Nonresponse
Nonobservation
Field data collection
Nonsampling Biases
Observation
Office processing
10Total Survey Error (1979)Anderson, Kasper,
Frankel, and Associates
- Empirical studies on nonresponse, measurement,
and processing errors for health survey data - Initial total survey error framework in more
elaborated nested structure
11Sampling
Variable Error
Field
Nonsampling
Processing
Frame
Total Error
Sampling
Consistent
Noncoverage
Bias
Nonobservation
Nonresponse
Nonsampling
Field
Observation
Processing
12Survey Errors and Survey Costs (1989), Groves
- Attempts conceptual linkages between total survey
error framework and - psychometric true score theories
- econometric measurement error and selection bias
notions - Ignores processing error
- Highest conceptual break on variance vs. bias
- Second conceptual break on errors of
nonobservation vs. errors of observation
13Mean Square Error
construct validity theoretical validity empirical
validity reliability
Variance
Errors of Nonobservation
Observational Errors
Coverage
Nonresponse
Sampling
Interviewer
Respondent
Instrument
Mode
criterion validity - predictive validity -
concurrent validity
Bias
Observational Errors
Errors of Nonobservation
Coverage
Nonresponse
Sampling
Interviewer
Respondent
Instrument
Mode
14Nonsampling Error in Surveys (1992), Lessler and
Kalsbeek
- Evokes total survey design more than total
survey error - Omits processing error
15(No Transcript)
16Introduction to Survey Quality, (2003), Biemer
and Lyberg
- Major division of sampling and nonsampling error
- Adds specification error (a la construct
validity) - Formally discusses process quality
- Discusses fitness for use as quality definition
17(No Transcript)
18Survey Methodology, (2004) Groves, Fowler,
Couper, Lepkowski, Singer, Tourangeau
- Notes twin inferential processes in surveys
- from a datum reported to the given construct of a
sampled unit - from estimate based on respondents to the target
population parameter - Links inferential steps to error sources
19The Total Survey Error Paradigm
Measurement
Representation
Inferential Population
Construct
Target Population
Validity
Coverage Error
Measurement
Sampling Frame
Measurement Error
Sampling Error
Response
Sample
Nonresponse Error
Processing Error
Respondents
Edited Data
Survey Statistic
20Summary of the Evolution of Total Survey Error
- Roots in cautioning against sole attention to
sampling error - Framework contains statistical and nonstatistical
notions - Most statistical attention on variance
components, most on measurement error variance - Late 1970s attention to total survey design
- 1980s-1990s attempt to import psychometric
notions - Key omissions in research
215 Myths of Survey Practice that TSE Debunks
- Nonresponse rates are everything
- Nonresponse rates dont matter
- Give as many cases to the good interviewers as
they can work - Postsurvey adjustments eliminate nonresponse
error - Usual standard errors reflect all sources of
instability in estimates (measurement error
variance, interviewer variance, etc.)
22Outline
- The total survey error paradigm in scientific
surveys - The decline in survey participation
- The rise of internet panels
- The second era of internet panels
- So... do we need probability sampling?
23Response Rates
- In most rich countries response rates on
household and organizational surveys are
declining - deLeeuw and deHeer (2002) model a 2 percentage
point decline per year - Probability sampling inference is unbiased from
nonresponse with 100 response rate
24- Recent studies challenge a simple link between
response rates and nonresponse error - Reading Keeter et al. (2000), Curtin et al.
(2000), Merkle and Edelman (2002) suggests
response rates dont matter - Standard practice urges maximizing response rates
- Whats a practitioner to do?
25Mismatches between Statistical Expressions for
Nonresponse Error and Practice
26What does the Stochastic View of Response
Propensity Imply?
- Key issue is whether the influences on survey
participation are shared with the influences on
the survey variables - Increased nonresponse rates do not necessarily
imply increased nonresponse error - Hence, investigations are necessary to discover
whether the estimates of interest might be
subject to nonresponse errors
27Assembly of Prior Studies of Nonresponse Bias
- Search of peer-reviewed and other publications
- 47 articles reporting 59 studies
- About 959 separate estimates (566 percentages)
- mean nonresponse rate is 36
- mean bias is 8 of the full sample estimate
- We treat this as 959 observations, weighted by
sample sizes, multiply-imputed for item missing
data, standard errors reflecting clustering into
59 studies and imputation variance
28Percentage Absolute Relative Bias
29Percentage Absolute Relative Nonresponse Bias by
Nonresponse Rate for 959 Estimates from 59 Studies
301. Nonresponse Bias Happens
312. Large Variation in Nonresponse Bias Across
Estimates Within the Same Survey, or
323. The Nonresponse Rate of a Survey is a Poor
Predictor of the Bias of its Various Estimates
(Naïve OLS, R2.04)
33Conclusions
- Its not that nonresponse error doesnt exist
- Its that nonresponse rates arent good
predictors of nonresponse error - We need auxiliary variables to help us gauge
nonresponse error
34A Practical Question
- What attraction does a probability sample have
for representing a target population if its
nonresponse rate is very high and its respondent
count is lower than equally-costly nonprobability
surveys?
35Outline
- The total survey error paradigm in scientific
surveys - The decline in survey participation
- The rise of internet panels
- The second era of internet panels
- So... do we need probability sampling?
36A Solution to Response Rate Woes
- Web surveys offer a very different cost structure
than telephone and face-to-face surveys - Almost all fixed costs
- Very fast data collection
- But there is no sampling frame
- Often probability sampling from large volunteer
groups - Internet access varies across and within countries
37Access/Volunteer Internet Panels
- Massive change in US commercial survey practice,
moving from telephone and mail paper
questionnaires to web surveys - Survey Sampling, a major supplier of telephone
samples over the past two decades now reports
that 80 of their business is web panel samples - Some businesses do only web survey measurement
38The Method
- Recruitment of email IDs from internet users
- At survey organizations web site
- Through pop-ups or banners on others sites
- Through third party vendors
- A June 15, 2008, Google search of make money
doing surveys yields 19,300 hits - make 10 in 5 minutes www.SurveyMonster.com
39(No Transcript)
40- There is a new industry
- Greenfield Online
- Survey Sampling
- e-Rewards
- Lightspeed
- ePocrates
- Knowledge Networks
- Private company panels
- Proprietary panels
Baker, 2008
Inside Research, 2007
41Reward Systems Vary
- Payment per survey
- Points per survey, yielding eligibility for
rewards - Points for sweepstakes
42Adjustment in Estimation
- Estimation usually involves adjustment to some
population totals - Some firms have propensity model-based
adjustments - proprietary estimation systems abound
43Outline
- The total survey error paradigm in scientific
surveys - The decline in survey participation
- The rise of internet panels
- The second era of internet panels
- So... do we need probability sampling?
44September, 2007, Respondent Quality Summit
- Head of Proctor and Gamble market research
- Cites Comscore 0.25 of internet users
responsible for 30 of responses to internet
panels - Cites average number of panel memberships of
respondents of 5-8 - Presents examples of failure to predict behaviors
45The number of surveys taken matters.
46The Practical Indicators of Quality
- Cheating on qualifying questions
- Internal inconsistencies
- Overly fast completion
- Straightlining in grids
- Gibberish or duplicated open end responses
- Failure of verification items in grids
- Selection of bogus or low-probability answers
- Non-comparability of results with non-panel
sample
Baker, 2008
47Panel response rates are in decline as panelists
do more surveys.
MSI, 2005 in Baker, 2008
48Where are we now?
- An industry in turmoil
- Active study of correlates of low quality
conducted by sophisticated clients - Professional associations attempting to define
quality indicators
49Outline
- The total survey error paradigm in scientific
surveys - The decline in survey participation
- The rise of internet panels
- The second era of internet panels
- So... do we need probability sampling?
50Access Panels and Inference
- Access panels have conjoined frame development
and sample selection - Without documentation of the frame development,
assessment of coverage properties are not
tractable - Many use probability sampling from the volunteer
set, but ignore this in estimation
51A Better Question
- Not do we still need probability sampling? but
can we develop good sampling frames with rich
auxiliary variables?
52Target Population
Target Population
Model- assisted
Sampling Frame
Model- assisted
Sampling Frame
?
Randomization theory
Sample
Sample
Model- assisted
Respondents
Respondents
53The Value of Probability Sampling From
Well-defined Frames
- Randomization theory is the powerful linking tool
between the sample and the frame - Models of nonresponse adjustment are enhanced by
auxiliary variables measured on respondents and
nonrespondents
54The Role of Probability Sampling in this Context
- Probability sampling has low marginal costs
within a defined sampling frame - Probability sampling offers stratification
benefits - A sampling frame with rich auxiliary variables
can improve stratification effects - Access panels should strive for well-defined
frame development
55Speculation
- As adjustment for nonresponse becomes more
important, - Richness of auxiliary variables is primary
- Coverage of population becomes relatively less
important - Hence, frame data and field observations on
nonrespondents and respondents are valued
56Outline
- The total survey error paradigm in scientific
surveys - The decline in survey participation
- The rise of internet panels
- The second era of internet panels
- So... do we need probability sampling?