Title: Non-response and what to do about it
1Non-response and what to do about it
- Gillian Raab
- Professor of Applied Statistics
- Napier University
2What do we mean by non-response
- Unit non response
- Item non response
- Start with the first of these
- It is a respondent to a survey who we tried to
get but did not obtain any response from - We may or may not know anything about them or
whether they exist
3What is an acceptable response rate?
It depends who you are. It depends on why the
response is poor It depends on whether
non-responders are like responders
4An example
- Postal survey on attitudes to racial
discrimination got a 45 response rate - Half of the letters were lost by the post-office,
but most of the others replied - No letters were lost, but a qualitative study
after the survey revealed that many people in the
study did not reply because they were hostile to
immigrant groups
5Types of missingness
- In the first example missing people might not be
thought to be different from others - Missing Completely at Random (MCAR)
- In the second one the missing people would be
likely to have quite different views - Missing Not at Random (MNAR)
6An intermediate position
- Missing At random (MAR)
- Assumes that within groups we can identify in the
survey, the missing people are just like the ones
who reply - The methods that survey researchers use all make
this assumption - But you need good information about those who
dont respond
7Survey non response is a world-wide problem
here the US refusal rates in major US surveys
8Acrostic et al. J of Official Statistics non
contact rates
9So doing something about it has become important
- The most commonly used method for unit
non-response is weighting - Non response weights can be calculated
- From data available on the sampling frame
- From another source of data for the population
- If it is the latter it is often called
POST-STRATIFICATION
10An example Ayr and Arran Health Survey
- Postal survey based on CHI
- Response rate about 50
- Cant be sure of response rate because dead
wood not properly accounted for - Population data available for data zones by 5
year age and sex groups
11How to do it simple case
- Age/sex groups only
- Make a table by age group and sex for the Census
data and the survey - Reasonable size groups (gt50)
- Calculate ratio of sample numbers to population
(overall 1.5 or 0.015) - Inverse of this becomes the grossing up weight
12Why such extreme weights here?
- The CHI is not a perfect sampling frame
- It has dead people on it and people who have
moved away - We think that non-contacts were replaced
- We did have some data on all addresses used
13(No Transcript)
14Item non-response
- Ignore cases with missing data
- Becomes problematic in regression models
- Use imputation to replace the missing values
- Informed inputation
- Hot deck imputation
- Model based imputation (can be multiple)
15Informed imputation
- Mainly used for sub-items when a total is needed
- Eg income, housing costs
- Often requires detailed examination of cases
- E.g. finding benefit entitlement
- Costs of a particular repair
- Survey specific
16Hot deck imputation
- Often used in census data
- Can be used for both unit and item non-response
- For unit non response a missing case is replaced
with another one that matches on whatever data
are available - For item non response another case is selected
that may be similar to the case with the missing
item on other things that are measured. - Can get very messy and difficult and lead to
things like pregnant men
17Model based imputation
- Assumes some statistical model for the data
- For example a multivariate normal distribution
- Start by relacing missing values by their means
- Fits the model and then replaces the missing
values with a sample from their predictive
distribution given the data - Do this repeatedly until the pattern stabilises
- You then have a complete data set to work with
18It works surprisingly well
- Even when the data are categories
- Just analysing the data as they are would give
misleading precision - But there is an easy adjustment that can be made
by running more than one imputation (usually 5)
and adding in a bit for the variation between
them.
19It is accessible
- Theory and practice has been developed by Don
Rubin and Jo Schaffer - Implemented in several programmes
- Including SAS PROC MI
- Once you have the multiple data sets they can be
analysed with PROC MIANALYSE
20Summary
- Unit non response
- Weighting
- Hot deck imputation
- Item non response
- Use available cases
- Use imputation
- Only time for a sketch of the latter