Title: Questionnaire Development Survey Methods Sampling Fundamentals
1Questionnaire DevelopmentSurvey MethodsSampling
Fundamentals
- AE B37 - Week 2 15 January 2002 MM
2Questionnaire Developmentand Survey Methods
- Further readings
- Churchill, Iacobucci Chapters 7, 8
- Malhotra Chapter 6, 10
- Aaker et al. Chapter 10, 12
3Sampling methods
- Further readings
- Churchill, Iacobucci Chapters 10-11
- Malhotra Chapter 11-12
- Aaker et al. Chapter 14-15
4Questionnaire design
- A questionnaire is a formalised set of questions
for obtaining information from respondents - Questions must translate the needed information
- Questionnaire must encourage cooperation
- Questionnaire should minimise response error
- Response errors Inaccurate / misrecorded /
misanalysed answers (due to the researcher, to
the interviewer or to the respondents)
5Questionnaire Design Process
- Specify the information needed
- Specify the type of interview method
- Determine the content of individual question
- Design the question to overcome any respondent
inability/unwillingness to answer - Decide on question structure
- Determine the question wording
- Arrange the question in proper order
- Identify the form and layout
- Reproduce the questionnaire
- Eliminate bugs by pretesting
61. Specify the information needed
- This relates to the formulated MR problem
- It can be helpful, prior to design the actual
questionnaire, to define a blank table where the
desired data will be stored (e.g. Excel
spreadsheet) - It is important to specify the needed information
having a clear idea of the target population
(different types of respondents)
72. Type of interview (survey methods)
- The questionnaire will be strictly conditional to
the survey method
Source Malhotra (1999)
8Telephone interviews
- Traditional interviewing (a phone, a pencil and a
questionnaire) - Computer Assisted Telephone Interviewing (CATI)
computerised questionnaire administered to
respondents through the phone
- It is expensive (at least 700-1000 interviews to
justify costs) - Not very suitable for open questions
- Interviews should be short (10-15 minutes)
- Use of stimuli is not possible
- Software checks for consistency and completeness
- Reduces the interviewers errors
- May control sampling procedures
- Measures quality parameters (e.g. duration)
- Data are ready to use
9Personal interviews
- In-home
- Mall-Intercept
- Computer-Assisted (CAPI) with interviewer
- Personal contact with interviewer
- Highly Expensive
- Interviewer influence/bias
- Wariness of respondents
- Cheaper
- Easy use of stimuli
- High response rate
- Difficulties in obtaining sensitive information
(no anonymity) - High social desirability
-
- Increased involvement of respondant
- On-screen and off-screen stimuli
- Limited sampling control
- Slower (but time perception varies)
10Mail surveys
- Mail interviews (Fax just for businesses)
- Mail panels
- Cheap
- Optimal for sensitive question/anonymity/social
desirability - No interviewer bias
- Very low response rate
- Selection bias / low sample control
- Very slow
- Allow for longitudinal (time comparison) design
- Higher response rate
- Higher sample control
- More expensive
- Low control of data collection environment
-
11Electronic interviews
- E-mail (ASCII/text message)
- Web-based (HTML/Java)
- Very cheap
- Quick
- No interviewer bias
- Require data entry before analysis
- Currently it is impossible to use logic
checks/randomisation (clients) - Low quality of data
- Low sample control
- Low response rate (and decreasing)
- Allow for stimuli
- Logic/consistency checks (CAWI)
- Higher sample control
- Anonimity/Sensitive questions (?)
- (Very) low sample control
- Selection bias
- Problems in compiling lists
- Even lower response rate
-
12Response and costs
133. Determine the content of individual question
- Is the question necessary?
- Unnecessary question should be eliminated, unless
they serve for other purposes (involvement,
disguise the purpose of sponsorship, etc.) - Is a single question sufficient?
- Do you think that organic products are healthier
and animal-friendlier? - What does a no-answer mean?
- Why do you eat Sainsbury pizza?
- Potential different interpretations because the
cheese is better or because Sainsbury is closer
to my place (attributes or knowledge of it)? - What do you like about Sainsbury pizza as
compared to other pizzas? - Why did you first buy Sainsbury pizza?
144. Overcoming problems in answering
- It is necessary to consider any factor that might
lead to an unanswered question or an inaccurate
answer - Lack of information (do they answer anyway?)
- What did you eat as a dessert for Easter?
- Lack of memory (avoid omission, telescoping or
creation effects) - Incapacity to articulate certain responses
- For certain vague questions, multiple choice is
preferable - An unanswered question due to incapacity may lead
to abandon the questionnaire - Unwillingness to answer (sensitive information,
too much effort, the question/context is
perceived as inappropriate)
15Techniques to get sensitive questions answered
- Hide the question among a group of innocent
questions - State that the behaviour of interest is common or
the usefulness of an answer - Use the third-person technique
- Provide categories instead of asking for figures
- Use randomised techniques (but you lose any
linkage with other questions).
16Randomised techniques
- Please flip a coin.
- If you get a head, please answer to question A,
if you get a tail, please answer to question B. - Are you enjoying this lecture?
- Are you a female?
- YES NO
17Interpretation of randomised questions
- We got the following results for the question
- YES 20 NO 80
- We know that 38 of our respondents are female
and 62 are male - We know that the probability of getting a head or
a tail is 50
18Results
195. Choosing question structure
- Unstructured question (open-ended, free response)
- Good as first questions on a topic
- Less biasing influence (but interviewer bias)
- Coding of responses is costly and time-consuming
- Structured questions
- Multiple Choice (A, B or C?) order bias
- Dichotomous (Yes or No Dont know) question
wording bias - Scales (from 1 to 10)
20Primary scales
- Nominal (Are you employed/non employed/student)
- Ordinal (order the following brands according to
your preferences) - Interval (What is the temperature today?)
- Difference can be compared
- The 0 point is arbitrary
- Ratio (what were last year sales?)
- The 0 point is not arbitrary
21Secondary scales
Source Malhotra (1999)
22Itemised ranking scales
- Likert scale This cheese is soft
- Strongly disagree 2. Disagree 3. Neither 4.
Agree 5. Strongly Agree - Semantic differential This cheese is
- Soft Hard
- Stapel scale This cheese is soft
- -5 -4 -3 -2 -1 1 2
3 4 5
- Easy, suitable for any survey method
- Slower, read all statements
- Allow to express intensity
- Relevance of positive, negative or neutral
phrasing
236. Wording
- Define the issue
- Use ordinary words
- Avoid ambiguous words (no usually, a bit)
- Avoid leading questions (suggesting the answer)
- Avoid implicit alternatives (do you like to
drive?) - Avoid implicit assumptions (are you in favour of
multiple choice tests? if this reduces the
likelihood of top marks? - Avoid generalisation and estimates (how much do
you spend in food every year?) - Use positive and negative statements (advisable
to use dual statements for different respondents
e.g. Is this cheese soft? Is this cheese hard?)
247. Order of questions
- Use good opening questions
- Ask first basic information (target variables)
- Ask classification and identification questions
at the end - Place difficult and sensitive question towards
the end - General questions should precede specific
questions - Follow a logical order (flow chart)
258. Form and Layout
- Check position of questions in the page
- No use of different colours (little effect, more
complicated) - Divide questionnaire into parts
- Number questions
- Number questionnaires (but risk of loss of
anonimity)
269. Reproduction of the questionnaire
- Quality of paper
- Professional appearance
- Avoid splitting questions across pages
2710. Pretesting
- Test preliminary the questionnaire on a small
number of respondents, considering all previous
issues. Any questionnaire can be improved. - Better by personal interview (regardless of the
actual survey method, a second pretesting may be
carried out for some methods) - Use a variety of interviewers for personal
interviews - Respondent is asked to think aloud
- Debriefing (go through the questionnaire with the
respondent after he has finished to compile it)
28Sampling
- A sample is a subgroup of the population selected
for the study - Sample statistics allow to make inference about
the population parameters, through estimation and
hypothesis testing
29The sampling design process
- Define the target population, its elements and
the sampling units - Determine the sampling frame (list)
- Select a sampling technique
- Sampling with/without replacement
- Probability/Nonprobability sampling
- Determine the sample size
- Precision versus costs
- The marginal value in terms of precision of
additional sampling units is decreasing - Execute the sampling process
30The sampling techniques
- Nonprobabilistic samples
- Convenience sampling
- Judgmental sampling
- Quota sampling
- Snowball sampling
- Probabilistic samples
- Simple random sampling
- Systematic sampling
- Stratified sampling
- Cluster sampling
- Other sampling techniques
31Representativeness
- A sample can be considered as representative
when it is expected to exhibit the average
properties of the population
32Selection bias
- Improper selection of sample units (ignoring a
relevant control variable that generate bias),
so that the values observed in the sample are
biased and the sample is not representative. - Example
- A survey is conducted for measuring goat milk
consumption, but the interviewers just select
people in urban areas, that on average drink less
goat milk.
33Convenience sampling
- Only convenient elements enter the sample
- Cheapest method
- Quickest method
- Selection bias
- Non representativeness
- Inference is not possible
34Judgmental sampling
- Selection based on the judgment of the researcher
- Non representativeness
- Inference is not possible
- Subjective
35Quota sampling
- Define control categories (quotas) for the
population elements, such as sex, age - Apply a restricted judgmental sampling, so that
quotas in the sample are the same of those in the
population
- Cheapest method
- Quickest method
- There is no guarantee that the sample is
representative (relevance of control
characteristic chosen) - Many sources of selection bias
- No assessment of sampling error
36Snowball sampling
- A first small sample is selected randomly
- Respondents are asked to identify others who
belong to the population of interests - The referrals will have demographic and
psychographic characteristics similar to the
referrers
- Lower costs
- Low variability
- Useful for rare populations
- Inference is not possible
37Simple random sampling
- Each element of the population has a known and
equal probability of selection - Every element is selected independently from
other elements - The probability of selecting a given sample of n
elements is computable (known)
- Statistical inference is possible
- It is easily understood
- Representative samples are large and expensive
- Standard errors are larger than in other
probabilistic sampling techniques - Sometimes it is difficult to execute a really
random sampling
38Systematic sampling
- A list of N elements in the population is
compiled, ordered according to a specified
variable - Unrelated to the target variable (similar to SRS)
- Related to the target variable (increased
representativeness) - A sampling size n is chosen
- A systematic step of kN/n is set
- A random number s between 1 and N is extracted
and represents the first element to be included - Then the other elements selected are sk, s2k,
s3k
- Cheaper and easier than SRS
- More representative if order is related to the
interest variable (monotone) - Sampling frame not always necessary
- Less representative (biased) if the order is
cyclical
39Stratified sampling
- Population is partitioned in strata through
control variables (stratification variables),
closely related with the target variable, so that
there is homogeneity within each stratum and
heterogeneity between strata - A simple random sampling frame is applied in each
strata of the population - Proportionate sampling size of the sample from
each stratum is proportional to the relative size
of the stratum in the total population - Disproportionate sampling size is also
proportional to the standard deviation of the
target variable in each stratum
- Gains in precision
- Include all relevant subpopolation even if small
- Stratification variables may not be easily
identifiable - Stratification can be expensive
40Cluster sampling
- The population is partitioned into clusters
- Elements within the cluster should be as
heterogeneous as possible with respect to the
variable of interests (e.g. area sampling) - A random sample of clusters is extracted through
SRS (with probability proportional to the cluster
size) - 2a. All the elements of the cluster are selected
(one-stage) - 2b. A probabilistic sample is extracted from the
cluster (two-stage cluster sampling)
- Reduced costs
- Higher feasibility
- Less precision
- Inference can be difficult
41Basic SRS sample statistics (unknown pop.
variance)
Mean case
Proportion case (p)
Standard deviation of X
Standard error of the mean/proportion
ACCURACY
42Finite population correction factor
- Large samples (more than 10 of N) tend to
overestimate the population standard deviation of
the mean (proportion)
43Level of confidence a and z parameter
The level of confidence a refers to the
likelihood that the true population mean falls in
the identified confidence interval
For the normal distribution (which applies to SRS
with a good sample size), given a value of a, the
corresponding za/2 values is tabulated
a/2
a/2
a0.95 za/2 1.96
x
Confidence interval for x at a level of
confidence a
44Determining sample size
- Factors influencing sample size (n)
- Size of the population (N)
- Variability of the population (sX)
- Desired level of accuracy (q)
- Level of confidence (a)
- Budget constraint
45An example
- Our aim is to estimate the average weekly
consumption of beer in pints per student (x) in
the University Student Union - We dont know the population variability, but we
may roughly assume a large population standard
deviation (s) of 4 pints - We want to estimate the value with an accuracy
(q) of 0.5 pints - The target population (students at the university
of Reading) has 13,151 units (N) - We want to determine the sample size for a Simple
Random Sampling, choosing a level of confidence
of a0.95 (za/21.96)
46Sample size
Our example
47Have a look at the assignments