Title: Survey Research Methods
1Survey Research Methods
2. Reliability, Validity and Scale Construction
Steve Fisher, Robert Andersen and Anthony Heath
Department of Sociology
UNIVERSITY of OXFORD
2The Logic of Sampling Measurement
3General Problem of MeasurementReliability
Validity
- Reliability
- Refers to the replicability of the measurement
procedure to yield consistent results - Validity
- Refers to the extent to which the measurement
procedure actually measures the concept that it
is intended to measure
4Different aspects of reliabilityTest-retest,
Inter-observer, Inter-item
- Test-retest reliability
- Consistency of repeated measurements on the same
subjects - Used for concepts that are believed to be stable
- N.B. subjects may change conditioning
- Inter-observer reliability
- Repeated measures by different observers on the
same subject - Especially important in coding open-ended
questions - Inter-item reliability
- Do the items in a composite measure correlate
highly - Cronbachs alpha
5Types of Validity 1. Face and Content
- Face validity
- Basically a subjective measure of validity
- Does it seem that we are measuring what we claim?
- Content validity
- Does the content of the measuring instrument
cover the full domain of the concept? - e.g., measures of left/right ideology requires
items tapping different but related things like
redistribution, privatisation, government
intervention etc.
6Types of Validity 2. Criterion-related Validity
- Criterion-related validity
- Correlation with other measures known to have
validity. (e.g., questionnaire measures of
turnout validated against registers) - Must know the criterion itself has been measured
well - Appropriate criteria do not always exist
- Predictive validity
- Does our measure predict expected outcomes
- (e.g., Attitudes to taxes can be validated by
their ability to predict electoral support for
tax-cutting party. But other factors influence
voting behaviour, so this is not clear-cut and
more theory-dependent)
7Types of Validity3. Construct Validity
- Based on a theoretical prediction about the
relationship between the concept and other items. - Does the measured concept relate empirically to
other measured variables in ways that are
theoretically expected (i.e., does the measure
yield the expected correlations?) - Theory-laden and rather weak but at least it can
always be attempted - NB the lack of an expected correlation may
reflect a bad theory or another measure involved
was badly measured - If your measure has been taken to test a theory
you cannot use the same theory to test the
construct validity of the measure
8Reliability and validitySome conclusions
- Reliability is relatively straightforward to
demonstrate, validity is much more difficult and
often theory-laden - You dont necessarily have to demonstrate
reliability and validity every time a measurement
procedure is used - Sometimes there will be existing measures which
previous researchers have shown to be valid and
reliable and you can simply borrow these - Remarkably few such measures in sociology and
political science. - See Heath, A. and J. Martin (1997) Why Are
There so Few Formal Measuring Instruments in
Social and Political Research? in Lyberg et. al
(eds.) Survey Measurement and Process Quality.
New York Wiley.
9Advantage of Composite Measures
- Based on idea triangulationseveral reliable
measures improves validity - Error scores tend to cancel out when we sum over
items - Gives greater variability in respondents scores
10Measuring abstract conceptsIndices and scales
- Composite measures of an abstract concept
- Adds together scores assigned to several
different measures of a construct - Assumes an underlying continuum (i.e., there is
structure in the data)scores represent specific
points along the continuum - Indices Individual measures can be distinct, but
together they represent a larger abstract concept
(e.g., Consumer Price Index, United Nations Index
of the best Countries to live) - Scales All indicators measure a single
dimensional concept (e.g., Likert scales
measuring attitudes)
11Measuring attitudinal intensityLikert-scales
- Evenly balanced response choices
- Mix direction of questions to avoid response set
bias - Standard response formats typically have five
categories for each item - Strongly agree, agree, disagree, strongly
disagree, undecided - Definitely likedefinitely dislike
- Very importantvery unimportant
- Definitely truedefinitely false
- Assign scores to the response categories (1-5)
- Sum together items (average if desired)
12Using scales to measure political attitudes
- For each statement, please circle the category
that best reflects your opinion - I believe that I can help change the minds of
public officials - Sometimes politics and government seem so
complicated that a person like me cant really
understand what is going on - People should vote in elections because each
individual vote can make a difference - Generally, those elected to parliament soon lose
touch with people - The government cares about what people like me
think - I doubt that individual people like me could
influence the platforms of political parties - Each of these items are Likert-items with
response categories ranging from strongly agree
to strongly disagree - Adapted from Gray, G. and N. Guppy (1999).
Successful Surveys. Research Methods and
Practice. Toronto Harcourt Brace, p.72.
13Some other types of scales to consider
- Feeling thermometers
- Scale of 0 (very cold)-100 (very warm)
- Often used to determine how people feel about
certain groups (used in AES for - Bogardus Social Distance Scale
- Measures the distance separating ethnic or other
groups from each other - Semantic Differential Scales
- Measures subjective feelings using polar opposite
adjectives (e.g., light/dark, deep/shallow,
modern/traditional, bad/good)
14An example from the BSAMeasuring income
satisfaction
- It is sometimes useful to give more detailed
response categories - e.g., Which of the phrases on this card would you
say comes closest to your feelings about your
households income these days? - Living comfortably on present income
- Coping on present income
- Finding it very difficult on present income
- Other answer (WRITE IN)
- (Dont know)
- Feel free to be creative
15Constructing Scales
- Choose items
- Check for dimensionality using factor analysis
- Compute a scale (i.e. sum the scores on the
items, recoding if necessary) - Check reliability of scale using Chronbachs
alpha and bivariate correlations - Make adjustments and test reliability again if
necessary
16How big an alpha?
- Early stages of research, gt0.7
- Individual level comparisons, gt0.9
- Even with 0.9 the standard error of measurement
is almost a third of the standard deviation of
test scores. (Nunnally and Bernstein, 1994).
17Group work
- Spend the next session working in groups in
Seminar room A or B - Aim is to design a set of questions that can be
used to build a valid and reliable scale for your
concept - Bring a draft questionnaire with you next week