Title: Scale Development
1Scale Development
- Chapter 5 - Steps in Scale Development
2Step 1 Determine What You Want to Measure
- Theory is key for clarity
- Ground the content of the scale in substantive
theories on the construct of interest - Limit the bounds of the construct so that it does
not drift into unintended domains - Specify a theoretical model to guide the scales
development - Can be as simple as a well-formulated definition
of the construct being measured - Can be as involved as a description of how the
new construct will relate to existing constructs
3Step 1 Determine What You Want to Measure
- Specificity is key to clarity
- Constructs relate better to each other when they
match in levels of specificity - Do you want your measure to assess very specific
behaviors or be a more global measure of the
construct? - Actively decide the level of specificity that is
appropriate based on the intended use of the
scale - Areas to consider when actively deciding your
scales specificity - Content domain, setting, population
4Step 1 Determine What You Want to Measure
- Be clear about what to include
- Is your construct distinct from others?
- Does the measure match my goals for its use?
- Avoid using items that might cross over into a
related construct - Be cautious of similar items that may assess very
different phenomena - Know the frame of reference for and intended
purpose of your scale
5Step 2 Generate an Item Pool
- Create and select items with the specific
measurement goal in mind - Use your description of the scales purpose to
guide this process - Each item is a test of the strength of the latent
variable - Make sure the thing items have in common is a
construct, not merely a category - Think creatively about the construct of interest
6Step 2 Generate an Item Pool
- Be overinclusive and redundant
- Theoretical models that guide scale development
are based on redundancy - Content that is common across many items will
aggregate, canceling out their irrelevant and
idiosyncratic aspects - Redundancy allows you to compare items and have a
preference for one over the other - While redundancy is most prevalent in the initial
item pool, some redundancy in the final item pool
is desirable
7Step 2 Generate an Item Pool
- How many items do you need?
- More than you plan include in the final scale ?
- Lots of items increases your chances of good
internal consistency - Initial pool can be three to four times larger
than the final pool
8Step 2 Generate an Item Pool
- Starting the writing process
- Focus less on quality and more on expressing
relevant ideas - Identify a variety of ways to state the central
concept the scale is intending to measure - Paraphrase the construct of interest
- Create additional statements that get at the same
idea somewhat differently - Seek alternative ways to express important ideas
- Write quickly and uncritically
- Be critical after you have 3 to 4 times as many
items as you need
9Step 2 Generate an Item Pool
- Bad Items
- Exceptionally lengthy
- Unnecessarily wordy
- Multiple negatives
- Double barreled items
- Ambiguous pronoun references
- Misplaced modifiers
- Use adjective forms instead of noun forms
- Good Items
- Unambiguous
- Targets the appropriate reading level for the
intended sample
10Step 2 Generate an Item Pool
- Positively and negatively worded items
- Positively worded Items indicating high levels
of the latent variable when endorsed - Negatively worded Items indicating low levels
of the latent variable when endorsed - Purpose of including both in a scale is to avoid
acquiescence, affirmation, or agreement - Can be confusing to respondents
- Reverse worded items can perform poorly
11Step 3 Determine a Response Format
- This step should occur at the same time you are
generating items so they are compatible - Example response formats
- Thurstone Scaling
- Items are develop to correspond to varying
intensities of the attribute, spaced to represent
equal intervals, and formatted using agree
disagree options - Difficult to find items to consistently
correspond to the intensities desired - Practical problems with this method outweigh its
advantages
12Step 3 Determine a Response Format
- Example response formats (continued)
- Guttman Scaling
- Items that measure progressively higher levels of
an attribute - Individual endorses a block of contiguous items
and then reaches a point where the level of the
attribute measured by the items exceeds the level
of the attribute possessed by the individual - Highest item endorsed is the level of the
attribute for the individual - Works well for objective information where
affirmative responses to one item indicate
endorsement of lower items
13Step 3 Determine a Response Format
- Equally weighted items
- All items in the scale are viewed as equivalent
detectors of the construct of interest - They are imperfect indicators but can be
aggregated into an acceptably reliable scale - Allows for a variety of response options
- Provides the scale developer with latitude in
creating a measure that is best suited for a
particular purpose
14Step 3 Determine a Response Format
- Optimum number of response categories
- Variability is important
- Have lots of items
- Have lots of response options within items
- Respondents must be able to meaningfully
discriminate between options - Ability to discriminate between items may depend
on specific wording or physical placement of the
response options - Investigators ability and willingness to record
a large number of values for each item
15Step 3 Determine a Response Format
- Optimum number of response categories (contd)
- Odd or even number of response options depends on
the investigators purpose - Odd implies a central neutral point
- Even forces commitment in one direction
- Neither is superior to the other
16Step 3 Determine a Response Format
- Types of response formats
- Likert Scale
- Item is presented as a declarative statement,
followed by response options - Response options are worded so they have roughly
equal intervals of agreement - Used most frequently to measure opinions,
attitudes, beliefs - Must consider how strongly you should word items
in the initial item pool
17Step 3 Determine a Response Format
- Types of response formats (continued)
- Semantic Differential
- Used in reference to one or more stimuli which
are followed by a list of adjective pairs
representing opposite ends of a continuum - Adjectives can be bipolar or unipolar (depending
on the intended purpose of the scale) - The Likert and Semantic Differential Scales are
compatible with theoretical models explained in
the book
18Step 3 Determine a Response Format
- Types of response formats (continued)
- Visual analog scale
- Continuous line between a pair of descriptors
representing opposite ends of a continuum - Respondent marks a point on the line that
represents what is being measures - Investigator determines assigns scores to each
point selected - Disadvantages marks at the same point may not
mean the same thing to different individuals - Advantages very sensitive and useful for
measuring construct before and after some
intervening event, prevents response bias with
repeated measurements
19Step 3 Determine a Response Format
- Types of response formats (continued)
- Binary options
- Responses reflecting items sharing a common
latent variable could be aggregated into a single
score for that construct - Disadvantages has minimal variability so you
will need more items to obtain adequate scale
variance - Advantages respondent are willing to complete
more items
20Step 3 Determine a Response Format
- Numerical response formats neural processes
- Research has found that certain response options
may correspond to how the brain processes
numerical information - Likert scales numbers arrayed in a sequence
express quantity not only in their numerical
value but also in their location
21Step 3 Determine a Response Format
- Item time frames
- When formatting items you need to consider what
time frame will be specified or implied by your
scale - Not making a reference to a time frame implying
a universal time perspective - Choose it actively rather than passively
- Use theory to guide your decision
22Step 4 Have Experts Review the Item Pool
- Ask people who are knowledgeable in the content
area to review your initial item pool - Maximizes your content validity
- Confirms or invalidates your definition of the
phenomenon - Have them rate how relevant they think each item
is to what you intend to measure - Especially important if you are creating a
measure that will consist of separate scales to
measure multiple constructs
23Step 4 Have Experts Review the Item Pool
- This step parallels hypothesis testing
- Hypothesis Your thoughts about what each item
measure - Data (confirming or disconfirming) Your experts
responses - How to do it
- Give them a working definition of the construct
- Ask them to rate the relevance of each item to
the construct as you have defined it - Ask for comments on individual items (e.g.,
clarity, conciseness, alternative wordings)
24Step 4 Have Experts Review the Item Pool
- Experts can also offer alternative ways to
measure the construct of interest - Final decision to include or exclude items is
your responsibility - Experts may not understand principles of scale
construction - Attend to their suggestions, but make your own
informed decisions about how to actually use
their advice
25Step 4 Have Experts Review the Item Pool
- Consider running a focus group
- Meet with a small group of individuals to get
detailed feedback on their opinions (5-10
people) - Gives you feedback from a sample that is similar
to the sample you will eventually give the scale
to - Especially important if working with special
samples (e.g., children, detainees, elderly)
26Step 4 Have Experts Review the Item Pool
- Things you might do in a focus group
- Identify difficult to read items and ask if the
items are confusing or difficult to read (checks
reading level) - Identify items that youre unsure of whether they
measure what you think they measure and ask
participants what the items mean to them (checks
your construct validity) - Ask How would you answer this statement? and
Why would you answer it that way? - For each item ask the individuals Is this
something (sample you will eventually use )
would say?
27Step 5 Inclusion of Validation Items
- Sometimes you may want to include items that will
determine the validity of the final scale - Items that might detect flaws or problems
- Items that might detect social desirability
- May also consider including separate measures of
validity rather than establishing your own
validity items
28Step 6 Administer Items
- Administer your initial pool of items along with
construct-related and validity items - How many participants should you collect?
- Depends on the length of the scale
- Fewer items requires fewer participants
- When the ratio of participants to items is low
correlations among items can be substantially
influenced by chance factors - Depends on how representative the development
sample is
29Step 6 Administer Items
- Possible nonrepresentativeness of the
developmental sample - Level of the attribute may be different than the
population for which the scale is intended - Sample is qualitatively different from the target
population - The underlying structure that emerges may be a
quirk of the sample used in development
30Step 7 Evaluate the Items
- Ultimate quality is a high correlation with the
true score of the latent variable - We can make inferences about this relation by
examining the correlations among items - Higher correlations among items ? higher
individual item reliabilities - More reliable individual items ?More reliable
scale - We therefore want items to be highly
intercorrelated in a correlation matrix
31Step 7 Evaluate the Items
- Reverse scored items
- Items may have verbal descriptors for the
response options in the same order but reverse
the numbers associated with the options - Both the verbal descriptors and the numbers
associated always in the same order but enter
different values at the time of data entry - Error prone and tedious method
- Reverse score the items electronically
- Easiest and least error prone method
32Step 7 Evaluate the Items
- Item-scale correlations
- We want highly intercorrelated items so we need
each individual item to correlate substantially
with the collection of remaining items - Two types of item-scale correlations
- Corrected correlates the item being evaluated
with all other scale items excluding itself - Uncorrected correlates the item being evaluated
with all other scale items including itself - Tells how representative the item is of the whole
scale
33Step 7 Evaluate the Items
- Item variances
- We want scale items with relatively high
variances - A development sample that is diverse with respect
to the attribute of interest will provide a range
of scores for any given item (i.e., good
variance) - Item means
- We want means close to the center of the range of
possible scores - Means too near the extremes will have low
variances ? poor correlations with other items
34Step 7 Evaluate the Items
- Factor analysis
- Allows you to determine the nature of the latent
variables underlying your items - Need enough participants in the development
sample to run factor analysis - Coefficient alpha
- Indicator of scales reliability how successful
youve been - Ranges from 0.0 to 1.0 .70 is acceptable lower
bound
35Step 8 Optimize Scale Length
- Scale length effects reliability
- Alpha is influenced by the degree of covariation
among items and the number of items in the scale - Items with average interitem correlations
adding items will increase alpha, removing items
will decrease alpha - Shorter scales are less burdensome to
participants - Need an optimal trade-off (but this is true only
when you have reliability to spare)
36Step 8 Optimize Scale Length
- Dropping bad items
- Dropping items with sufficiently
lower-than-average item correlations will raise
alpha - Retaining items with slightly below average item
correlations will actually increase alpha - Adjusting scale length
- Use reliability analyses in SPSS to decide
- Items whose omission causes the least negative or
most positive effect should be dropped first - Items with lowest item-scale correlations should
be dropped first
37Step 8 Optimize Scale Length
- Adjusting scale length (continued)
- Communality (i.e., squared multiple correlation)
extent to which the item shares variance with
the other items - Items with low communality estimates should be
dropped - Should see convergence across these methods
- Also consider that the reliability of alpha as an
estimate of reliability increases with the number
of items
38Step 8 Optimize Scale Length
- Split samples
- A large developmental sample may be split it into
two samples - First sample used to compute alpha, evaluate
items, adjust length, arriving at your final item
set - Second sample used to replicate findings
- Consistency across the two samples gives you
confidence in your estimates - Problems with this
- Samples are not separated by time
- Special conditions may have applied to data
collection - Longer scale was given to the first sample