Title: designing a questionnaire
1designing a questionnaire
2Objectives in designing questionnaires
- There are 3 main objectives are
- To maximise the proportion of subjects answering
our questionnaire - that is, the response rate. -
- To obtain accurate relevant information for our
survey. - In order to obtain accurate relevant information,
we have to give some thought to what questions we
ask, how we ask them, the order we ask them in,
and the general layout of the questionnaire. - To maximise our response rate,
- we have to consider carefully how we administer
the questionnaire, establish rapport, explain the
purpose of the survey, and remind those who have
not responded. The length of the questionnaire
should be appropriate.
3Deciding what to ask
- There are three potential types of information
- (1) Information we are primarily interested
in-that is, dependent variables. - (2) Information which might explain the
dependent variables-that is, independent
variables. - (3) Other factors related to both dependent and
independent factors which may distort the
results and have to be adjusted for - that is,
confounding variables.
4Qualities of a Good Question
- Evokes the truth.
- Questions must be non-threatening.
- When a respondent is concerned about the
consequences of answering a question in a
particular manner, there is a good possibility
that the answer will not be truthful. - Anonymous questionnaires that contain no
identifying information are more likely to
produce honest responses than those identifying
the respondent. - If your questionnaire does contain sensitive
items, be sure to clearly state your policy on
confidentiality.
5- Asks for an answer on only one dimension or only
one piece of information at a time (NO DOUBLE
BARREL QUESTIONS) - The purpose of a survey is to find out
information. A question that asks for a response
on more than one dimension will not provide the
information you are seeking. - For example, a researcher investigating a new
food snack asks - "Do you like the texture and flavor of the
snack?" - If a respondent answers "no", then the
researcher will not know if the respondent
dislikes the texture or the flavor, or both.
6- Another question asks, "Were you satisfied with
the quality of our food and service?" - Again, if the respondent answers "no", there is
no way to know whether the quality of the food,
service, or both were unsatisfactory. - A good question asks for only one "bit" of
information. - Another example, "Please rate the lecture in
terms of its content and presentation" asks for
two pieces of information at the same time. It
should be divided into two parts - "Please rate the lecture in terms of (a) its
content, (b) its presentation."
7- Can accommodate all possible answers.
- Multiple choice items are the most popular type
of survey questions because they are generally
the easiest for a respondent to answer and the
easiest to analyze. - Asking a question that does not accommodate all
possible responses can confuse and frustrate the
respondent. For example, consider the question
8- What brand of computer do you own? __ A.
IBM PC B. Apple - Clearly, there are many problems with this
question. What if the respondent doesn't own a
microcomputer? What if he owns a different brand
of computer? What if he owns both an IBM PC and
an Apple? There are two ways to correct this kind
of problem.
9- The first way is to make each response a separate
dichotomous item on the questionnaire. For
example - Do you own an IBM PC? (circle Yes or No)
- Do you own an Apple computer? (circle Yes or No)
10- Another way to correct the problem is to add the
necessary response categories and allow multiple
responses. This is the preferable method because
it provides more information than the previous
method. - What brand of computer do you own?(Check all
that apply) - __ Do not own a computer__ IBM PC__ Apple__
Other
11- Has mutually exclusive options.
- A good question leaves no ambiguity in the mind
of the respondent. There should be only one
correct or appropriate choice for the respondent
to make. An obvious example is - Where did you grow up? __
- A. countryB. farmC. city
- A person who grew up on a farm in the country
would not know whether to select choice A or B.
This question would not provide meaningful
information. Worse than that, it could frustrate
the respondent and the questionnaire might find
its way to the trash.
12- Produces variability of responses.
- When a question produces no variability in
responses, we are left with considerable
uncertainty about why we asked the question and
what we learned from the information. - If a question does not produce variability in
responses, it will not be possible to perform any
statistical analyses on the item. For example - What do you think about this report? __
- A. It's the worst report I've readB. It's
somewhere between the worst and bestC. It's the
best report I've read
13- Since almost all responses would be choice B,
very little information is learned It's
somewhere between the worst and best. - Design your questions so they are sensitive to
differences between respondents. As another
example - Are you against drug abuse? (circle Yes or No)
- Again, there would be very little variability in
responses and we'd be left wondering why we asked
the question in the first place.
14- Follows comfortably from the previous question.
- Writing a questionnaire is similar to writing
anything else. Transitions between questions
should be smooth. Grouping questions that are
similar will make the questionnaire easier to
complete, and the respondent will feel more
comfortable. Questionnaires that jump from one
unrelated topic to another feel disjointed and
are not likely to produce high response rates
15- Does not presuppose a certain state of affairs.
- Among the most subtle mistakes in questionnaire
design are questions that make an unwarranted
assumption. An example of this type of mistake
is - Are you satisfied with your current auto
insurance? (Yes or No) - This question will present a problem for someone
who does not currently have auto insurance. Write
your questions so they apply to everyone. This
often means simply adding an additional response
category. Are you satisfied with your current
auto insurance? - ___ Yes___ No___ Don't have auto insurance
16- One of the most common mistaken assumptions is
that the respondent knows the correct answer to
the question. Industry surveys often contain very
specific questions that the respondent may not
know the answer to. For example - What percent of your budget do you spend on
direct mail advertising? ____ - Very few people would know the answer to this
question without looking it up, and very few
respondents will take the time and effort to look
it up. If you ask a question similar to this, it
is important to understand that the responses are
rough estimates and there is a strong likelihood
of error.
17- Does not imply a desired answer.
- The wording of a question is extremely important.
We are striving for objectivity in our surveys
and, therefore, must be careful not to lead the
respondent into giving the answer we would like
to receive. - Leading questions are usually easily spotted
because they use negative phraseology. As
examples - Wouldn't you like to receive our free brochure?
- Don't you think the government is spending too
much money?
18- Does not use emotionally loaded or vaguely
defined words. This is one of the areas
overlooked by both beginners and experienced
researchers. - Quantifying adjectives (e.g., most, least,
majority) are frequently used in questions. - It is important to understand that these
adjectives mean different things to different
people.
19- Does not use unfamiliar words or abbreviations.
Remember who your audience is and write your
questionnaire for them. - Do not use uncommon words or compound sentences.
Write short sentences. Abbreviations are okay if
you are absolutely certain that every single
respondent will understand their meanings. - If there is any doubt at all, do not use the
abbreviation. The following question might be
okay if all the respondents are educated people ,
but it would not be a good question for the
general public. - What was your SES status? ______
20- Is not dependent on responses to previous
questions. Branching in written questionnaires
should be avoided. - While branching can be used as an effective
probing technique in telephone and face-to-face
interviews, it should not be used in written
questionnaires because it sometimes confuses
respondents. An example of branching is - 1. Do you currently have a life insurance policy
? (Yes or No) If no, go to question 3 - 2. How much is your annual life insurance premium
? _________
21- Does not ask respondent to order or rank a series
of more than five items. - Questions asking respondents to rank items by
importance should be avoided. - This becomes increasingly difficult as the number
of items increases, and the answers become less
reliable. - This becomes especially problematic when asking
respondents to assign a percentage to a series of
items. - In order to successfully complete this task, the
respondent must mentally continue to re-adjust
his answers until they total one hundred percent.
- Limiting the number of items to five will make it
easier for the respondent to answer.
22The Order of the Questions
- Items on a questionnaire should be grouped into
logically coherent sections. - Grouping questions that are similar will make the
questionnaire easier to complete, and the
respondent will feel more comfortable. - Questions that use the same response formats, or
those that cover a specific topic, should appear
together
23- Each question should follow comfortably from the
previous question. - Writing a questionnaire is similar to writing
anything else. - Transitions between questions should be smooth.
- Questionnaires that jump from one unrelated topic
to another feel disjointed and are not likely to
produce high response rates
24- Arranging the questions
- The order of the questions is also important.
Some general rules are - Go from general to particular.
- Go from easy to difficult.
- Go from factual to abstract.
- Start with closed format
- questions.
- Start with questions relevant to
- the main subject.
- Do not start with demographic
- and personal questions.
25- It is useful to use a variety of question format
to maintain the respondents' interest. - When a series of semantic differential scales
are used, it may be a good idea to mix positive
negative - for example, interesting to dull -
with negative positive - for example, useless to
useful - scales. - This might make the respondents think more and
avoid the tendency to tick the same response for
every question.
26Question Wording
- The wording of a question is extremely important.
Researchers strive for objectivity in surveys
and, therefore, must be careful not to lead the
respondent into giving a desired answer.
Unfortunately, the effects of question wording
are one of the least understood areas of
questionnaire research.
27- Many investigators have confirmed that slight
changes in the way questions are worded can have
a significant impact on how people respond. - Several authors have reported that minor changes
in question wording can produce more than a 25
percent difference in people's opinions
28- Several investigators have looked at the effects
of modifying adjectives and adverbs. Words like
usually, often, sometimes, occasionally, seldom,
and rarely are "commonly" used in questionnaires,
although it is clear that they do not mean the
same thing to all people. - Some adjectives have high variability and others
have low variability. The following adjectives
have highly variable meanings and should be
avoided in surveys a clear mandate, most,
numerous, a substantial majority, a minority of,
a large proportion of, a significant number of,
many, a considerable number of, and several.
Other adjectives produce less variability and
generally have more shared meaning. These are
lots, almost all, virtually all, nearly all, a
majority of, a consensus of, a small number of,
not very many of, almost none, hardly any, a
couple, and a few.
29- Use short and simple sentences
- Short, simple sentences are generally less
confusing and ambiguous than long, complex ones.
As a rule of thumb, most sentences should contain
one or two clauses. Sentences with more than
three clauses should be rephrased.
30- Avoid negatives if possible
- Negatives should be used only sparingly. For
example, instead of asking students whether they
agree with the statement, "Small group teaching
should not be abolished," the statement should be
rephrased as, "Small group teaching should
continue." Double negatives should always be
avoided.
31- Ask precise questions
- Questions may be ambiguous because a word or term
may have a different meaning. - For example, if we ask students to rate their
interest in "medicine," this term might mean
"general medicine" (as opposed to general
surgery) to some, but inclusive of all clinical
specialties (as opposed to professions outside
medicine) to others.
32- Another source of ambiguity is a failure to
specify a frame of reference. - For example, in the question, "How often did you
borrow books from your library?" the time
reference is missing. It might be rephrased as,
"How many books have you borrowed from the
library within the past six months altogether?"
33- Ensure those you ask have the necessary knowledge
- For example, in a survey of university lecturers
on recent changes in higher education, the
question, - "Do you agree with the recommendations in the
Report on Higher Education" is unsatisfactory for
several reasons. - Not only does it ask for several pieces of
information at the same time as there are several
recommendations in the report, the question also
assumes that all lecturers know about the
relevant recommendations.
34- Level of details
- It is important to ask for the exact level of
details required. - On the one hand, you might not be able to fulfil
the purposes of the survey if you omit to ask
essential details. - On the other hand, it is important to avoid
unnecessary details. - People are less inclined to complete long
questionnaires. - This is particularly important for confidential
sensitive information, such as personal financial
matters or marital relationship issues.
35- Handling Sensitive Issues
- It is often difficult to obtain truthful answers
to sensitive questions. Clearly, the question,
"Have you ever copied other students' answers in
a degree exam?" is likely to produce either no
response or negative responses. Less direct
approaches have been suggested. - Firstly, the casual approach "By the way, do you
happen to have copied other students' answers in
a degree exam?" may be used as a last part of
another decoy question. - Secondly, the numbered card approach "Please
tick one or more of the following items which
correspond to how you have answered degree
examination questions in the past." In the list
of items, include "copy from other students" as
one of many items.
36- Thirdly, the everybody approach "As we all know,
most university students have copied other
students' answers in degree exams. Do you happen
to be one of them?" - Fourthly, other people approach. This approach
was used in the recent medical student survey. In
this survey, students were given the scenario,
Jalil copies answers in a degree exam from
Jamal." They were then asked, "Do you feel Jalil
is wrong, what penalty should be imposed for
Jalil, and have you done or would you consider
doing the above?"
37- Length of questionnaire
- There are no universal agreements about the
optimal length of questionnaires. - It probably depends on the type of respondents.
- However, short simple questionnaires usually
attract higher response rates than long complex
ones. - In a survey of stroke survivors both the
response rate and the proportion of completed
forms were higher for a shorter questionnaire
(six questions with a visual analogue scale)
compared with a longer and more complex
questionnaire (with 34 questions).
38- Write in everyday terms.Avoid internal jargon.
Many corporations have abbreviations or acronyms
for products and services which are not familiar
to custome - Follow good business writing practices.Write
short, simple questions. Be clear and to the
point. Avoid errors in spelling, grammar and
usage.
39- Use consistent scales.
- All rating scales should mimic the first one
used. - It can confuse respondents if you change from,
for example, a five point to a seven point scale.
- Keep the scales going the same way. In other
words, if '5' is high on the first scale, don't
make '1' high on the next. - Use similar wording for the anchors.
- Finally, group like questions under the same
scale. If you do need to change scales, wait
until you reach a new section of the
questionnaire.
40- Use consistent wording.
- The use of similar phrases for the text of the
survey can unify your questionnaire. - For example, questions can be set up with a lead
phrase which is a phrase that can be used to lead
off each question. For example - How satisfied are you that our staff is
- Responsive to your service requests .......
- Knowledgeable about products .............
- Knowledgeable about your business........
41- Avoid asking more than one
- question at a time.
- This is known as asking a 'double barreled'
question. A typical double barreled question
"Sales reps are polite and responsive." While the
sales reps may be polite they may not be
responsive, or vice versa. The respondent will be
forced to rate one attribute differently from
their true feelings. Consequently, data
interpretation will be questionable.
42- Provide directions.
- It is important to let the respondent know what
to do on any particular question however, it is
just as important to avoid complicated
directions. - Make the survey as easy as possible for your
respondents by using phrases such as 'Mark all
that apply,' and 'Mark only one.' - Avoid asking them to calculate anything, such as
percentages, and try to avoid the use of skip
patterns.
43- Analysis of the responses and the interviewers'
comments are used to improve the questionnaire. - Ideally, there should be sufficient variations
in responses among respondents each question
should measure different qualities - that is, the
responses between any two items should not be
very strongly correlated - and the non-response
rate should be low. In the third phase the pilot
test is polished to improve the question order,
filter questions, and layout.
44Format of responses
45Format of responses
- The responses can be in open or closed formats.
In an open ended question, the respondents can
formulate their own answers. In closed format,
respondents are forced to choose between several
given options. What are the advantages of each of
these formats? - It is possible to use a mixture of the two
formats- for example, give a list of options,
with the final option of "other" followed by a
space for respondents to fill in other
alternatives. - There are several forced choice formats. Out of
these formats, ranking is probably least
frequently used, as the responses are relatively
difficult to record and analyse.
46Closed-that is, forced choice-format
- Easy and quick to fill in
- Minimise discrimination against the less literate
(in self administered questionnaire) or the less
articulate (in interview questionnaire) - Easy to code, record, and analyse results
quantitatively - Easy to report results
47- Example "How satisfied are you with your job?"
(Circle the number that represents your response) -
- Very disatisfied Dissatisfied Neutral
Satisfied Very satisfied
1 2
3 4 5 -
48- Example "What is your marital status?" (Check the
box that applies) - Single, never married
- Married
- Divorced
- Separated
- Widowed
- Other_____
49Please cicle your respons
Questions Very dissatisfied Dissatisfied Satisfied Very satisfied
How satisfied are you with your working conditions? 1 2 3 4
How satisfied are you with your pay? 1 2 3 4
How satisfied are you with your supervisor?1 1 2 3 4
50Open format
- Advantages
- Allows exploration of the range of possible
themes arising from an issue - Can be used even if a comprehensive range of
alternative choices cannot be compiled
51Choosing the Right Scale
- Choosing a scale for your survey instrument is an
important decision that will shape the
information you collect. Each scale has
variations, some more reliable than others. - Even vs. Odd
- Number of Points
- Defining Your Scale
52- Even vs. Odd
- Even numbered scales can more effectively
discriminate between satisfied or unsatisfied
customers because there is not a neutral option. - However, this clear division may cause hesitation
for respondents who are neutral in regard to a
survey item. - Without a midpoint option, respondents often
choose a positive response, creating positively
skewed data. - Carefully consider whether a clear division
between positive and negative responses is
necessary, or whether a midpoint will be more
appropriate for your information needs.
53- Number of Points
- In survey research, scales commonly range from 2
to 10 points. - The number of points for your scale should be
determined by how you intend to use the data. - Although seven to ten point scales may seem to
gather more discriminating information, there is
debate whether respondents actually discriminate
carefully enough when filling out a questionnaire
to make these scales valuable. - Also, these scales are often collapsed into
three or five point scales for reporting
purposes. - Four and five point scales are more highly
recommended Two and three point scales offer
little discriminative value and are rarely
recommended.
54Defining Your Scale
- Once the number of points on a scale has been
decided, it is important to determine the labels
for each scale point, or in some cases, whether
or not you will use any labels. - For example, an agreement scale could be set up
like this - Strongly Agree Strongly
Disagree - 5 4 3 2 1
55- Though this may be true, it is also important
that each respondent understand the meaning of
each scale point. - By labeling each scale point, all respondents
attach the same word to a numerical value. - This helps avoid respondent misinterpretation of
scale definitions. - Additionally, verbally defining each scale point
allows reports to be written in more concrete
terms such as "x percentage were satisfied."
56Four-point Requirements Scale
- Receives high marks for discrimination and
reliability. A leading sentence might be, "Please
indicate how well Company Z met your
requirements. - Exceeded Met Nearly Met Missed
- 4 3 2 1
- The option of "Nearly Met" serves well to capture
data from respondents who are somewhat
unsatisfied but prefer to choose positive
responses.
57Five-point Expectations Scale
- Receives high marks for discrimination and
reliability. - A leading sentence might be, "In terms of your
expectations, please rate the performance of
Company Z. - Significantly Significantly Above
Above Met Below Below - 5 4 3 2 1
58- While these scales have been shown to be
effective in collecting accurate data, a good
scale cannot compensate for poorly worded items. - Accurate, reliable data depends on a combination
of the proper scale and correctly written items,
as well as proper survey administration
59- Open-ended QuestionsHow much information or
detail do you need in open-ended questions? In
general, more detailed information can be
gathered when an interviewer probes and clarifies
responses, than when respondents are asked to
write in their own response. - Visual AidsWill the respondent need to see
graphs or figures? If so, chances are a phone
survey will not work. - Skip PatternsAre there complex skip patterns
requiring respondents to skip to other questions
based on answers to previous questions? If so,
trained interviews or properly programmed
computer methods will ensure that respondents
answer the correct questions.
60- There are several ways of administering
questionnaires. - They may be self administered or read out by
interviewers. - Self administered questionnaires may be sent by
post, email, or electronically online. - Interview administered questionnaires may be by
telephone or face to face. - Advantages of self administered questionnaires
include - Cheap and easy to administer.
- Preserve confidentiality.
- Can be completed at respondent's convenience.
- Can be administered in a standard manner.
61.
- Nonsence or error data
- SD D X A SA, X should be there because it is
not of the scale - Elimate the question. If several respondents
give similar nonsense data, your isntrument is
probably in error - Other categories
- Not Applicable you may have too many respondents
giving NA. when more than 15-20 gave NA, then
the validity of the Q is in question. When item
has more than 20 NA, elimate the item for
analysis, keep it an eliminate the individual.
One way is to be sure that all item apply to yur
respondents and do not use NA - NA means it does not apply to me Neutral mean it
applies to me but I am neutral in my opinion
62Validity of questionnaires
63What is validity?
- According to the American Psychological
Association, validity "...refers to the
appropriateness, meaningfulness, and usefulness
of the specific inferences made from test
scores." (Standards for Psychological and
Educational Testing, 1985, p. 9). - In other words, if your findings need to be
appropriate, meaningful and useful, they need to
be valid. - Validity refers to whether the questionnaire or
survey measures what it intends to measure.
64- An instrument that is a valid measure of third
grader's math skills probably is not a valid
measure of high school student's math skills. - An instrument that is a valid predictor of how
well students might do in school, may not be a
valid measure of how well they will do once they
complete school. - So we never say that an instrument is valid or
not valid...we say it is valid for a specific
purpose with a specific group of people.
65- The validity of a questionnaire relies first and
foremost on reliability. If the questionnaire
cannot be shown to be reliable, there is no
discussion of validity. But there is good
news. Demonstrating validity is easy, compared
to reliability. If you have reached this point
and have a reliable instrument for measuring the
issues or phenomena you are after, demonstrating
its validity will not be difficult.
66Types of Validity
- Everyone agrees that validity is important, but
what type of validity are we talking about? - three main types of validity
- content,
- criterion-related, and
- construct validity.
67Content Validity
- Content validity determines if the survey items
are representative of the topic being measured. - You need to
- Define you must clearly state what you are
interested in measuring, for example 'Quality.' - Choose the specific aspects which require
feedback, for example, 'Error Rate.' - Judge whether your items relate to the
definitions you developed and adequately cover
all aspects whether the items are representative
of the topic.
68- Example
- Specialists in the content measured by the
instrument are asked to judge the appropriateness
of the items on the instrument. - Do they cover the breath of the content area
(does the instrument contain a representative
sample of the content being assessed)? - Are they in a format that is appropriate for
those using the instrument? - A test that is intended to measure the quality of
science instruction in fifth grade, should cover
material covered in the fifth grade science
course in a manner appropriate for fifth graders.
- A national science test might not be a valid
measure of local science instruction, although it
might be a valid measure of national science
standards.
69- Researchers aim to study mathematical learning
and create a survey to test for mathematical
skill. If these researchers only tested for
multiplication and then drew conclusions from
that survey, their study would not show content
validity because it excludes other mathematical
functions. - A researcher needing to measure an attitude like
self-esteem must decide what constitutes a
relevant domain of content for that attitude. - You must define your content domain
70Criterion-Related Validity
- Criterion-related validation relies on
statistical analyses rather than judgments as in
content validation. - Criterion-related validation involves calculating
a 'validity coefficient' by correlating the
survey items with another measure (criteria)
already known to be related to other aspects of
the attribute. - For example, if satisfaction with the service
department relates to the number of friends one
refers to the service department, then we could
correlate scores on a measure of satisfaction to
an index of referrals.
71Construct Validity
- Determine the construct to be measured, for
example, 'Quality.' - Determine relationship between the construct and
other constructs, for example, 'Satisfaction.' - Examine pattern of relationships
72(No Transcript)
73Reliability of questionnaires
74- Why Does Reliability Matter?
- A questionnaire, will always produce numerical
results, even if they're meaningless. - You could be making business decisions based on
survey results that don't mean anything. - Only a test of reliability can tell you if you
should trust the results..
75- What Is Reliability?
- The most common definitions include descriptions
such as stability, repeatability, and accuracy. - In the context of survey design, reliability is
essentially the extent to which a survey will
provide the same results with repeated
measurement. - An example will make this statement clear.
76- Non-technically speaking, a reliable
questionnaire is one that that would give the
same results if you used it repeatedly with the
same group. - That may sound funny because most organizations
don't administer a questionnaire to the same
group twice. - But if they did, they would learn how reliable
their questionnaire is, because a reliable survey
will give the same results on Tuesday as it did
the previous Monday.
77- Reliability is a property of the measuring
instrument. - If you are like many people, you probably get on
your bathroom scale in the morning, look at the
weight displayed, then step off, and do it
again. - You have learned that what is displayed by a
bathroom scale the first time is not always
exactly the same as the second, but it is usually
very close.
78- What if one morning you weighed yourself, then a
second time, and the second weight displayed was
5 lbs. heavier than the first? - You would probably step off, then weigh yourself
a third time. What if it was now 4 lbs. lighter
than the first? - Would you still be concerned about your weight?
Or would you be more concerned about finding out
what's "wrong" with the scale? - What's wrong is that your scale has become
unreliable. You can see unreliability by
repeatedly measuring the same thing. - And when you know the scale is unreliable, you
don't even try to measure your weight, you
concentrate on fixing the scale first.
79- If you questionnaire is unreliable, it's like
trying to measure the length of something with a
rubber tape measure. - You could make your marks at precise intervals,
but the flexibility of the material would destroy
its reliability. - Most questionnaires that use rating scales to
record people's opinions are like rubber tape
measures.
80- Reliability
- The ability of an instrument to measure
consistently with relative absence of error. The
higher the reliability coefficient, the more
confidence you can have in the score. - .90 and upExcellent!
- .80-.89.Good
- .70-.79.Adequate
- Below .70May have limited applicability
- Source Testing and Assessment An Employers
Guide to Good Practices U.S. Dept of Labor 1999
81- To understand reliability coefficients, a brief
discussion of the components of a score will be
helpful. An observed or obtained score on an
instrument can be divided into two parts.
Observed Score True Score Error - An instrument can be said to be reliable if it
accurately reflects true scores. Or in other
words, an instrument can said to be reliable to
the extent that it minimizes the error component.
So, the reliability coefficient is the proportion
of true variability to the total obtained
variability. - Therefore, if you get a reliability coefficient
of .85, this means that 85 percent of the
variability in obtained scores could be said to
represent true individual differences and 15
percent of the variability is due to random
error.
82- Stability (produces the same results with
repeated testing) - Test-retest Parallel forms
- Alternate forms
83- Internal-Consistency Measures of Reliability
-
- Split-half reliability
- Chronbachs alpha
- Split-Half Reliability
- One test is split into two halves and the
correlation between the two halves is calculated.
(Both halves of the test must be equal in
content and difficulty.) Since the number of
items is split in half, the Spearman-Brown
formula must be employed to estimate reliability
for the entire test. -
- Kuder-Richardson 20
- A test of homogeneity (inter-item consistency)
the K-R 20 compares the proportion of correct and
incorrect responses to each of the items on the
test. The K-R 20 is appropriate for tests in
which items are either scored right or wrong. -
- Kuder-Richardson 21
- This simpler formula is based upon the assumption
that all items are of equal difficulty (rarely
the case!)
84- Equivalence (instrument produces the same results
when a equivalent instrument is used or there is
consistency among researchers using the same
instrument). Two equivalent forms of the test are
administered to the same group of people. (It
can be very difficult to develop two truly
equivalent forms of a test.) - Parallel items on Alternate forms Inter-rater
reliability
85- if we measure an object using two rulers, one
made of steel and one made of a rubber band, you
would expect the steel ruler to provide
relatively consistent or stable measurements
(assuming the object was stable). - The rubber band ruler, on the other hand would
probably provide a variable set of measurements.
86- How Do You Measure Reliability?
- As is the case with validity, there are a number
of different ways to assess the reliability of a
survey. The method you choose will depend upon
what you are trying to accomplish. Several ways
we measure the reliability of an instrument
include test-retest, split-halves, and internal
consistency. All of these methods will result in
a number between 0.00 and 1.00, with scores
increasing as the survey becomes more reliable. - Basically there two reliability testing
procedures - One administration and two administration
- Two administration is a less desireble procedure
87Stability
- Test - Retest
- As the name implies, test-retest reliability
involves administering a survey to a group of
individuals at one time and then re-administering
the survey to the same individuals some later
time. - The survey responses are then correlated and the
resulting correlation is interpreted as the
reliability of the instrument. This method
clearly illustrates the notion of reliability as
measurement consistency. - Unfortunately, there are some downsides to this
approach. - Look for correlation score of at least .70 (tend
to higher for short term retests 2 weeks and
lower for long term retests gt1-2 months
88- Weaknessess
- First, it is often very difficult to administer
the same survey to the same person twice. Second,
the act of measuring someone's attitudes (i.e.,
satisfaction) can affect their attitude. - Specifically, asking people to report their
satisfaction at time 1 can sensitize them to the
issues and result in a change in scores at time 2
(a phenomenon called reactivity), resulting in a
low reliability estimate. - Finally, people remember their first response and
respond in a way to maximize their consistency,
not necessarily to reflect their attitude. This
will result in inflated estimates of reliabilit
89- Parallel items on Alternate forms
- Same population completing similar forms of the
instruments before and after a short time period
or one right after another - If items are truly parallel, they have identical
true scores and identical error variances.
Responses to parallel items will differ only with
respect to random fluctuations. - Uses questions (items) that are comparable to
each other and parallel. However, it is very
difficult to prepare to two forms of a test that
display the properties of parallel measures. - However, there are two forms of certain tests
whose items are intended to measure the same
thing and do not differ from each other in any
systematic way. - The two sets of scores are correlated to produce
a correlation coefficient
90Homogeneity - Internal Consistency
- Split Halves
- An alternate approach to reliability requires us
to split a survey in half and then correlate the
two halves. - For example, if we had a twenty item survey
assessing customer satisfaction with a sales
associate, we could administer the survey to our
sample, split the survey in half, and then
correlate the two halves of the survey. - A reliable survey would result in strong
correlations between the two halves. - The major problem associated with this approach
is splitting the survey. - Every approach will probably result in a slightly
different result, providing some confusion as to
the actual reliability of the survey.
91Formula for split half
- r 2 r ½ ½
- 1 r ½ ½
- Where r ½ ½ is the pearson correlation between
the two halves
92- Internal Consistency
- A potential solution to the problems with the
split-half approach is to use a measure of
internal consistency. - Internal consistency considers the average
correlation between all of the survey items and
the number of survey items to provide an estimate
of reliability. - A common measure of internal consistency is
coefficient a Alpha or Cronbach's alpha KR20 or
KR21. - The downside to alpha is that is more difficult
to calculate than the other methods. - Luckily, however,many statistical programs will
calculate this for you. - Item to total correlation Measures the
correlation of each of the items to the total
scale. Items with a low correlation can be
deleted.
93KR 20 for dichotomous itensKR 21 MC items
- Where,
- r is realibility estimate
- K number of items on test
- p proportion of sample who got item correct
- q proportion of sample who got item wrong (1-p)
- S x2 variance in sample
- X mean on test
- Formula for KR20
- r K 1 (S x2 sum of pq)
- K- 1 sx 2
- Formula for KR21
- R (KS x 2) X(K X)
- S x 2 (K 1)
94Factors influencing reliability
- Test related factors
- Length, test content, homogeniety of items,
dificulty of items (too easy or too difficulty
will reduice reliability) - Test taker
- Heterogeneity hetero more spread
- Attitude
- Aptitude
- Administration factor
- - time limit
- - opportunity to cheat
95Raising reliability
- Lengthen the intrument
- Check the test item (clarity, reading level,
format) - Make it median difficulty if it is an achievement
test - Increase timi limit
96- Types of instrument
- Dichotomous (T/F)
- Multiple choice
- Check all that apply
- Rank the items
- Rate the itesm that will be summed together
- Likert scale
- Demographic
- One instrument with several domain which are
measured by multiple items - Long instrument
- Types of realibility procedures
- KR20
- KR21/alpha
- Test/retest
- Test/retest
- Test/retest
- Cronbach Alpha/KR21
- None or percent or agreement
- KR21/Croncbachs Alpha
- Split half
97(No Transcript)