Title: Question wording and data analysis
1Question wording and data analysis
- PHC 6716
- June 15, 2011
- Chris McCarty
2Validity and Reliability
- Most of what we have dealt with so far has to do
with reliability - Reliability is the extent to which you will get
the same result when you repeat a measure several
times - Validity is the extent to which you are measuring
what you think you are measuring - For example, using frequency of jogging as a
measure of exercise is not valid because there
are many other forms of exercise - Much of question wording is about validity
3Common mistakes in designing questions
4Not mutually exclusive
- What is your income?
- 0-20,000
- 20,000-40,000
- 40,000-60,000
- 60,000-80,000
- 80,000-100,000
- 100,000
5Not exhaustive
- Where do you get most of your medical advice?
- My doctor
- TV
- Friends
- Family members
6Too long and wordy
- The next questions ask about YOUR OWN health
care. Please DO NOT include care you got when
you stayed overnight in a hospital or the times
you went for dental care visits. For the purposes
of this survey a A PERSONAL DOCTOR OR NURSE is
the health provider who knows you best. This can
be a general doctor, a specialist doctor, a nurse
practitioner, or a physician assistant. When you
were enrolled in this program or at any time
since then, did you get a NEW personal doctor or
nurse? - Yes
- No
7Double-barreled
- Please rate your satisfaction with the amount and
kind of care you received while you were in the
hospital. - Very satisfied
- Satisfied
- Neither satisfied or dissatisfied
- Dissatisfied
- Very dissatisfied
8Leading
- Most doctors believe that exercise is good for
you. Do you - Strongly agree
- Agree
- Neither agree or disagree
- Disagree
- Strongly disagree
9Unreasonable
- How many times in the past year have you eaten
out? - ________
10Too many categories to choose from (will often
choose first or last)
- Please describe the first page of the web site.
- QuitPlan
- QuitNet
- Quote from member
- We're helping Minnesotans learn to quit
- Create your own QuitPlan
- Ask Questions of Expert Counselors
- Get support from the QuitNet community
- Learn from science-based Quitting Guides
- How much lifetime and money has the Nicodemon
stolen from you! - On an average day, how many cigarettes do you
(or did you) smoke? - How soon after you wake do you smoke your first
cigarette? - QUITPLAN has the tools to help you learn to quit
- Other, specify______________________________
11Smoking questionUnreasonable for Interviewer
- Can you describe what happens in this
advertisement? -
- INT DO NOT READ CHOICES
-
- 1 They start naming high school clubs and teams
that can be joined - 2 Boy names the varsity team
- 3 Girl names the drama club
- 4 Boy names student government
- 5 Girl says, but there is only one with the
potential to save over 400,000 - lives every year
- 6 Girl says, SWAT
- 7 Music starts in background, girls says
students working against tobacco - 8 Boy says, we're athletes
- 9 Girl says, we're artists
- 10 Boy says we're leaders and we are committed to
giving Florida's youth a - voice in the fight against tobacco
-
-
- 11 Girl says, together we can help to stop the
tobacco industry and to save - the over 400,000 people who die from tobacco
use each year - 12 Girl says, but SWAT needs your help
- 13 Boy says, whoever you are
- 14 Girl says, whatever you are into.
- 15 Boy says, wherever you go to school ask about
SWAT and how you can do - your part in the fight against tobacco
- 16 Girl says, whatever you do today, can save a
life tomorrow - 17 Boy and girls talk about how students have to
join to fight against - tobacco
- 18 SWAT can fight big tobacco.
- 19 Anyone can join SWAT and fight tobacco
companies - 20 Tobacco kills people every year.
- 21 Don't smoke
- 22 Other (Please specify)
-
12Miscellaneous points
- When repeating surveys be careful of making
changes to response categories such that response
numbers mean different things in different
versions - Some questionnaire authoring packages allow you
to randomize the order of questions, and response
categories (Stewart et al) - Alternate questions that are phrased positively
and those phrased negatively - Sensitive and controversial questions should be
phrased so that respondent feels OK about
selecting a negative response - You should typically offer a Dont Know and Not
Available category (Krosnick et al)
13Scales
- A scale is a set of questions designed to measure
a concept that cannot be adequately represented
with a single question - There are many existing and tested scales for
health care (e.g. Beck depression)
14How to create a scale
- Begin by getting a group of respondents to
free-list questions related to a concept until
there are very few new questions - Create a questionnaire using those items
- Give the questionnaire to a sample of respondents
- Analyze results and remove questions that are
overwhelmingly neutral - Test the scale again on a new sample of
respondents - High and low values should represent the spectrum
of your concept
15Indices
- Index, like a scale, is a measure derived from a
set of questions - The value of an index is in comparing values
across time - Consumer confidence index is compared to values
from previous month and to same time a year
before - Even though questions may not make sense, it is
often better to leave an index unchanged for the
purposes of comparability
16Four levels of measurement
- Nominal (categorical, qualitative)
- Ordinal (rank)
- Interval
- Ratio
17Nominal Data - Defined
- Data represented by number or letters
- Data are placeholders for response items
numbers have no numerical meaning - Response items should be mutually exclusive and
exhaustive - Typically analyzed with frequencies,
crosstabulations and significance tests for
crosstabulations such as Chi Square
18Nominal - Example
- What kind of place do you go to most often when
you are sick or need advice about your health? - 1 Clinic or health center
- 2 Doctor's office
- 3 Hospital emergency room
- 4 Hospital outpatient department
- 5 Some other place (Specify)
- -7 Don't go to one place most often
- -8 Don't know
- -9 Refused
19Ordinal Data - Defined
- Includes the properties of nominal data
- Has additional property that numbers have rank
order - Often analyzed like nominal data using
frequencies and crosstabulations - There are crosstab significance tests for ranked
data (Tau B, Gamma), but I rarely see them - Very often they are treated as interval data
- They do not have the attributes to be treated as
interval data - Some people feel that if they work to predict
that is justification for using them as interval
data
20Ordinal Data - Example
- In the last 6 months, not counting times you
needed health care right away, how often did you
get an appointment for health care as soon as you
wanted? - 1 Never
- 2 Sometimes
- 3 Usually
- 4 Always
21Interval Data - Defined
- Has all the properties of nominal and ordinal
(place-holding, mutually exclusive and
exhaustive, rank order) - Has the additional quality that the distance
between numbers is equal - This allows for the calculation of mean and
standard deviation - Most of the field of statistics is oriented
towards data of at least interval level (e.g.
ANOVA, regression, t-test, cluster analysis,
etc.) - This makes it extremely tempting to treat ordinal
data as interval - There are not a lot of examples of interval data
in social science
22Interval Data - Example
- What is the temperature outside in Fahrenheit?
- _______
23Ratio Data - Defined
- Has all the properties of nominal, ordinal and
interval (place-holding, mutually exclusive and
exhaustive, rank order, equal distance) - Has the additional quality of an absolute zero
- There are not many statistics that take advantage
of ratio data
24Ratio Data - Example
- What is your age in years?
- _____
25Interval versus ordinal
- Interval data can inadvertently be made ordinal
by using bad ranges - You can use midpoint of ranges to make interval
- 5 to 9 becomes 7
- 10 or more would typically become 10
- In the last 6 months (not counting times you went
to an emergency room), how many times did you go
to a doctors office or clinic to get care for
yourself? - 0? None
- 1? 1
- 2? 2
- 3? 3
- 4? 4
- 5? 5 to 9
- 6? 10 or more
26Open Ended Questions
- Typically used when you are unsure what the
response categories should be - Sometimes used to provide text examples to
illustrate points - Other-Specify is often included as the last of a
set of response items to cover unanticipated
responses
27Open Ended Question Example 1
- Does your child have any special health care
needs? - 1 Yes
- 2 No
- -8 Dont know
- -9 Refused
- If Yes
- What is the diagnosis?
- ____________________________
28Open Ended Question Example 2
- What kind of place do you go to most often when
you are sick or need advice about your health? - 1 Clinic or health center
- 2 Doctor's office
- 3 Hospital emergency room
- 4 Hospital outpatient department
- 5 Some other place (Specify)
- -7 Don't go to one place most often
- -8 Don't know
- -9 Refused
29Analysis of Open-Ended Questions
- Typically researcher reads through all open ended
responses and decides if new response categories
seem to come up, then recodes open-ended
responses to the new categories - Some may used text analysis software (e.g.
Atlas.ti, MAXQDA, NVivo)
30Wordle of open ended responses to alternative
race on ten years of CCI (Brener, et al)
31Item non-response
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Question placement of breakoffs
39Examples
40Question Banks
- Pew Research Center
- http//people-press.org/question-search/
- Roper Center
- http//webapps.ropercenter.uconn.edu/CFIDE/cf/acti
on/catalog/ - Inter-University Consortium for Political and
Social Research (ICPSR) - http//www.icpsr.umich.edu/icpsrweb/ICPSR/
- Odum Institute
- http//arc.irss.unc.edu/dvn/
-
41Analysis
42Frequency table of nominal variable
Respondent's sex
Cumulative Cumulative
SEX Frequency Percent
Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒƒƒƒƒƒƒ 1,MALE
1106 42.64 1106 42.64
2,FEMALE 1488 57.36
2594 100.00
43Frequency table of ordinal variable
Current financial condition
Cumulative
Cumulative CURFIN
Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
-9,NA 9 0.35
9 0.35 -8,DK
12 0.46 21 0.81
1,BETTER NOW 1053 40.59
1074 41.40 2,SAME
819 31.57 1893
72.98 3,WORSE NOW 701
27.02 2594 100.0
44Crosstabulation
EMPLOY(Are you employed now)
SEX(Respondent's sex)
Frequency
Percent
Row Pct
Col Pct 1,MALE 2,FEMALE Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
-9,NA 5
5 10
0.19 0.19 0.39
50.00 50.00
0.45 0.34
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
-8,DK 6 2 8
0.23
0.08 0.31
75.00 25.00
0.54 0.13
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1,YES 640
712 1352
24.67 27.45 52.12
47.34 52.66
57.87 47.85
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2,NO 455 769 1224
17.54
29.65 47.19
37.17 62.83
41.14 51.68
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 1106
1488 2594
42.64 57.36 100.00
45Significance test for a table
- Significance test tells you the probability that
the relationship you see in the table is due to
chance - Significance test does NOT tell you whether the
relationship is meaningful - Chi-square is a commonly used significance test
for a table - It is very sensitive to the number of cells
46Modified crosstabulation
EMPLOY(Are you employed now)
SEX(Respondent's sex)
Frequency
Percent
Row Pct
Col Pct 1,MALE 2,FEMALE Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1,YES 640
712 1352
24.84 27.64 52.48
47.34 52.66
58.45 48.08
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2,NO 455 769 1224
17.66
29.85 47.52
37.17 62.83
41.55 51.92
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 1095
1481 2576
42.51 57.49 100.00
Frequency Missing 18
Statistic DF Value
Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 1
27.1563 lt.0001 Likelihood
Ratio Chi-Square 1 27.2376 lt.0001
Continuity Adj. Chi-Square 1
26.7420 lt.0001
Mantel-Haenszel Chi-Square 1 27.1458
lt.0001 Phi Coefficient
0.1027
Contingency Coefficient 0.1021
Cramer's V
0.1027
47Measuring differences between two groupsT-test
with insignificant difference
Lower CL Upper CL Lower CL
Upper CL Variable BLDRO N Mean
Mean Mean Std Dev Std Dev Std Dev Std
Err PCOUNT 1,OWN 1996 2.4964
2.5556 2.6148 1.3088 1.3494 1.3926
0.0302 PCOUNT 2,RENT 432 2.4348
2.588 2.7411 1.5184 1.6197 1.7355
0.0779 PCOUNT Diff (1-2) -0.178
-0.032 0.1135 1.3629 1.4013 1.4418
0.0744
T-Tests Variable Method
Variances DF t Value Pr gt t
PCOUNT Pooled Equal
2426 -0.44 0.6635
PCOUNT Satterthwaite Unequal 567
-0.39 0.6988
Equality of Variances
Variable Method Num DF Den DF F
Value Pr gt F PCOUNT
Folded F 431 1995 1.44 lt.0001
48T-test with significant difference
Lower CL Upper CL Lower CL
Upper CL Variable SEX N Mean
Mean Mean Std Dev Std Dev Std Dev Std
Err indexus 1,MALE 1106 92.903
95.242 97.582 38.062 39.648 41.373
1.1922 indexus 2,FEMALE 1488 82.522
84.396 86.27 35.575 36.853 38.227
0.9554 indexus Diff (1-2) 7.8824
10.846 13.81 37.061 38.07 39.135
1.5114
T-Tests Variable Method
Variances DF t Value Pr gt t
indexus Pooled Equal
2592 7.18 lt.0001
indexus Satterthwaite Unequal 2281
7.10 lt.0001
Equality of Variances
Variable Method Num DF Den DF F
Value Pr gt F indexus
Folded F 1105 1487 1.16 0.0090
49T-test with significant difference
Lower CL Upper CL Lower CL
Upper CL Variable BLDRO N Mean
Mean Mean Std Dev Std Dev Std Dev Std
Err indexus 1,OWN 2007 88.335
90.038 91.741 37.734 38.902 40.144
0.8684 indexus 2,RENT 439 81.377
84.912 88.447 35.348 37.687 40.359
1.7987 indexus Diff (1-2) 1.1291
5.1262 9.1233 37.632 38.687 39.803
2.0384
T-Tests Variable Method
Variances DF t Value Pr gt t
indexus Pooled Equal
2444 2.51 0.0120
indexus Satterthwaite Unequal 658
2.57 0.0105
Equality of Variances
Variable Method Num DF Den DF F
Value Pr gt F indexus
Folded F 2006 438 1.07 0.4071
50Means of Persons per household by age group
Analysis Variable PCOUNT Person Count, FL
usual residence Broader age group of
N respondent Obs N
Mean Std Dev Minimum
Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒ 18-24 161 159 3.2955975
1.5733278 1.0000000 12.0000000
25-34 276 272 3.1985294
1.5620965 1.0000000 16.0000000
35-44 392 388 3.3479381
1.4924689 1.0000000 12.0000000
45-54 511 507 2.7159763
1.2877506 1.0000000 9.0000000
55-64 479 472 2.1440678
1.0033949 1.0000000 7.0000000 gt65
722 715 1.8293706
1.2040915 1.0000000 20.0000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
51ANOVA Testing differences between more than two
groups
Dependent Variable PCOUNT Person Count, FL
usual residence
Sum of Source
DF Squares Mean Square F
Value Pr gt F Model
6 913.010024 152.168337 89.99
lt.0001 Error 2557
4323.607761 1.690891 Corrected
Total 2563 5236.617785
R-Square Coeff Var Root MSE
PCOUNT Mean 0.174351
51.16756 1.300343 2.541342
Source DF Anova SS
Mean Square F Value Pr gt F AGE1
6 913.0100235
152.1683373 89.99 lt.0001