Title: Coding closed questions
1Codingclosed questions
GAP Toolkit 5
Training in basic drug abuse data
management and analysis
2Objectives
- To establish a set of practical coding rules for
closed questions - To explain the importance of assigning numbers to
characteristics - To construct a framework for recording missing
values - To introduce identification numbers as a method
of ensuring the anonymity of respondents, while
maintaining a link between files and
questionnaires
3Components of a data file
- Cases or observations
- Variables
- Values
4Coding
- The identification of the possible values of a
variable and the assignment of numbers to those
values - The numbers, representing the values, are stored
in a data file
5Closed questions/categorical variables
- A limited number of values
- The values are mutually exclusive
- The values are collectively exhaustive
- Code by assigning a number to each value
6Example
- Coding gender
- Possible values male female
- Coding scheme 1 Male 2 Female
7Why numbers?
- Efficient use of computers
- Quicker to enter
- Not subject to spelling mistakes
8Why numbers?
- Some statisticians define measurement as
necessarily resulting in numbers - To measure a property means to assign numbers to
units as a way of representing that property. - (D. S. Moore, Statistics Concepts and
Controversies, 2nd ed. (New York, W. H. Freeman
Press, 1985)).
9Pre-code
- Coding takes place before the questionnaire is
delivered - The possible responses to a question are
anticipated - The coding appears on the questionnaire
10Coding rules
- Codes must be
- Mutually exclusive
- Collectively exhaustive
- Consistent across variables
- (J. Fielding, Coding and managing data,
Researching Social Life, N. Gilbert, ed. (London,
Sage Publications, 1993) and D. De Vaus, Surveys
in Social Research (London, Routledge, 2002)).
11Continuous variables
- Do not generally require coding as
- They are already numerical
- There is a potentially infinite number of
categories
12Coding in SPSS
- The Values column in Variable View is used to
implement coding in SPSS - Numbers are allocated to each of the categories
of a variable
13Example coding Drug
Case summariesa
- In data file Ex1.sav, a variable called Drug was
defined as a string variable and a number of
drugs were entered
Drug
1 Heroin
2 Alcohol
3 Hashish
4 Bhang
5 Heroin
6 Hashish
Total N 6
a Limited to first 100 cases.
14Coding Drug
- Decide on a set of numeric labels for the
different categories, in this case drugs - 1 Heroin
- 2 Alcohol
- 3 Hashish
- 4 Bhang
15Coding Drug
- Create a new variable Drug2type numeric
width 2 decimals 0label Drug Coded - Click on the Values column and then on the three
dots that appear to the right of the Values box
to generate the following dialogue box
16Click to register code
17(No Transcript)
18Frequency count for Drug Coded
Drug Coded
Frequency Percentage Valid percentage Cumulative percentage
Valid Heroin 2 33.3 33.3 33.3
Alcohol 2 33.3 33.3 66.7
Hashish 1 16.7 16.7 83.3
Bhang 1 16.7 16.7 100.0
Total 6 100.0 100.0
19Note
- Coding data does not change the level of
measurement - The level of measurement is a guide to the
selection of appropriate statistics
20SPSS
- Value labels can be assigned to numeric variables
and string variables of eight or fewer characters - By default, SPSS sets all numeric variables to
Scale variables
21Exercise coding
22Frequency count of Drug
Drug
Frequency Percentage Valid percentage Cumulative percentage
Valid Alcohol 3 25.0 25.0 25.0
Bhang 1 8.3 8.3 33.3
Hashish 3 25.0 25.0 58.3
Heroin 2 16.7 16.7 75.0
Mandrax 3 25.0 25.0 100.0
Total 12 100.0 100.0
23Frequency count of Condition
Condition Coded
Frequency Percentage Valid Percentage Cumulative percentage
Valid Recovered 5 41.7 41.7 41.7
Relapsed 7 58.3 58.3 100.0
Total 12 100.0 100.0
24Missing values
25Missing values causes
- The question is not applicable
- The respondent does not know
- The respondent refuses to answer
- No response is marked on the questionnaire (i.e.,
truly missing and there is no clue why) - (De Vaus, 2002)
26Coding missing values
- Use codes outside of the range of common values
- e.g., 9, 99, -99, 999
- If possible, retain the same codes for the
various missing options for all variables - The default missing value in SPSS is a full stop
. and is called the systems missing value
27SPSS missing values
- Part of the variable definition
- Variable View Missing column
- Click on the Missing cell in the row defining the
variable - Click on the three buttons that appear to the
right of the Missing cell and the following
dialogue box will appear
28(No Transcript)
29Exercise
- Three additional observations are obtained for
Ex1.sav - DAP1-0013 Alcohol 39 ------------
- DAP1-0014 Hashish -- Recovered
- DAP1-0015 --------- 16 Relapsed
- Code necessary missing values for the variables
- Run a frequency count on Drug and Condition,
comparing percentage and valid percentage
30Identification numbers
31ID numbers purpose
- An ID number
- Ensures anonymity
- Links a row in the data file to a physical
questionnaire
32ID numbers characteristics
- A unique identifier
- Sometimes contains information in a compound form
33Example
- DAP1-001, DAP1-002,
- DAP is short for Drug Assessment Programme
- 001, 002 are consecutive numbers that uniquely
identify each questionnaire or respondent - There must be at most 999 respondents, as space
has only been made available for 999 unique ID
numbers
34Summary
- Coding closed questions
- Value labels
- Frequency counts
- Missing values
- ID numbers