Concepts to retain

About This Presentation

Title:

Concepts to retain

Description:

Reliability (inter-rater, intra-rater, test-retest) ... 2. Do you find it difficult to refrain from smoking in places where it is forbidden? ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 43

Provided by: janemc1

Category:

more less

Transcript and Presenter's Notes

Title: Concepts to retain

1
Concepts to retain

Level of measurement (nominal, ordinal,
interval, ratio)
Single- and multi-item measures (index, scale)
Response options (categorical vs. continuous)
Common response scales (Likert visual analogue,
semantic differential)
Reliability (inter-rater, intra-rater,
test-retest)
Measuring reliability ( agreement Kappa
coefficient ICC)
Internal consistency (Cronbachs alpha)
Validity (content/face, criterion (concurrent
and predictive) construct (discriminant and
convergent)
Biases (response bias recall bias, acquiescence
bias, social desirability)
Responsiveness

2
True/False
3
(No Transcript)
4
Traditions of measurement theory

Clinimetric - clinical, epidemiological (focus on
screening and diagnostic tests)
Psychometric - psychology (focus on scales)

5
What do we measure in epidemiology?

Health outcomes
Exposures, determinants, risk factors
Confounders
Effect modifiers
Objective is to maximize the validity of the
study results

6
Sources of data

Primary
Clinical observations
Questionnaires and interviews

Secondary
Reportable diseases, registries
Administrative databases (hospital
discharges,medication prescription)
Vital statistics

7
Measurement

Researcher must have a very clear idea of the
- concept that needs to be measured
- the type and amount of information needed
for analysis
- operational definition
A measure comprises
- question(s)/item(s) (i.e., single vs.
multi-item index/scale)
- response options (- open vs. closed-ended
categorical vs. continuous)
- many options and many decisions need to be
made

8
Criteria to select measure

Appropriate to purpose (describe health evaluate
intervention compare groups predict outcome)
Feasible
Respondent burden
Method of administration
- self-administered (in-person, mail)
- interviewer (face-to-face, telephone)
- informant or proxy

Cost
Acceptable
Simplicity
Parsimonious
Meaningful
Reliability
Validity
Responsiveness (sensitivity to change)

9
Single vs. multi-item measures

Single item measures
- used when underlying concept is simple and
easy to measure
Multi-item measures (index)
- sets of items measuring a latent construct
- items interrelated with each more than with
items representing other latent variables
- items summed, averaged, weighted
- sub-scales
- Cronbach's alpha is a common test of whether
items are sufficiently interrelated to justify
their combination in an index
- scale - ordinal index

10
Single-item measure of nicotine dependence

On a scale of 1 to 10, how addicted are you to
cigarettes?

11
Fagerstrom Test for Nicotine Dependence

1. How soon after you wake up do you smoke your
first cigarette?
- After 60 minutes (0)
- 31-60 minutes (1)
- 6-30 minutes (2)
- Within 5 minutes (3)
2. Do you find it difficult to refrain from
smoking in places where it is forbidden?
- No (0)
- Yes (1)
3. Which cigarette would you hate most to give
up?
- The first in the morning (1)
- Any other (0)

4. How many cigarettes per day do you smoke?
- 10 or less (0)
- 11-20 (1)
- 21-30 (2)
- 31 or more (3)
5. Do you smoke more frequently during the first
hours after awakening than during the rest of the
day?
- No (0)
- Yes (1)
6. Do you smoke even if you are so ill that you
are in bed most of the day?
- No (0)
- Yes (1)

12
Choice of response options

Open-ended What do you like most about the
epidemiology program at McGill?__________________
_______
- useful in exploratory research
- used to develop more structured
questions
- analysis time-consuming requires
qualitative methods
Closed-ended What I like most about McGill is
the(choose one response)
(i) the teachers in 611
(ii) the walk up the hill to Purvis in
the winter
(iii) the fascinating Monday seminars
(iv) other
- used more frequently
- easier to analyze

13
Choice of response options -categorical (discrete)

Dichotomous, binary
- two response categories
- Are you able to climb stairs? (yes,
no)
Polychotomous - multiple response categories
- nominal - What is your marital status?
(single, married, divorced)
- ordinal - categorical data where there is a
logical ordering in the categories (Do you have
difficulty walking? (0- no 1- some problems 2-
confined to bed)
- can be analyzed as continuous
(pseudocontinuous)
Disadvantages
- need to make judgments
- loss of information (precision)

14
Choice of response options - continuous
(quantitative)

Interval scale
- measures quantitative differences between
values of a variable
- equal distances between values
- scores can be added and subtracted but not
multiplied or divided
- no 0 value (or it is hard to define)
- intelligence, temperature, weight
Ratio scales
- a numerical interval scale with a true
zero point
- a given size interval has the same
interpretation for the entire scale
- no. cigarettes/day no. nights spent in a
hospital
Continuous measures can be categorized

15
Visual analog scale

A bipolar scale (absence vs. highest degree)
used to determine the degree of stimuli
experienced, commonly used as a visual
measurement of pain or stimuli.
To help people say how good or had their
health is, lets say the best state you can
imagine is 100, and the worst if 0. In your
opinion, how good or bad is you heath today?
Please mark an X on the line below.
0___________________________________________
_100
How severe is your arthritic pain been
today?
Pain as
bad as
can be_______________________________________
__No pain

16
Likert scale

Ordinal scales commonly used in attitudinal
measurements
Please circle the response that corresponds
best to your opinion. I am able to get up early
enough in the morning to exercise before work.
1. Totally agree
2. Agree
3. No opinion
4. Disagree
5. Totally disagree

17
Semantic differential scale

A technique for obtaining a value for subjective
response in which the subject is asked to denote
the intensity of a stimulus by choosing a
subdivision between two extremes
My illness is
Painful ________________________Painless
Serious________________________Mild
Boring ________________________Interesting

18
Fagerstrom Test for Nicotine Dependence

1. How soon after you wake up do you smoke your
first cigarette?
- After 60 minutes (0)
- 31-60 minutes (1)
- 6-30 minutes (2)
- Within 5 minutes (3)
2. Do you find it difficult to refrain from
smoking in places where it is forbidden?
- No (0)
- Yes (1)
3. Which cigarette would you hate most to give
up?
- The first in the morning (1)
- Any other (0)

4. How many cigarettes per day do you smoke?
- 10 or less (0)
- 11-20 (1)
- 21-30 (2)
- 31 or more (3)
5. Do you smoke more frequently during the first
hours after awakening than during the rest of the
day?
- No (0)
- Yes (1)
6. Do you smoke even if you are so ill that you
are in bed most of the day?
- No (0)
- Yes (1)

19
Level of dependence on nicotine

0-2 Very low dependence
3-4 Low dependence
5 Medium dependence
6-7 High dependence
8-10 Very high dependence

20
Reliability

Refers to the degree to which the results
obtained by a measurement procedure can be
replicated
Measures with low reliability will vary across
interviewers, time, method of administration
Internal consistency
Reproducibility (stability)
Test-retest reliability
Inter-rater and intra-rater reliability

21
Internal consistency

Concept that is relevant to multi-item index
Inter-correlation between items of a scale that
are meant to measure different dimensions of the
same construct
Based on a single administration of an index
Scales with more items have higher internal
consistency
Cronbachs alpha (psychometric property)
- assesses the extent to which a set of items
can be treated as measuring a single latent
variable

22
Measure of internal consistency

Split-half reliability - correlation between
scores on arbitrary half of measure with scores
on other half
Cronbachs alpha estimates split half correlation
for all possible combinations of dividing the
scale
May be used to reduce the number of items in a
scale
Ranges between 0.0-1.0
Widely-accepted cut-off is that alpha should be
.70 or higher, some use .75 or .80 while others
are as lenient as .60

23
Use of the fagerstrom tolerance questionnaire for
measuring nicotine dependence among adolescent
smokers in China a pilot test.Chen X, Zheng H,
Steve S, Gong J, Stacy A, Xia J, Gallaher P, Dent
C, Azen S, Shan J, Unger JB, Johnson
CA.Institute for Health Promotion and Disease
Prevention Research, University of Southern
California, USA. jim_chen_at_abtassoc.comThe
validity of the Prokhorov adolescent version of
the Fagerstrom Tolerance Questionnaire (FTQ) has
not been demonstrated in assessing nicotine
dependence among Chinese adolescents in China.
Data for 48 tenth-grader 30-day smokers in Wuhan,
China (ages 16-17 years), were analyzed. Two
different item scoring protocols were used, and
self-reports of smoking were validated with
saliva cotinine. When items were scored using
Protocol A, Cronbach's alphas were .42 and .63
for the 7-item and the 4-item scales,
respectively while using Protocol B, the alphas
were .67 and .79 for the 7-item and 4-item
scales, respectively. The total FTQ scores were
significantly associated with self-reported
smoking and saliva cotinine levels. These results
support the reliability and validity of the
Prokhorov FTQ.
24
To measure reproducibility

Need at least two administrations
Intra-rater - repeated measurements by the same
rater
Inter-rater - two or more raters assess the same
measure
Test-retest - measure is taken two or more times
under identical conditions
- for constructs that fluctuate, 2 weeks
often used to reduce effects of memory and true
change
- some constructs should not fluctuate
(personality traits)

25
To measures of reliability of categorical data

Percent agreement
- limitation value is affected by prevalence
- higher if very low or very high prevalence
Kappa statistic
- takes chance agreement into account
- defines fraction of observed agreement not due
to chance
- Kappa p (obs) p (exp)
1 p (exp)
p(obs) proportion of observed agreement
p(exp) proportion of agreement expected
by chance

26
(No Transcript)
27
Interpretation of Kappa

Range 0.0-1.0
Excellent 0.75
Fair to good 0.40 - 0.75
Poor 0.40

28
To measures of reliability of continuous data

Correlation coefficients measure pair-wise
comparison
Pearsons r
- assesses linear association between 2 sets of
observations
- sensitive to range of values, especially
outliers
Spearman r
- ordinal or rank order correlation
- less influenced by outliers

29
Intra-class correlation coefficient (ICC)

Equivalent to kappa and same range of values
(0.0-1.0)
Reflects true agreement, including systematic
differences
Assesses reliability by comparing the variability
of different ratings of the same subject to the
total variation across all ratings and all
subjects.
Estimates proportion of total measurement
variability due to between-individuals (vs error
variance)
Interpretation of ICC0.88 is that i.e.,88 of
that variation in the score relates to true
variance between subjects
Affected by range of values - if less variation
between individuals, ICC will be lower

30
The Fagerström Test for Nicotine Dependence in a
Dutch sample of daily smokers and ex-smokers J
M. Vink , G Willemsen, A Beem, D Boomsma
Abstract We explored the performance of the
Fagerström Test for Nicotine Dependence (FTND) in
a sample of 1378 daily smokers and 1058
ex-smokers who participated in a survey study of
the Netherlands Twin Register. FTND scores were
higher for smokers than for ex-smokers. Nicotine
dependence level was not associated with age.
FTND score was highly correlated with the maximum
number of cigarettes smoked (even after excluding
the item number of cigarettes per day from
FTND), but the FTND score showed a low
correlation with age of first cigarette and total
number of years smoked. In a subsample of smokers
(n143) and ex-smokers (n181) the testretest
correlations for the FTND were high. In general,
the performance of the FTND in ex-smokers was
comparable with that in smokers. These findings
suggest the FTND to be a valuable tool for
studies of nicotine dependence in large
epidemiological samples.
In the testretest sample, the mean FTND score of
the first measurement was not significantly
different from the mean FTND score at the second
measurement occasion. The testretest
correlations (PearsonLawly correction) were .70
for male smokers, .83 for female smokers, .91 for
male ex-smokers. and .83 for female ex-smokers.
These correlations did not differ much from the
regular Pearson ProductMoment Correlations (.72
for male smokers, .85 for female smokers, .92 for
male ex-smokers, and .86 for female ex-smokers).
31
To improve reliability

Increase the number of items in a scale
Increase the number of response choices for each
item
Reduce inter-observer variation through training
of interviewers, use of standardized protocols
Reduce ambiguity in questions

32
Validity

An expression of the degree to which a
measurement measures what it purports to measure.
Does it measure what it is intended to?
Types
- Face, content
- Criterion (concurrent (convergent)
predictive)
- Construct (discriminant convergent)
- Responsiveness
Depends on purpose
- Develop new scale - content
- Screening discriminant construct validity
- Outcome of treatment responsiveness,
sensitivity to change
- Prognosis predictive validity

33
Content and face validity

Judgment of experts and/or members of target
population
Face validity extent to which, on the face of
it, the measurement appears to be measuring the
desired qualities (eyeball test)
Content validity - extent to which the
measurement incorporates al the relevant content
or domains of the construct under study
Content can be developed through lit reviews,
interviews with target population, focus groups,
review of existing instruments

34
Criterion validity

Extent to which a measure correlates with an
external criterion (gold standard)
Convergent (concurrent) criterion validity -
correlation between the measurement of interest
and another measure known to measure the same
concept. Both measures are taken at the same time
- 0.4-0.8
- screening test vs. diagnostic test
Predictive criterion validity ability of the
measure to predict the criterion
- cancer staging test vs 5-year survival

35
Construct validity

Is the theoretical construct underlying the
measure valid?
Development and testing of hypotheses
Requires multiple data sources and investigations
- convergent validity measure is correlated with
other measures of similar constructs (i.e., food
frequency questionnaire and food records
Fagerstrom correlates with saliva cotinine)
- discriminant validity measure is not
correlated with measures of different constructs
(i.e., Fagerstrom not correlated with depression)

36
Table 1. Correlation among FTND scores of daily
smokers and ex-smokers and other smoking
variables
All correlation are
significant at the Plt.05 level.
37
Response bias

Tendency to respond in a particular way or style
to items on a scale that yields systematic error
Recall bias - systematic error due to the
differences in accuracy or completeness of recall
to memory of past events or experiences
Acquiescence bias - tendency to agree with
statements of opinions
Social desirability - tendency to respond in a
way that is perceived to be more socially
desirable than true response

38
Factors affecting response

Question wording/response scale
Characteristics of subjects (age, sex,
education)
Method of data collection (questionnaire,
interview, telephone vs face-to-face
Training of interviewers

39
Responsiveness

Ability of measure to detect clinically
important change over time or differences between
treatments
Sensitivity to change
Important when testing the effectiveness of an
interventions

40
Translation

Not an simple matter
Double back translation
Need to retest validity and reliability in target
population

41
Ask yourself.

How will you measure the outcome? Exposures?
Confounders?
Are your measures reliable? In the population you
will target? How was reliability established?
Is there any evidence that your measures are
valid? In the population you will target? How was
validity established?

42
True/False

Write a Comment

User Comments (0)