Title: Measurement Issues in Health Disparities Research
1 Measurement Issues in Health Disparities
Research
- Anita L. Stewart, Ph.D.
- University of California, San Francisco
- Health Disparities Research Methods
- EPI 222, Spring
- April 14, 2011
2Overview of Class
- Background culture-specific versus generic
measures - Conceptual and psychometric adequacy and
equivalence - Adequacy in one group
- Equivalence across groups
- Modifying measures
3Background
- U.S. population becoming more diverse
- Minority groups are being included in research
due to - NIH mandate (1993 women and minorities)
- Health disparities initiatives
4Types of Diverse Groups
- Health disparities research focuses on
differences in health between - Minority vs. non-minority
- Lower income vs. others
- Lower education vs. others
- Limited English Proficiency (LEP) vs. others
- . and many others
5Measurement Implications of Research in Diverse
Groups
- Most self-reported measures were developed and
tested in mainstream, well-educated groups - Little information is available on
appropriateness, reliability, validity, and
responsiveness in diverse groups - Although this is changing rapidly
6Measurement Adequacy vs. Measurement Equivalence
- Adequacy - within a diverse group
- concepts are appropriate and relevant
- psychometric properties meet minimal criteria
- Good variability
- Reliable and valid
- Sensitive to change over time
- Equivalence - between diverse groups
- conceptual and psychometric properties are
comparable
7Why Not Use Culture-Specific Measures?
- Measurement goal is to identify measures that can
be used across all groups in one study, yet
maintain sensitivity to diversity and have
minimal bias - Most health disparities studies compare mean
scores across diverse groups
8Generic/Universal vs Group-Specific(Etic versus
Emic)
- Concepts unlikely to be defined exactly the same
way across diverse ethnic groups - Generic/universal (etic)
- features of a concept that are appropriate across
groups - Group-Specific (emic)
- idiosyncratic or culture-specific portions of a
concept
9Etic versus Emic (cont.)
- Goal in health disparities research with more
than one group - identify generic/universal portion of a concept
that are applicable across all groups - For within-group studies
- the culture-specific portion is also relevant
10Overview of Class
- Background culture-specific versus generic
measures - Conceptual and psychometric adequacy and
equivalence - Adequacy in one group
- Equivalence across groups
11Conceptual and Psychometric Adequacy and
Equivalence
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
12Left Side of Matrix Adequacy in a Single Group
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
13Ride Side of Matrix Equivalence in More Than One
Group
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
14Overview of Class
- Background culture-specific versus generic
measures - Conceptual and psychometric adequacy and
equivalence - Adequacy in one group
- Equivalence across groups
- Modifing measures
15Approaches to Explore Conceptual Adequacy in
Diverse Groups
- Literature reviews of concepts and measures
- In-depth interviews and focus groups
- discuss concepts, obtain their views
- Expert consultation from diverse groups
- review concept definitions
- rate relevance of items
16Example Review of Measures of Dietary Intake in
Minority Populations
- Reviewed food frequency questionnaires for use in
minority populations - Performed well in some groups and poorly in
others - Group differences that could affect scores
- Portion sizes differ
- Missing ethnic foods
- Could underestimate total intake and nutrients
RJ Coates et al. Am J Clin Nutr
199765(suppl)1108S-15S.
17A Structured Method for Examining Conceptual
Relevance
- Compiled set of 33 typical HRQL items
- Administered to older African Americans
- After each question, asked how relevant is this
question to the way you think about your health? - 0-10 scale with 0not at all relevant,
10extremely relevant
Cunningham WE et al., Qual Life Res,
19998749-768.
18HRQL Relevance Results
- Most relevant items
- Spirituality, weight-related health, hopefulness
- Least relevant items
- Physical functioning, role limitations due to
emotional problems
19Qualitative Research Expert Panel Reviewed
Spanish FACT-G
- Functional Assessment of Cancer Therapy General
(FACT-G) - Bilingual/bicultural panel reviewed items for
conceptual relevance to Hispanics - One item had low relevance (I worry about dying)
- Added new item "I worry my condition will get
worse" - One domain missing spirituality
- Developed new spirituality scale (FACIT-Sp) with
input from cancer patients, psychotherapists, and
religious experts
D Cella et al. Med Care 1998 361407
20Example of Inadequate Concept
- Patient satisfaction typically conceptualized in
terms of, e.g., - access, technical care, communication,
continuity, coordination, interpersonal style - In minority and low income groups, additional
relevant domains - discrimination by health professionals
- sensitivity to language barriers
MN Fongwa et al., Ethnicity Dis,
200616(3)948-955.
21Measuring Park/Recreation Environments in
Low-Income Communities
- New focus on how environments promote physical
activity - Many good new measures of environments
- Reviewed adequacy for lower-income, minority
communities
22Measuring Park/Recreation Environments in
Low-Income Communities (cont)
- Recommendations In low-income communities of
color - Identify and address most salient environmental
needs - Incorporate research on preferred recreational
activities - Ensure representation of perceptions of residents
MF Floyd et al. Am J Prev Med, 200936S156-S160.
23Psychometric Adequacy in any Group
- Minimal standards
- Sufficient variability
- Minimal missing data
- Adequate reliability/reproducibility
- Evidence of construct validity
- Evidence of sensitivity to change
24Example Adequacy of Reliability of Spanish SF-36
in Argentinean Sample
SF-36 scale Coefficient alpha
Physical functioning .85
Role limitations - physical .84
Bodily pain .80
General health perceptions .69
Vitality .82
Social functioning .76
Role limitations - emotional .75
Mental health .84
F Augustovski et al, J Clin Epid, 2008,
611279-84.
25Overview of Class
- Background culture-specific versus generic
measures - Conceptual and psychometric adequacy and
equivalence - Adequacy in one group
- Equivalence across groups
- Modifying measures
26Conceptual Equivalence Across Groups
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
27Conceptual Equivalence
- Is the concept relevant, familiar, acceptable to
all diverse groups being studied? - Is the concept defined the same way in all
groups? - all relevant domains included (none missing)
- interpreted similarly
28Example Developing Concept of Interpersonal
Processes of Care
IPC Version I frameworkin Milbank Quarterly
19 focus groups -African American, Spanish- and
English-speaking Latino,and White adults
IPC II conceptual framework
Literature review of quality of care in diverse
groups
29IPC-II Conceptual Framework Reflects Concerns of
All 4 Groups
I. COMMUNICATION III. INTERPERSONAL
STYLE General clarity
Respectfulness Elicitation/responsiveness
Courteousness Explanations of
Perceived discrimination --processes,
condition, Emotional support
self-care, meds Cultural sensitivity
II. DECISION MAKING Responsive to
patient preferences Consider
ability to comply
30IPC-II Conceptual Framework (cont)
IV. OFFICE STAFF Respectfulness
Discrimination V. FOR LIMITED
ENGLISH PROFICIENCY PATIENTS MDs and
office staffs sensitivity to language
31Conceptual Equivalence Spanish- and
English-speaking Inpatients
- Administered Hospital Quality of Care Survey
(H-CAHPS), asked 2 open-ended questions to
detect experiences missed by survey - What they liked most about care
- What aspects of care they would change
- Analyzed responses in relation to existing survey
items or new topics
MP Hurtado et al. Health Serv Res, 200540-6,
Part II2140-2161
32Psychometric Equivalence
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
33Psychometric or Measurement Equivalence
- When comparing groups (as in health disparities
research) - Measures should have similar or equivalent
measurement properties in all diverse groups of
interest in your study - e.g., English and Spanish, African Americans and
Caucasians
34Psychometric Equivalence Across Groups
- Psychometric characteristics should be
equivalent across all groups - Sufficient variability
- Minimal missing data
- Reliability/reproducibility
- Construct validity
- Sensitivity to change
35Bias (Systematic Error) - A Special Concern
- Observed group mean differences in a measure can
be due to - Culturally- or group-mediated differences in true
score (true differences) -- OR -- - Bias - systematic differences between observed
scores not attributable to true scores
36Random versus Systematic Error
- Observed true item
score score -
Relevant to reliability
random systematic
error
Relevant to validityBias
37Bias (Systematic Error)
- Systematic measurement error may make group
comparisons invalid - Systematic differences in scores can be due to
group differences in - the meaning of concepts or items
- the extent to which measures represent a concept
- cognitive processes of responding
- use of response scales
38Bias or Systematic Difference?
- Bias deviation from true score
- Cannot speak of a bias in one group compared to
another w/o knowing true score - Preferred term differential item functioning
(DIF) - Item (or measure) that has a different meaning in
one group than another
39Item Equivalence
- No Differential Item Functioning (DIF)
- Items are similarly related to the underlying
trait - Meaning of response categories is similar across
groups - Distance between response categories is similar
across groups
40Methods for Identifying Differential Item
Functioning (DIF)
- Item Response Theory (IRT)
- Examines each item in relation to underlying
latent trait - Tests if responses to one item predict the
underlying latent score similarly in two groups - if not, items have differential item functioning
41Example of Effect of DIF
- 5 CES-D items administered to Black and White men
- 1 item subject to differential item functioning
(bias) - 5-item scale including item suggested that Black
men had more somatic symptoms than White men (p lt
.01) - 4-item scale excluding biased item showed no
differences
S Gregorich, Med Care, 200644S78-S94.
42Equivalence of Reliability?? No!
- Difficult to compare reliability because it
depends on the distribution of the construct in a
sample - Thus lower reliability in one group may simply
reflect poorer variability - More important is the adequacy of the reliability
in both groups - Reliability meets minimal criteria within each
group
43Equivalence of Criterion Validity
- Determine if hypothesized patterns of
associations with specified criteria are
confirmed in both groups, e.g. - a measure predicts utilization in both groups
- a cutpoint on a screening measure has the same
specificity and sensitivity in identifying a
condition in both groups
44Equivalence of Construct Validity
- Are hypothesized patterns of associations
confirmed in both groups? - Example Scores on the Spanish version of the
FACT-G had similar relationships with other
health measures as scores on the English version - Primarily tested through subjectively examining
pattern of correlations - Can also test using confirmatory factor analysis
(CFA)
45Equivalence of Construct Validity of Spanish
SF-36 in Argentinean Sample
- Compared Spanish SF-36 construct validity test
results to U.S. English SF-36 results - Tested several previously tested hypotheses
(which were confirmed) - PCS decreases with age and of diseases
- Relationship of PCS and MCS with utilization
- Known groups validity (scores lower for those
with various diseases)
F Augustovski et al, J Clin Epid, 2008,
611279-84.
46Equivalence of Factor Structure
- Factor structure similar in new group to
structure in original study - measurement model is the same across groups
- Methods
- Specify number of factors
- Determine if hypothesized model fits the data
47Factor Structure of CES-D
- Original study found 4 factors
- Somatic symptoms
- Depressive affect
- Interpersonal behavior
- Positive affect
- In a new population group do you find 4 factors?
LS Radloff, Applied Psychol Measurement,
19771385-401.
48How Evidence for Equivalence of Factor Structure
is Obtained
- Subjectively
- visually compare factor loadings across
group-specific exploratory factor analysis - Empirically
- confirmatory factor analysis of data that
includes multiple groups - studies of psychometric invariance
49Empirical Examination of Equivalence of Factor
Structure
- Psychometric invariance (equivalence)
- Important properties of theoretically-based
factor structure (measurement model) do not vary
across groups (are invariant) - measurement model is the same across groups
- Empirical comparison across groups using
confirmatory factor analysis - Not simply by examination
50Hierarchical Tests of Psychometric Equivalence
- Across all groups a sequential process
- Same number of factors or dimensions
- Same items on same factors
- Same factor loadings
- No bias on any item across groups
- Same residuals on items
- No item or scale bias AND same residuals
51Criteria for Evaluating Invariance Across Groups
Technical Terms
Dimensional Invariance Same number of factors
Configural Invariance Same items load on same
factors
Metric or Factor Pattern Invariance Items have
same loadings on same factors
Scalar or Strong Factorial Invariance Observed
scores are unbiased
Residual Invariance Observed item and factor
variances are unbiased
Strict Factorial Invariance Both scalar and
residual criteria are met
52Factor Structure of CES-D
- Original study found 4 factors
- Somatic symptoms
- Depressive affect
- Interpersonal behavior
- Positive affect
- In a new population group do you find 4 factors?
LS Radloff, Applied Psychol Measurement,
19771385-401.
53Test for Evidence of Dimensional Invariance
- Two studies of Latinos
- 2 factors in both studies
- Depression and well-being
- American Indian adolescents
- 3 factors
- Depressed affect
- Somatic symptoms and reduced activity
- Positive affect
TQ Miller et al., J GerontolSoc Sci
1997520S259
SM Manson et al., Psychol Assessment
19902231-237
54Configural Invariance
Dimensional Invariance Same number of factors
Configural Invariance Same items load on same
factors
Metric or Factor Pattern Invariance Items have
same loadings on same factors
Strong Factorial or ScalarInvariance Observed
scores are unbiased
Residual Invariance Observed item and factor
variances can be compared across groups
Strict Factorial Invariance Both scalar
invariance and residual invariance criteria are
met
55Configural Invariance
- Assumes dimensional invariance is found (same
number of factors) - Definition Item-factor patterns are the same,
same items load on same factors in both groups - CES-D example
- 4 factors found in Anglos, Blacks, and Chicanos
- Same items loaded on each factor in all groups
RE Roberts et al., Psychiatry Research,
19802125-134
56Metric Invariance
Dimensional Invariance Same number of factors
Configural Invariance Same items load on same
factors
Metric or Factor Pattern Invariance Items have
same loadings on same factors
Strong Factorial or ScalarInvariance Observed
scores are unbiased
Residual Invariance Observed item and factor
variances can be compared across groups
Strict Factorial Invariance Both scalar
invariance and residual invariance criteria are
met
57Metric Invariance or Factor Pattern Invariance
- Assumes dimensional and configural invariance
are found - Definition Item loadings are the same across
groups - i.e., the correlation of each item with its
factor is the same in all groups
58Metric Invariance Example from Interpersonal
Processes of Care
- Out of 91 items factor structure of 29 items
met criteria of invariance across 4 groups - Spanish-speaking Latinos, English speaking
Latinos, African Americans, Whites - Dimensional
- Similar factor structure across all 4 groups
- Configural
- Same items loaded on each factor in all 4 groups
- Metric
- Same item loadings in all 4 groups
Stewart et al., Health Services Research, 2007
42 (3, Part I)1235-56.
59Seven Metric Invariant ScalesSame Item
Loadings Across Groups
I. COMMUNICATION Hurried
communication Elicited concerns,
responded Explained results, medications
II. DECISION MAKING
Patient-centered decision-making III.
INTERPERSONAL STYLE Compassionate,
respectful Discriminated
Disrespectful office staff
60Strong Factorial Invariance
Dimensional Invariance Same number of factors
Configural Invariance Same items load on same
factors
Metric or Factor Pattern Invariance Items have
same loadings on same factors
Strong Factorial or ScalarInvariance Observed
scores are unbiased
Residual Invariance Observed item and factor
variances can be compared across groups
Strict Factorial Invariance Both scalar
invariance and residual invariance criteria are
met
61Strong Factorial Invariance or Scalar Invariance
- Assumes dimensional, configural, and metric
invariance are found - Definition Observed scores are unbiased, i.e.,
means can be compared across groups - Requires test of equivalence of mean scores
across groups using confirmatory factor analysis
62Seven Scalar Invariant (Unbiased) IPC Scales
(18 items)
I. COMMUNICATION Hurried communication
lack of clarity Elicited concerns,
responded Explained results, medications
explained results II. DECISION MAKING
Patient-centered decision-making decided
together III. INTERPERSONAL STYLE
Compassionate, respectful(subset) compassionate,
respectful Discriminated discriminated
due to race/ethnicity Disrespectful office
staff
63Equivalence of Spanish and English Hospital
Quality of Care Survey (H-CAHPS)
- Tested 7 subscales (e.g., nurse communication,
pain control, discharge information) - Compared Spanish and English groups
- Item-scale correlations, internal consistency
reliability, factor structure, and construct
validity - Concluded these were equivalent
MP Hurtado et al. Health Serv Res, 200540-6,
Part II2140-2161
64Overview of Class
- Background culture-specific versus generic
measures - Conceptual and psychometric adequacy and
equivalence - Adequacy in one group
- Equivalence across groups
- Modifying measures
65What if Measures Need Modifying or Adapting?
- Why would we modify a measure?
- What information is used to modify?
- What are the types of modifications?
- How should we test modified measures?
66When Problems are Found Through Pretesting
Investigators Face a Choice
- Use the existing measure as is to preserve
integrity of measure - OR
- Try to modify the measure to address problems in
diverse group
67Argument in Favor of Using Measure As Is
- Modifications can change the measures validity
and reliability - Allows comparison of findings to other research
using the measure
68Argument Against Using Measure As Is .
- when problems are found
- If reliability and validity are poor
- Results pertaining to the measure could be
erroneous - Limited internal validity
69Reasons for Considering Modifying an Existing
Measure
- In health disparities research
- Sample/population differs from that in which
original measure developed - More broadly
- Measure developed awhile ago
- Poor format/presentation
- Study context issues
70Key Reason Population Group Differences from
Original
- Research in diverse population groups
- Different culture, race/ethnic group
- Lower level of socioeconomic status (SES)
- Limited English proficiency, lower literacy
- Mainstream research
- Different disease, health problem, patient group,
age group
71Why Might a Measure Not be Suitable for New
Population Group?
- Concept or dimension is missing
- Meaning of concepts differ from mainstream
- New group may not interpret items as intended
- Process of answering questions may differ
72Poor Format/Presentation High Respondent Burden
- Instructions unnecessarily wordy, unclear
- Way of responding is complicated
- Difficult to navigate the questionnaire
- Crowded on the page
- Hard to track across the page
- Hard to read
- Poor contrast, small font
73Example Complex Instructions
- Instructions There are 12 statements on
this form. They are statements about families.
You are to decide which of these statements are
true of your family and which are false. If you
think the statement is TRUE or MOSTLY TRUE of
your family, please mark the box in the T (TRUE)
column. If you think the statement is FALSE or
MOSTLY FALSE of your family, please mark the box
in the F (FALSE) column. - You may feel that some of the statements are
true for some family members and false for
others. Mark the box in the T column if the
statement is TRUE for most members. Mark the box
in the F column if the statement is FALSE for
most members. If the members are evenly divide,
decide what is the stronger overall impression
and answer accordingly. - Remember, we would like to know what your
family seems like to you. So do not try to
figure out how other members see your family, but
do give us your general impression of your family
for each statement. Do not skip any item.
Please begin with the first item.
74Example Burdensome Way of Responding
- For each question, choose from the following
alternatives - 0 Never
- 1 Almost Never
- 2 Sometimes
- 3 Fairly Often
- 4 Very Often
1. In the last month, how often have you felt nervous and stressed? . 0 1 2 3 4
2. In the last month, how often have you felt that things were going your way?.................................... 0 1 2 3 4
S Cohen et al. J Health Soc Beh,
198324(4)385-396.
75What Information is Used to Decide How to Modify
a Measure?
- Same data identifying conceptual differences in
diverse population - often includes information for making revisions
76Published Review - Physical Activity Measures for
Minority Women
- WHI convened experts to identify issues in
measuring PA in minority and older women - Some conclusions
- Assess culturally sensitive activities (e.g.,
walking for transportation and errands) - Measure intermittent activities
- Phrases leisure time, free time, spare time
(used to denote non-occupational activities) not
understood - Review can help select appropriate measures and
adapt as needed
LC Masse et al., J Womens Health, 1998757-67.
77Types of Modifications
- Format or presentation
- Content
- Dimensions
- Item stems
- Response options
78Format/Presentation Modifications
- Goal reduce respondent burden
- Improve appearance or way of responding
- Simplify instructions
- Modify format for responding
- Create more space, reduce crowded items
- Improve contrast, increase font size
79Types of Modifications
- Format or presentation
- Content
- Dimensions
- Item stems
- Response options
80Content Modification Example Add Dimension
- Study of older Korean/Chinese immigrants
- Added language support to existing social support
measure - Based on focus group data
- Help with translation at medical appointments
- Help to ask questions in English when on the
phone - Help to learn English
S Wong et al. Int J Health Human Dev,
200561105-121.
81Content Modification Example Add Dimension
(cont)
- New items were embedded in existing social
support measure using same format
82Minor to Major Modifications?
- Each type of modification can hypothetically be
rated on a continuum from having minor to major
impact on reliability and validity of original
measure - Minor slight changes in format/presentation
-
- Major numerous changes in dimensions, items,
and response choices
83Need to Test Psychometric Properties of Modified
Measures
- All modifications, no matter how small, can
affect reliability and validity of original
measure - Burden is on investigator to test modified
measure
84Recommendations for Testing Modified Measures
- Pretest modified measure extensively before
fielding in new study - Build in ability to do psychometric testing when
measure is fielded - Add validity variables (e.g., similar to original
measure to test comparability) - Add follow-up to assess test-retest reliability
85Analyze Psychometric Adequacy of Modified Measure
in New Study
- Modified measure should meet minimal criteria
- Item-scale correlations
- Internal-consistency reliability
86Analyzing Modified Measure Comparability to
Original Measure
- Compare measurement results of modified measure
to original measure - Reliability (sample dependent)
- Factor structure
- Construct validity
- Sensitivity to change
87Overall Conclusions
- Measurement in health disparities research is
relatively new field - We encourage reporting on adequacy and
equivalence of measures tested in any diverse
population - As evidence grows, easier to find measures that
work better across diverse groups
88Resource Reviews of Measures for Diverse
Populations
- Multicultural measurement in older populations,
JH Skinner et al (eds), Springer Publishing Co
NY, 2002 - ALSO published as
- Measurement in older ethnically diverse
populations, J Mental Health Aging, Vol 7, Spring
2001
Reviews measures that have been used
cross-culturally in acculturation, socioeconomic
status, social support, cognition, health,
depression, and religiosity.
89Resource Special Journal Issue
- Measurement in a multi-ethnic society
- Med Care, Vol 44, November 2006
- Qualitative and quantitative methods in
addressing measurement in diverse populations
90Guidelines for Translating Measures
- Handout annotated bibliography of articles in
which optimal methods of translation are used - Compiled by CADC Measurement and Methods Core
91Homework for Class 3
- Complete rows 12-17 in matrix
- Use form posted on the website
- Include your name in the filename
- Smith_HW_epi222_class3
- Email by Monday April 18 to
- Anita.Stewart_at_ucsf.edu