Title: Measurement Issues in Health Disparities Research
1 Measurement Issues in Health Disparities
Research
- Anita L. Stewart, Ph.D.
- University of California, San Francisco
- Clinical Research with Diverse Communities
- EPI 222, Spring
- April 17, 2008
2Background
- U.S. population becoming more diverse
- More minority groups are being included in
research due to - NIH mandate
- Recent health disparities initiatives
3Types of Diverse Groups
- Health disparities research focuses on
differences in health between the following
groups - Minority vs. non-minority
- Low income vs. others
- Low education vs. others
- Limited English skills vs. others
- . and others
4Health Disparities Research
- Describe health disparities
- Health differences across various diverse groups
- Identify mechanisms by which health disparities
occur - Individual level
- Environmental level
- Intervene to reduce health disparities
5Health Care Disparities
- Differential access to and quality of health care
is well known - Thus, health care disparities become a plausible
mechanism for health disparities - Understanding determinants of health care
disparities is also of interest
6Types of Self-Report Measures Needed
- Measures of health, and of various mechanisms for
disparities - Class 4 will present numerous mechanisms
- Examples from this class sense of control,
self-efficacy for managing disease,
health-related quality of life for various health
conditions
7Measurement Implications of Research in Diverse
Groups
- Most self-reported measures were developed and
tested in mainstream, well-educated groups - Subgroup analysis of measures has been rare
- Thus, little information is available on
appropriateness, reliability, validity, and
responsiveness in minority and other diverse
groups
8The Measurement Goal Identify Measures That Can
Be Used
- To compare diverse groups
- To study mechanisms within any particular
diverse group
9Group Comparisons are the Most Problematic
- Disparities research involves comparing mean
levels of health or its determinants - Requires equivalent concepts and measures
- Potential true differences may be obscured
- Observed group differences may be inaccurate
10Alternative Explanations for Observed Group
Differences
- Observed group mean differences in a measure can
be due to - culturally- or group-mediated differences in true
score (true differences) -- OR -- - bias - systematic differences between group
observed scores not attributable to true scores
11Bias - A Special Concern
- Measurement bias in any one group may make group
comparisons invalid - Bias can be due to group differences in
- the meaning of concepts or items
- the extent to which a measure represents a
concept - cognitive processes of responding
- use of response scales
- appropriateness of data collection methods
12Example of Effect of Biased Items
- 5 CES-D items administered to black and white men
- 1 item subject to differential item functioning
(bias) - 5-item scale including item suggested that black
men had higher levels of somatic symptoms than
white men (p lt .01) - 4-item scale excluding biased item showed no
differences between black and white men
S Gregorich, Med Care, 200644S78-S94.
13Bias or Systematic Difference?
- Bias refers to deviation from true score
- Cannot speak of a measure being biased in one
group compared to another w/o knowing true score - Preferred term differential item functioning
- Item (or measure) that has a different meaning in
one group than another
14Typical Sequence of Developing New Self-Report
Measures
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
15Typical Sequence of Developing New Self-Report
Measures
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
16Extra Steps in Sequence of Developing New
Self-Report Measures for Diverse Groups
Obtain perspectives of diverse groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
17Extra Steps in Sequence of Developing New
Self-Report Measures for Diverse Groups
Obtain perspectives of diverse groups
Develop concept
Create item pool
.. to reflect these perspectives
Pretest/revise
Field survey
Psychometric analyses
Final measures
18Extra Steps in Sequence of Developing New
Self-Report Measures for Diverse Groups
Obtain perspectives of diverse groups
Develop concept
Create item pool
.. to reflect these perspectives
.. in all diverse groups
Pretest/revise
Field survey
Psychometric analyses
Final measures
19Extra Steps in Sequence of Developing New
Self-Report Measures for Diverse Groups
Obtain perspectives of diverse groups
Develop concept
Create item pool
.. to reflect these perspectives
.. in all diverse groups
Pretest/revise
Field survey
.. in all diverse groups
Psychometric analyses
Final measures
20Extra Steps in Sequence of Developing New
Self-Report Measures for Diverse Groups
Obtain perspectives of diverse groups
Develop concept
Create item pool
.. to reflect these perspectives
.. in all diverse groups
Pretest/revise
Field survey
.. in all diverse groups
Measurement studies across groups
Psychometric analyses
Final measures
21Extra Steps in Sequence of Developing New
Self-Report Measures for Diverse Groups
Obtain perspectives of diverse groups
Develop concept
Create item pool
.. to reflect these perspectives
.. in all diverse groups
Pretest/revise
Field survey
.. in all diverse groups
If results are non-equivalent
Psychometric analyses
Final measures
22Measurement Adequacy vs. Measurement Equivalence
- Making group comparisons requires conceptual and
psychometric adequacy and equivalence - Adequacy - within a diverse group
- concepts are appropriate
- psychometric properties meet minimal criteria
- Equivalence - between diverse groups
- conceptual and psychometric properties are
comparable
23Why Not Use Culture-Specific Measures?
- Measurement goal - identify measures that can be
used across all groups, yet maintain sensitivity
to diversity and have minimal bias - Most health disparities studies require comparing
mean scores across diverse groups - need comparable measures
24Conceptual and Psychometric Adequacy and
Equivalence
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
25Left Side of Matrix Issues in a Single Group
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
26Ride Side of Matrix Issues in More Than One Group
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
27Conceptual Adequacy in One Group
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
28Conceptual Adequacy in One Group
- Is concept relevant, meaningful, and acceptable
to a diverse group? - Traditional research
- Conceptual adequacy simply defining a concept
- Mainstream population assumed
- Minority and health disparities research
- Mainstream concepts may be inadequate
- Concept should correspond to how a particular
group thinks about it
29Qualitative Approaches to Explore Conceptual
Adequacy in Diverse Groups
- Literature reviews
- ethnographic and anthropological
- In-depth interviews and focus groups
- discuss concepts, obtain their views
- Expert consultation from diverse groups
- review concept definitions
- rate relevance of items
30Example of Inadequate Concept
- Patient satisfaction typically conceptualized in
mainstream populations in terms of, e.g., - access, technical care, communication,
continuity, interpersonal style - In minority and low income groups, additional
relevant domains include, e.g., - discrimination by health professionals
- sensitivity to language barriers
31Method for Examining Conceptual Relevance
- Compiled set of 33 HRQL items spanning many
concepts - Assessed relevance to older African Americans
- After answering each question, asked how
relevant is this question to the way you think
about your health? - Response scale 0-10 scale with endpoints labeled
- Labels 0not at all relevant, 10extremely
relevant
Cunningham WE et al., Qual Life Res,
19998749-768.
32Results Conceptual Relevance
- Most relevant items
- Spirituality (3 items)
- Weight-related health (2 items)
- Hopefulness (1 item)
- Spirituality items
- importance of spirituality to well-being, level
of spirituality, being sick affected spirituality
33Results Conceptual Relevance
- Least relevant items
- Physical functioning
- Role limitations due to emotional problems
- All standard MOS measures ranked in the lower
2/3, including all SF12 items
34Conceptual Relevance of Spanish FACT-G
- Bilingual/bicultural expert panel reviewed all 28
items for relevance - One item had low cultural relevance to quality of
life - One concept was missing spirituality
- Developed new spirituality scale (FACIT-Sp) with
input from cancer patients, psychotherapists, and
religious experts - Sample item I worry about dying
Cella D et al. Med Care 1998 361407
35Psychometric Adequacy in One Group
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
36Psychometric Adequacy in any Group
- Minimal standards
- Sufficient variability
- Minimal missing data
- Adequate reliability/reproducibility
- Evidence of construct validity
- Evidence of responsiveness to change
- Basic classical test theory approach
37Evidence of Psychometric Inadequacy of Measure in
Various Diverse Groups
- SF-36 social functioning scale - internal
consistency reliability lt .70 in three different
samples - Chinese language, adults aged 55-96 years
- Japanese language, Japanese elders
- English, Pima Indians
Stewart AL Nápoles-Springer A, Med Care,
200038(9 Suppl)II-102
38Conceptual Equivalence Across Groups
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
39Conceptual Equivalence
- Is the concept relevant, familiar, acceptable to
all diverse groups being studied? - Is the concept defined the same way in all
groups? - all relevant domains included (none missing)
- interpreted similarly
- Is the concept appropriate for all diverse groups?
40Generic/Universal vs Group-Specific(Etic versus
Emic)
- Concepts unlikely to be defined exactly the same
way across diverse ethnic groups - Generic/universal (etic)
- features of a concept that are appropriate across
groups - Group-Specific (emic)
- idiosyncratic or culture-specific portions of a
concept
41Etic versus Emic (cont.)
- Goal in health disparities research on more than
one group - identify generic/universal portion of a concept
(could be entire concept) that can be applied
across all groups - For within-group analyses or studies
- the culture-specific portion is also relevant
- Same as examining conceptual adequacy within
one group
42Approaches Similar to Those for Conceptual
Adequacy
- Main difference Need to assure concept is
equivalent across groups - Additional criterion
- What do we mean by equivalent conceptually?
- Methods are poorly developed
43Obtain Perspective of All Diverse Groups on
Concept
Obtain perspectives of diverse groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
44Example Develop Concept of Interpersonal
Processes of Care
- Began with conceptual framework from literature
and psychometric studies of preliminary survey - IPC Version I Conceptual Framework
- Three major multi-dimensional categories
- Communication
- Decision-making
- Interpersonal Style
45IPC Version I Subdomains
- Communication
- Elicitation of concerns, explanations,
general clarity - Decision-making
- Involving patients in decisions
- Interpersonal Style
- Respectfulness, emotional support,
non-discrimination, cultural sensitivity
Stewart et al., Milbank Quarterly, 1999 77305
46Limitations of First IPC Framework
- Tested on small sample of 600 patients from San
Francisco General Hospital - Several hypothesized concepts were not confirmed
- e.g., cultural sensitivity
- Needed further development and validation on a
larger sample
47Developed Revised IPC Concept
IPC Version I frameworkin Milbank Quarterly
Draft IPC II conceptual framework
19 focus groups -African American, Latino,and
White adults
Literature review of quality of care in diverse
groups
48IPC-II Conceptual Framework
I. COMMUNICATION III. INTERPERSONAL
STYLE General clarity
Respectfulness Elicitation/responsiveness
Courteousness Explanations of
Perceived discrimination --processes,
condition, Emotional support
self-care, meds Cultural sensitivity
Empowerment II. DECISION
MAKING Responsive to patient
preferences Consider ability to
comply
49IPC-II Conceptual Framework (cont)
IV. OFFICE STAFF Respectfulness
Discrimination V. FOR LIMITED
ENGLISH PROFICIENCY PATIENTS MDs and
office staffs sensitivity to language
50Psychometric Equivalence
Conceptual
Concept equivalent across groups
Concept meaningful within one group
Adequacyin 1 Group
Equivalence Across Groups
Psychometric properties meet minimal
standards within one group
Psychometric properties invariant
(equivalent) across groups
Psychometric
51Psychometric Equivalence
- Measures have similar measurement properties in
all diverse group of interest in your study - e.g., English and Spanish language, African
Americans and Caucasians - Measures have similar measurement properties in
one diverse group as in original (mainstream)
groups on which the measures were developed
52Equivalence of Reliability?? No!
- Difficult to compare reliability because it
depends on the distribution of the construct in a
sample - Thus lower reliability in one group may simply
reflect poorer variability - More important is the adequacy of the reliability
in both groups - Reliability meets minimal criteria within each
group
53Equivalence of Criterion Validity
- Determine if hypothesized patterns of
associations with specified criteria are
confirmed in both groups, e.g. - a measure predicts utilization in both groups
- a cutpoint on a screening measure has the same
specificity and sensitivity in both groups
54Equivalence of Construct Validity
- Are hypothesized patterns of associations
confirmed in both groups? - Example Scores on the Spanish version of the
FACT had similar relationships with other health
measures as scores on the English version - Primarily tested through subjectively examining
pattern of correlations - Can test differences using confirmatory factor
analysis (e.g., through Structural Equation
Modeling)
55Item Equivalence
- Differential Item Functioning (DIF)
- Items are non-equivalent if they are
differentially related to the underlying trait - Equivalence indicated by no DIF
- Meaning of response categories is similar across
groups - Distance between response categories is similar
across groups
56Equivalence of Response Choices Spanish and
English Self-rated Health
- Excellent
- Very good
- Good
- Fair
- Poor
- Excelente
- Muy buena
- Buena
- Regular
- Mala
Regular in Spanish may be closer to good in
English, thus is not comparable to the meaning of
fair
57Spanish and English Self-rated Health Responses
- Excellent
- Very good
- Good
- Fair
- Poor
- Excelente
- Muy buena
- Buena
- Regular (Pasable?)
- Mala
Another choice, pasable, may be closerin
meaning to fair
58Methods for Identifying Differential Item
Functioning (DIF)
- Item Response Theory (IRT)
- Examines each item in relation to underlying
latent trait - Tests if responses to one item predict the
underlying latent score similarly in two groups - if not, items have differential item functioning
59Equivalence of Factor Structure
- Factor structure is similar in new group to
structure in original groups in which measure was
tested - In other words, the measurement model is the same
across groups - Methods
- Specify the number of factors you are looking for
- Determine if the hypothesized model fits the data
60Confirmatory Factor Analysis (CFA)
- Can specify a hypothesized structure a priori
- Can test mean and covariance structures
- to estimate bias
61Equivalence of Factor Structure Testing
Psychometric Invariance
- Psychometric invariance (equivalence)
- Important properties of theoretically-based
factor structure (measurement model) do not vary
across groups (are invariant) - measurement model is the same across groups
- Empirical comparison across groups
- Not simply by examination
62Criteria for Psychometric Invariance
- Across all groups a sequential process
- Same number of factors or dimensions
- Same items on same factors
- Same factor loadings
- No bias on any item across groups
- Same residuals on items
- No item or scale bias AND same residuals
63Criteria for Evaluating Invariance Across Groups
Technical Terms
Dimensional Invariance Same number of factors
Configural Invariance Same items load on same
factors
Metric or Factor Pattern Invariance Items have
same loadings on same factors
Scalar or Strong Factorial Invariance Observed
scores are unbiased
Residual Invariance Observed item and factor
variances are unbiased
Strict Factorial Invariance Both scalar and
residual criteria are met
64Interpersonal Processes of Care (IPC)
- Social-psychological aspects of the
patient-physician interaction - communication, respectfulness, patient-centered
decision-making, and being sensitive to patients
needs - Developed survey of 92 items based on principles
outlined above
65Conducted Survey
- From over 16,000 primary care patients, randomly
sampled those who - Made at least one visit in prior 12 months
- Records indicated they were African American,
Latino, or White (Caucasian) - Sampled within race/ethnic group
66Sample Size (N1,664)
- 383 Spanish speaking Latino
- 435 African American
- 428 English speaking Latino
- 418 White
67Results
- Of the 92 items, 29 had similar factor structure
across all 4 groups - achieved metric invariance
68Results Metric Invariance Across 4 Groups for 29
Items
Dimensional Invariance Same number of factors
Configural Invariance Same items load on same
factors
Metric or Factor Pattern Invariance Items have
same loadings on same factors
Strong Factorial or ScalarInvariance Observed
scores are unbiased
Residual Invariance Observed item and factor
variances can be compared across groups
Strict Factorial Invariance Both scalar
invariance and residual invariance criteria are
met
69Seven Metric Invariant Scales (29 items)
I. COMMUNICATION Hurried
communication Elicited concerns,
responded Explained results, medications
II. DECISION MAKING
Patient-centered decision-making III.
INTERPERSONAL STYLE Compassionate,
respectful Discriminated
Disrespectful office staff
70Continued Exploration of Invariance Item Bias
- Tested invariance of model parameter estimates
across groups for scalar invariance - Bias in items
71Obtained Partial Scalar Invariance Across 4
Groups for 18 Items
Dimensional Invariance Same number of factors
Configural Invariance Same items load on same
factors
Metric or Factor Pattern Invariance Items have
same loadings on same factors
Strong Factorial or ScalarInvariance Observed
scores are unbiased
Residual Invariance Observed item and factor
variances can be compared across groups
Strict Factorial Invariance Both scalar
invariance and residual invariance criteria are
met
72Seven Scalar Invariant (Unbiased) Scales (18
items)
I. COMMUNICATION Hurried communication
lack of clarity Elicited concerns,
responded Explained results, medications
explained results II. DECISION MAKING
Patient-centered decision-making decided
together III. INTERPERSONAL STYLE
Compassionate, respectful(subset) compassionate,
respectful Discriminated discriminated
due to race/ethnicity Disrespectful office
staff
73What to do if Measures Are Not Equivalent in a
Specific Study Comparing Groups
- Need guidelines for how to handle data when
substantial non-comparability is found in a study - Drop bad or biased items from scores
- Compare results with and without biased items
- Analyze study by stratifying diverse groups
- The current challenge for measurement in minority
health studies
74Example 20-item Spanish CES-D in Older Latinos
- 2 items had very low item-scale correlations,
high rates of missing data in two studies - I felt hopeful about the future
- I felt I was just as good as other people
- 20-item version Study 1 Study 2
- Item-scale correlations -.20 to .73 .05 to
.78 - Cronbachs alpha
- 18-item version
- Item-scale correlations .45 to .76 .33 to
.79
75Example Measure Can be Modified
- GHAA Consumer Satisfaction Survey
- Adapted to be appropriate for African American
patients - Focus groups conducted to obtain perspectives of
African Americans - New domains added (e.g., discrimination/
stereotyping) - New items added to existing domains
Fongwa M et al. Ethnicity and Disease,
200616948-955.
76Approaches to Conducting Studies When You Are Not
Sure
- Use a combination of universal and group-
specific items - use universal items to compare across groups
- use specific items (added onto universal items)
when conducting analyses within one group - To find a variable that correlates with a health
measure within one group
77Conclusions
- Measurement in health disparities and minority
health research is a relatively new field - few
guidelines - Encourage first steps - test and report adequacy
and equivalence - As evidence grows, concepts and measures that
work better across diverse groups will be
identified
78Two Special Journal Issues on Measurement in
Diverse Populations
- Measurement in older ethnically diverse
populations - J Mental Health Aging, Vol 7, Spring 2001
- Measurement in a multi-ethnic society
- Med Care, Vol 44, November 2006
79Homework for Class 3
- Using the same template and measure you reviewed
for class 2, complete sections 14-21 - Use same file and submit entire document by email
to anita.stewart_at_ucsf.edu