Assessing Intervention Fidelity in RCTs: Concepts and Methods - PowerPoint PPT Presentation

About This Presentation

Title:

Assessing Intervention Fidelity in RCTs: Concepts and Methods

Description:

Conceptual foundation for assessing fidelity in RCTs, a special case. ... Little consensus on what is meant by the term 'intervention fidelity' ... – PowerPoint PPT presentation

Number of Views:115

Avg rating:3.0/5.0

Slides: 55

Provided by: drdavid83

Learn more at: https://ies.ed.gov

Category:

more less

Transcript and Presenter's Notes

Title: Assessing Intervention Fidelity in RCTs: Concepts and Methods

1
Assessing Intervention Fidelity in RCTs Concepts
and Methods

Panelists
David S. Cordray, PhD
Chris Hulleman, PhD
Joy Lesnick, PhD
Vanderbilt University
Presentation for the IES Research Conference
Washington, DC
June 12, 2008

2
Overview

Session planned as an integrated set of
presentations
Well begin with
Definitions and distinctions
Conceptual foundation for assessing fidelity in
RCTs, a special case.
Two examples of assessing implementation
fidelity
Chris Hulleman will illustrate an assessment for
an intervention with a single core component
Joy Lesnick illustrates additional consideration
when fidelity assessment is applied to
intervention models with multiple program
components.
Issues for the future
Questions and discussion

3
Definitions and Distinctions
4
Dimensions Intervention Fidelity

Little consensus on what is meant by the term
intervention fidelity.
But Dane Schneider (1998) identify 5 aspects
Adherence/compliance program components are
delivered/used/received, as prescribed
Exposure amount of program content
delivered/received by participants
Quality of the delivery theory-based ideal in
terms of processes and content
Participant responsiveness engagement of the
participants and
Program differentiation unique features of the
intervention are distinguishable from other
programs (including the counterfactual)

5
Distinguishing Implementation Assessment from
Implementation Fidelity Assessment

Two models of intervention implementation, based
on
A purely descriptive model
Answering the question What transpired as the
intervention was put in place (implemented).
An a priori intervention model, with explicit
expectations about implementation of core program
components.
Fidelity is the extent to which the realized
intervention (tTx) is faithful to the
pre-stated intervention model (TTx)
Fidelity TTx tTx
We emphasize this model

6
What to Measure?

Adherence to the intervention model
(1) Essential or core components (activities,
processes)
(2) Necessary, but not unique to the
theory/model, activities, processes and
structures (supporting the essential components
of T) and
(3) Ordinary features of the setting (shared with
the counterfactual groups (C)
Essential/core and Necessary components are
priority parts of fidelity assessment.

7
An Example of Core Components Bransfords HPL
Model of Learning and Instruction

John Bransford et al. (1999) postulate that a
strong learning environment entails a combination
of
Knowledge-centered
Learner-centered
Assessment-centered and
Community-centered components.
Alene Harris developed an observation system (the
VOS) that registered novel (components above) and
traditional pedagogy in classes.
The next slide focuses on the prevalence of
Bransfords recommended pedagogy.

8
Challenge-based Instruction in Treatment and
Control Courses The VaNTH Observation System
(VOS)
Percentage of Course Time Using Challenge-based
Instructional Strategies
Adapted from Cox Cordray, in press
9
Implications

Fidelity can be assessed even when there is no
known benchmark (e.g., 10 Commandments)
In practice interventions can be a mixture of
components with strong, weak or no benchmarks
Control conditions can include core intervention
components due to
Contamination
Business as usual (BAU) contains shared
components, different levels
Similar theories, models of action
But to index fidelity, we need to measure
components within the control condition

10
Linking Intervention Fidelity Assessment to
Contemporary Models of Causality

Rubins Causal Model
True causal effect of X is (YiTx YiC)
RCT methodology is the best approximation to the
true effect
Fidelity assessment within RCT-based causal
analysis entails examining the difference between
causal components in the intervention and
counterfactual condition.
Differencing causal conditions can be
characterized as achieved relative strength of
the contrast.
Achieved Relative Strength (ARS) tTx tC
ARS is a default index of fidelity

11
Achieved Relative Strength .15
Expected Relative Strength .25
12
In Practice.

Identify core components in both groups
e.g., via a Model of Change
Establish bench marks for TTX and TC
Measure core components to derive tTx and tC
e.g., via a Logic model based on Model of
Change
With multiple components and multiple methods of
assessment achieved relative strength needs to
be
Standardized, and
Combined across
Multiple indicators
Multiple components
Multiple levels (HLM-wise)
We turn to our examples.

13
Assessing Implementation Fidelity in the Lab and
in Classrooms The Case of a Motivation
Intervention

Chris S. Hulleman
Vanderbilt University

14
The Theory of Change
INTEREST
PERCEIVED UTILITY VALUE
MANIPULATED RELEVANCE
PERFORMANCE
Adapted from Hulleman (2008) Hulleman, Godes,
Hendricks, Harackiewicz (2008) Hulleman
Harackiewicz (2008) Hulleman, Hendricks,
Harackiewicz (2007) Eccles et al. (1983)
Wigfield Eccles (2002) Hulleman et al. (2008)
15
Methods
Laboratory Classroom
Sample N 107 undergraduates N 182 ninth-graders 13 classes 8 teachers 3 high schools
Task Mental Multiplication Technique Biology, Physical Science, Physics
Treatment manipulation Write about how the mental math technique is relevant to your life. Pick a topic from science class and write about how it relates to your life.
Control manipulation Write a description of a picture from the learning notebook. Pick a topic from science class and write a summary of what you have learned.
Number of manipulations 1 2 8
Length of Study 1 hour 1 semester
Dependent Variable Perceived Utility Value Perceived Utility Value
16
Motivational Outcome
?
g 0.05 (p .67)
17
Fidelity Measurement and Achieved Relative
Strength

Simple intervention one core component
Intervention fidelity
Defined as quality of participant
responsiveness
Rated on scale from 0 (none) to 3 (high)
2 independent raters, 88 agreement

18
Quality of Responsiveness
Laboratory Laboratory Laboratory Laboratory Classroom Classroom Classroom Classroom
C C Tx Tx C C Tx Tx
Quality of Responsiveness N N N N
0 47 100 7 11 86 96 38 41
1 0 0 15 24 4 4 40 43
2 0 0 29 46 0 0 14 15
3 0 0 12 19 0 0 0 0
Total 47 100 63 100 90 100 92 100
Mean 0.00 0.00 1.73 1.73 0.04 0.04 0.74 0.74
SD 0.00 0.00 0.90 0.90 0.21 0.21 0.71 0.71
19
Indexing Fidelity

Absolute
Compare observed fidelity (tTx) to absolute or
maximum level of fidelity (TTx)
Average
Mean levels of observed fidelity (tTx)
Binary
Yes/No treatment receipt based on fidelity scores
Requires selection of cut-off value

20
Fidelity Indices
Conceptual Laboratory Classroom
Absolute Tx
C
Average Tx 1.73 0.74
C 0.00 0.04
Binary Tx
C
21
Indexing Fidelity as Achieved Relative Strength

Intervention Strength Treatment Control
Achieved Relative Strength (ARS) Index
Standardized difference in fidelity index across
Tx and C
Based on Hedges g (Hedges, 2007)
Corrected for clustering in the classroom (ICCs
from .01 to .08)

22
Average ARS Index
Group Difference
Sample Size Adjustment
Clustering Adjustment

Where,
mean for group 1 (tTx )
mean for group 2 (tC)
ST pooled within groups standard deviation
nTx treatment sample size
nC control sample size
n average cluster size
p Intra-class correlation (ICC)
N total sample size

23
Absolute and Binary ARS Indices
Group Difference
Sample Size Adjustment
Clustering Adjustment

Where,
pTx proportion for the treatment group (tTx )
pC proportion for the control group (tC)
nTx treatment sample size
nC control sample size
n average cluster size
p Intra-class correlation (ICC)
N total sample size

24
Average ARS Index
Treatment Strength
100 66 33 0
3 2 1 0
TTx
Infidelity
t tx
(0.74)-(0.04) 0.70
tC
Infidelity
TC
25
Achieved Relative Strength Indices
Observed Fidelity Observed Fidelity Lab vs. Class Contrasts
Lab Class Lab - Class
Absolute Tx 0.58 0.25
C 0.00 0.01
g 1.72 0.80 0.92
Average Tx 1.73 0.74
C 0.00 0.04
g 2.52 1.32 1.20
Binary Tx 0.65 0.15
C 0.00 0.00
g 1.88 0.80 1.08
26
Linking Achieved Relative Strength to Outcomes
27
Sources of Infidelity in the Classroom

Student behaviors were nested within teacher
behaviors
Teacher dosage
Frequency of responsiveness
Student and teacher behaviors were used to
predict treatment fidelity (i.e., quality of
responsiveness).

28
Sources of Infidelity Multi-level Analyses

Part I Baseline Analyses
Identified the amount of residual variability in
fidelity due to students and teachers.
Due to missing data, we estimated a 2-level model
(153 students, 6 teachers)
Student Yij b0j b1j(TREATMENT)ij rij,
Teacher b0j ?00 u0j,
b1j ?10 u10j

29
Sources of Infidelity Multi-level Analyses

Part II Explanatory Analyses
Predicted residual variability in fidelity
(quality of responsiveness) with frequency of
responsiveness and teacher dosage
Student Yij b0j b1(TREATMENT)ij
b2(RESPONSE FREQUENCY)ij rij
Teacher b0j ?00 u0j
b1j ?10 b10(TEACHER DOSAGE)j u10j
b2j ?20 b20(TEACHER DOSAGE)j u20j

30
Sources of Infidelity Multi-level Analyses
Baseline Model Baseline Model Explanatory Model Explanatory Model
Variance Component Residual Variance of Total Variance Reduction
Level 1 (Student) 0.15437 52 0.15346 lt 1
Level 2 (Teacher) 0.13971 48 0.04924 65
Total 0.29408 0.20270
p lt .001.
31
Case Summary

The motivational intervention was more effective
in the lab (g 0.45) than field (g 0.05).
Using 3 indices of fidelity and, in turn,
achieved relative treatment strength, revealed
that
Classroom fidelity lt Lab fidelity
Achieved relative strength was about 1 SD less in
the classroom than the laboratory
Differences in achieved relative strength
differences motivational outcome, especially in
the lab.
Sources of fidelity teacher (not student) factors

32
Assessing Fidelity of Interventions with
Multiple Components A Case of Assessing
Preschool Interventions

Joy Lesnick

33
What Do We Mean By Multiple Components in
Preschool Literacy Programs?

How do you define preschool instruction?
Academic content, materials, student-teacher
interactions, student-student interactions,
physical development, schedules routines,
assessment, family involvement, etc. etc.
How would you measure implementation?
Preschool Interventions
Are made up of components (e.g., sets of
activities and processes) that can be thought of
as constructs
These constructs vary in meaning, across actors
(e.g., developers, implementers, researchers)
They are of varying levels of importance within
the intervention and
These constructs are made up of smaller parts
that need to be assessed.
Multiple components makes assessing fidelity more
challenging

33
34
Overview

Four areas of consideration when assessing
fidelity of programs with multiple components
Specifying Multiple Components
Major Variations in Program Components
The ABCs of Item and Scale Construction
Aggregating Indices
One caveat Very unusual circumstances
Goal of this work
To build on the extensive evaluation work that
had already been completed and use the case study
to provide a framework for future efforts to
measure fidelity of implementation.

34
35
1. Specifying Multiple Components

Our Process
Extensive review of program materials
Potentially hundreds of components
How many indicators do we need to assess
fidelity?

35
36
1. Specifying Multiple Components
36
Constructs Sub-Constructs Facets
Elements Indicators
37
Grain Size is Important

Conceptual differences between programs may
happen at micro-levels
Empirical differences between program
implementation may happen at more macro levels
Theoretically expected differences vs.
empirically observed differences
Must identify conceptual differences between
programs at the smallest grain size at the
outset, although may be able to detect empirical
differences once implemented at higher macro
levels

37
38
2. Major Variations in Program Components

One program often has some combination of these
different types of components
Scripted (highly structured) activities
Unscripted (unstructured) activities
Nesting of activities
Micro-level (discrete) activities
Macro-level (extended) activities
What youre trying to measure will influence how
to measure it -- and how often it needs to be
measured.

38
39
2. Major Variations in Program Components
Type of Program Component Example from the Case Study Implications Abs Avg Bin ARS
Scripted (highly structured) activities In the first treatment condition, four scripted literacy circles are required. There is known criteria for assessing fidelity. Fidelity is the difference between the expected and observed values TTx tTx Yes Yes ? Yes
Unscripted (unstructured) activities In the second treatment condition, literacy circles are required, but the specific content of those group meetings is not specified. There is unknown criteria for assessing fidelity. We can only record what was done, or in comparison to control tTx No? Yes? ? Yes
Abs Absolute Fidelity Index what happened as
compared to what should have happened highest
standard Avg Magnitude or exposure level
indicates what happened, but its not very
meaningful how do we know if level is good or
bad? Bin Binary Complier Can we set a
benchmark to determine whether or not program
component was successfully implemented? gt30 for
example? Is that realistic? Meaningful? ARS
Difference in magnitude between Tx and C
relative strength is there enough difference to
warrant a treatment effect?
39
40
Dots under a microscope what is it???
41
Starry Night, Vincent Van Gogh, 1889
42
We must measure the trees and also the forest

Micro-level (discrete) activities
Depending on the condition, daily activities
(i.e. whole group time, small group time, center
activities) may be scripted or unscripted and
take place within larger structure of theme under
study.
Macro-level (extended) activities
Month long thematic unit (is structured in
treatment condition and unstructured in control)
is underlying extended structure within which
scripted or unscripted micro activities take
place.
In multi-component programs, many activities are
nested within larger activity structures. This
nesting has implications for fidelity analysis
what to measure and how to measure it.

42
43
3. The ABCs of Item and Scale Construction

Aim for one-to-one correspondence of indicators
to component of interest
Balance items across components
Coverage and quality are more important than the
quantity of items

43
44
Aim for one-to-one correspondence
Diff bw T C (Oral Lang) T 1.80 (0.32) C
1.36 (0.32) ARS ES 1.38 T 3.45 (0.87) C
2.26 (0.57) ARS ES 1.62

Example of more than one component being assessed
in one item
Does the teacher Talk with children throughout
the day, modeling correct grammar, teaching new
vocabulary, and asking questions to encourage
children to express their ideas in words?
(Yes/No)
Example of one component being measured in each
item
Teacher provides an environment wherein students
can talk about what they are doing.
Teacher listens attentively to students
discussions and responses.
Teacher models and/or encourages students to ask
questions during class discussions.

Data for the case study comes from an evaluation
conducted by Dale Farran, Mark Lipsey,
Carol Blibrey, et al.
44
45
Balance items across components
Literacy Content items a
Oral language 20 0.95
Language, comprehension, and response to text 7 0.70
Book and print awareness 2 0.80
Phonemic awareness 3 0.68
Letter and word recognition 7 0.76
Writing 6 0.67
Literacy Processes
Thematic Studies 4 0.62
Structured Literacy Circles 2 0.62

How many items are needed for each scale?
Oral-language over represented
Scales with alt0.80 not reliable

45
46
Coverage and quality more important than quantity
Literacy Content items a
Oral language 20 0.95
Language, comprehension, and response to text 7 0.70
Book and print awareness 2 0.80
Phonemic awareness 3 0.68
Letter and word recognition 7 0.76
Writing 6 0.67
Literacy Processes
Thematic Studies 4 0.62
Structured Literacy Circles 2 0.62

Two scales each have 2 items, but very different
levels of reliability
How many items are needed for each scale?
Oral Language 20 items. Randomly selected
items and recalculated alpha
10 items a 0.92
8 items a 0.90
6 items a 0.88
5 items a 0.82
4 items a 0.73

46
47
Aggregating Indices

To weight or not to weight? How do we decide?
Possibilities
Theory
Consensus
spent
Time spent
Case study example 2 levels of aggregation
within and between
Unit-weight within facet Instruction Content
Literacy
Hypothetical weight across sub-construct
Instruction Content

47
48
YOU ARE HERE.
UNIT WEIGHT
THEORY WEIGHT
HOW WEIGHT?
HOW WEIGHT?
48
49
Aggregating Indices

Unit-weight within facet Instruction Content
Literacy

Literacy Content Average Fidelity Index Tx Average Fidelity Index C Absolute Fidelity Index Tx Absolute Fidelity Index C Achieved Relative Strength Fidelity Index (Average) Achieved Relative Strength Fidelity Index (Absolute)
Oral language 1.82 1.40 91 70 1.36 0.53
Language, comprehension, and response to text 1.74 1.37 87 69 1.45 0.44
Book and print awareness 1.91 1.39 96 70 1.38 0.73
Phonemic awareness 1.73 1.48 87 74 0.74 0.32
Letter and word recognition 1.75 1.36 88 68 1.91 0.50
Writing 1.68 1.37 84 69 1.22 0.34
Average unit weighting 1.77 1.38 89 75 1.34 0.48
49
clustering is ignored
50
Aggregating Indices

Theory-weight across sub-construct (hypothetical)

Instruction - Content Treatment Control Hypothetical Weight
Literacy 1.77 1.38 40
Math 1.51 1.80 5
Social and Personal Development 1.79 1.58 35
Scientific Thinking 1.57 1.71 5
Social Studies 1.84 1.41 5
Creative Arts 1.66 1.32 5
Physical Development 1.45 1.50 3
Technology 1.45 1.57 2
100
Unweighted Average 1.63 1.53
Weighted Average 1.74 1.49
51
YOU ARE HERE
UNIT WEIGHT
THEORY WEIGHT
HOW WEIGHT?
HOW WEIGHT?
51
52
Key Points and Future Issues

Identifying and measuring, at a minimum, should
include model-based core and necessary
components
Collaborations among researchers, developers and
implementers is essential for specifying
Intervention models
Core and essential components
Benchmarks for TTx (e.g., an educationally
meaningful dose what level of X is needed to
instigate change) and
Tolerable adaptation

53
Points and Issues

Fidelity assessment serves two roles
Average causal difference between conditions and
Using fidelity measures to assess the effects of
variation in implementation on outcomes.
Should minimize infidelity and weak ARS
Pre-experimental assessment of TTx in the
counterfactual conditionIs TTx gt TC?
Build operational models with positive
implementation drivers
Post-experimental (re)specification of the
intervention For example
MAPARS .3(planned prof.development).6(planned
use of data for differentiated instruction)

54
Points and Issues

What does an ARS of 1.20 mean?
We need experience and a normative framework
Cohen defined a small effect on outcomes as 0.20
medium as 0.50, and large as 0.80
Overtime this may emerge for ARS

Write a Comment

User Comments (0)