Title: Computerized Adaptive Testing
1Reducing the duration and cost of assessment with
the GAIN Computer Adaptive Testing
2Evidence-Based Practice
- Requires accurate diagnosis, treatment placement,
and outcomes monitoring - Assessment over a wide range of domains
- The cost of evidence-based assessment is
- Time
- Respondent Burden
- Increased staff resources (including training
3Improving Efficiency
- The use of screeners and short-form instruments
has significantly improved the efficiency of the
assessment process - Can help determine whether a full assessment is
warranted - But not a substitute for a full assessment
- Lack of precision
- Floor and ceiling effects
- Limited content validity
4Computerized Adaptive Testing
- Selects items from a large bank of items based on
the responses made to previous items. - Continues to select and administer items until
sufficient measurement precision is obtained. - Combines the precision and comprehensiveness of a
full assessment with the efficiency of a screener.
5CAT Process
Typical Pattern of Responses
Increased Difficulty
- Score is calculated and the next best item is
selected based on item difficulty
Middle Difficulty
/- 1 Std. Error
Decreased Difficulty
Correct
Incorrect
6CAT in Clinical Assessment
7CAT in Clinical Assessment Issues
- Triage of individuals to support clinical
decision making
- Measurement of multiple clinical dimensions and
subdimensions
- Persons with atypical presentation of symptoms
- Generalizability of assessment to various groups
8Clinical Decision Making
- How severe are the symptoms?
- What type of treatment is most appropriate?
- Can CAT be used to answer these questions more
efficiently?
9Strategy
- Use CAT to place persons into low, moderate and
high levels of substance abuse and dependency. - Starting Rules
- Using screener measures to set the initial
measure and select the first item - Variable Stop Rules
- Tight precision around cut points
- Less precision away from cut points
10CAT Standard Error
11Results
- CAT to full-measure correlations ranged from .87
to .99 - Classification of persons into treatment groups
based on CAT and full measure (kappa
coefficients) ranged from .66 to .71. - Screener starting rule improved CAT efficiency by
7 percent - Variable stop rules improved efficiency by 15-38
12Measuring Multiple Dimensions
13Assessment on Multiple Dimensions
- Instruments often measure multiple domains
- In CAT, treating a multi-domain measure as
measuring one domain is problematic - Some subdimensions may not be adequately measured
14Strategy Content Balancing
- Set an item quota for each subscale
- Maximum number of subscale items to administer
during the CAT - An item is selected if
- Its subscale quota has not been met
- Provides maximum information
15Content Balancing Procedures
16Percentage of Items Administered by Subscale
17Cont. Balancing CAT to Full IMDS Correlations
18Identifying Persons with Atypical Presentation of
Symptoms
19Overview
- Implications Clients sometimes endorse severe
clinical symptoms that are not reflected by
overall scores on standard assessments. - Statistics that can detect atypical presentation
of symptoms have important clinical implications.
- Strategy Identify fit statistics sensitive to
atypical presentation in a CAT context
20Rasch Fit Statistics
- Fit statistics are used to test particular
hypotheses. - Atypicalness Used to detect unexpected outlying,
off-target responses. Outlier sensitive - Example A person with a high level on the
measured trait misses an easy item. - Randomness Used to detect unexpected inlying,
targeted responses. - Both infit and outfit are chi-square statistics.
An infit or outfit value of 1.0 indicates perfect
fit to the Rasch model.
21Problems with Fit
22Clinical Implications of Misfit
- Our analyses indicate that there are subgroups
who endorse severe symptoms without endorsement
of milder symptoms. - Examples
- Atypical suicide
- Substance use withdrawal without dependence
23Atypicalness by Number of Items
24Content Balancing and Atypicalness
25Future Research
- Identify alternative fit statistics that are more
sensitive to atypical presentation of symptoms - Determine when it is likely that someone may be
present with atypical symptoms, and if so, select
items to confirm atypicalness.
26Generalizability of CAT to Various Groups
27Overview
- Persons at the same severity level may differ in
their endorsement of specific items. - This is called differential item functioning
(DIF) - On the GAIN, DIF has been detected by
- Age (adolescent vs. adult)
- Gender
- Ethnicity/Race
- Drug of choice
28DIF By GAIN Scale
29DIF and CAT
- The presence of DIF can limit our ability to
generalize measurement findings across different
groups. - Controlling for DIF becomes complicated as the
number of DIF items and groups/factors increases.
- Currently exploring a number of methods for
controlling DIF in CAT.
30Potential of CAT in Clinical Practice
- Reduce respondent burden
- Reduce staff resources
- Reduce data fragmentation
- Streamline complex assessment procedures
- Assist in clinical decision making
- Identify persons with atypical profiles
- Improve measurement generalizability
31Future Research
- How do we put it all together?
- Much of the research in the area of CAT has used
computer simulation. There is a need to test
working CAT systems in clinical practice.
32Contact Information
- A copy of this presentation will be at
www.chestnut.org/li/posters - For more information, please contact Barth Riley
at bbriley_at_chestnut.org