Title: PowerPoint-Pr
1- Opportunities and Challenges for Developing and
Evaluating Diagnostic Assessments in STEM
Education - A Modern Psychometric Perspective
André A. Rupp, EDMS Department, University of
Maryland
2- Toward a Definition of Diagnostic Assessment
Systems
3Proposed Panel Definition
- The term "diagnostic comes from a combination of
dia, to split apart, and gnosi, to learn, or
knowledge. We use diagnostic assessment
(system) to refer to assessment processes based
on an explicit cognitive model, itself supported
by empirical study, of proficient reasoning in a
particular domain. - The cognitive model must support delineation of
students and / or teachers strengths and
weaknesses that can be traced as they move from
less to more proficient reasoning in the domain.
The principled assessment design process should
specify how observed behaviors are used to make
inferences about what students or teachers know
as they progress. We believe that diagnostic
assessment has the potential to inform and assess
the outcomes of instruction.
4Conceptualization of Problem Space
from Stevens, Beal, Sprang (2009)
5- Toward an Understanding of Frameworks Models
6The Evidence-centered Design Framework
- adapted from Mislevy, Steinberg, Almond, Lukas
(2006)
7Frameworks vs. Models
- A principled assessment design framework for
diagnostic assessment such as evidence-centered
design is NOT a model. It does NOT prescribe a
particular statistical modeling approach. - A statistical / psychometric model is a
mathematical tool that plays a supporting role
for generating evidence-based narratives about
students and / or teachers strenghts and
weaknesses. Its parameters do NOT have inherent
meanings. - A cognitive model for diagnostic assessment is
a theory and data-driven description of how
emergent understandings and misconceptions in a
domain develop and how these can be traced back
to unobservable cognitive underpinnings. It does
NOT prescribe a singular assessment approach.
8- Evidence-based Reasoning for Traditional
Assessments
9Traditional Construct Operationalization
Construct
Construct
Construct
Theoretical Realm
Empirical Realm
10Feedback Utility (Part I Scoring Card)
11Feedback Utility (Part II Simple Progress
Mapping)
12- Evidence-based Reasoning for Modern Assessments
13Complex Assessment Tasks for Diagnosis (Part I)
from Seeratan Mislevy (2008)
14Complex Assessment Tasks for Diagnosis (Example
II)
from Behrens et al. (2009)
15Evidence Identification, Aggregation, Synthesis
from Stevens, Beal, Sprang (2009)
16Proficiency Pathways
from Stevens, Beal, Sprang (2009)
17Interventional Pathways
from Stevens, Beal, Sprang (2009)
18- Selected Statistical Tools for Evidence-based
Reasoning
19Selected Modeling Approaches for Diagnostic
Assessments
- Approaches Resulting in Continuous Proficiency
Scales - Unidimensional explanatory IRT or FA models
(e.g., de Boeck Wilson, 2004) -
- 2. Multidimensional CTT sumscores (e.g.,
Henson, Templin, Douglas, 2007) - Multidimensional explanatory IRT or FA models
(e.g., Reckase, 2009) - Structural equation models (e.g., Kline, 2010)
- Approaches Resulting in Classifications of
Respondents based on Discrete Scales - 1. Bayesian inference networks (e.g., Almond,
Williamson, Mislevy, Yan, in press) - Parametric diagnostic classification models
(e.g., Rupp, Templin, Henson, 2010) - Non- / Semi-parametric classification approaches
(e.g., Tatsuoka, 2009)
20Psychometric Tools for Diagnostic Assessments
- New frontiers of educational measurement
- 1. Educational data mining for simulation- /
games-based assessment - (e.g., Rupp et al., 2010 Soller Stevens,
2007 West et al., 2009) - 2. Diagnostic multiple-choice items /
selected-response items - (e.g., Briggs et al., 2006 de la Torre, 2009)
- 3. Computerized diagnostic adaptive assessment
- (e.g., Cheng, 2009 McGlohen Chang, 2008)
- Useful ideas from large-scale assessment
- 1. Modeling dependencies in nested response
data - (e.g., Jiao, von Davier, Wang, 2010 Wainer,
Bradlow, Wang, 2007) - 2. Item families / task variants automatic
test / form assembly - (e.g., Embretson Daniel, 2008 Geerlings,
Glas, van der Linden, in press)
21- Opportunities and Challenges for Developing and
Evaluating Diagnostic Assessments in STEM
Education - A Modern Psychometric Perspective
André A. Rupp EDMS Department, University of
Maryland 1230-A Benjamin Building College Park,
MD 20742 Phone (301) 405 3623 E-mail
ruppandr_at_umd.edu
22References (Part I)
- Almond, R. G., Williamson, D. M., Mislevy, R. J.,
Yan, D. (in press). Bayes nets in educational
assessment. New York Springer. - Beaton, A. E., Allen, N. L. (1992).
Interpreting scales through scale anchoring.
Journal of Educational Statistics, 17, 191-204. - Borsboom, D., Mellenbergh, G. J. (2007). Test
validity in cognitive assessment. In J. P.
Leighton M. J. Gierl (Eds.), Cognitive
diagnostic assessment for education Theory and
applications (pp. 85118). Cambridge, UK
Cambridge University Press. - Briggs, D. C., Alonzo, A. C., Schwab, C.,
Wilson, M. (2006). Diagnostic assessment with
ordered multiple-choice items. Educational
Assessment, 11, 33-63. - Cheng, Y. (2009). When cognitive diagnosis meets
computerized adaptive testing CD-CAT.
Psychometrika, 74, 619-632. - de Boeck, P., Wilson, M. (2004). Explanatory
item response theory models A generalized linear
and nonlinear approach. New York Springer. - de la Torre, J. (2009). A cognitive diagnosis
model for cognitively based multiple-choice
options. Applied Psychological Measurement, 33,
163-183. - Embretson, S. E., Daniel, R. C. (2008).
Understanding and quantifying cognitive
complexity level in mathematical problem-solving
items. Psychology Science Quarterly, 50, 328-344. - Frey, A., Hartig, J., Rupp, A. A. (2009). An
NCME instructional module on booklet designs in
large-scale assessments of student achievement.
Educational Measurement Issues and Practice,
28(3), 39-53. - Geerlings, H., Glas, C. A. W., van der Linden,
W. (in press). Modeling rule-based item
generation. Psychometrika.
23References (Part II)
- Gomez, P. G., Noah, A., Schedl, M., Wright, C.,
Yolkut, A. (2007). Proficiency descriptors based
on a scale-anchoring study of the new TOEFL iBT
reading test. Language Testing, 24, 417-444. - Haberman, S., Sinharay, S. (2010). Reporting of
subscores using multidimensional item response
theory. Psychometrika, 75, 209-227. - Haberman, S., Sinharay, S., Puhan, G. (2009).
Reporting subscores for institutions. British
Journal of Mathematical and Statistical
Psychology, 62, 79-95. - Jiao, H., von Davier, M., Wang, S. (2010,
April). Polytomous mixture Rasch testlet model.
Presented at the annual meeting of the National
Council for Measurement in Education, Denver, CO. - Kane, M. T. (2006). Validation. In R L. Brennan
(Ed.), Educational measurement (4th ed., pp.
1764). Portsmouth, NH Greenwood. - Kline, R. (2010). Principles and practice of
structural equation modeling (2nd ed.). New York
Guilford Press. - Leighton, J., Gierl, M. (2007). Cognitive
diagnostic assessment for education Theory and
applications. Cambridge, UK Cambridge University
Press. - McGlohen, M., Chang, H.-H. (2008). Combining
computer adaptive testing technology with
cognitively diagnostic assessment. Behavior
Research Methods, 40, 808-821. - Messick, S. (1995). Validity of psychological
assessment Validation of inferences from
persons responses and performances as scientific
inquiry into score meaning. American
Psychologist, 50, 741749. - Mislevy, R. J., Steinberg, L. S., Almond, R. G.,
Lukas, J. F. (2006). Concepts, terminology, and
basic models of evidence-centered design. In D.
M. Williamson, I. I. Bejar, R. J. Mislevy
(Eds.), Automated scoring of complex tasks in
computer-based testing (pp. 1548). Mahwah, NJ
Erlbaum.
24References (Part III)
- Nugent, R., Dean, N., Ayers, B. (2010, July).
Skill set profile clustering The empty K-means
algorithm with automatic specification of
starting cluster centers. Presented at the
International Educational Data Mining Conference,
Pittsburgh, PA. - Reckase, M. (2009). Multidimensional item
response theory. New York Springer. - Rupp, A. A., Templin, J., Henson, R. A. (2010).
Diagnostic measurement Theory, methods, and
applications. New York Guildford Press. - Rupp, A. A., Gushta, M., Mislevy, R. J.,
Shaffer, D. W. (2010). Evidence-centered design
of epistemic games Measurement principles for
complex learning environments. Journal of
Technology, Learning, Assessment, 8(4).
Available online at http//escholarship.bc.edu/jtl
a/vol8/4/ - Rutkowski, L., Gonzalez, E., Joncas, M., von
Davier, M. (2010). International large-scale
assessment data Issues in secondary analysis and
reporting. Educational Researcher, 39, 142-151.
Tatsuoka, K. K. (2009). Cognitive assessment An
introduction to the rule-space method. Florence,
KY Routledge. - Stevens, R., Beal, C., Sprang, M. (2009,
August). Developing versatile automated
assessments of scientific problem-solving.
Presented at the NSF conference on games- and
simulation-based assessment, Washington, DC. - Templin, J., Henson, R. (2009, April).
Practical issues in using diagnostic estimates
Measuring the reliability and validity of
diagnostic estimates. Presented at the annual
meeting of the National Council of Measurement in
Education, San Diego, CA. - Wainer, H., Bradlow, E. T., Wang, X. (2007).
Testlet response theory and its applications. New
York Cambridge University Press. - West, P., Rutstein, D. W., Mislevy, R. J., Liu,
J., Levy, R., DiCerbo, K. E., et al. (2009,
June). A Bayes net approach to modeling learning
progressions and task performances. Paper
presented at the Learning Progressions in Science
conference, Iowa City, IA.