Empirical Methods to Evaluate the Instructional Sensitivity of Accountability Tests

1 / 31
About This Presentation
Title:

Empirical Methods to Evaluate the Instructional Sensitivity of Accountability Tests

Description:

Index 1 (A D) / N. the proportion of correct classification. Values range from 0 to 1 ... Index 1 is again applicable: (A D)/N. Variations ... –

Number of Views:115
Avg rating:3.0/5.0
Slides: 32
Provided by: wich7
Category:

less

Transcript and Presenter's Notes

Title: Empirical Methods to Evaluate the Instructional Sensitivity of Accountability Tests


1
Empirical Methods to Evaluate the Instructional
Sensitivityof Accountability Tests
  • Stephen C. Court
  • Director of Testing
  • Wichita Public Schools
  • Wichita, Kansas (USA)
  • Presented at
  • Association of Educational Assessment - Europe
  • 10th Annual Conference
  • Innovation in Assessment to meet changing needs
  • 5 - 7 November 2009
  • Valletta, Malta

2
Definition
  • Instructional sensitivity
  • The degree to which students performances on a
    test accurately reflect the quality of
    instruction specifically provided to promote
    students mastery of the knowledge and skills
    being assessed.
  • (Popham, 2007)

3
Instructional Insensitivity
  • Due to
  • socio-economic status
  • inherited academic aptitude
  • prior knowledge
  • misalignment of tested and taught content
  • flaws in test design or construction

4
Two Approaches
  • Judgmental strategies
  • Empirical studies

5
The basic question here today
  • What low-tech empirical methods can be employed
    to evaluate
  • the instructional sensitivity
  • of accountability tests?
  • low tech easy to compute, interpret, report,
    and explain

6
Design-wise
  • Generally speaking
  • Dependent variables
  • Item-level p-values
  • Item-level pass rates
  • Subscale or indicator mean scores
  • Subscale or indicator proficiency rates

7
Design-wise
  • Generally speaking
  • Independent variables
  • Different groups, same occasion
  • Same group, different occasions

8
Model 1 Different Groups
9
Index 1
  • (A D) / N
  • the proportion of correct classification
  • Values range from 0 to 1
  • (from Completely Insensitive to Totally
    Sensitive)
  • In practice
  • .50 chance
  • Values lt .50 are worse than guessing

10
Index 1 Equivalents
  • Index 1 is conceptually equivalent to
  • Mann-Whitney U
  • Wilcoxon statistic
  • Area Under the Curve (AUC) in Receiver Operating
    Characteristic (ROC) curve analysis

11
AUC Totally Sensitive
  • (50 50)/100 1.0

12
AUC Totally Sensitive
  • (90 10)/100 1.0

13
AUC Totally Insensitive
  • (00)/100 0.0

14
AUC Useless
  • (2525)/100 0.50

15
AUC Advantages
  • Decomposable into sensitivity and specificity
  • Sensitivity D / (BD)
  • Specificity C / (AC)
  • Easily graphed as
  • (Sensitivity) versus (1 Specificity)
  • Readily expandable to polytomous situations
  • Multiple test items in a subscale
  • Multiple subscales in a test
  • Multiple groups being tested

16
AUC Interpretation - Informal
  • Easily interpreted
  • .90-1.0 excellent (A)
  • .80-.90 good (B)
  • .70-.80 fair (C)
  • .60-.70 poor (D)
  • .50-.60 fail (F)
  • Note These are general guidelines exact
    interpretation may vary with context

17
AUC Interpretation - Formal
  • Hypothesis testing
  • Most statistical software packages e.g., SAS,
    SPSS - include a ROC procedure.
  • The area under the curve table displays
  • estimates of the area,
  • standard error of the area,
  • confidence limits for the area,
  • and the p-value of a hypothesis test.

18
Area Under Curve (AUC) - Graphed
  • Curve 1 .50 ? Pure chanceno better than random
    guess
  • Curve 3 is better than
  • Curve 2
  • Curve 4 1.0 ? Totally Sensitive ? completely
    accurate classification of effectively and
    less-effectively instructed students

19
ROC Curve Interpretation
  • Greater AUC values indicate greater separation
    between distributions
  • e.g., Most effective versus less effective
  • 1.0 complete separation that is, total
    sensitivity

20
ROC Curve Interpretation
  • AUC values close to .50 indicate no separation
    between distributions.
  • AUC .50 indicates
  • Complete overlap
  • No difference
  • Might as well guess

21
ROC Hypothesis Test
  • The null hypothesis true AUC .50 ? Is the
    item or indicator more sensitive than a random
    guess?
  • The ROC Curve Analysis supports formal empirical
    inquiry into instructional sensitivity.
  • The A, B, C, D, F interpretation can be
    understood by even the most statistics-phobic
    teacher, policy-maker, or news reporter.

22
Model 2 Same GroupDifferent Occasions
  • When two sets of outcomes are available for a
    group of students, the 2x2 table supports a
    different conceptualization

23
Model 2 Essential Statistic
  • Essential statistic B/(AB) The ratio of true
    learners to potential learners.
  • In the context of evaluating instructional
    sensitivity, the lower row (Cell C and Cell D) is
    best ignored as less than a clear-cut example of
    true and potential learning.

24
Model 2 ? Model 1
  • Use only the top row of both groups, and the
    model reverts to the original Model 1 (Different
    Groups) configuration

25
Model 2 ? Model 1
  • except, the Ns are smaller.
  • Fortunately, the strength of the
    true-to-potential learning ratio helps to offset
    the loss of power.

26
Model 2 ? Model 1
  • Index 1 is again applicable (AD)/N

27
Variations
  • The Different Groups Repeated Measures Model
    would support not only an index and ROC Curve
    analysis of instructional sensitivity
  • but also more specific, drill down comparisons.

28
Drill Down
  • For example, a variety of questions could be
    answered in conjunction with data involving
    school, teacher, and student characteristics
    e.g.,
  • How do differences in teaching experience affect
    instructional sensitivity?
  • How do differences in instructional technique
    affect instructional sensitivity?
  • How do instructional sensitivity estimates vary
    across different races or socio-economic levels?.
  • Similarly, drilling down might address
    psychometric issues, such as
  • To what degree does (AD)/N contribute unique
    information to other indicators of quality
    e.g., difficulty and discrimination?
  • To what degree do estimates of instructional
    sensitivity remain stable over time?

29
Procedural Review
  • Step 1 Cross-tabulate not-pass/pass status
    with teacher identification of not-best-taught/bes
    t-taught indicators
  • Step 2 When possible, use pre- and post-
    outcomes to compare the ratio of
    true-to-potential learners to more purely
    evaluate sensitivity to instruction
  • Step 3 (Optional) Use logistic regression and
    propensity score matching to create
    randomly-equivalent groups or, as close as you
    can get
  • Step 4 Use (AD)/N or formal ROC Curve Analysis
    to evaluate instructional sensitivity at the
    smallest grain-size possible - at
  • individual items
  • indicators or subscales

30
In Closing
  • The assumption that accountability tests are
    sensitive to
  • instruction rarely holds.
  • Inferences drawn from test scores about school
    quality and
  • teaching effectiveness must therefore be
    validated.
  • The empirical approaches presented here may help
    in
  • determining if the assumption of instructional
    sensitivity indeed is warranted.
  • constructing accountability tests that are more
    sensitive to instruction.

31
Presenters email addressscourt_at_usd259.net
  • Questions, comments, or suggestions are welcome
Write a Comment
User Comments (0)
About PowerShow.com