Annotating Students - PowerPoint PPT Presentation

About This Presentation
Title:

Annotating Students

Description:

Rodney D. Nielsen, Wayne Ward, James Martin, and Martha Palmer ... 2003; Mitchell et al., 2002 & 2003; Pullman, 2005; Sukkarieh, 2003 & 2005 ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 24
Provided by: rodn97
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Annotating Students


1
Annotating Students Understanding of Science
Concepts
  • Rodney D. Nielsen, Wayne Ward, James Martin, and
    Martha Palmer
  • Center for Computational Language and Education
    Research
  • University of Colorado, Boulder

2
Annotating Fine-Grained Entailments
  • Question Kate said An object has to move to
    produce sound. Do you agree with her? Why or why
    not?
  • Reference answer Agree. Vibrations are movements
    and vibrations produce sound.
  • Learner answer I do not agree because a radio
    does not move to make sound.
  • The student agrees Contradicted
  • Vibrations are movement Unaddressed
  • Vibrations produce something Different Argument
  • Something produces sound Expressed

3
Recognizing Textual Entailment
  • Hypothesis Agree. Vibrations are movements and
    vibrations produce sound.
  • Text I do not agree because a radio does not
    move to make sound.
  • The student agrees False
  • Vibrations are movement Unknown
  • Vibrations produce something Unknown
  • Something produces sound True

4
Prior Work
  • Automated Tutors
  • Aleven et al. 2001 Graesser et al., 2001 Jordan
    et al., 2004 Koedinger et al. 1997 Makatchev et
    al., 2004 Peters et al., 2004 Pon Berry et al.,
    2004 Roll et al., 2005 Rose et al., 2003
    VanLehn et al., 2005
  • Constructed Response Scoring
  • Callear et al., 2001 Leacock and Chodorow, 2003
    Mitchell et al., 2002 2003 Pullman, 2005
    Sukkarieh, 2003 2005
  • PASCAL RTE (Dagan, Glickman and Magnini, 2005)
  • Differences / Weakness
  • Course grained entailment yes/no or grade 0-2
    points
  • Question-specific systems
  • Hand-crafted dialog control, parsers,
    knowledge-based ontologies, logic
    representations, and or rules
  • Require 100-500 responses per question

5
Necessity of Finer-Grained Analysis
  • Imagine a tutor only knowing that there is some
    unspecified part of the reference answer that we
    are not sure the student understands
  • Reference Answer A long string produces a low
    pitch.
  • Break the reference answer down into low-level
    facets derived from a dependency parse and
    thematic roles
  • NMod(string, long) The string is long.
  • Agent(produces, string) A string is producing
    something.
  • Product(produces, pitch) A pitch is being
    produced.
  • NMod(pitch, low) The pitch is low.
  • Assess whether an understanding of each facet is
    implicated by the students response

6
Representing Fine-Grained Semantics
  • Assess the relationship between the students
    answer and the reference answer facets at a finer
    grain
  • Reference Ans A long string produces a low
    pitch.
  • NMod(string, long)
  • Agent(produces, string)
  • Product(produces, pitch)
  • NMod(pitch, low)

Expressed Expressed Expressed Unaddressed
A long string produces a pitch.
Yes Yes Yes No
Assumed Expressed Expressed Different
Argument It produces a loud pitch.
Assumed Expressed Expressed Contradiction
Expressed It produces a high pitch.
7
The Focus of This Effort
  • Low level facets of reference answer
  • Finer-grained relationship to the facets

8
The Corpus
Grd Life Science Physical Science and Technology Earth and Space Science Scientific Reasoning and Technology
3-4 Human Body Structure of Life Magnetism Electricity Physics of Sound Water Earth Materials Ideas Inventions Measurement
5-6 Food Nutrition Environments Levers Pulleys Mixtures Solutions Solar Energy Landforms Models Designs Variables
  • Assessing Science Knowledge (ASK) Full Option
    Science System
  • Berkeley, Lawrence Hall of Science national
    assessment project (NSF)
  • 16 science teaching and learning modules, Grades
    3-6
  • 287 constructed response questions
  • 15,400 total student responses
  • 146,000 facet entailment annotations

9
Annotation Process
  • Step 1 FOSS/ASK reference answers were manually
    decomposed into constituent facets
  • Ref Answer The string is tighter, so the pitch
    is higher.
  • Be(string, tighter) The string is tighter.
  • Be(pitch, higher) The pitch is higher.
  • Cause(X, Y) X is caused by Y
  • Step 2 Learner answers are annotated to indicate
    whether and how each facet was addressed
  • Learner Answer The string is tighter, so there
    is less tension so the pitch gets higher.
  • Be(string, tighter) The string is
    tighter. Self-Contra
  • Be(pitch, higher) The pitch is higher. Expressed
  • Cause(X, Y) X is caused by Y Expressed

10
Reference Answer Decomposition
  • Begin with a manual dependency parse of the
    reference answer

vc
vmod
sbar
prd
vmod
nmod
sub
vmod
pmod
sub
vmod
The brass ring would not stick to the nail
because the ring is not iron.
  • Then raise main verbs, remove unimportant
    dependencies, incorporate copulas, prepositions
    and negation into dependency labels, and utilize
    thematic role labels

theme_not
cause_because
nmod
destination_to_not
be_not
The brass ring would not stick to the nail
because the ring is not iron.
11
Reference Answer Markup
  • Final facets for Ref Answer The brass ring would
    not stick to the nail because the ring is not
    iron.
  • NMod(ring, brass) The ring is brass.
  • Theme_not(stick, ring) The ring does not stick.
  • Destination_to_not(stick, nail) Something does
    not stick to the nail.
  • Be_not(ring, iron) The ring is not iron.
  • Cause_because(stick, is) X is caused by Y

theme_not
cause_because
nmod
destination_to_not
be_not
The brass ring would not stick to the nail
because the ring is not iron.
12
Answer Annotation Labels
  • Assumed Facets that are assumed to be understood
    a priori based on the question
  • Expressed Any facet directly expressed or
    inferred by simple reasoning
  • Inferred Facets inferred by pragmatics or
    nontrivial logical reasoning
  • Contra-Expr Facets directly contradicted by
    negation, antonymous expressions and their
    paraphrases
  • Contra-Infr Facets contradicted by pragmatics or
    complex reasoning
  • Self-Contra Facets that are both contradicted
    and implied (self contradictions)
  • Diff-Arg The core relation is expressed, but it
    has a different modifier or argument
  • Unaddressed Facets that are not addressed at all
    by the students answer

13
Annotation Expressed Inferred
  • Question Kate said An object has to move to
    produce sound. Do you agree with her? Why or why
    not?
  • Reference Answer Agree. Vibrations are movements
    and vibrations produce sound.
  • Root(root, agree) student agrees Expressed
  • Be(vibration, movement) vibration is
    movement Inferred
  • Agent(produce, vibrations) vibrations produce
    something Expressed
  • Patient(produce, sound) something produces
    sound Expressed
  • Student Answer Yes because it has to vibrate to
    make sounds.

14
Annotation Contradictions
  • Question Darla tied one end of a string around a
    doorknob and held the other end in her hand. When
    she plucked the string (pulled and let go
    quickly) she heard a sound. How would the pitch
    change if Darla pulled the string tighter?
  • Reference Answer When the string is tighter, the
    pitch will be higher.
  • Be(string, tighter) The string is
    tighter. Assumed
  • Be(pitch, higher) The pitch is
    higher. Contra-Expr
  • Cause(X, Y) X is caused by Y Assumed
  • Student Answer it will be low the pitch change

15
Annotation Unaddressed
  • Question Write a note to David to tell him why
    the pitch gets higher rather than lower
  • Ref Ans The string is tighter, so the pitch is
    higher. The string between the cup and table is
    not longer.
  • Be_not(string, longer) The string is not
    longer Unaddressed
  • Student Answer David pitch is not happening
    tension is happening okay so calm down.

16
Labels
  • Assumed Facets that are assumed to be understood
    a priori based on the question
  • Expressed Any facet directly expressed or
    inferred by simple reasoning
  • Inferred Facets inferred by pragmatics or
    nontrivial logical reasoning
  • Contra-Expr Facets directly contradicted by
    negation, antonymous expressions and their
    paraphrases
  • Contra-Infr Facets contradicted by pragmatics or
    complex reasoning
  • Self-Contra Facets that are both contradicted
    and implied (self contradictions)
  • Diff-Arg The core relation is expressed, but it
    has a different modifier or argument
  • Unaddressed Facets that are not addressed at all
    by the students answer

17
Inter-annotator Agreement
Fine-Grn Tutor Y/N
ITA 78.4 86.2 88.0
Kappa 0.704 0.728 0.752
  • In most disagreements (57) one annotator chose
    Unaddressed
  • 49 were between Unaddressed and Understood
  • 35 of disagreements were between the labels
    implying understanding
  • Only 2.3 of disagreements are between Understood
    and Contradicted

Fine-Grn all labels kept separate Tutor combine
Expressed, Inferred Assumed and Contra-Expr
Contra-Infr, others separate Y/N combine
Expressed, Inferred Assumed v. everything
else
18
Assessment Technology Overview
  • Start with hand-generated reference answer facets
  • Automatically parse reference learner answer
    and automatically extract representation
  • Generate machine learning feature vectors
    indicative of the students understanding of each
    facet
  • From answers, their parses, the relations between
    these, and corpus co-occurrence statistics
  • Train a machine learning classifier on the
    training set feature vectors
  • Use classifier to assess the test set answers,
    assigning one of five Tutor-Labels for each RA
    facet

19
Results (C4.5 decision tree)
nonAsmdFacets MajorityClass LexicalBaseline All Features ReducedTraining
Training Set 10xCV 54,967 54.6 59.7 77.1
Unseen Answers 30,514 51.1 56.1 75.5
Unseen Questions 6,699 58.4 63.4 61.7 66.5
Unseen Modules 3,159 53.4 62.9 61.4 68.8
  • Results on Tutor-Labels are
  • 24.4, 8.1 and 15.4 over most frequent class
    baseline
  • 19.4, 3.1 and 5.9 over lexical baseline
  • (All Unseen Modules facets adjudicated, about
    half of other modules adjudicated)

20
Conclusions
  • New assessment paradigm to enable more effective
    tutoring dialog management
  • Facet break down enables the tutor to provide
    feedback relevant specifically to the appropriate
    part of the reference answer
  • Additional labels facilitate understanding the
    type of mismatch between the reference
    answer/hypothesis and the students answer/text

21
Conclusions
  • Corpus of annotated answers
  • Substantial agreement 86.2 on Tutor-Labels,
    0.728 Kappa
  • About 146K facet annotations
  • Only corpus of fine-grained inference information
  • Freely available
  • Will support alternative approaches to the
    Recognizing Textual Entailment task

22
Conclusions
  • Answer Assessment System
  • Evaluation according to new paradigm
  • Within domain performance
  • 24 over majority class baseline
  • Out-of-domain performance
  • 15 over majority class baseline
  • First system to address out-of-domain assessment
  • First successful assessment of Grade 3-6
    constructed responses

23
Thanks!
  • This work was partially funded by Award Numbers
  • NSF 0551723,
  • IES R305B070434, and
  • NSF DRL-0733323.
Write a Comment
User Comments (0)
About PowerShow.com