Inferring Conceptual Knowledge from Unstructured Student Writing - PowerPoint PPT Presentation

About This Presentation
Title:

Inferring Conceptual Knowledge from Unstructured Student Writing

Description:

Inferring Conceptual Knowledge from Unstructured Student Writing Workshop: Personalizing Education with Machine Learning Neural Information Processing Systems (NIPS ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 19
Provided by: Norm2161
Category:

less

Transcript and Presenter's Notes

Title: Inferring Conceptual Knowledge from Unstructured Student Writing


1
Inferring Conceptual Knowledge from Unstructured
Student Writing
Workshop Personalizing Education with Machine
Learning Neural Information Processing Systems
(NIPS) ConferenceLake Tahoe, CA, 8 December 2012
  • Norma C. Ming

Vivienne L. Ming
2
The role of assessment in instruction
  • Reveals what students already know and what they
    need to learn
  • Provides feedback to students and teachers on
    success of learning and instruction
  • Timely and specific feedback can guide continued
    instruction (formative assessment)

Graphic from http//www.cmu.edu/teaching/assessmen
t/basics/alignment.html
3
Challenges with assessment
  • Large-scale assessment
  • Heavy on summative assessment
  • Standardized tests, academic analytics systems
  • Emphasize performance, not conceptual
    understanding
  • Delayed, coarse-grained feedback
  • Intrusive
  • Interrupt class to administer test
  • Modify instruction to adopt others materials
  • Alternatives
  • Teachers may lack training in designing and
    interpreting other kinds of assessment
  • Difficult to aggregate, calibrate

Printable sign available athttp//www.pickens.k12
.ga.us/assessment.html
4
Our goals
  • Use continuous, passive assessmentto elucidate
    conceptual knowledge.
  • Wealth of unstructured data
  • Informal
  • Build on teachers existing instruction
  • Align with formal assessment, e.g.
  • course grades
  • standardized tests
  • instructor qualitative assessment

5
Research questions
  1. Can topic models of unstructured student writing
    predict course outcomes?
  2. How does the accuracy of these predictions change
    over time as more student work is analyzed?
  3. What does learning the topic hierarchy add beyond
    conventional topic modeling in improving these
    predictions?

6
Dataset Methods
  • Online discussion forums
  • 5- or 6-week courses
  • 2 mandatory discussion questions per week
  • Introductory courses at large, for-profit
    university

Biology (undergraduate) Economics (MBA)
Course length (wks) 5 6
discussion question threads per class 10 12
classes 17 45
students (after filtering) 230 970
posts by students 9118 44345
7
Analytical approach
  • Outcome of interest Student conceptual
    understanding
  • Proxy Outcome Student course grade
  • Compare possible data features
  • Baseline
  • Mean course grade
  • Individual student posting characteristics
  • Word count
  • Conventional Semantic Modeling
  • Probabilistic Latent Semantic Analysis (pLSA)
  • Feature of Interest
  • Hierarchical Latent Dirichlet Allocation (hLDA)

8
Algorithms
  • Proof of concept
  • Logistic regression on the accumulated topic
    coefficients from each week
  • Other supervised algorithms (e.g., SVM) surely
    better
  • LR chosen to focus on contribution from hLDA
  • Current work utilizes
  • HCRF (Hidden-state Conditional Random Fields)
  • Improved weekly predictions
  • Allows forward prediction in course time

9
Results Biology course
  • Prediction accuracy
  • Word count gt mean (for 3 wks)
  • pLSA gtgt word count
  • hLDA gt pLSA
  • With more data collected over time
  • All predictions improve.

10
Results Economics course
  • Prediction accuracy
  • Word count gt mean (for 2 wks)
  • pLSA gt word count
  • hLDA gtgt pLSA
  • With more data collected over time
  • All predictions improve.

11
Topic modeling can distinguish topics discussed
by final grades.
  • Each point represents posts by one student
  • Posts projected in 100-D pLSA concept space
  • Used local linear embedding (LLE) to reduce to
    2-D

Cs Ds neglect these topics
Increasing final grades
12
Comments by higher grade-earners reveal more
structure.
  • Each point represents one post, color-coded by
    grade
  • Ds and below cluster in the center
  • Higher grades move in specific directions toward
    periphery
  • Directions may correspond to course structure or
    instructors guidance
  • Not just depth or specificity, but particular
    concepts

13
Structure corresponds to course topics.
  • Same points, color-coded by week
  • Different weeks on different branches
  • Low grades stay in center even when discussion
    topics invite more specific comments.

14
What does hierarchical modeling add?
  • Not all language is equal.
  • Conventional topic modeling treats all topics as
    equal (and independent).
  • Hierarchy implies ranking
  • Shallower more frequent and generic language
  • Deeper more infrequent and technical language

15
Examining hLDA results (Econ)
  • Posts from students earning higher grades
    correlated with
  • Higher mean of depth in hLDA
  • C grades most language at shallowest level
  • A, B grades more language at deeper levels
  • More technically proficient language use
  • General language more anecdotal comments
  • Specific language greater conceptual depth

16
Summary of results
  • Can topic models of unstructured student writing
    predict course outcomes?
  • YES pLSA, hLDA both better than chance (and
    better than post length).
  • How does the accuracy of these predictions change
    over time as more student work is analyzed?
  • Extra weeks of data improves predictions.
  • By end of course, pLSA predictions are within one
    letter grade.
  • What does learning the topic hierarchy add beyond
    conventional topic modeling in improving these
    predictions?
  • hLDA gt pLSA
  • Higher grades associated with discussion of
    deeper topics in hLDA.

17
Conclusions and Future Work
  • There is some collection of topics associated
    with higher grades (and some other collection of
    topics associated with lower grades).
  • Deeper topics associated with low/high grades
    could potentially differ analysis yet to be
    done.
  • i.e., deep misconceptions such as inheriting
    acquired traits (Lamarckian evolution)
  • Next steps
  • Create topic map
  • Hierarchical relationships
  • Normative sources (e.g., textbook, exemplary
    student work)
  • Labeled, non-normative sources (common
    misconceptions)

18
Implications
  • Extensions to other text data
  • Essays, short-answer test questions
  • Online tutoring
  • Informal learning environments (e.g., Quora,
    Evernote)
  • Annotations on e-texts
  • Wiki contributions
  • Language mediates learning text is everywhere.
    Learn from it, improve it.
Write a Comment
User Comments (0)
About PowerShow.com