Title: AutoTutor: An Intelligent Tutoring System with Mixed Initiative Dialog
1AutoTutor An Intelligent Tutoring System with
Mixed Initiative Dialog
- Art Graesser
- University of Memphis
- Department of Psychology
- the Institute for Intelligent Systems
- Supported on grants from the NSF, ONR, ARI, IDA,
IES, - US Census Bureau, and CHI Systems
2Interdisciplinary Approach
Psychology
Education
Computational Linguistics
Computer Science
3Overview
- Brief comments on my research on question asking
and answering - Primary focus is on AutoTutor -- a
collaborative reasoning and question answering
system
4Overview of my Research on Questions
- Psychological Models
- Question asking (PREG, ONR, NSF, ARI)
- Question answering (QUEST, ONR)
- Computer Artifacts
- Tutor (AutoTutor, Why/AutoTutor, Think like a
commander, NSF, ONR, ARI, CHI Systems) - Survey question critiquer (QUAID, US Census, NSF)
- Point Query software (PQ, ONR)
- Query-based information retrieval (HURA Advisor,
IDA)
5- AutoTutor
- Collaborative reasoning and question answering in
tutorial dialog
6(No Transcript)
7(No Transcript)
8Think Like a Commander Vignettes
- 1 Trouble in McLouth
- 2 Save the Shrine
- 3 The Recon Fight
- 4 A Shift In Forces
- 5 The Attack Begins
- 6 The Bigger Picture
- 7 Looking Deep
- 8 Before the Attack
- 9 Meanwhile Back at the Ranch
9Themes
- Keep Focus on Mission? Highers Intent?
- Model a Thinking Enemy?
- Consider Effects of Terrain?
- Use All Assets Available?
- Consider Timing?
- See the Bigger Picture?
- Visualize the Battlefield
- Accurately? - Realistic Space-Time Forecast
- Dynamically? - Entities Change Over Time
- Proactively? - What Can I Make Enemy Do
- Consider Contingencies and Remain Flexible?
10What does AutoTutor do?
- Asks questions and presents problems
- Why? How? What-if? What is the difference?
- Evaluates meaning and correctness of the
learners answers (LSA and computational
linguistics) - Gives feedback on answers
- Face displays emotions some gestures
- Hints
- Prompts for specific information
- Adds information that is missed
- Corrects some bugs and misconceptions
- Answers student question
- Holds mixed-initiative dialog in natural language
11Pedagogical Design Goals
- Simulate normal human tutors and ideal tutors
- Active construction of student knowledge rather
than information delivery system - Collaborative answering of deep reasoning
questions - Approximate evaluation of student knowledge
rather than detailed student modeling - A discourse prosthesis
12Feasibility of Natural Language Dialog in
Tutoring
- Learners are forgiving when the tutors dialog
acts are imperfect. - They are even more forgiving when the bar is set
low during instructions. - There are learning gains.
- Learning is not correlated with liking.
13 14Human Tutors
- Analyze hundreds of hours human tutors
- Research methods in college students
- Basic algebra in 7th grade
- Typical unskilled cross-age tutors
- Studies from the Memphis labs
- Graesser Person studies
- Studies from other labs
- Chi, Evens, McArthur
15Characteristics of students that we wish were
better
- Student question asking
- Comprehension calibration
- Self-regulated learning, monitoring, and error
correction - Precise, symbolic articulation of knowledge
- Global integration of knowledge
- Distant anaphoric reference
- Analogical reasoning
- Application of principles to a practical problem
16Pedagogical strategies not used by unskilled
tutors
- Socratic method (Collins, Stevens)
- Modeling-scaffolding-fading (Rogoff)
- Reciprocal training (Brown, Palincsar)
- Anchored Learning (Bransford,Vye, CTGV)
- Error diagnosis repair (Anderson, van Lehn,
Lesgold) - Building on prerequisites (Gagne)
- Cascade techniques (van Lehn, Schank)
- Sophisticated motivational techniques (Lepper)
17What can AutoTutor (and most human tutors)
handle?
18AutoTutor
Language Extraction
Problem selection
Speech act classifier
Dialog Management
Latent Semantic Analysis
Curriculum Script
Talking head with gestures
19(No Transcript)
20Managing One AutoTutor Turn
- Short feedback on the students previous turn
- Advance the dialog by one or more dialog moves
that are connected by discourse markers - End turn with a signal that transfers the floor
to the student - Question
- Prompting hand gesture
- Head/gaze signal
21Dialog Moves
- Positive immediate feedback Yeah Right!
- Neutral immediate feedback Okay Uh huh
- Negative immediate feedback No Not quite
- Pump for more information What else?
- Hint How does tossing the pumpkin affect
horizontal velocity? - Prompt for specific information Vertical
acceleration does not affect horizontal _______. - Assert Vertical acceleration does not affect
horizontal velicity. - Correct Air resistance is negligible
- Repeat So, once again, how does tossing the
pumpkin affect horizontal velocity? - Summarize So to recap, succinct summary.
22Discourse Marker Relations
23Sample dialog
- Tutor This might help. If the runner speeds up
will the pumpkin land behind the runner, next to
the runner, or in front of the runner? -
- The pumpkin will land behind the runner.
-
- Tutor Right. What is the horizontal force
applied to the pumpkin after it leaves the
runner's hands? - There is no horizontal force applied to the
pumpkin after it leaves the runner's hands. - Tutor Right.
- expectation(1) 0.836
- expectation(2) 0.851
- expectation(3) 0.584
- misconception (1) 0.371
- misconception (2) 0.605
24Expectation-1 The pumpkin has the same
horizontal velocity as the runner.
Expectation-2 Zero force is needed to keep an
object going with constant velocity.
Expectation-3 Vertical forces on the pumpkin do
not affect its horizontal velocity.
25(No Transcript)
26Expectation-1 The pumpkin has the same
horizontal velocity as the runner.
Expectation-2 Zero force is needed to keep an
object going with constant velocity.
Expectation-3 Vertical forces on the pumpkin do
not affect its horizontal velocity.
27How does Why/AutoTutor select the next
expectation?
- Dont select expectations that the student has
covered - cosine(student answers, expectation) gt threshold
- Frontier learning, zone of proximal development
- Select highest sub-threshold expectation
- Coherence
- Select next expectation that has highest overlap
with previously covered expectation - Pivotal expectations
28How does AutoTutor know which dialog move to
deliver?
Dialog Advancer Network (DAN) for
mixed-initiative dialog 15 Fuzzy production
rules Quality of the students assertion(s) in
preceding turn Student ability level Topic
coverage Student verbosity (initiative) Hint-Pr
ompt-Assertion cycles for expected good answers
29Dialog Advancer Network
30Hint-Prompt-Assertion Cycles to Cover Good
Expectations
Hint
- Cycle fleshes out one expectation at a time
- Exit cycle when
- cos(S, E ) gt T
- S student input
- E expectation
- T threshold
Prompt
Assertion
Hint
Prompt
Assertion
31Who is delivering the answer?
- STUDENT PROVIDES INFORMATION
- Pump
- Hint
- Prompt
- Assertion
- TUTOR PROVIDES INFORMATION
32Correlations between dialog moves and student
ability
33Question Taxonomy
- QUESTION CATEGORY GENERIC QUESTION FRAMES AND
EXAMPLES - 1. Verification Is X true or false? Did an
event occur? Does a state exist? - 2. Disjunctive Is X, Y, or Z the case?
- 3. Concept completion Who? What? When? Where?
- 4. Feature specification What qualitative
properties does entity X have? - 5. Quantification What is the value of a
quantitative variable? How much? How many? - 6. Definition questions What does X mean?
- 7. Example questions What is an example or
instance of a category?). - 8. Comparison How is X similar to Y? How is X
different from Y? - 9. Interpretation What concept/claim can be
inferred from a static or active data pattern? - 10. Causal antecedent What state or event
causally led to an event or state? - Why did an event occur? Why does a state
exist? - How did an event occur? How did a state come
to exist? - 11. Causal consequence What are the consequences
of an event or state? - What if X occurred? What if X did not occur?
- 12. Goal orientation What are the motives or
goals behind an agents action? - Why did an agent do some action?
- 13. Instrumental/procedural What plan or
instrument allows an agent to accomplish a goal?
34Speech Act Classifier
Assertions Questions (16 categories) Directive
s Metacognitive expressions (Im
lost) Metacommunicative expressions (Could
you say that again?) Short Responses 95
Accuracy on tutee contributions
35A New Query-based Information Retrieval
System(Louwerse, Olney, Mathews, Marineau,
Hite-Mitchell, Graesser, 2003)
Input speech act
Syntactic Parser Lexicons Surface cues Frozen
expressions
Classify speech act QUESTs 16
question categories, assertion, directive, other
Word particles of question category
Augment retrieval cues
Input context Text and Screen
Search documents via LSA
Select Highest Matching Document
36 37LEARNING GAINS (effect sizes)
- .42 Unskilled human tutors
- (Cohen, Kulik, Kulik, 1982)
- .75 AutoTutor (7 experiments)
- (Graesser, Hu, Person)
- 1.00 Intelligent tutoring systems
- PACT (Anderson, Corbett, Koedinger)
- Andes, Atlas (VanLehn)
- 2.00 (?) Skilled human tutors
38Learning Gains (Effect Sizes)
39Spring 2002 EvaluationsConceptual
Physics(VanLehn Graesser, 2002)
- Four conditions
- Human tutors
- Why/Atlas
- Why/AutoTutor
- Read control
- 86 College Students
40Measures in Spring Evaluation
- Multiple Choice Test
- Pretest and posttest (40 multiple choice
questions in each) - Essays graded by 6 physics experts
- 4 pretest and 4 posttest essays
- Expectations versus misconceptions
- Wholistic grades
- Generic principles and misconceptions
(fine-grained) - Learner perceptions
- Time on Tasks
41Effect Sizes on Learning Gains (pretest to
posttest,
no differences among tutoring
conditions)
42Fall 2002 EvaluationsConceptual
Physics(Graesser, Moreno, et al., 2003)
- Three tutoring conditions
- Why/AutoTutor
- Read textbook control
- Read nothing
- 63 subjects
43Multiple Choice Scores
442002-3 EvaluationsComputer Literacy(Graesser,
Hu, et al., 2003)
- 2 Tutoring Conditions
- AutoTutor
- Read nothing
- 4 Media Conditions
- Print
- Speech
- SpeechHead
- SpeechHeadPrint
- 96 subjects
45Deep Reasoning Questions
46 47Signal Detection Analyses
48Recall, Precision, and F-measure
49What Expectations are LSA-worthy?
- Compute correlation between
- Experts ratings of whether essay answers have
expectation E - Maximum LSA cosine between E and all possible
combinations of sentences in essay - A high correlation means the expectation is
LSA-worthy
50Expectations and Correlations (expert ratings,
LSA)
- After the release, the only force on the balls is
the force of the moons gravity (r .71) - A larger object will experience a smaller
acceleration for the same force (r .12) - Force equals mass times acceleration (r
.67) - The boxes are in free fall (r .21)
51- OTHER EMPIRICAL EVALUATIONS
52Assessment of Dialogue Management
- Bystander Turing test
- Participants rate whether particular dialog
moves in conversations were generated by
AutoTutor or by skilled human tutors.
?
53ASL Model 501 Eye Tracker
54 Percentage of Time Allocated to Interface
Components
QUESTION 4
TALKING HEAD 40
DISPLAY 29
ANSWER 7
OFF 20
(MAINLY KEYBOARD)
55What Conversational Agents Facilitate Learning?
56Correlation matrix for DVs
57AutoTutor Collaborations
- University of Pittsburgh (VanLehn)
- ONR, physics intelligent tutoring systems, Why2
- University of Illinois, Chicago (Wiley, Goldman)
- NSF/ROLE, plate tectonics, eye tracking, critical
stance - Old Dominion and Northern Illinois University
(McNamara, - Magliano, Millis, Wiemer-Hastings) IERI,
science text comprehension. - MIT Media Lab
- NSR/ROLE, Learning Companion, emotion sensors
(Picard, Reilly) - BEAT, gesture, emotion and speech generator
(Cassell, Bickmore) - CHI Systems (Zachary, Ryder)
- Army SBIR, Think Like a Commander
- Institute for Defense Analyses (Fletcher, Toth,
Foster) - ONR/OSD, Human Use Regulatory Affairs Advisor,
research ethics, web site with agent
58Collaboration with MIT Media Lab
- Affect Computing Lab
- Frustration
- Anger
- Confusion
- Eureka highs
- Contemplation flow experience
- Inferring emotions from sensors
- Blue Eyes
- Mouse-glove pressure and sweat
- Butt
- Dialog moves sensitive to emotions
59Forthcoming AutoTutor developments
- Language and discourse enhancements
- Weave in deeper semantic processing components
- Natural language generation for prompts
- Improved animated conversational agent
- 3-d simulation for enhancing the articulation of
explanations - Improve authoring tools
- Evolution of the content of curriculum scripts
through tutoring experience
60(No Transcript)
61(No Transcript)
62The Long-term Vision
- Future human-computer interfaces will be
conversational Just like people talking face to
face. - Avatars will tutor and mentor learners on the
web students, soldiers, citizens, customers,
elderly, special populations, low and high
literacy, low and high motivation - Learning modules will be accessed and available
throughout the globe a 24 by 7 virtual
university. - Learning tailored to learners abilities,
talents, interests, motivation, unique histories.
63Proposed NSF Project will Augment AutoTutor
- AutoTutor enhanced with more problem solving and
complex decision making with Franklins
Intelligent Distribution Agent - Courseware from MIT, Carnegie Mellon, Pittsburgh,
Wisconsin, Illinois, military, and corporations
(Merlot, Concord Consortium) - SCORM learning software standards established in
the military - University of Colorado speech recognition
- FedEx Institute as ADL Co-Lab with DoD and Bureau
of Labor, for expansion to business and industry
64 65- AutoTutor, Atlas, and Why2 are perhaps the most
sophisticated tutorial dialogue projects in the
intelligent tutoring systems community (James
Lester, AI Magazine, 2001)
66Are there properties of the expectations that
correlate with LSA-worthiness?
- Number of words .07
- Number of content words .01
- Vector length of expectation .05
- Number of glossary terms -.03
- Number of infrequent words .23
- Number of negations -.29
- Number of relative terms, symbols .06
- quantifiers, deictic expressions
67Challenges in use of LSA in AutoTutor
- Widely acknowledged limitations of LSA
- Negation
- Word order
- Structural composition
- Size of description
- I thought I said that already!
- coverage imperfection
- Need for larger corpus of misconceptions
- Expectation d of LSA with experts .79
- Misconception d of LSA with experts .57
- Coordinating LSA with symbolic systems