End-User Debugging of Machine Learning Systems

About This Presentation

Title:

End-User Debugging of Machine Learning Systems

Description:

Margaret Burnett. Simone Stumpf. Tom Dietterich. Jon Herlocker. Erin Fitzhenry. Lida Li ... S., Rajaram V., Li L., Burnett M., Dietterich T., Sullivan E. ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 36

Provided by: OSU68

Learn more at: https://web.engr.oregonstate.edu

Category:

more less

Transcript and Presenter's Notes

Title: End-User Debugging of Machine Learning Systems

1
End-User Debugging of Machine Learning Systems

Weng-Keen Wong
Oregon State University
School of Electrical Engineering and Computer
Science
http//www.eecs.oregonstate.edu/wong

2
Collaborators
Faculty
Grad Students
Undergrads

Margaret Burnett
Simone Stumpf
Tom Dietterich
Jon Herlocker

Erin Fitzhenry
Lida Li
Ian Oberst
Vidya Rajaram

Russell Drummond
Erin Sullivan

3
Papers

Stumpf S., Rajaram V., Li L., Burnett M.,
Dietterich T., Sullivan E., Drummond R.,
Herlocker J. (2007) . Toward Harnessing User
Feedback For Machine Learning. In Proceedings of
IUI 2007.
Stumpf, S., Rajaram V., Li L., Wong, W.-K.,
Burnett, M., Dietterich, T., Sullivan, E.,
Herlocker, J. (2008) Interacting Meaningfully
with Machine Learning Systems Three Experiments.
(Submitted to IJHCS)
Stumpf, S., Sullivan, E., Fitzhenry, E., Oberst,
I., Wong, W.-K., Burnett., M. (2008). Integrating
Rich User Feedback into Intelligent User
Interfaces. In Proceedings of IUI 2008.

4
Motivation
?
Date Mon, 28 Apr 2008 235900 (PST) From John
Doe ltjohn.doe_at_onid.orst.edugt To Weng-Keen Wong
ltwong_at_eecs.oregonstate.edugt Subject CS 162
Assignment I cant get my Java assignment to
work! It just wont compile and it prints out
lots of error messages! Please help! public
class MyFrame extends JFrame private
AsciiFrameManager reader private JPanel
displayPanel public MyFrame(String filename)
throws Exception reader new
AsciiFrameManager(filename) displayPanel new
JPanel() ...
CS 162
John Doe
Trash

Machine learning tool adapts to end user
Similar situation in recommender systems, smart
desktops, etc.

5
Motivation
Date Mon, 28 Apr 2008 235100 (PST) From Bella
Bose ltbose_at_eecs.oregonstate.edugt To Weng-Keen
Wong ltwong_at_eecs.oregonstate.edugt Subject
Teaching Assignments Ive compiled the teaching
preferences for all the faculty. Here are the
teaching assignments for next year Fall
Quarter CS 160 (Computer Science Orientation)
Paul Paulson CS 161 (Introduction to Programming
I) Chris Wallace CS 162 (Introduction to
Programming II) Weng-Keen Wong ...
Trash

Machine Learning systems are great when they work
correctly, aggravating when they dont
The end user is the only person at the computer
Can we let end users correct machine learning
systems?

6
Motivation

Learn to correct behavior quickly
Sparse data on start
Concept drift
Rich end-user knowledge
Effects of user feedback on accuracy?
Effects on users?

7
Overview
End-User
End user feedback
Explanation
Machine Learning Algorithm
8
Related Work

End user interaction
Active Learning (Cohn et al. 96, many others)
Constraints (Altendorf et al. 05, Huang and
Mitchell 06)
Ranks (Radlinski and Joachims 05)
Feature Selection (Raghavan et al. 06)
Crayons (Fails and Olsen 03)
Programming by Demonstration (Cypher 93, Lau and
Weld 99, Lieberman 01)

Explanation
Expert Systems (Swartout 83, Wick and Thompson
92)
TREPAN (Craven and Shavlik 95)
Description Logics (McGuinness 96)
Bayesian networks (LaCave and Diez 00)
Additive classifiers (Poulin et al. 06)
Others (Crawford et al. 02, Herlocker et al. 00)

9
Outline

What types of explanations do end users
understand? What types of corrective feedback
could end users provide? (IUI 2007)
How do we incorporate this feedback into a ML
algorithm? (IJHCS 2008)
What happens when we put this together? (IUI 2008)

10
What Types of Explanations do End Users
Understand?

Thinkaloud study with 13 participants
Classify Enron emails
Explanation systems rule-based, keyword-based,
similarity-based
Findings
Rule-based best but not a clear winner
Evidence indicates multiple explanation paradigms
needed

11
What types of corrective feedback could end users
provide?

Suggested corrective feedback in response to
explanations
Adjust importance of word
Add/remove word from consideration
Parse / extract text in a different way
Word combinations
Relationships between messages/people

12
Outline

What types of explanations do end users
understand? What types of corrective feedback
could end users provide? (IUI 2007)
How do we incorporate this feedback into a ML
algorithm? (IJHCS 2008)
What happens when we put this together? (IUI 2008)

12
13
Incorporating Feedback into ML Algorithms

Two approaches
Constraint-based
User co-training

14
Constraint-based approach

Constraints
If weight on word reduced or word removed, remove
the word as a feature
If weight of word increased, word assumed to be
important for that folder
If weight of word increased, word is a better
predictor for that folder than other words

Estimate parameters for Naive Bayes using MLE
with these constraints
15
Standard Co-training

Create classifiers C1 and C2 based on the two
independent feature sets.
Repeat i times
Add most confidently classified messages by any
classifier to training data
Rebuild C1 and C2 with the new training data

16
User Co-training

CUSER Classifier based on user feedback
CML Machine learning algorithm
For each session of user feedback
Add most confidently classified messages by
CUSER to training data
Rebuild CML with the new training data

17
User Co-training

CUSER Classifier based on user feedback
CML Machine learning algorithm
For each session of user feedback
Add most confidently classified messages by
CUSER to training data
Rebuild CML with the new training data

Well expand the inner loop on the next slide
18
User Co-training

For each folder f, let vector vf words with
weights increased by the user
For each message m in the unlabeled set
For each folder f,
Compute Probf from the machine learning
classifier
Scoref of words in vf appearing in the
message Probf
ScoremScorefmax Scoreother
Sort Scorem for all messages in decreasing order
Select the top k messages to add to the training
set along with their folder label fmax
Rebuild CML with the new training data

19
Constraint-based vs User co-training

Constraint-based
Difficult to set hardness of constraint
Constraints often already satisfied
End-user can over-constrain the learning
algorithm
Slow
User co-training
Requires unlabeled emails in inbox
Better accuracy than constraint-based

20
Results
Feedback from keyword-based paradigm
Feedback from similarity-based paradigm
21
Outline

What types of explanations work for end users?
What types of corrective feedback could end users
provide? (IUI 2007)
How do we incorporate this feedback into a ML
algorithm? (IJHCS 2008)
What happens when we put this together? (IUI 2008)

21
22
Experiment Email program
22
23
Experiment Procedure

Intelligent email system to classify emails into
folders
43 English-speaking, non-CS students
Background questionnaire
Tutorial (email program and folders)
Experiment task on feedback set
Correct folders. Add, remove, change weight on
keywords.
30 interaction logs
Post-session questionnaire

24
Experiment Data

Enron data set
9 folders
50 training messages
10 each for 5 folders with folder labels
50 feedback messages
For use in experiment
Same for each participant
1051 test messages
For evaluation after experiment

25
Experiment Classification algorithm

User co-training
Two classifiers User, Naïve Bayes
Slight modification on user classifier
Scorefsum of weights in vf appearing in the
message
Weights can be modified interactively by user

26
Results Accuracy improvements of rich feedback
Subject
Accuracy ? over folder feedback

Rich Feedback participant folder labels and
keyword changes
Folder feedback participant folder labels

27
Results Accuracy improvements of rich feedback
Subject
Accuracy ? over baseline

Rich Feedback participant folder labels and
keyword changes
Baseline original Enron labels

28
Results Accuracy improvements of rich feedback
Subject
Accuracy ?

Rich Feedback participant folder labels and
keyword changes
Baseline original Enron labels
Folder feedback participant folder labels

29
Results Accuracy summary

60 of participants saw accuracy improvements,
some very substantial
Some dramatic decreases
More time between filing emails or more folder
assignments ? higher accuracy

30
Interesting bits

Need to communicate the effects of the users
corrective feedback
Unstable classifier period
With sparse training data, a single new training
example can dramatically change the classifiers
decision boundaries
Wild fluctuations in classifiers predictions
frustrate end users
Causes wall of red

31
Interesting bits Unstable classifier period
Moved test emails into training set to look for
effect on accuracy (Baseline, participant 101)
32
Interesting bits

Unlearning important, especially to correct
undesirable changes
Gender differences
Females took longer to complete
Females added twice as many keywords
Comment more on unlearning

33
Interesting directions for HCI

Gender differences
More directed debugging
Other forms of feedback
Communicating effects of corrective feedback
Users need to detect the system is listening to
their feedback
Explanations
Form
Fidelity

34
Interesting directions for Machine Learning

Algorithms for learning from corrective feedback
Modeling reliability of user feedback
Explanations
Incorporating new features

35
Future work

ML Whyline (with Andy Ko)

36
For more information

wong_at_eecs.oregonstate.edu
www.eecs.oregonstate.edu/wong

Write a Comment

User Comments (0)

About PowerShow.com

End-User Debugging of Machine Learning Systems - PowerPoint PPT Presentation

End-User Debugging of Machine Learning Systems

Margaret Burnett. Simone Stumpf. Tom Dietterich. Jon Herlocker. Erin Fitzhenry. Lida Li ... S., Rajaram V., Li L., Burnett M., Dietterich T., Sullivan E. ... – PowerPoint PPT presentation