End-User Debugging of Machine Learning Systems - PowerPoint PPT Presentation

About This Presentation
Title:

End-User Debugging of Machine Learning Systems

Description:

Margaret Burnett. Simone Stumpf. Tom Dietterich. Jon Herlocker. Erin Fitzhenry. Lida Li ... S., Rajaram V., Li L., Burnett M., Dietterich T., Sullivan E. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 36
Provided by: OSU68
Category:

less

Transcript and Presenter's Notes

Title: End-User Debugging of Machine Learning Systems


1
End-User Debugging of Machine Learning Systems
  • Weng-Keen Wong
  • Oregon State University
  • School of Electrical Engineering and Computer
    Science
  • http//www.eecs.oregonstate.edu/wong

2
Collaborators
Faculty
Grad Students
Undergrads
  • Margaret Burnett
  • Simone Stumpf
  • Tom Dietterich
  • Jon Herlocker
  • Erin Fitzhenry
  • Lida Li
  • Ian Oberst
  • Vidya Rajaram
  • Russell Drummond
  • Erin Sullivan

3
Papers
  • Stumpf S., Rajaram V., Li L., Burnett M.,
    Dietterich T., Sullivan E., Drummond R.,
    Herlocker J. (2007) . Toward Harnessing User
    Feedback For Machine Learning. In Proceedings of
    IUI 2007.
  • Stumpf, S., Rajaram V., Li L., Wong, W.-K.,
    Burnett, M., Dietterich, T., Sullivan, E.,
    Herlocker, J. (2008) Interacting Meaningfully
    with Machine Learning Systems Three Experiments.
    (Submitted to IJHCS)
  • Stumpf, S., Sullivan, E., Fitzhenry, E., Oberst,
    I., Wong, W.-K., Burnett., M. (2008). Integrating
    Rich User Feedback into Intelligent User
    Interfaces. In Proceedings of IUI 2008.

4
Motivation
?
Date Mon, 28 Apr 2008 235900 (PST) From John
Doe ltjohn.doe_at_onid.orst.edugt To Weng-Keen Wong
ltwong_at_eecs.oregonstate.edugt Subject CS 162
Assignment I cant get my Java assignment to
work! It just wont compile and it prints out
lots of error messages! Please help! public
class MyFrame extends JFrame private
AsciiFrameManager reader private JPanel
displayPanel public MyFrame(String filename)
throws Exception reader new
AsciiFrameManager(filename) displayPanel new
JPanel() ...
CS 162
John Doe
Trash
  • Machine learning tool adapts to end user
  • Similar situation in recommender systems, smart
    desktops, etc.

5
Motivation
Date Mon, 28 Apr 2008 235100 (PST) From Bella
Bose ltbose_at_eecs.oregonstate.edugt To Weng-Keen
Wong ltwong_at_eecs.oregonstate.edugt Subject
Teaching Assignments Ive compiled the teaching
preferences for all the faculty. Here are the
teaching assignments for next year Fall
Quarter CS 160 (Computer Science Orientation)
Paul Paulson CS 161 (Introduction to Programming
I) Chris Wallace CS 162 (Introduction to
Programming II) Weng-Keen Wong ...
Trash
  • Machine Learning systems are great when they work
    correctly, aggravating when they dont
  • The end user is the only person at the computer
  • Can we let end users correct machine learning
    systems?

6
Motivation
  • Learn to correct behavior quickly
  • Sparse data on start
  • Concept drift
  • Rich end-user knowledge
  • Effects of user feedback on accuracy?
  • Effects on users?

7
Overview
End-User
End user feedback
Explanation
Machine Learning Algorithm
8
Related Work
  • End user interaction
  • Active Learning (Cohn et al. 96, many others)
  • Constraints (Altendorf et al. 05, Huang and
    Mitchell 06)
  • Ranks (Radlinski and Joachims 05)
  • Feature Selection (Raghavan et al. 06)
  • Crayons (Fails and Olsen 03)
  • Programming by Demonstration (Cypher 93, Lau and
    Weld 99, Lieberman 01)
  • Explanation
  • Expert Systems (Swartout 83, Wick and Thompson
    92)
  • TREPAN (Craven and Shavlik 95)
  • Description Logics (McGuinness 96)
  • Bayesian networks (LaCave and Diez 00)
  • Additive classifiers (Poulin et al. 06)
  • Others (Crawford et al. 02, Herlocker et al. 00)

9
Outline
  1. What types of explanations do end users
    understand? What types of corrective feedback
    could end users provide? (IUI 2007)
  2. How do we incorporate this feedback into a ML
    algorithm? (IJHCS 2008)
  3. What happens when we put this together? (IUI 2008)

10
What Types of Explanations do End Users
Understand?
  • Thinkaloud study with 13 participants
  • Classify Enron emails
  • Explanation systems rule-based, keyword-based,
    similarity-based
  • Findings
  • Rule-based best but not a clear winner
  • Evidence indicates multiple explanation paradigms
    needed

11
What types of corrective feedback could end users
provide?
  • Suggested corrective feedback in response to
    explanations
  • Adjust importance of word
  • Add/remove word from consideration
  • Parse / extract text in a different way
  • Word combinations
  • Relationships between messages/people

12
Outline
  1. What types of explanations do end users
    understand? What types of corrective feedback
    could end users provide? (IUI 2007)
  2. How do we incorporate this feedback into a ML
    algorithm? (IJHCS 2008)
  3. What happens when we put this together? (IUI 2008)

12
13
Incorporating Feedback into ML Algorithms
  • Two approaches
  • Constraint-based
  • User co-training

14
Constraint-based approach
  • Constraints
  • If weight on word reduced or word removed, remove
    the word as a feature
  • If weight of word increased, word assumed to be
    important for that folder
  • If weight of word increased, word is a better
    predictor for that folder than other words

Estimate parameters for Naive Bayes using MLE
with these constraints
15
Standard Co-training
  • Create classifiers C1 and C2 based on the two
    independent feature sets.
  • Repeat i times
  • Add most confidently classified messages by any
    classifier to training data
  • Rebuild C1 and C2 with the new training data

16
User Co-training
  • CUSER Classifier based on user feedback
  • CML Machine learning algorithm
  • For each session of user feedback
  • Add most confidently classified messages by
    CUSER to training data
  • Rebuild CML with the new training data

17
User Co-training
  • CUSER Classifier based on user feedback
  • CML Machine learning algorithm
  • For each session of user feedback
  • Add most confidently classified messages by
    CUSER to training data
  • Rebuild CML with the new training data

Well expand the inner loop on the next slide
18
User Co-training
  • For each folder f, let vector vf words with
    weights increased by the user
  • For each message m in the unlabeled set
  • For each folder f,
  • Compute Probf from the machine learning
    classifier
  • Scoref of words in vf appearing in the
    message Probf
  • ScoremScorefmax Scoreother
  • Sort Scorem for all messages in decreasing order
  • Select the top k messages to add to the training
    set along with their folder label fmax
  • Rebuild CML with the new training data

19
Constraint-based vs User co-training
  • Constraint-based
  • Difficult to set hardness of constraint
  • Constraints often already satisfied
  • End-user can over-constrain the learning
    algorithm
  • Slow
  • User co-training
  • Requires unlabeled emails in inbox
  • Better accuracy than constraint-based

20
Results
Feedback from keyword-based paradigm
Feedback from similarity-based paradigm
21
Outline
  1. What types of explanations work for end users?
    What types of corrective feedback could end users
    provide? (IUI 2007)
  2. How do we incorporate this feedback into a ML
    algorithm? (IJHCS 2008)
  3. What happens when we put this together? (IUI 2008)

21
22
Experiment Email program
22
23
Experiment Procedure
  • Intelligent email system to classify emails into
    folders
  • 43 English-speaking, non-CS students
  • Background questionnaire
  • Tutorial (email program and folders)
  • Experiment task on feedback set
  • Correct folders. Add, remove, change weight on
    keywords.
  • 30 interaction logs
  • Post-session questionnaire

24
Experiment Data
  • Enron data set
  • 9 folders
  • 50 training messages
  • 10 each for 5 folders with folder labels
  • 50 feedback messages
  • For use in experiment
  • Same for each participant
  • 1051 test messages
  • For evaluation after experiment

25
Experiment Classification algorithm
  • User co-training
  • Two classifiers User, Naïve Bayes
  • Slight modification on user classifier
  • Scorefsum of weights in vf appearing in the
    message
  • Weights can be modified interactively by user

26
Results Accuracy improvements of rich feedback
Subject
Accuracy ? over folder feedback
  • Rich Feedback participant folder labels and
    keyword changes
  • Folder feedback participant folder labels

27
Results Accuracy improvements of rich feedback
Subject
Accuracy ? over baseline
  • Rich Feedback participant folder labels and
    keyword changes
  • Baseline original Enron labels

28
Results Accuracy improvements of rich feedback
Subject
Accuracy ?
  • Rich Feedback participant folder labels and
    keyword changes
  • Baseline original Enron labels
  • Folder feedback participant folder labels

29
Results Accuracy summary
  • 60 of participants saw accuracy improvements,
    some very substantial
  • Some dramatic decreases
  • More time between filing emails or more folder
    assignments ? higher accuracy

30
Interesting bits
  • Need to communicate the effects of the users
    corrective feedback
  • Unstable classifier period
  • With sparse training data, a single new training
    example can dramatically change the classifiers
    decision boundaries
  • Wild fluctuations in classifiers predictions
    frustrate end users
  • Causes wall of red

31
Interesting bits Unstable classifier period
Moved test emails into training set to look for
effect on accuracy (Baseline, participant 101)
32
Interesting bits
  • Unlearning important, especially to correct
    undesirable changes
  • Gender differences
  • Females took longer to complete
  • Females added twice as many keywords
  • Comment more on unlearning

33
Interesting directions for HCI
  • Gender differences
  • More directed debugging
  • Other forms of feedback
  • Communicating effects of corrective feedback
  • Users need to detect the system is listening to
    their feedback
  • Explanations
  • Form
  • Fidelity

34
Interesting directions for Machine Learning
  1. Algorithms for learning from corrective feedback
  2. Modeling reliability of user feedback
  3. Explanations
  4. Incorporating new features

35
Future work
  • ML Whyline (with Andy Ko)

36
For more information
  • wong_at_eecs.oregonstate.edu
  • www.eecs.oregonstate.edu/wong
Write a Comment
User Comments (0)
About PowerShow.com