Lessons Learned from Applications of Machine Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Lessons Learned from Applications of Machine Learning

Description:

Personal involvement in a commercial project to use ML to ... 'leave one batch out' (LOBO) testing method. how to learn from batched data ? Research Issues (3) ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 22
Provided by: hol73
Category:

less

Transcript and Presenter's Notes

Title: Lessons Learned from Applications of Machine Learning


1
Lessons Learned from Applications of Machine
Learning
  • Robert C. Holte
  • University of Alberta

2
Source Material
  • Personal involvement in a commercial project to
    use ML to detect oil spills in satellite images
  • Other peoples papers on specific applications
  • Other peoples lessons learned
  • e.g. discussions with Foster Provost

3
Lesson 1 ML works
  • Numerous examples of machine learning being
    successfully applied in science and industry
  • saving time or money
  • doing something that would not have been possible
    otherwise
  • sometimes superior to human performance

Corollary it would be beneficial to have an
on-line repository of success stories
4
Example D. Michie
  • American Express (UK)
  • Loan applications automatically categorized by a
    statistical method
  • definitely accept
  • definitely reject
  • refer to a human expert
  • Human experts 50 accurate predicting loan
    defaults
  • Learned rules 70 accurate

5
Oil Spill project the task
  • In a continuous stream of satellite radar images,
  • identify the images that are likely to contain
    one or more oil slicks,
  • highlight the suspected region(s), and
  • forward the selected, annotated images to a human
    expert for a final decision and action.
  • Macdonald Dettwiler Associates

6
oil slick
7
Oil Spill project - team
  • MDA - satellite image processing experts
  • Canada Centre for Remote Sensing - human expert
    in recognizing oil slicks in radar images
  • attempts to build a classifier by hand failed
  • me, Stan Matwin, Miroslav Kubat
  • 1995-97
  • see Machine Learning, vol.30, February 1998

8
Lesson 2 Research Spinoffs
  • Many new, general research issues arose during
    the oil spill project, but could not be properly
    investigated within the scope of the project.
  • A great deal of follow-on research is needed.

Corollary when you write up an application, look
for general techniques, issues, and phenomena
9
Research Issues (1)
  • hand-picked data (purchased)
  • not a representative sample
  • small data sets (9 images, 937 dark regions)
  • risk of overtuning
  • task formulation
  • classifying images, regions, or pixels ?
  • subcategories of non-slicks ?

10
Research Issues (2)
  • imbalanced data sets (41 oil slicks, 896
    non-slicks)
  • accuracy inappropriate performance measure
  • standard learners optimize accuracy, tend to
    classify everything as not an oil slick
  • data is in distinct batches
  • leave one batch out (LOBO) testing method
  • how to learn from batched data ?

11
Research Issues (3)
  • feature engineering
  • image processing parameter settings affects
    learning in 2 ways
  • which regions are extracted from the image
  • the features of each region which are calculated
    and then fed into the learning algorithm
  • best settings for one were not best for the other

12
Good Classification, Poor Region
oil slick
13
Lesson 3 Need Version Control
  • Over the course of the project we had a vast
    variety of data sets
  • images from three different types of satellite
  • a growing set of images for each type
  • a different data set for every different setting
    of the image processing parameters
  • and many variations on the learning algorithms,
    experimental method, etc.

14
Lesson 4 Understand the Deployment Context
  • What is the task ? classification, filtering,
    control, diagnosis
  • non-uniform misclassification costs
  • costs vary with user, time, not known during
    learning
  • some tasks require explanations in addition to
    classifications, or classifiers that can be
    understood by domain experts

Corollary your experiments and performance
measure should reflect how the system will be
used and judged after deployment
15
Example Evans Fisher
  • Printing press banding problem
  • ML built a decision tree to predict if banding
    would occur or not. Some features exogenous (e.g.
    humidity), others were controllable (e.g. ink
    viscosity).
  • Actually, used to set the controllable variables
    given the values of the exogenous ones
  • But different variables were under the control of
    different craftsmen who would not necessarily
    co-operate with each other

16
Lesson 5 Expect Skepticism
  • It will be very hard to convince a decision-maker
    to actually deploy something new.
  • It will help if the learned system is in a form
    that the decision-maker is familiar with or can
    easily comprehend, and is consistent with all
    available background knowledge.

17
Counterexample Evans Fisher
  • One of the learned rules flatly contradicted the
    advice of an expert consultant, and the latter
    was more intuitive.
  • Upon further analysis by the local engineers, the
    learned rule was adopted.

18
Lesson 6 Exploit Human Experts
  • Capture as much expertise as you can
  • Involve the expert in the induction process
  • e.g. interactive induction (Evans Fisher,
    PROTOS)
  • e.g. Structured Induction (Alen Shapiro)

19
Lesson 7 Start Simple
  • 1R, Naïve Bayes, Perceptron, 1-NN
  • often work surprisingly well
  • provide a performance baseline
  • successes and failures inform you about your data

20
Lesson 8 Visualize
  • Visualize your data
  • e.g. project onto 1 or 2 dimensions
  • Visualize your classifiers performance
  • e.g. with ROC or cost curves
  • e.g. in instance space (which examples are
    problematic ?)
  • e.g. systematic error

21
Lessons
  1. ML works
  2. Applications spin off research issues
  3. Need version control for experiments
  4. Understand the deployment context
  5. Expect skepticism from decision-makers
  6. Exploit human experts
  7. Start simple
  8. Visualize your data and your classifier
Write a Comment
User Comments (0)
About PowerShow.com