Pfizer HTS Machine Learning Algorithms: November 2002 - PowerPoint PPT Presentation

About This Presentation
Title:

Pfizer HTS Machine Learning Algorithms: November 2002

Description:

Pfizer HTS Machine Learning Algorithms: November 2002. Paul Hsiung ... Super Model. ... Super Model. Divide Training Set into Compartment A and Compartment B ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 75
Provided by: ting3
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Pfizer HTS Machine Learning Algorithms: November 2002


1
Pfizer HTS Machine Learning Algorithms November
2002
  • Paul Hsiung (hsiung_at_cs.cmu.edu)
  • Paul Komarek (komarek_at_cs.cmu.edu)
  • Ting Liu (tingliu_at_cs.cmu.edu)
  • Andrew W. Moore (awm_at_cs.cmu.edu)
  • Auton Lab, Carnegie Mellon University
  • School of Computer Science
  • www.autonlab.org

2
Datasets
3
Projections
4
Previous Algorithms
5
New Algorithms
6
Explicit False Positive Model
7
Explicit False Positive Model
8
Example in 2 dimensions Decision Boundary
9
Example in 2 dimensions 100 true positives
10
100 true positives and 100 true negatives
11
100 TP, 100 TN, 10 FP
12
Using regular logistic regression
13
Using EFP Model
14
Example 10000 true positives
15
10000 true positives, 10000 true negatives
16
10000 TP, 10000 TN, 1000 FP
17
Using regular logistic regression
18
Using EFP Model
19
EFP Model Real Data Results
K-fold
20
EFP Effect
Very impressive on Train1 / Test1
21
Log X-axis
22
EFP Effect
Unimpressive on jun31 / jun32
23
Super Model
  • Divide Training Set into Compartment A and
    Compartment B
  • Learn each of N models on Compartment A
  • Predict each of N models on Compartment B
  • Learn best weighting of opinions with Logistic
    Regression of Predictions on Compartment B
  • Apply the models and their weights to Test Data

24
Comparison
25
Log X-Axis Scale
26
Comparison on 100-dims
27
Log X-axis
28
Comparison on 10 dims
29
Log X-axis
30
NewKNN summary of results and timings
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
PLS summary of results
  • PLS projections did not do so well.
  • However, PLS as a predictor performed
    well,especially under train100/test100.
  • PLS is fast. The runtime varies from 1 to 10
    minutes.
  • But PLS takes large amounts of memory.
    Impossibleto use in a sparse representation.
    (This is due to theupdate on each iteration.)

46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
Summary of results
  • SVM best early on in Train1, LR better in the
    long-haul.
  • Projecting to 10-d always a disaster
  • Projecting to 100-d often indistinguishable from
    behavior with original data (and much cheaper)
  • Naïve Gaussian Bayes Classifier best on JUN-3-1
    (k-nn better for long haul)
  • Naïve Gaussian Bayes Classifier best on combined
  • Non-linear SVM never seems distinguishable from
    Linear SVM
  • All methods have won in at least one context,
    except Dtree.

73
Some AUC Results
Not statistically significantly different
74
Some AUC Results
Write a Comment
User Comments (0)
About PowerShow.com