Pfizer HTS Machine Learning Algorithms: November 2002

About This Presentation

Title:

Pfizer HTS Machine Learning Algorithms: November 2002

Description:

Pfizer HTS Machine Learning Algorithms: November 2002. Paul Hsiung ... Super Model. ... Super Model. Divide Training Set into Compartment A and Compartment B ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 75

Provided by: ting3

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Pfizer HTS Machine Learning Algorithms: November 2002

1
Pfizer HTS Machine Learning Algorithms November
2002

Paul Hsiung (hsiung_at_cs.cmu.edu)
Paul Komarek (komarek_at_cs.cmu.edu)
Ting Liu (tingliu_at_cs.cmu.edu)
Andrew W. Moore (awm_at_cs.cmu.edu)
Auton Lab, Carnegie Mellon University
School of Computer Science
www.autonlab.org

2
Datasets
3
Projections
4
Previous Algorithms
5
New Algorithms
6
Explicit False Positive Model
7
Explicit False Positive Model
8
Example in 2 dimensions Decision Boundary
9
Example in 2 dimensions 100 true positives
10
100 true positives and 100 true negatives
11
100 TP, 100 TN, 10 FP
12
Using regular logistic regression
13
Using EFP Model
14
Example 10000 true positives
15
10000 true positives, 10000 true negatives
16
10000 TP, 10000 TN, 1000 FP
17
Using regular logistic regression
18
Using EFP Model
19
EFP Model Real Data Results
K-fold
20
EFP Effect
Very impressive on Train1 / Test1
21
Log X-axis
22
EFP Effect
Unimpressive on jun31 / jun32
23
Super Model

Divide Training Set into Compartment A and
Compartment B
Learn each of N models on Compartment A
Predict each of N models on Compartment B
Learn best weighting of opinions with Logistic
Regression of Predictions on Compartment B
Apply the models and their weights to Test Data

24
Comparison
25
Log X-Axis Scale
26
Comparison on 100-dims
27
Log X-axis
28
Comparison on 10 dims
29
Log X-axis
30
NewKNN summary of results and timings
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
PLS summary of results

PLS projections did not do so well.
However, PLS as a predictor performed
well,especially under train100/test100.
PLS is fast. The runtime varies from 1 to 10
minutes.
But PLS takes large amounts of memory.
Impossibleto use in a sparse representation.
(This is due to theupdate on each iteration.)

46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
Summary of results

SVM best early on in Train1, LR better in the
long-haul.
Projecting to 10-d always a disaster
Projecting to 100-d often indistinguishable from
behavior with original data (and much cheaper)
Naïve Gaussian Bayes Classifier best on JUN-3-1
(k-nn better for long haul)
Naïve Gaussian Bayes Classifier best on combined
Non-linear SVM never seems distinguishable from
Linear SVM
All methods have won in at least one context,
except Dtree.