Title: AVRP: Initial Analysis of the NonHuman Primate Study
1AVRP Initial Analysis of the Non-Human Primate
Study
David Madigan Rutgers University
stat.rutgers.edu/madigan
2Goal of the Analysis
- Are measurable aspects of the state of the immune
system predictive of survival? - Problem hundreds of different assays but fewer
than one hundred macaques - Initial descriptive analysis
- Regularized predictive modeling
3(No Transcript)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Logistic Regression Model
- Linear model for log odds of category membership
p(y1xi)
log ? bj xij bxi
p(y-1xi)
- Conditional probability model
15Maximum Likelihood Training
- Choose parameters (bj's) that maximize
probability (likelihood) of class labels (yi's)
given documents (xis)
- Tends to overfit
- Not defined if d gt n
- Feature selection
16Shrinkage Methods
- Feature selection is a discrete process
individual variables are either in or out.
Combinatorial nightmare. - This method can have high variance a different
dataset from the same source can result in a
totally different model - Shrinkage methods allow a variable to be partly
included in the model. That is, the variable is
included but with a shrunken co-efficient - Elegant way to tackle over-fitting
17Ridge Regression
subject to
Equivalently
This leads to Choose ? by cross-validation.
works even when XTX is singular
18s
19Least Absolute Shrinkage Selection Operator
(LASSO)
Tibshirani
subject to
- Quadratic programming algorithm needed to solve
for the parameter estimates - Modifed Gauss-Seidel Highly tuned C
implementation - http//stat.rutgers.edu/madigan/BBR
20(No Transcript)
21Same as putting a double exponential or Laplace
prior on each bj
22(No Transcript)
23Data Sets
- ModApte subset of Reuters-21578
- 90 categories 9603 training docs 18978 features
- Reuters RCV1-v2
- 103 cats 23149 training docs 47152 features
- OHSUMED heart disease categories
- 77 cats 83944 training docs 122076 features
- Cosine normalized TFxIDF weights
24Dense vs. Sparse Models (Macroaveraged F1)
25(No Transcript)
26(No Transcript)
27 Groups 1-3 TNA at week 38 IFNm at week
4 TNFe at week 4
Estimate Std. Error z value Pr(gtz)
(Intercept) -14.9800 9.6559 -1.551 0.1208
tna38 -0.4594 0.5611 -0.819
0.4129 ifnm4 1.8591 1.4046
1.324 0.1856 tnfe4 16.2882
8.7637 1.859 0.0631 .
Groups 4-8 IgG at week 46 TNA at week
8 SI at week 38 IL6m at week 38
Estimate Std. Error z value Pr(gtz)
(Intercept) -2.7190 1.6131 -1.686 0.09186
. dose 31.5690 19.6857 1.604
0.10879 igg46 -0.9257 0.6544
-1.415 0.15718 tna8 -0.1901
0.2356 -0.807 0.41971 si38 1.1912
0.8243 1.445 0.1345 il6m38 -0.9989
0.5405 -1.848 0.06457
28 Groups 1-3
29 Groups 4-8