Title: Event Name Goes Here
1(No Transcript)
2Rational HIV vaccine design
- Nebojsa Jojic and David Heckerman
- Machine Learning and Applied Statistics
- Microsoft Research
3Collaborators
- Vladimir Jojic, Microsoft/U Toronto
- Carl Kadie, Microsoft
- Jennifer Listgarten, Microsoft/U Toronto
- Chris Meek, Microsoft
- Brendan Frey, Microsoft/ U Toronto
- Bette Korber, Los Alamos National Laboratory
- Christian Brander, Harvard/MGH
- Nicole Frahm, Harvard/MGH
- Simon Mallal/ Royal Perth Hospital
- Jim Mullins/ University of Washington
4Epitome as a model of diversity in natural
signals
A set of image patches
Input image
Epitome
5Compact representation
6Compact representation
7Using the epitome for recognition
The smiling point
Epitome of 295 face images
Images with the highest total posterior at the
smiling point
Images with the lowest total posterior at the
smiling point
8Epitomes may also allow some variability
Epitome e
Mean ?
Variances ?
9Epitomes can be computed for ordered datasets
(e.g., 1-D arrays or 2-D, or 3-D or n-D
matrices) with arbitrary measurement types
- Intensities
- R, G, B values
- Gradient values
- Wavelet coefficients
- Spectral energies
- Nucelotide or aminoacid content
-
- We even played with text and MIDI files
10AIDS 101
- AIDS (acquired immune deficiency syndrome) was
first described in the early 1980s - HIV (human immunnodeficiency virus) causes AIDS
was isolated in 1983 40 million people now
infected - HIV is RNA virus protein coat copying proteins
regulatory proteins RNA - Copying proteins RNA enters cell
- RNA is reverse transcribed to DNA
- DNA inserts into cells DNA and is transcribed and
translated to more HIV protein - Infected cell assembles more copies of HIV
- Cell bursts releasing many new copies of HIV
11The map of HIV
From http//www.mcld.co.uk/hiv (A simplified
version of the LANL detailed map)
12HIV diversity (LANL database)
HIV is encoded in an RNA sequence of about 10000
nucleotides, divided into several genes. NEF is
one of the shorter and moderately variable ones.
The NEF length in the strain The 73
nucelotides of the NEF gene
Note the insertions, deletions and mutations. A
triplet of nucleotides encode for one aminoacid.
A change in a single aminoacid may lower the
cellular immunity to the virus in one patient and
increase it in the other.
13(No Transcript)
14Immune system response
15(No Transcript)
16Known epitopes in a part of HIVs Gag protein
17Epitopes in variable regions
Colors signify different human immune types
18Immunology 101
- Train and kill mechanism
- Immune system sees a virus and trains killer
cells (T cells) to kill any cell showing a
pattern from the virus - Patterns are short peptides (8-11 amino acids
long) called epitopes
3D structure of an epitope as presented by an
infected cell to the killer cells
SLYNTVATL
Amino-acid pattern (peptide)
19But, HIV is variable
- The train-and-kill mechanism doesnt work as well
for HIV the virus adapts through rapid
mutation. As soon as the killer cells get the
upper hand, the epitopes start changing. - Possible solution
- Find epitopes that occur frequently across a
population of HIV viruses - Compact these epitopes into a small vaccine
(small is good long vaccines are hard to
deliver, and less likely to be effective) -
20The epitome of a virus
21Colors Different patients
Sequence data
VLSGGKLDKWEKIRLRPGGKKKYKLKHIVWASRELERF LSGGKLDRWEK
IRLR KKKYQLKHIVW KKKYRLKHIVW
Epitome
22Machine Learning Approach to Vaccine Design
- Use sample HIV strains from multiple patients
- Build models that compactly encode as many
epitopes (or likely epitopes) as possible - Learning techniques
- Myopic
- Split and merge
- Expectation Maximization
23(No Transcript)
24Coverage of all 10aa blocks from 245 Gag proteins
(Perth data)
25A Vaccine for HIV/AIDS
- Typical vaccines are near copies of the virus
that is being vaccinated against - HIV mutates at a high rate cant use
traditional techniques - Machine learning allows us to build compact forms
of pseudo-virus that covers the diversity of
the HIV virus (or rather a pseudo-protein that
covers the diversity of a particular HIV protein) - This pseudo-protein, which we call the epitome is
much shorter than the concatenation of all strains
26Expected (weighted) coverage optimization
We have algorithms to predict this!
p(T), p(S) Cleavage, MHC binding,
transport P(XSET) T-cell cross-reactivity
We have some idea about this, too.
27Finding Epitopes and their MHC-I counterparts
28Important to find both epitopes and the MHC-I
types that can present them
- Each patient has six MHC-I types (2 As, 2Bs, 2Cs)
- Most epitopes can be presented by only a few
MHC-I molecules - Different populations (China, India, South
Africa, etc.) have different MHC-I frequencies
29Finding Epitopes and their MHC-I counterparts
- Existing methods
- Trial and error in the wet lab
- Machine learning
- Our methods
- More machine learning
- Machine learning physics
- Machine learning wet lab
30Machine Learning
Examples of peptide is epitope for MHC-I type
Examples of peptide is NOT epitope for MHC-I type
- Classifier
- Logisitc regression
- SVM
- Neural net
- Etc
31Issues (from experience)
- Amount of data
- Feature extraction
- Algorithm choice
32Simple feature extraction
SLYNTVATL, A02
- Amino acid at position 1S
- Amino acid at position 2L
- Amino acid at position 3Y
-
- Amino acid at position 9L
- MHC-I typeA02
33Simple feature extraction(logistic regression)
34Better feature extraction
SLYNTVATL, A02
- Previously mentioned features
- Amino acid at position 1 S MHC-I A02
- Amino acid at position 2 L MHC-I A02
-
- Amino acid at position 9 L MHC-I A02
35Better feature extraction
36Machine learning physicswith David Baker and
Ora Furman, UW
37Machine learning physicswith David Baker and
Ora Furman, UW
38Machine learning wet labWith Christian Brander
Nicole Frahm, HarvardJennifer Listgarten, U.
Toronto
peptide, e.g., NYTSLIYTLIEESQNQQEK
Pt1
Pt2
Pt3
Pt4
PtN
- If a patients blood reacts with a peptide, then
it is very likely that some subsequence of the
peptide is an epitope for at least one of the
patients six MHC-I types - From observations for many patients, tease out
the responsible MHC-I type(s) - Find the subsequence in the lab
39What makes a good solution for a peptide?
- The fewer the responsible MHC-I types the better
- An MHC-I type gets points for appearing in
reacting patients and loses points for
appearing in non-reacting patients
40Not easy
- Lots of noise p(react is epitope)0.25
- Leaks may see a reaction even when the peptide
is not an epitope for any MHC-I type of the
patient - Explaining away When a patient has two MHC-I
types that can be responsible for a reaction,
those two get less credit - Dont actually know
- p(react is epitope)
- Leak probabilities
- Example solution
A B C
reacting patients
non-reacting patients
A B C
41Graphical model for a peptide
A01
A02
A03
B01
B02
B03
C01
C02
C03
A02c
A01c
A03c
A03c
B01c
B02c
B02c
B03c
C01c
C01c
OR
OR
C03c
C02c
pt1 reacts
pt2 reacts
leak
leak
p0
p0
42(Directed Acyclic) Graphical Models
p(F,B,T,G,S) p(F) p(BF) p(TF,B) p(GF,B,T)
p(SF,B,T,G) p(F) p(BF)
p(TF,B) p(GF,B,T) p(SF,B,T,G)
Pvars p(varparents)
43Graphical model for a peptide
A01
A02
A03
B01
B02
B03
C01
C02
C03
44Graphical model for a peptide
A01
A02
A03
B01
B02
B03
C01
C02
C03
p
A02c
A03c
B01c
B02c
C01c
C03c
45Graphical model for a peptide
A01
A02
A03
B01
B02
B03
C01
C02
C03
p
p
p
A02c
p
p
A03c
p
B01c
B02c
C01c
C03c
46Graphical model for a peptide
A01
A02
A03
B01
B02
B03
C01
C02
C03
A02c
A03c
B01c
B02c
C01c
OR
C03c
pt1 reacts
leak
p0
47Graphical model for a peptide
A01
A02
A03
B01
B02
B03
C01
C02
C03
A02c
A01c
A03c
A03c
B01c
B02c
B02c
B03c
C01c
C01c
OR
OR
C03c
C02c
pt1 reacts
pt2 reacts
leak
leak
p0
p0
48Solving the model
- Principle find the p, p0 and MHC-I assignments
that maximize the likelihood of the data - Algorithm
- Guess p, p0
- Iterate
- Use relaxation method to find max likelihood
MHC-I assignments - Use gradient descent to find values of p, p0 that
maximize the likelihood
49Status
- Most likely assignments have been confirmed
50Summary
- HIV vaccine design is a data intensive problem
- Data is in the form of discrete sequences, making
it ideal for computer-science/machine-learning
analysis - Machine learning approaches are instrumental in
finding epitopes and vaccine compression - Work in progress Our vaccine designs are
scheduled to be tested at Mass General in vitro
this summer
51(No Transcript)
52What if there are fewer epitopes?
Fewer epitopes
53What if there are more epitopes?
More epitopes
If uncertain, should err in favor of
more epitopes (overlap provides some robustness)
54Rational Design of HIV/AIDS Vaccines
- Many collaborators
- Microsoft Nebojsa Jojic, David Heckerman,
Vladimir Jojic, Chris Meek, Brendan Frey, Carl
Kadie, Jennifer Listgarten - Royal Perth Hospital Simon Mallal
- University of Washington Jim Mullins
- Harvard/Mass General Bruce Walker, Christian
Brander - Los Alamos National Lab Bette Korber
55AIDS 101
- AIDS (acquired immune deficiency syndrome) was
first described in the early 1980s - HIV (human immunnodeficiency virus) causes AIDS
was isolated in 1983 40 million people now
infected - HIV is RNA virus protein coat copying proteins
RNA - Copying proteins RNA enters cell
- RNA is reverse transcribed to DNA
- DNA inserts into cells DNA and is transcribed and
translated to more HIV protein - Infected cell assembles more copies of HIV
- Cell bursts releasing many new copies of HIV
56(No Transcript)
57Immunology 101
- Immune system fights viruses through train and
kill mechanism - Immune system sees a virus and trains killer
cells (T cells) to kill any cell showing a
pattern from the virus - Patterns are short peptides (8-11 amino acids
long) called epitopes
SLYNTVATL
Amino-acid pattern (peptide)
3D structure of an epitope as presented by an
infected cell to the killer cells
58(No Transcript)
59HIV is different
- The train-and-kill mechanism doesnt work for HIV
the virus adapts through rapid mutation. As
soon as the killer cells get the upper hand, the
epitopes start changing. - Possible solution
- Find epitopes that occur commonly across a
population of HIV viruses - Compact these epitopes into a small vaccine
(small is good long vaccines are hard to
deliver, and less likely to be effective) -
60Important to find both epitopes and the MHC-I
types that can present them
- Each patient has six MHC-I types (2 As, 2Bs, 2Cs)
- Most epitopes can be presented by only a few
MHC-I molecules - Different populations (China, India, South
Africa, etc.) have different MHC-I frequencies
61Machine learning, HIV, and SPAM
62- Use machine learning to find patterns of words
and phrases that indicate spam - Free!
- Money
- Click here
- Vi_at_gr_at_
- Use machine learning to find epitopes that
stimulate the immune system
SLYNTVATL