Title: Labeling%20Semantic%20Relations%20Between%20Proteins
1Labeling Semantic Relations Between Proteins
- Barbara Rosario, Marti Hearst, Janice Hamer
2Protein-Protein interactions
- One of the most important challenges in modern
genomics, with many applications throughout
biology - There are several protein-protein interaction
databases (BIND, MINT,..), all manually curated
3HIV-1 Protein Interaction Database
- Documents interactions between HIV-1 proteins and
- host cell proteins
- other HIV-1 proteins
- disease associated with HIV/AIDS
- 2224 pairs of interacting proteins, 65 types
http//www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions
4HIV-1, Protein Interaction Database
Protein 1 Protein 2 Paper ID Interaction Type
Tat, p14 AKT3 11156964, 11994280.. activates
AIP1 Gag, Pr55 14519844, binds
Tat, p14 CDK2 9223324 induces
Tat, p14 CDK2 7716549 enhances
Tat, p14 CDK2 9525916 downregulates
.
5Most common interactions
6Protein-Protein interactions
- Idea use this to label data
Protein 1 Protein 2 Interaction Paper ID
Tat, p14 AKT3 activates 11156964
7Protein-Protein interactions
- Idea use this to label data
Protein 1 Protein 2 Interaction Paper ID
Tat, p14 AKT3 activates 11156964
Extract from the paper all the sentences with
Protein 1 and Protein 2
activates
activates
Label them with the interaction given in the
database
8Protein-Protein interactions
- Use citances
- Find all the papers
- that cite the papers
- in the database
Protein 1 Protein 2 Interaction Paper ID
Tat, p14 AKT3 activates 11156964
ID 9918876
ID 9971769
9Protein-Protein interactions
- From the papers, extract
- the citation sentences
- from these extract the
- sentences with Protein 1
- and Protein 2
- Label them
Protein 1 Protein 2 Interaction Paper ID
Tat, p14 AKT3 activates 11156964
10Examples of sentences
- Papers
- The interpretation of these results was slightly
complicated by the fact that AIP-1/ALIX depletion
by using siRNA likely had deleterious effects on
cell viability , because a Western blot analysis
showed slightly reduced Gag expression at later
time points (fig. 5C ). - Citations
- They also demonstrate that the GAG protein from
membrane - containing viruses , such as HIV ,
binds to Alix / AIP1 , thereby recruiting the
ESCRT machinery to allow budding of the virus
from the cell surface (TARGET_CITATION CITATION
) .
1110 Interaction types
12Protein-Protein interactions
- Tasks
- Given sentences from Paper ID, and/or citation
sentences to ID - Predict the interaction type given in the HIV
database for Paper ID - Extract the proteins involved
- 10-way classification problem
13Protein-Protein interactions
- Models
- Dynamic graphical model
- Naïve Bayes
14Graphical Models
15Evaluation
- Evaluation at document level
- All (sentences from papers citations)
- Papers (only sentences from papers)
- Citations (only citation sentences)
- Trigger word approach
- List of keywords (ex for inhibits inhibitor,
inhibition, inhibitetc. - If keyword presents assign corresponding
interaction
16Results
- Accuracies on interaction classification
Model All Papers Citations
Markov Model 60.5 57.8 53.4
Naïve Bayes 58.1 57.8 55.7
Baselines
Most freq. inter. 21.8 11.1 26.1
TriggerW 20.1 24.4 20.4
TriggerW BO 25.8 40.0 26.1
(Roles hidden)
17Results confusion matrix
For All. Overall accuracy 60.5
18Hiding the protein names
- Replaced protein names with tokens PROT_NAME
- Selective CXCR4 antagonism by Tat
- Selective PROT_NAME antagonism by PROT_NAME
19Results with no protein names
Model Papers Citations
Markov Model 44.4 (-23.1) 52.3 (-2.0)
Naïve Bayes 46.7 (-19.2) 53.4 (-4.1 )
20Protein extraction
- (Protein name tagging, role extraction)
- The identification of all the proteins present in
the sentence that are involved in the interaction - These results suggest that Tat - induced
phosphorylation of serine 5 by CDK9 might be
important after transcription has reached the 36
position, at which time CDK7 has been released
from the complex. - Tat might regulate the phosphorylation of the RNA
polymerase II carboxyl - terminal domain in pre -
initiation complexes by activating CDK7
21Protein extraction results
Recall Precision F-measure
All 0.74 0.85 0.79
Papers 0.56 0.83 0.67
Citations 0.75 0.84 0.79
No dictionary used
22Conclusions of protein-protein interaction project
- Encouraging results for the automatic
classification of protein-protein interactions - Use of an existing database for gathering labeled
data - Use of citations