Labeling%20Semantic%20Relations%20Between%20Proteins - PowerPoint PPT Presentation

About This Presentation
Title:

Labeling%20Semantic%20Relations%20Between%20Proteins

Description:

One of the most important challenges in modern genomics, with many applications ... by the fact that AIP-1/ALIX depletion by using siRNA likely had deleterious ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 19
Provided by: rosa6
Category:

less

Transcript and Presenter's Notes

Title: Labeling%20Semantic%20Relations%20Between%20Proteins


1
Labeling Semantic Relations Between Proteins
  • Barbara Rosario, Marti Hearst, Janice Hamer

2
Protein-Protein interactions
  • One of the most important challenges in modern
    genomics, with many applications throughout
    biology
  • There are several protein-protein interaction
    databases (BIND, MINT,..), all manually curated

3
HIV-1 Protein Interaction Database
  • Documents interactions between HIV-1 proteins and
  • host cell proteins
  • other HIV-1 proteins
  • disease associated with HIV/AIDS
  • 2224 pairs of interacting proteins, 65 types

http//www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions
4
HIV-1, Protein Interaction Database
Protein 1 Protein 2 Paper ID Interaction Type
Tat, p14 AKT3 11156964, 11994280.. activates
AIP1 Gag, Pr55 14519844, binds
Tat, p14 CDK2 9223324 induces
Tat, p14 CDK2 7716549 enhances
Tat, p14 CDK2 9525916 downregulates
.
5
Most common interactions
6
Protein-Protein interactions
  • Idea use this to label data

Protein 1 Protein 2 Interaction Paper ID
Tat, p14 AKT3 activates 11156964
7
Protein-Protein interactions
  • Idea use this to label data

Protein 1 Protein 2 Interaction Paper ID
Tat, p14 AKT3 activates 11156964
Extract from the paper all the sentences with
Protein 1 and Protein 2
activates
activates

Label them with the interaction given in the
database
8
Protein-Protein interactions
  • Use citances
  • Find all the papers
  • that cite the papers
  • in the database

Protein 1 Protein 2 Interaction Paper ID
Tat, p14 AKT3 activates 11156964
ID 9918876
ID 9971769
9
Protein-Protein interactions
  • From the papers, extract
  • the citation sentences
  • from these extract the
  • sentences with Protein 1
  • and Protein 2
  • Label them

Protein 1 Protein 2 Interaction Paper ID
Tat, p14 AKT3 activates 11156964
10
Examples of sentences
  • Papers
  • The interpretation of these results was slightly
    complicated by the fact that AIP-1/ALIX depletion
    by using siRNA likely had deleterious effects on
    cell viability , because a Western blot analysis
    showed slightly reduced Gag expression at later
    time points (fig. 5C ).
  • Citations
  • They also demonstrate that the GAG protein from
    membrane - containing viruses , such as HIV ,
    binds to Alix / AIP1 , thereby recruiting the
    ESCRT machinery to allow budding of the virus
    from the cell surface (TARGET_CITATION CITATION
    ) .

11
10 Interaction types
12
Protein-Protein interactions
  • Tasks
  • Given sentences from Paper ID, and/or citation
    sentences to ID
  • Predict the interaction type given in the HIV
    database for Paper ID
  • Extract the proteins involved
  • 10-way classification problem

13
Protein-Protein interactions
  • Models
  • Dynamic graphical model
  • Naïve Bayes

14
Graphical Models
15
Evaluation
  • Evaluation at document level
  • All (sentences from papers citations)
  • Papers (only sentences from papers)
  • Citations (only citation sentences)
  • Trigger word approach
  • List of keywords (ex for inhibits inhibitor,
    inhibition, inhibitetc.
  • If keyword presents assign corresponding
    interaction

16
Results
  • Accuracies on interaction classification

Model All Papers Citations
Markov Model 60.5 57.8 53.4
Naïve Bayes 58.1 57.8 55.7
Baselines
Most freq. inter. 21.8 11.1 26.1
TriggerW 20.1 24.4 20.4
TriggerW BO 25.8 40.0 26.1
(Roles hidden)
17
Results confusion matrix
For All. Overall accuracy 60.5
18
Hiding the protein names
  • Replaced protein names with tokens PROT_NAME
  • Selective CXCR4 antagonism by Tat
  • Selective PROT_NAME antagonism by PROT_NAME

19
Results with no protein names
Model Papers Citations
Markov Model 44.4 (-23.1) 52.3 (-2.0)
Naïve Bayes 46.7 (-19.2) 53.4 (-4.1 )
20
Protein extraction
  • (Protein name tagging, role extraction)
  • The identification of all the proteins present in
    the sentence that are involved in the interaction
  • These results suggest that Tat - induced
    phosphorylation of serine 5 by CDK9 might be
    important after transcription has reached the 36
    position, at which time CDK7 has been released
    from the complex.
  • Tat might regulate the phosphorylation of the RNA
    polymerase II carboxyl - terminal domain in pre -
    initiation complexes by activating CDK7

21
Protein extraction results
Recall Precision F-measure
All 0.74 0.85 0.79
Papers 0.56 0.83 0.67
Citations 0.75 0.84 0.79
No dictionary used
22
Conclusions of protein-protein interaction project
  • Encouraging results for the automatic
    classification of protein-protein interactions
  • Use of an existing database for gathering labeled
    data
  • Use of citations
Write a Comment
User Comments (0)
About PowerShow.com