Interactions and Ontologies

About This Presentation

Title:

Interactions and Ontologies

Description:

BRENDA. BRITE - Biomolecular Relations in Information Transmission and Expression ... Fields S. Song O. Nature. 1989 Jul 20;340(6230):245-6. PMID: 2547163 ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 92

Provided by: stephe78

Category:

more less

Transcript and Presenter's Notes

Title: Interactions and Ontologies

1
Interactions and Ontologies

CBW Bioinformatics Workshop February 23th 2006,
Toronto Christopher Hogue The Blueprint
Initiative
2
About this talk

Interoperability, Standards and Systems - A
Historic Perspective
Understanding Biomolecular Function
A BIND Interaction Record
Interaction and Reaction Data Models
Interaction Experiments
Yeast Two Hybrid, Affinity Purification and False
Positives
Spoke and Matrix models for complexes of Unknown
Topology
Ontologies

3
Interaction Databases

Aminoacyl-tRNA Synthetases Database
ASEdb - Alanine Scanning Energetics database
BBID - Biological Biochemical Image Database
BIND - Biomolecular Interaction Network Database
BindingDB - The Binding Database
Biocarta
Biocatalysis/Biodegradation Database
BioPathways Consortium
BRENDA
BRITE - Biomolecular Relations in Information
Transmission and Expression
COMPEL (Composite Regulatory Elements)
COPE - Cytokines Online Pathfinder Encyclopaedia
CSNDB - Cell Signaling Networks Database / CSNDB
Paper
Curagen Pathcalling
DIP - Database of Interacting Proteins
DPInteract - DNA-protein interactions
DRC - Database of Ribosomal Crosslinks
Ecocyc and Metacyc
Dynamic Signaling Maps

JenPep Immunology MHC-peptide database
KEGG - Kyoto Encyclopedia of Genes and Genomes
Kohn Molecular Interaction Maps
MDB - Metalloprotein Database and Browser
MHCPEP - A database of MHC binding peptides
MINT - a database of Molecular INTeractions
MIPS Yeast Genome Database
MMDB - Molecular Modeling Database
NetBiochem Welcome Page
ooTFD - object-oriented Transcription Factors
Database)
ORDB - Olfactory Receptor Database
PATIKA - Pathway Analysis Tool for Integration
and Knowledge Acquisition
PFBP - Protein Function and Biochemical Pathways
Project
PhosphoBase - A database of phosphorylation sites
PIM (Protein Interaction Map)
PIMdb - Drosophila Protein Interaction Map
database
PKR - Protein Kinase Resource
ProChart Database (at AxCell Biosciences)
ProNet Online - Protein Interactions on the Web
(Myriad)

4
Over 50? Why So Many?

Easy to build a simple Interaction Database.
A Simple Abstraction. Many Projects cutting
their teeth in Bioinformatics
Conceptually this list includes Biochemical
Pathways (reactions interactions)
Also includes transcription factors, tRNA
synthetases, etc, all of which can fall into a
general biomolecular binding description.
Many Niches to Fill
Kinetics
Organism centric
Protein-protein centric
Most are not funded for a large-scale service

5
How do we make things interoperate?What is in a
Standard?A Historical Perspective

Standards emerge from successful implementations
of complete systems.
Which one is the standard The light bulb
or the electrical grid?
Lamps were the original killer app.
(bye-bye candles, gas lamps, oil lamps)
Other Apps Motors, Heaters, Toasters
Unexpected Apps radio, TV, transformers,
computers, rechargables
Entire systems become standards via ad-hoc and
popular use snowball effect.

6
Emergence and evolution of technological systems

Systems emerge across broad frontiers
Lots of small inventions are responsible for
emerging technologies.
Portions of the frontier that are held back
become the focus of intense innovation
Called a reverse salient by students of
technology
An inadequately functioning or accessible
component in a complex system of components
Opportunities for invention and replacement

7
Reverse Salient AC/DC Example

1882 Edisons DC standard lit up Wall Street
High-level buy-in for DC.
AC was too complicated, could kill a person!
Edisons DC system only worked over short-range.
This flaw is the reverse salient.

Westinghouse/Stanley/Tesla saw the flaw in this
standard
AC technology raced to fill the gap.
Light bulbs work with both AC or DC.
Motors required re-invention
E.S. Rogers batteryless radio

1925
8
Reverse Salient AC/DC Example

Result Cars, Battery based devices emerged with
DC.

Result The electrical Grid emerged with AC.

NOT A WINNER-TAKE-ALL (zero-sum game) RESULT!
9
A few reverse salients in Bioinformatics

Inadequately Functioning
Integration of Structure and Sequence
Integration of chemoinformatics with
bioinformatics
Mapping of microarray data to pathways
Integration of interactions and pathways
Inaccessable
Carbohydrate representation and analysis tools
Advanced, ad-hoc text mining tools

10
Reverse Salient Attitudes

What holds us back?
Oversights (didnt think of that!).
Shortsightedness (wont ever need that!).
Inability (cant do it!)
Stubbornness (wont do it!)
Prescriptivism (do it like this!)
Nationalism, Continentalism, Colonialism
(because thats the way we do it here!)
110 vs 220

11
Understanding Biomolecular Function

"I yam what I yam and that's all that I yam.
- Popeye the sailor man, the worlds first comic
book superhero

12
Biomolecular function
E S gt E P

This is a generalization of how a biochemist
might represent the function of enzymes.

13
Biomolecular function
E S gt E P kinase-ATP complex
inactive-enzyme gt Kinase ADP active
enzyme
K
P
ATP
ADP

Here is an example of the generalization
represented two different ways.

14
Biomolecular function
Kinase-ATPcomplex
inactiveenzyme
Activeenzyme
ADP

This is another representation.

15
Biomolecular function
A
B
C
D
E
F

This is a generalization of the representation.

16
Biomolecular function
A
B
C
D
E
F

A biomolecules function can be defined by the
things that it interacts with and the new (or
altered) molecules that result from that
interaction.

17
Biomolecular function
A
B
C
D
E
n

This representation makes it easy to focus on the
interaction part.

18
Biomolecular function
A
B
C
D
E
n

This also happens to represent the BIND data
model.

19
A simple BIND record
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference

The minimal BIND record has 9 pieces of
information.

20
A curated BIND record
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference

The curated BIND record may have many more pieces
of information.

21
An example BIND record
A
B
1. INAD 2. TRP3. Protein 4. Protein 5.
GenBank GI 3641615 6. GenBank GI 73018617.
GenBank Taxonomy ID 7227 8. GenBank Taxonomy ID
7227 9. PubMed ID 8630257

You can view this record in BIND

22
BIND stores molecular interaction data
23
(No Transcript)
24
http//bind.ca

Enter 188 (the BIND record number) in the
Identifier search box

25
(No Transcript)
26
BIND records are observations
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference

All BIND records will have a publication
reference and most will specifically list a
method(s) used to demonstrate the interaction.

27
(No Transcript)
28
Methods used to detect interactions.

A great deal of interaction data in BIND
originates from high-throughput experiments
designed to detect interactions between
proteins.
The most common methods are
Two-hybrid assay
Affinity purification

29
Interaction Experimental Evidence in BIND
Remaining1
30
Two-hybrid assay
1.
3.
2.
4.
31
Two-hybrid assay
1.
3.
2.
4.
32
Two-hybrid assay
1.
B
3.
A
2.
4.
33
Two-hybrid assay
1.
B
3.
A
2.
4.
34
Two-hybrid assay
1.
SNF4
B
SNF1
3.
A
2.
GAL4-DBD
Transcription activation domain
UASG
4.
Fields S. Song O. Nature. 1989 Jul
20340(6230)245-6. PMID 2547163
GAL1
Allows growth on galactose
35
Some Two-hybrid caveats
1.
3.
A
2.
4.
Does the DBD-fusion have activity by itself?
36
Some Two-hybrid caveats
1.
A
3.
B
2.
4.
Is the interaction bi-directional?
37
Some Two-hybrid caveats
1.
B
C
3.
A
2.
4.
Is the interaction mediated by some other
protein?
38
Some Two-hybrid questions
1.
B
3.
A
2.
Are the proteins expresssed?Are they
over-expressed?Are they in-frame?Are the
interacting domains defined?Was the observation
reproducible?Was the strength of interaction
significant?Was another method used to back-up
the conclusion? Are the two proteins from the
same compartment?
4.
39
Two-hybrid assay
1.
A
3.
B
2.
4.
Negative results dont mean a lot.
40
Affinity purification
A
this molecule will bind the tag.
tag modification(e.g. HA/GST/His)
Protein of interest
41
Affinity purification
the cell
A
42
Affinity purification
lots of other untagged proteins
the cell
A
B
naturally binding protein
43
Affinity purification
Ruptured membranes
A
B
cell extract
44
Affinity purification
A
B
untagged proteinsgo through fastest(flow-through
)
45
Affinity purification
A
B
tagged complexes are slower and come out later
(eluate)
46
Some affinity purification questions
Is the bait protein expressed and in frame? Is
the bait protein observed?Is the bait protein
over-expressed?Are the interacting domains
defined?Was the observation reproducible?Was
the interactor found in the background?Was the
strength of interaction significant? Was the
interaction saturable? Was the interactor
stoichiometric with the bait protein?Was another
method used to back-up the conclusion?Was
tandem-affinity purification (TAP) used? Was the
interaction shown using an extract or a purified
protein? Is the inverse interaction
observable? Are the two proteins from the same
compartment? Are the two proteins known to be
involved in the same process? Is the interactor
likely to be physiologically significant?
A
B
47
Some affinity purification caveats
First and most importantly, this is only a
representation of the observation. You can only
tell what proteins are in the eluate you cant
tell how they are connected to one another. If
there is only one other protein present (B), then
its likely that A and B are directly
interacting. But, what if I told you that
two other proteins (B and C) were present along
with A.
A
B
A
C
B
48
Complexes with unknown topology
A
A
A
B
C
B
C
B
C
Which of these models is correct? The complex
described by this experimental result is said to
have an Unknown Topology.
49
Complexes with unknown stoichiometry
A
A
B
C
Heres another possibility? The complex described
by this experimental result is also said to have
Unknown Stoichiometry.
50
High throughput data in BIND

Affinity purificationSystematic identification
of protein complexes in Saccharomyces cerevisiae
by mass spectrometry (2002). PMID 11805837
Two-hybridA protein interaction map of
Drosophila Melanogaster(2003). PMID 14605208
Two-hybrid and Affinity purificationA map of
the interactome network of the metazoan C.
Elegans (2004). PMID 14704431
Data from these examples can be retrieved from
BIND using a PMID search.

51
How complex data are stored in BIND.
A
?
B
?
Three interaction records.
C
?
52
How complex data are stored in BIND.
A
?
A complex record in BIND is simply a collection
of interaction records.
B
?
C
?
53
How complex data are stored in BIND.
A
?
A complex record in BIND is simply a collection
of interaction records.
B
?
C
?
54
Alternate representations.
A
?
A
B
B
C
?
The matrix model (a clique).
C
?
55
Alternate representations.
A
?
A
B
B
C
?
The spoke model. Which model to use?
C
?
56
Spoke and Matrix Models

Vrp1 (bait), Las17, Rad51, Sla1, Tfp1, Ypt7

Possible Actual Topology
Spoke
Matrix
Theoretical max. number of interactions, but many
FPs
Simple model Intuitive, more accurate, but
canmisrepresent.
BaderHogue Nature Biotech. 2002 Oct 20(10)991-7
57
A view on real datamatrix model(seems hopeless)
6 redox enzymes
7 redox enzymes
Old yellow enzyme Function?
58
OYE has little small molecule specificity,
unlike all other redox enzymes
The crystal structure shows a large surface near
its reactivesite, unlike other similar
proteins. So is its substrate protein? Other
redox enzymes? Solution Go do an experiment!
59
Predicting Interaction Information

Very often the best result of a Bioinformatics
investigation is the suggestion of a specific
experiment, that wasnt previously considered.
Often very hard to get a scientist to try an
experiment.
Negative results arent publishable risk to the
experimentalist that they are wasting their
time/resources!
Narrowing down the vast space of possible
interactions is important
Approx. 36,000,000 pairs of testable
protein-protein interactions in yeast.
Important to use all the information at hand and
to demonstrate to the experimentalist that you
have reduced (not increased or left-unchanged)
their risk of failure.

60
1. How do we predict/validate interactions? 2.
How do we locate specific binding sites?

Functional annotation (imprecise for 2)
Matching sequence features to patterns
PSSMs
Domain-small Molecule Interactions (SMID-BLAST)
Domain-motif interactions
3D Docking
slow
need 3D models
Energy scoring functions are imprecise

61
Motif-Domain Interactions

Protein interactions play a crucial role in
driving many important cellular processes such as
intra-cellular signaling, transcription
regulation, cell cycle regulation, and metabolic
activities.
Many of the interactions are mediated by
conserved domains binding to short sequence
motifs that form peptide recognition modules.
Only a small number of domains have known binding
motifs.

62
SH3 domain and Pro-rich Motif
63
High-throughput protein complex identification
Ho et al Nature 415, 180 - 183 (10 Jan
2002) HMS-PCI dataset
Gavin et al Nature 415, 141 - 147 (10 Jan 2002)
TAP dataset
64
Rho family GTPase Interactions
Extract Motifs from 3D Structures Criteria Non
-domain polypeptides
65
Gibbs Sampling

Gibbs sampling is a stochastic Markov Chain Monte
Carlo algorithm
Used for motif-discovery proteins
Widely used for the identification transcription
factors binding sites Lawrence et al., 1993,
Neuwald et al., 1995.
Gibbs sampling allows for the incorporation of
prior knowledge about the motif composition.

66
Seed and Focus Procedure

Gibbs sampling is sensitive to database size.
On a sufficiently large database, almost any
motif could be found.
Most motifs found with this approach were found
before databases got big from genomics
SEED the Gibbs sampler with the 3D structure
motif
Focus the Gibbs sampler groups of interacting
sequences found in complexes with the domain
smaller database
If the motif is real it should be enriched
otherwise it should disappear

67
Focused sequences from yeast complexes
containing RhoGAP.
Input to Gibbs Sampler Motifs from 3D structure
SEED Database of all proteins from HTP
complexes in yeast that have RhoGAP domains
68
4 Motif descriptions 4 PSSMs
QEDYXR
YVPXVP
QEDYXRLXXL
YXPXXF
69
Use PSSMs to Identify Motifs

Constrain to the HTP complexes (next slide).
Good enough to get the attention of an
experimentalist!
Try on all yeast genes
18,459 raw pssm-based predictions (scores vary)
No compartmentalization or other information
considered
Match 623 literature validated predictions.
Probability of predicting by random chance is
1.6e-53.

70
Predicted RhoGAP interactions
M. Tyers did the validation. Using a standard
flag-pull down - then a more sensitive myc
double-tag pull-down. 11 Validated interactions
(colored) to match 4 motifs
71
High-throughput protein complex identification
Ho et al Nature 415, 180 - 183 (10 Jan
2002) HMS-PCI dataset
Gavin et al Nature 415, 141 - 147 (10 Jan 2002)
TAP dataset
72
Domain-Motif TAP network hits.
73
Domain-motif HMS-PCI network hits.
Significantly more Domain-Motif hits than in the
TAP dataset. Over-expressed proteins used in
this approach may be more sentitive to transient
or low-copy number domain-motif interactions. Or
the baits selected contain more domain-motif
interactions in their respective networks
74
A tea cup in a rainstorm

2000 elemental observations (facts) about
molecular assembly published in the literature
every month
2600 High Throughput Interactions published
every month with high rates of false positives.
200,000 facts sitting in the literature on
library shelves, not validated.

75
Ontologies for Pathways Interactions and
Signaling

An emerging consensus that may help you
(someday)

76
The domain Biological pathways
Main categories
Metabolic Pathways
Molecular Interaction Networks
Signaling Pathways
77
Ontology

ltphilosophygt A systematic account of Existence.
ltartificial intelligencegt (From philosophy) An
explicit formal specification of how to represent
the objects, concepts and other entities that are
assumed to exist in some area of interest and the
relationships that hold among them.
ltinformation sciencegt The hierarchical
structuring of knowledge about things by
subcategorising them according to their essential
(or at least relevant and/or cognitive)
qualities. This is an extension of the previous
senses of "ontology" (above) which has become
common in discussions about the difficulty of
maintaining subject indices. The philosophy of
indexing everything in existence?

78
Ontology redux

An ontology is a choice of a system of data
grammar together with specific controlled
vocabularies and an organizational framework to
contain data.
Ontologies are used in practice to describe how
to exchange data faithfully between computers,
not how to compute with them!
An Ontology may be used to Archive information or
to make information available to applications
(API).

79
Parsing - Summary

Parsing flatfiles is instructive to understand
how biological data is stored and used.
Most bioinformaticians in small academic groups
write their own parsers and work with small
batches of computations.
Data Grammars and automatically generated parsers
are efficient and often error free.
Most database organizations and software
developers with large audiences use data grammar
approaches.
Semantic approaches (OWL) are beginning to emerge.

80
BioPAX

BioPAX Biological PAthway eXchange
A data exchange ontology and format for semantic
integration, aggregation and inference of
biological pathway data
Open source community effort the community
agreed upon and built this!
www.biopax.org

81
BioPAX Ontology Overview
Level 1 v1.0 (July 7th, 2004)
82
The domain Biological pathways
Main categories
Metabolic Pathways
Molecular Interaction Networks
Signaling Pathways
83
Aggregation, Integration, Inference

Multiple kinds of pathway databases
metabolic
molecular interactions
signal transduction
gene regulatory
Constructs designed for integration
DB References
XRefs (Publication, Unification, Relationship)
Synonyms
Provenance (not yet implemented)
OWL DL to enable reasoning

84
BioPAX uses other ontologies

Conceptual framework based upon existing DB
schemas
aMAZE, BIND, EcoCyc, WIT, KEGG, Reactome, etc.
Allows wide range of detail, multiple levels of
abstraction
Uses pointers to existing ontologies to provide
supplemental annotation where appropriate
Cellular location ? GO Component
Cell type ? Cell.obo
Organism ? NCBI taxon DB
Incorporate other standards where appropriate
Chemical structure ? SMILES, CML, INCHI
Interoperate with existing standards (RDF/OWL,
LSID, SBML, PSI, CellML Metadata Standard)

85
Case study BioPAX in SBML facilitates SMBL
integration

Addresses SBMLs nasty data integration issues
Different data types, same representation
Same data, different representations
External references
Synonyms
Provenance

86
BioPAX Ontology Overview
species
reaction
modifier
Level 1 v1.0 (July 7th, 2004)
87
Different data types, same representation

Protein-Protein Interaction
ltreaction
idpyruvate_dehydrogenase_cplx/gt
ltlistOfReactantsgt
ltspeciesRef speciesPdhA/gt
ltspeciesRef speciesPdhB/gt
lt/listOfReactantsgt
ltlistOfProductsgt
ltspeciesRef speciesPyruvate_dehydrogenase_E1
/gt
lt/listOfProductsgt
lt/reactiongt

Biochemical Reaction ltreaction
idpyruvate_dehydrogenase_rxn/gt
ltlistOfReactantsgt ltspeciesRef
speciesNADP/gt ltspeciesRef speciesCoA/gt
ltspeciesRef speciespyruvate/gt
lt/listOfReactantsgt ltlistOfProductsgt
ltspeciesRef speciesNADPH/gt ltspeciesRef
speciesacetyl-CoA/gt ltspeciesRef
speciesCO2/gt lt/listOfProductsgt
ltlistOfModifersgt ltmodifierSpeciesRef
speciespyruvate_dehydrogenase_E1/gt
lt/listOfModifiersgt lt/reactiongt
88
BioPAX solution metadata

ltsbml xmlnsbphttp//www.biopax.org/release1/bio
pax-release1.owl
xmlnsowl"http//www.w3.org/2002/07/owl"
xmlnsrdf"http//www.w3.org/1999/02/22-rdf
-syntax-ns"gt
ltlistOfSpeciesgt
ltspecies idPdhA metaidPdhAgt
ltannotationgt
ltbpprotein rdfIDPdhA/gt
lt/annotationgt
lt/speciesgt
ltspecies idNADP metaidNADPgt
ltannotationgt
ltbpsmallMolecule rdfIDNADP/gt
lt/annotationgt
lt/listOfSpeciesgt
ltlistOfReactionsgt
ltreaction idpyruvate_dehydrogenase_cplxgt
ltannotationgt
ltbpcomplexAssembly rdfIDpyruvate_dehydrog
enase_cplx/gt
lt/annotationgt

89
BioPAX External References

ltspecies idpyruvate metaidpyruvategt
ltannotation
xmlnsbphttp//biopax.org/release1/biopax-r
elease1.owlgt
ltbpsmallMolecule rdfIDpyruvategt
ltbpXrefgt
ltbpunificationXref
rdfIDunificationXref119"gt
ltbpDBgtLIGANDlt/bpDBgt
ltbpIDgtc00022lt/bpIDgt
lt/bpunificationXrefgt
lt/bpXrefgt
lt/bpsmallMoleculegt
lt/annotationgt
lt/speciesgt

90
BioPAX Synonyms

ltspecies idpyruvate metaidpyruvategt
ltannotation xmlnsbphttp//biopax.org/release1/b
iopax_release1.owl/gt
ltbpsmallMolecule rdfIDpyruvate gt
ltbpSYNONYMSgtpyroracemic acidlt/bpSYNONYMSgt
ltbpSYNONYMSgt2-oxo-propionic
acidlt/bpSYNONYMSgt
ltbpSYNONYMSgtalpha-ketopropionic
acidlt/bpSYNONYMSgt
ltbpSYNONYMSgt2-oxopropanoatelt/bpSYNONYMSgt
ltbpSYNONYMSgt2-oxopropanoic acidlt/bpSYNONYMSgt
ltbpSYNONYMSgtBTSlt/bpSYNONYMSgt
ltbpSYNONYMSgtpyruvic acidlt/bpSYNONYMSgt
lt/bpsmallMoleculegt
lt/annotationgt
lt/speciesgt

Write a Comment

User Comments (0)

About PowerShow.com

Interactions and Ontologies - PowerPoint PPT Presentation

Interactions and Ontologies

BRENDA. BRITE - Biomolecular Relations in Information Transmission and Expression ... Fields S. Song O. Nature. 1989 Jul 20;340(6230):245-6. PMID: 2547163 ... – PowerPoint PPT presentation