Interactions and Ontologies - PowerPoint PPT Presentation

1 / 86
About This Presentation
Title:

Interactions and Ontologies

Description:

BRENDA. BRITE - Biomolecular Relations in Information Transmission and Expression ... Fields S. Song O. Nature. 1989 Jul 20;340(6230):245-6. PMID: 2547163 ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 87
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: Interactions and Ontologies


1
Interactions and Ontologies

CBW Bioinformatics Workshop February 24th 2005,
Vancouver Christopher Hogue The Blueprint
Initiative
2
The Blueprint Initiative
  • Develop, curate and maintain the Biomolecular
    Interaction Network Database (BIND) and Related
    Tools
  • Headquarters in Toronto
  • Blueprint Asia Ltd. Pte. in Singapore
  • Public good research program funded by Canadian
    and Singaporean Government
  • Data and software is freely available
  • like GenBank

3
blueprint.org
4
About this talk
  • The OECD Ministerial Declaration of 2004
  • Interoperability, Standards and Systems - A
    Historic Perspective
  • Understanding Biomolecular Function
  • A BIND Interaction Record
  • Interaction and Reaction Data Models
  • Interaction Experiments
  • Yeast Two Hybrid, Affinity Purification and False
    Positives
  • Spoke and Matrix models for complexes of Unknown
    Topology
  • Ontologies

5
Organisation for Economic Co-operation and
Development (oecd.org)
  • DECLARATION ON ACCESS TO RESEARCH DATA FROM
    PUBLIC FUNDING
  • adopted on 30 January 2004 in ParisThe
    governments (1)  of Australia, Austria, Belgium,
    Canada, China, the Czech Republic, Denmark,
    Finland, France, Germany, Greece, Hungary,
    Iceland, Ireland, Israel, Italy, Japan, Korea,
    Luxembourg, Mexico, the Netherlands, New Zealand,
    Norway, Poland, Portugal, the Russian Federation,
    the Slovak Republic, the Republic of South
    Africa, Spain, Sweden, Switzerland, Turkey, the
    United Kingdom, and the United States

6
Declare their commitment to
  • Work towards the establishment of access regimes
    for digital research data from public funding in
    accordance with the following objectives and
    principles
  • Openness
  • Transparency
  • Legal Conformity
  • Formal Responsibility
  • Professionalism
  • IP Protection
  • Interoperability
  • Quality and Security
  • Efficiency
  • Accountability

7
Bioinformatics Research is Orthogonal to Biology
Research
  • All scientists are rewarded by publishing papers.
  • 300 year old tradition
  • Bioinformatics papers focus on algorithm
    development and improvement
  • the last 2 improvement in sensitivity/specificity
  • Discovery research needs implementations of
    algorithms that work together
  • interoperability
  • Interoperability is only now becoming
    publishable

8
Interaction Databases
  • Aminoacyl-tRNA Synthetases Database
  • ASEdb - Alanine Scanning Energetics database
  • BBID - Biological Biochemical Image Database
  • BIND - Biomolecular Interaction Network Database
  • BindingDB - The Binding Database
  • Biocarta
  • Biocatalysis/Biodegradation Database
  • BioPathways Consortium
  • BRENDA
  • BRITE - Biomolecular Relations in Information
    Transmission and Expression
  • COMPEL (Composite Regulatory Elements)
  • COPE - Cytokines Online Pathfinder Encyclopaedia
  • CSNDB - Cell Signaling Networks Database / CSNDB
    Paper
  • Curagen Pathcalling
  • DIP - Database of Interacting Proteins
  • DPInteract - DNA-protein interactions
  • DRC - Database of Ribosomal Crosslinks
  • Ecocyc and Metacyc
  • Dynamic Signaling Maps
  • JenPep Immunology MHC-peptide database
  • KEGG - Kyoto Encyclopedia of Genes and Genomes
  • Kohn Molecular Interaction Maps
  • MDB - Metalloprotein Database and Browser
  • MHCPEP - A database of MHC binding peptides
  • MINT - a database of Molecular INTeractions
  • MIPS Yeast Genome Database
  • MMDB - Molecular Modeling Database
  • NetBiochem Welcome Page
  • ooTFD - object-oriented Transcription Factors
    Database)
  • ORDB - Olfactory Receptor Database
  • PATIKA - Pathway Analysis Tool for Integration
    and Knowledge Acquisition
  • PFBP - Protein Function and Biochemical Pathways
    Project
  • PhosphoBase - A database of phosphorylation sites
  • PIM (Protein Interaction Map)
  • PIMdb - Drosophila Protein Interaction Map
    database
  • PKR - Protein Kinase Resource
  • ProChart Database (at AxCell Biosciences)
  • ProNet Online - Protein Interactions on the Web
    (Myriad)

9
Over 50? Why So Many?
  • Easy to build a simple Interaction Database.
  • A Simple Abstraction. Many Projects cutting
    their teeth in Bioinformatics
  • Conceptually this list includes Biochemical
    Pathways (reactions interactions)
  • Also includes transcription factors, tRNA
    synthetases, etc, all of which can fall into a
    general biomolecular binding description.
  • Many Niches to Fill
  • Kinetics
  • Organism centric
  • Protein-protein centric
  • Most are not funded for a large-scale service

10
How do we make things interoperate?What is in a
Standard?A Historical Perspective
  • Standards emerge from successful implementations
    of complete systems.
  • Which one is the standard The light bulb
    or the electrical grid?
  • Lamps were the original killer app.
  • (bye-bye candles, gas lamps, oil lamps)
  • Other Apps Motors, Heaters, Toasters
  • Unexpected Apps radio, TV, transformers,
    computers, rechargables
  • Entire systems become standards via ad-hoc and
    popular use snowball effect.

11
Emergence and evolution of technological systems
  • Systems emerge across broad frontiers
  • Lots of small inventions are responsible for
    emerging technologies.
  • Portions of the frontier that are held back
    become the focus of intense innovation
  • Called a reverse salient by students of
    technology
  • An inadequately functioning or accessible
    component in a complex system of components
  • Opportunities for invention and replacement

12
Reverse Salient AC/DC Example
  • 1882 Edisons DC standard lit up Wall Street
  • High-level buy-in for DC.
  • AC was too complicated, could kill a person!
  • Edisons DC system only worked over short-range.
  • This flaw is the reverse salient.
  • Westinghouse/Stanley/Tesla saw the flaw in this
    standard
  • AC technology raced to fill the gap.
  • Light bulbs work with both AC or DC.
  • Motors required re-invention
  • E.S. Rogers batteryless radio

1925
13
Reverse Salient AC/DC Example
  • Result Cars, Battery based devices emerged with
    DC.
  • Result The electrical Grid emerged with AC.

NOT A WINNER-TAKE-ALL (zero-sum game) RESULT!
14
A few reverse salients in Bioinformatics
  • Inadequately Functioning
  • Integration of Structure and Sequence
  • Integration of chemoinformatics with
    bioinformatics
  • Mapping of microarray data to pathways
  • Integration of interactions and pathwyas
  • Inaccessable
  • Carbohydrate representation and analysis tools
  • Advanced, ad-hoc text mining tools

15
Reverse Salient Attitudes
  • What holds us back?
  • Oversights (didnt think of that!).
  • Shortsightedness (wont ever need that!).
  • Inability (cant do it!)
  • Stubbornness (wont do it!)
  • Prescriptivism (do it like this!)
  • Nationalism, Continentalism, Colonialism
  • (because thats the way we do it here!)
  • 110 vs 220

16
Understanding Biomolecular Function
  • "I yam what I yam and that's all that I yam.
  • - Popeye the sailor man, the worlds first comic
    book superhero

17
Biomolecular function
E S gt E P
  • This is a generalization of how a biochemist
    might represent the function of enzymes.

18
Biomolecular function
E S gt E P kinase-ATP complex
inactive-enzyme gt Kinase ADP active
enzyme
K
P
ATP
ADP
  • Here is an example of the generalization
    represented two different ways.

19
Biomolecular function
Kinase-ATPcomplex
inactiveenzyme
Activeenzyme
ADP
  • This is another representation.

20
Biomolecular function
A
B
C
D
E
F
  • This is a generalization of the representation.

21
Biomolecular function
A
B
C
D
E
F
  • A biomolecules function can be defined by the
    things that it interacts with and the new (or
    altered) molecules that result from that
    interaction.

22
Biomolecular function
A
B
C
D
E
n
  • This representation makes it easy to focus on the
    interaction part.

23
Biomolecular function
A
B
C
D
E
n
  • This also happens to represent the BIND data
    model.

24
A simple BIND record
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference
  • The minimal BIND record has 9 pieces of
    information.

25
A curated BIND record
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference
  • The curated BIND record may have many more pieces
    of information.

26
An example BIND record
A
B
1. INAD 2. TRP3. Protein 4. Protein 5.
GenBank GI 3641615 6. GenBank GI 73018617.
GenBank Taxonomy ID 7227 8. GenBank Taxonomy ID
7227 9. PubMed ID 8630257
  • You can view this record in BIND

27
BIND stores molecular interaction data
28
(No Transcript)
29
http//bind.ca
  • Enter 188 (the BIND record number) in the
    Identifier search box

30
(No Transcript)
31
BIND records are observations
A
B
1. Short label for A 2. Short label for B3.
Molecule type for A 4. Molecule type for B 5.
Database reference for A 6. Database reference
for B7. Where A comes from 8. Where B comes
from 9. Publication reference
  • All BIND records will have a publication
    reference and most will specifically list a
    method(s) used to demonstrate the interaction.

32
(No Transcript)
33
Methods used to detect interactions.
  • A great deal of interaction data in BIND
    originates from high-throughput experiments
    designed to detect interactions between
    proteins.
  • The most common methods are
  • Two-hybrid assay
  • Affinity purification

34
Interaction Experimental Evidence in BIND
Remaining1
35
Two-hybrid assay
1.
3.
2.
4.
36
Two-hybrid assay
1.
3.
2.
4.
37
Two-hybrid assay
1.
B
3.
A
2.
4.
38
Two-hybrid assay
1.
B
3.
A
2.
4.
39
Two-hybrid assay
1.
SNF4
B
SNF1
3.
A
2.
GAL4-DBD
Transcription activation domain
UASG
4.
Fields S. Song O. Nature. 1989 Jul
20340(6230)245-6. PMID 2547163
GAL1
Allows growth on galactose
40
Some Two-hybrid caveats
1.
3.
A
2.
4.
Does the DBD-fusion have activity by itself?
41
Some Two-hybrid caveats
1.
A
3.
B
2.
4.
Is the interaction bi-directional?
42
Some Two-hybrid caveats
1.
B
C
3.
A
2.
4.
Is the interaction mediated by some other
protein?
43
Some Two-hybrid questions
1.
B
3.
A
2.
Are the proteins expresssed?Are they
over-expressed?Are they in-frame?Are the
interacting domains defined?Was the observation
reproducible?Was the strength of interaction
significant?Was another method used to back-up
the conclusion? Are the two proteins from the
same compartment?
4.
44
Two-hybrid assay
1.
A
3.
B
2.
4.
Negative results dont mean a lot.
45
Affinity purification
A
this molecule will bind the tag.
tag modification(e.g. HA/GST/His)
Protein of interest
46
Affinity purification
the cell
A
47
Affinity purification
lots of other untagged proteins
the cell
A
B
naturally binding protein
48
Affinity purification
Ruptured membranes
A
B
cell extract
49
Affinity purification
A
B
untagged proteinsgo through fastest(flow-through
)
50
Affinity purification
A
B
tagged complexes are slower and come out later
(eluate)
51
Some affinity purification questions
Is the bait protein expressed and in frame? Is
the bait protein observed?Is the bait protein
over-expressed?Are the interacting domains
defined?Was the observation reproducible?Was
the interactor found in the background?Was the
strength of interaction significant? Was the
interaction saturable? Was the interactor
stoichiometric with the bait protein?Was another
method used to back-up the conclusion?Was
tandem-affinity purification (TAP) used? Was the
interaction shown using an extract or a purified
protein? Is the inverse interaction
observable? Are the two proteins from the same
compartment? Are the two proteins known to be
involved in the same process? Is the interactor
likely to be physiologically significant?
A
B
52
Some affinity purification caveats
First and most importantly, this is only a
representation of the observation. You can only
tell what proteins are in the eluate you cant
tell how they are connected to one another. If
there is only one other protein present (B), then
its likely that A and B are directly
interacting. But, what if I told you that
two other proteins (B and C) were present along
with A.
A
B
A
C
B
53
Complexes with unknown topology
A
A
A
B
C
B
C
B
C
Which of these models is correct? The complex
described by this experimental result is said to
have an Unknown Topology.
54
Complexes with unknown stoichiometry
A
A
B
C
Heres another possibility? The complex described
by this experimental result is also said to have
Unknown Stoichiometry.
55
High throughput data in BIND
  • Affinity purificationSystematic identification
    of protein complexes in Saccharomyces cerevisiae
    by mass spectrometry (2002). PMID 11805837
  • Two-hybridA protein interaction map of
    Drosophila Melanogaster(2003). PMID 14605208
  • Two-hybrid and Affinity purificationA map of
    the interactome network of the metazoan C.
    Elegans (2004). PMID 14704431
  • Data from these examples can be retrieved from
    BIND using a PMID search.

56
How complex data are stored in BIND.
A
?
B
?
Three interaction records.
C
?
57
How complex data are stored in BIND.
A
?
A complex record in BIND is simply a collection
of interaction records.
B
?
C
?
58
How complex data are stored in BIND.
A
?
A complex record in BIND is simply a collection
of interaction records.
B
?
C
?
59
Alternate representations.
A
?
A
B
B
C
?
The matrix model (a clique).
C
?
60
Alternate representations.
A
?
A
B
B
C
?
The spoke model. Which model to use?
C
?
61
Spoke and Matrix Models
  • Vrp1 (bait), Las17, Rad51, Sla1, Tfp1, Ypt7

Possible Actual Topology
Spoke
Matrix
Theoretical max. number of interactions, but many
FPs
Simple model Intuitive, more accurate, but
canmisrepresent.
BaderHogue Nature Biotech. 2002 Oct 20(10)991-7
62
A view on real datamatrix model(seems hopeless)
6 redox enzymes
7 redox enzymes
Old yellow enzyme Function?
63
OYE has little small molecule specificity,
unlike all other redox enzymes
The crystal structure shows a large surface near
its reactivesite, unlike other similar
proteins. So is its substrate protein? Other
redox enzymes?
64
A tea cup in a rainstorm
  • 2000 elemental observations (facts) about
    molecular assembly published in the literature
    every month
  • 2600 High Throughput Interactions published
    every month with high rates of false positives.
  • 200,000 facts sitting in the literature on
    library shelves, not validated.

65
Ontology
  • ltphilosophygt A systematic account of Existence.
  • ltartificial intelligencegt (From philosophy) An
    explicit formal specification of how to represent
    the objects, concepts and other entities that are
    assumed to exist in some area of interest and the
    relationships that hold among them.
  • ltinformation sciencegt The hierarchical
    structuring of knowledge about things by
    subcategorising them according to their essential
    (or at least relevant and/or cognitive)
    qualities. This is an extension of the previous
    senses of "ontology" (above) which has become
    common in discussions about the difficulty of
    maintaining subject indices. The philosophy of
    indexing everything in existence?

66
Ontology redux
  • An ontology is a choice of a system of data
    grammar together with specific controlled
    vocabularies and an organizational framework to
    contain data.
  • Ontologies are used in practice to describe how
    to exchange data faithfully between computers,
    not how to compute with them!
  • An Ontology may be used to Archive information or
    to make information available to applications
    (API).

67
Parsing - Summary
  • Parsing flatfiles is instructive to understand
    how biological data is stored and used.
  • Most bioinformaticians in small academic groups
    write their own parsers and work with small
    batches of computations.
  • Data Grammars and automatically generated parsers
    are efficient and often error free.
  • Most database organizations and software
    developers with large audiences use data grammar
    approaches.
  • Semantic approaches (OWL) are beginning to emerge.

68
Matching and Finding Strings
  • Biologists use language in a very expansive
    manner, consider the lowly calcium ion
  • Calcium, Ca2, Ca2, Ca, calcium (II), Ca(II)
  • Given a database with descriptive text how do
    you find each and every record in a database that
    has calcium?
  • Search for each form of the string Ca AND
    calcium
  • Use a regular expression? CA

69
Controlled Vocabularies
  • Fix the database so that only one form of the
    string is used.
  • control the use of vocabulary in descriptive text
    only use calcium, prohibit the other forms
  • Called a synonym constrained controlled
    vocabulary
  • Requires that people doing data entry use
    selected words
  • Requires the person making a query know what form
    of the word is used in the database.

70
Controlled Vocabularies
  • Add a field to the database and store the atomic
    number of any elements described
  • Atomic number is a unique identifier for calcium.
  • This enables searching the database by element.
  • Periodic table defines the unique identifying
    number
  • Called a numerically controlled vocabulary
  • Requires that numbers representing words be added
  • Allows searching by code number

71
Selected Unique Identifiers in Biological Data
72
(No Transcript)
73
Unique Identifier Use
  • Use a single unique identifier to get a specific
    data entry from a database
  • Use a list of unique identifiers to manage a
    collection of data from the database
  • Use of list of identifiers to keep track of GO
    terms you are interested in.
  • Search databases using Unique Identifiers!

74
List of Protein GIs for TrpRS protein hits
20178136 20178135 20178132 20178128 20178126
20178125 20139932 17367600 6226200 135188
20482401 14754335 21362967 17864462 417846 135189
18144292 18309615 15829216 6226201 1174553
1174552 21362962 20178139 20178138 20178133
20178131 20178130 20178129 20178127 20178124
20178123 20178122 17367824 16974813 16974812
16974811 7994694 135191 13878796 7994695 7994693
7994692 6226203 2501073 13431912 11134974 8039807
8039806 6226202 6094418 3915079 3122910 3122904
2501071 2501070 2501069 1711656 1351182 135187
14090160 7674347 2851538 2501074 1174551 417845
1754770 1754768
This describes a complete collection of sequences
75
BioPAX
  • BioPAX Biological PAthway eXchange
  • A data exchange ontology and format for semantic
    integration, aggregation and inference of
    biological pathway data
  • Open source community effort the community
    agreed upon and built this!
  • www.biopax.org

76
BioPAX Ontology Overview
Level 1 v1.0 (July 7th, 2004)
77
The domain Biological pathways
Main categories
Metabolic Pathways
Molecular Interaction Networks
Signaling Pathways
78
Aggregation, Integration, Inference
  • Multiple kinds of pathway databases
  • metabolic
  • molecular interactions
  • signal transduction
  • gene regulatory
  • Constructs designed for integration
  • DB References
  • XRefs (Publication, Unification, Relationship)
  • Synonyms
  • Provenance (not yet implemented)
  • OWL DL to enable reasoning

79
BioPAX uses other ontologies
  • Conceptual framework based upon existing DB
    schemas
  • aMAZE, BIND, EcoCyc, WIT, KEGG, Reactome, etc.
  • Allows wide range of detail, multiple levels of
    abstraction
  • Uses pointers to existing ontologies to provide
    supplemental annotation where appropriate
  • Cellular location ? GO Component
  • Cell type ? Cell.obo
  • Organism ? NCBI taxon DB
  • Incorporate other standards where appropriate
  • Chemical structure ? SMILES, CML, INCHI
  • Interoperate with existing standards (RDF/OWL,
    LSID, SBML, PSI, CellML Metadata Standard)

80
Case study BioPAX in SBML facilitates SMBL
integration
  • Addresses SBMLs nasty data integration issues
  • Different data types, same representation
  • Same data, different representations
  • External references
  • Synonyms
  • Provenance

81
BioPAX Ontology Overview
species
reaction
modifier
Level 1 v1.0 (July 7th, 2004)
82
Different data types, same representation
  • Protein-Protein Interaction
  • ltreaction
  • idpyruvate_dehydrogenase_cplx/gt
  • ltlistOfReactantsgt
  • ltspeciesRef speciesPdhA/gt
  • ltspeciesRef speciesPdhB/gt
  • lt/listOfReactantsgt
  • ltlistOfProductsgt
  • ltspeciesRef speciesPyruvate_dehydrogenase_E1
    /gt
  • lt/listOfProductsgt
  • lt/reactiongt

Biochemical Reaction ltreaction
idpyruvate_dehydrogenase_rxn/gt
ltlistOfReactantsgt ltspeciesRef
speciesNADP/gt ltspeciesRef speciesCoA/gt
ltspeciesRef speciespyruvate/gt
lt/listOfReactantsgt ltlistOfProductsgt
ltspeciesRef speciesNADPH/gt ltspeciesRef
speciesacetyl-CoA/gt ltspeciesRef
speciesCO2/gt lt/listOfProductsgt
ltlistOfModifersgt ltmodifierSpeciesRef
speciespyruvate_dehydrogenase_E1/gt
lt/listOfModifiersgt lt/reactiongt
83
BioPAX solution metadata
  • ltsbml xmlnsbphttp//www.biopax.org/release1/bio
    pax-release1.owl
  • xmlnsowl"http//www.w3.org/2002/07/owl"
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf
    -syntax-ns"gt
  • ltlistOfSpeciesgt
  • ltspecies idPdhA metaidPdhAgt
  • ltannotationgt
  • ltbpprotein rdfIDPdhA/gt
  • lt/annotationgt
  • lt/speciesgt
  • ltspecies idNADP metaidNADPgt
  • ltannotationgt
  • ltbpsmallMolecule rdfIDNADP/gt
  • lt/annotationgt
  • lt/listOfSpeciesgt
  • ltlistOfReactionsgt
  • ltreaction idpyruvate_dehydrogenase_cplxgt
  • ltannotationgt
  • ltbpcomplexAssembly rdfIDpyruvate_dehydrog
    enase_cplx/gt
  • lt/annotationgt

84
BioPAX External References
  • ltspecies idpyruvate metaidpyruvategt
  • ltannotation
  • xmlnsbphttp//biopax.org/release1/biopax-r
    elease1.owlgt
  • ltbpsmallMolecule rdfIDpyruvategt
  • ltbpXrefgt
  • ltbpunificationXref
    rdfIDunificationXref119"gt
  • ltbpDBgtLIGANDlt/bpDBgt
  • ltbpIDgtc00022lt/bpIDgt
  • lt/bpunificationXrefgt
  • lt/bpXrefgt
  • lt/bpsmallMoleculegt
  • lt/annotationgt
  • lt/speciesgt

85
BioPAX Synonyms
  • ltspecies idpyruvate metaidpyruvategt
  • ltannotation xmlnsbphttp//biopax.org/release1/b
    iopax_release1.owl/gt
  • ltbpsmallMolecule rdfIDpyruvate gt
  • ltbpSYNONYMSgtpyroracemic acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgt2-oxo-propionic
    acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgtalpha-ketopropionic
    acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgt2-oxopropanoatelt/bpSYNONYMSgt
  • ltbpSYNONYMSgt2-oxopropanoic acidlt/bpSYNONYMSgt
  • ltbpSYNONYMSgtBTSlt/bpSYNONYMSgt
  • ltbpSYNONYMSgtpyruvic acidlt/bpSYNONYMSgt
  • lt/bpsmallMoleculegt
  • lt/annotationgt
  • lt/speciesgt

86
A comprehensive list of BioPAX supporting
Applications (Feb 2005)
Write a Comment
User Comments (0)
About PowerShow.com