Title: Bioinformatics%20in%20the%2090
1Bioinformatics in the 90s
- Origins data storage needs related to the
sequencing effort... - but storage was hardly enough additional
needs - Assembly, comparison and annotation of sequences
- Prediction of genes
- Reconstruction of evolutionary trees
- Modelisation prediction of 3D structures
- ...
- IT on-line databases and software tools
- Science modeling, computational
representations, algorithms
2The post-genomic phase transition
- Availability of complete genome sequences
- High-throughput experimental techniques yield new
types of results - SNP (Single Nucleotide Polymorphisms)
- mRNA expression levels (DNA chips)
- Systematic determination of 3D structure
- Protein expression levels
- Protein- protein interactions
- Systematic mutagenesis
- ...
- New needs opportunities
- Processing and analysis of each type of data
- Integration of heterogeneous data
- Reconstruction and simulation of cellular
mechanisms
3Corporate Information
Founded December 1997 Headquarters and
laboratories Central Paris Employees 60
as of end 2001 Intellectual Property 57
patents on technology, interaction
s and targets Equity raised c. ?30
million Ownership Advent (B), Alafi
(US), Apax (F), Auriga (F),
IMH(D), Health Cap (S), Lombard-Odie
r (CH), Medicis (D), Rendex (B)
4Hybrigenics business strategy
- Own drug discovery programs
- in the fields of infectious diseases, cancer and
metabolic disorders - the resulting novel validated targets being
exploited for the Companys own product pipeline - Collaboration and licensing agreements with
biopharmaceutical companies - in any disease field
- for out-licensing
5Hybrigenics discovery programs
Cancer Proteins involved in basic cellular
functions Proteins involved in
apoptosis Proteins involved in cell cycle
regulation
Metabolic disorders / Obesity Proteins
involved in adipogenesis
Anti-infectious diseases
Antibacterial Essential proteins of the
pathogens HIV, HCV protein-protein interactions
between the host cell and the pathogen
6The Helicobacter pylori Genome
From Tomb et al. (1997), Nature 388539-47
less than 20 with assigned biological
functions (500 with no database match 250 with
structural homology but totally unknown function)
1,667,867 base pairs 1,590 predicted ORFs
7The protein-protein interaction map of
Helicobacter pylori
285 baits 261 proteins
2 million prey fragments
20 milion interactions/bait
PBS filtering (false positives identification)
Over 1,200 interactions Over 1500 SID
Nature (2001) 409211-215.
Connectivity 46.6 of proteome 3.36
interactions/bait Reproducibility gt95
8Target IdentificationHybrigenics' PIM Technology
Platform
New Generation of Reliable High-Throughput 2-Hybri
d in Yeast Coli
PIMBuilder in-house Production Management System
PBS Scoring Technology
VirtualPIM Prediction
PIMRider platform
9HybrigenicsTarget Discovery Process
Target Identification
Target Pre-Validation
Target Validation
Selected Pathology and Mechanism of Action
10 In-silico Target Validation Platform
- Goals
- Validate protein interactions and SIDs
- Evaluate target potential and druggability
- Provide functional context for target candidates
- Prioritize promising" candidates for biological
validation
- Means
- Integrate PIMs with functional clues of different
origins - Predict novel biological information
- Computer aided decision process
- Provide comprehensive decision-oriented view
of functional clues - Automated filtering
- Output
- Prevalidated targets functional context
11The Genostar platformA modular software platform
for exploratory genomics
The Geno Consortium Pasteur Institute
(Paris), National Institute for Research in
Computer Science (INRIA, Grenoble) Genome Express
(Grenoble) Hybrigenics
- Genostar technology
- Rich object-based knowledge representation system
(objects, relations, tasks and strategies) - Modular architecture
- Domain-specific biological modeling
12Genolink viewing biological data as a graph of
relations
Genolink Composite Graph
Vertices biological entities
Edges similarity, interaction or association
links
Sequence Similarity Links
Profile Similarity Links
Domain Inclusion Links
Tissue Expression Links
Subcell Location Links
Protein Interaction Links
Preprocessing
Genomic data
mRNA Expression data
Interaction data
Sub-Cellular Location
Domain data
13From PIMs to Pathways
From PIMs to Pathways
Combine PIMs and external data to reconstruct
biological pathways
PIM annotation Pathways expansion
PIM Network of interaction links
Context-dependentHomology
Common Data Model
Functional Classifications
Pathways Databases
PIMs
14The BioPathways Consortium
- Mission
- Foster development of pathways informatics
systems biology - Goals
- Scientific community buildup, standards
recommendation, public outreach,
industry-academia collaboration support,
coordination with other groups - Means
- Forum open to interested participants (academics,
pharmas, biotechs, software vendors) - Achievements
- Launched June 2000 by 3rd Millennium (Boston) and
Hybrigenics (Paris) - 1st Meeting at ISMB 2000 -gt Work Groups
- 2nd Meeting at PSB 2001 -gt First results on
evaluation of pathways representations - 3rd Meeting Satellite Meeting of ISMB 2001,
Copenhagen -gt Focus on ontologies and pathways
reconstruction (gt150 attendants), new workgroups - Several sponsors (pharmas, biotechs, IT
companies) - Over 200 participants from academia industry
15Annotation fonctionnelle
- Objectif assigner une/des fonction(s) à un
gène ou à une protéine de séquence connue - Méthodes traditionnelles
- Résultats expérimentaux
- Variations sur le thème propagation
dannotations dorigine expérimentale via
similitude de séquences - Fonction ?
- Locale et précise (Ex la protéine P est un
enzyme catalysant la réaction R) - Globale et vague appartenance à un processus
biologique de haut niveau (Ex P intervient dans
la dégradation du glucose) - Ce qui est propagé mots clefs, nœud dun arbre
de classification fonctionnelle
16An effort toward consensus Gene Ontology
Fig. 1 Examples of Gene Ontology. Three
examples illustrate the structure and style used
by GO to represent the gene ontologies and to
associate genes with nodes within an ontology.
The ontologies are built from a structured,
controlled vocabulary. The illustrations are the
products of work in progress and are subject to
change when new evidence becomes available. For
simplicity, not all known gene annotations
have been included in the figures. a, Biological
process ontology. This section illustrates a
portion of the biological process ontology
describing DNA metabolism. Note that a node may
have more than one parent for example, DNA
ligation has three parents, DNAdependent DNA
replication, DNA repair and DNA recombination
. b, Molecular function ontology. The ontology is
not intended to represent a reaction pathway, but
instead reflects conceptual categories
of gene-product function. A gene product can be
associated with more than one node within an
ontology, as illustrated by the MCM proteins.
These proteins have been shown to bind chromatin
and to possess ATPdependent DNA helicase
activity, and are annotated to both nodes. c,
Cellular component ontology. The ontologies are
designed for a generic eukaryotic cell, and are
flexible enough to represent the known
differences between diverse organisms.
The Gene Ontology Consortium (2000) Nature Genet.
25 25-29
17Le dogme
Séquence
Structure
Fonction
18et les expériences
Contexte cellulaire
Technologies de Perturbation
?
Séquence
?
Structure
Technologies dobservation
?
Fonction
Phénotype
Couple perturbation-observation faux positifs,
faux négatifs, traitement statistique,
formalisation de la conclusion
19Integration of heterogeneous data
- Joint use of functional clues from a variety of
experimental approaches to - Validate the biological relevance of interactions
- Determine the function of proteins
- Validate targets in-silico
- Examples
- Interaction expression
- Interaction 3D structure
- Location expression
- Phylogenetic profiles domain fusion
- Recent problem, drug discovery efforts bottleneck
- Frontier for the bioinformatics community
- Technology normalization, formats, ontologies
- Science automate (some) biological reasoning ?
20Evaluating pathways representations
- Vincent Schächter, Hybrigenics, Paris
- Aviv Regev, Tel-Aviv University
- BioPathways Formalisms Workgroup
21Evaluation scope untangling the web...
- Large body of literature, focusing on different
biological phenomena and different theoretical
issues - A typical article on pathways may include one or
more of the following - A data-model, describing (a fraction of) the
pathway universe of discourse - A formalism, used to describe the data-model and
to express algorithms / functions - Description of algorithms based on
characteristics of both the formalism and the
data model - Description of implementations of data-storage
functionalities and/or of some of the above
algorithms
22Excerpt from target evaluation list non DE
formalisms
- Petri nets (basic, hybrid, self-modifying,
time-dependent, hierarchical, mobile) - Process algebra (basic and stochastic
pi-calculus) - Markup languages (CellML and SBML)
- Biocalculus
- Regulatory grammars (Collado-Vides)
- Semiotes (Kazic)
- Statecharts (Kam, Holcombe)
- Boolean networks (basic, multi-level)
- Hierarchical networks (Bodnar)
- Neural networks (Mjolsness)
- Molecular graph reaction networks (McCaskill)
- Molecular interaction maps (Kohn)
- Electrical circuits (Keane)
23Quelques exemples de représentations discrètes
- Modèles orienté-objet
- Requêtes sur tous types de réseaux
- Reconstruction, mais problème de l information
incomplète - Réseaux booléens
- Simulation qualitative, reconstruction à partir
de données d expression - Appliqué aux réseaux de régulation
- Réseaux de Petri
- Simulation qualitative plus fine, analyse
formelle du comportement - Appliqué aux réseaux de régulation
- Application possible aux réseaux métaboliques et
signalisation avec extensions (self-modifying PN,
Hybrid PN) - Algèbres de processus
- Simulation, analyse formelle, reconstruction
- Appliqué aux réseaux de signalisation et de
régulation (métabolisme avec extension
stochastiques
24The position of formalisms in the context of
pathways informatics
- Pathway construction
- Pathway generation
- Pathway selection
- Dynamics
- Simulation
- Analysis
Data storage retrieval Query language
Supports
Supports
Supports
Construction-oriented formalism data-model
Dynamics-oriented formalism data-model
Database-oriented formalism data-model
Expresses
Expresses
Expresses
- Core Representation / Ontology
- Biological scope
- Formal expressiveness
25Evaluate and compare a modular approach
- Evaluate expressiveness/ease of use of
representation relatively to specific
goals/functionalities - Compare representations in the categories for
which they were designed - Reduce each category to a set of evaluation items
that can be rated and compared as objectively as
possible
26Core representation / Ontology
- Conceptual structure of the universe of
discourse (abstract and concrete entities,
relations, hierarchies...) - Constrains scope of phenomena that can be
described, and thus queried, analyzed,
reconstructed, and queried. - Often implicit in a given pathway representation
need to extract...
- Possible evaluation schemes
- 1. Compare features of ontology
- 2. Expressiveness benchmark set of biological
situations - 3. Translation of data models into common
formalisms comparison
How do you represent gene A inhibits gene B
in your data model ?
27Conceptual Model Biological Scope Evaluation
Items
28Core Representation Formal Expressiveness
Evaluation Items
29Data Storage and Retrieval
- Storage and retrieval of data
database-related functionalities - Extremes relational or OO models vs, e.g., most
simulation-oriented formalisms... - A data-retrieval oriented formalism can be used
below other formalisms - Query language
- Retrieve information within a structured,
homogeneous, compositional framework - Shifting boundary with analysis and
reconstruction algorithms
- Evaluation items / sub-categories
- Robust database implementation issue
- Query language ease of use
- Query language expressiveness
- Limited by formalism and ontology expressiveness
30Pathway reconstruction
- Construction/prediction of pathways in given
biological environment (organism, tissue,
condition, location) from a combination of - experimental data
- fully instantiated pathway information,
- partially instantiated (or incomplete) pathway
data, such as interaction data - Special cases reverse engineering, pathway
inference
- Evaluation items / sub-categories
- Input data types
- Pathway generation algorithm
- Pathway selection algorithm
- Pathway fitness function
- Pathway similarity/homology measure
- Interactive validation ?
31Dynamics
- Study of network dynamics (regulatory networks,
ST, MP) - Simulation runs
- Analysis of dynamic behavior
- Evaluation items / sub-categories
- States nature, expressiveness, level of detail
vs available data - Evolution rules / Reaction model rule,
implementation - Time continuous/discrete, synchronous/asynchrono
us updates - Space continuous/discrete, topology, resolution
- Analysis
- Scope state reachability, liveness of
transitions, substance flow... - Formal methods available
- Comparative power
- Limited to steady state ?
32Methodology what do we evaluate ?
Queries
Reconstruction
Simulation
Supports
Formalism
Evaluation targets
Describes
Data-model
Translation into common ontology description
language ?
Ontology