Title: Aucun titre de diapositive
1 The use of the concepts of evolutionary
biology in genome annotation.
2- Comparative genomics, concept of orthology and
paralogy. - What is phylogenomics?
- Structural and functional annotation.
- Structural annotation (deciphering of gene
structure). - Functional annotation (especially the use of
phylogeny to decipher proteins function). - Figenix
- . Genome evolution
- CASSIOPE
3Metazoan Phylogeny (From Adoutte et al. 2000)
4URBILATERIA The hypothetical Metazoan
Ancestor Geoffroy de St Hilaire during XIX th
Century
- URBILATERIA Genome evolved by the fixation of
- Gene mutation
- Gene loss
- Genic duplication
- Gene duplication
- Genome region duplication
- Whole genome duplication
800 millions years ago
5 Large scale gene duplication in vertebrate
lineage
Amniota (Human)
360
450
Vertebrates
Lisamphibia
AIS (Adaptative Immune System)
528
T2
Actinopterygii (Zebrafish)
Chondrichthyes (shark)
Deutérostomata
564
T1
Cephalaspidomorphi (lamprey)
Pikaia
751
Myxini (Hagfish)
20 000 genes
gt751
Céphalochordata (amphioxus)
lt833-993
Urochordata (Ciona)
Echinodermata
Insects (Drosophila)
833-993
Protostomata
Nématod (c. elegans)
6Orhologs under purifying selection
7Ortholog functional switch
8Co-ortholog Sub Functionalization
9Co-ortholog Neo Functionalization
10Orthologs and paralogs
11Orthology/ Paralogy
Orthologs 2 genes on different species Which
come from a common ancestor and separated by a
speciation event.
Paralogs 2 genes resulting from a duplication
event in a genome.
12How to evidence orthologous relationship ?
Many scientists are using the best BLAST hit to
look for orthologous relationship
BUT! Many co orthologs can be present
Problem with genomes that are not fully
sequenced Or when gene loss occurred
-
- AND
- Even with Phylogenetic analysis
- Bias must be corrected.
- Different methods must be used to reconstruct
phylogenetic trees.
13Co-ortholog Neo Functionalization
14(No Transcript)
15Constitutive proteasome ß-subunits replacement
after Interferon-? stimulation
Paralogue duplicated gene
16 Large scale gene duplication in vertebrate
lineage
17(No Transcript)
18PHYLOGENOMICS STUDY genes and genomes
history. gt HELP to find evidences for gene
function.
19- Comparative genomics, concept of orthology and
paralogy. - What is phylogenomics?
- Structural and functional annotation.
- Structural annotation (deciphering of gene
structure). - Functional annotation (especially the use of
phylogeny to decipher proteins function). - Figenix
- . Genome evolution
- CASSIOPE
20- Une prédiction structurale correcte pour une
analyse phylogénétique correcte.
21Structural annotation
- Genome nucleotide-level Annotation
- Mapping
- Finding genomic landmarks
- Gene finding and protein prediction
- Non-coding RNAs and regulatory regions
- Identifying repetitive elements
- Mapping segmental duplications
- Mapping variations (SNP, microsatellites, .)
22Available tools
Structural annotation
State of the Art
- Ab initio
- Genscan
- Fgenesh
- Genie
- Etc
- Based on statistical signals within the DNA.
- Coding propensity (hexamer signals).
- Splice Site Signals.
- Strengths
- Easy and quick to run.
- Only need DNA as input.
- Weakness High false positive rate.
- Similarity Assisted
- GenomeScan
- Twinscan
- Extension of ab initio programs.
- Use sequence similarities to guide the
predictions - Strengths Should be better than pure ab initio.
- Weakness High false positive rate.
- Similarity Based
- Genewise
- Sim4
- Est2genome
- Alignement programs that know about gene
structure. - Very accurate with strong sequence similarities
- Strengths Accurate.
- Weakness Need strong similarities, slow to run.
23Structural annotation
FIGENIX SOFTWARE PLATFORM Annotating method
- Structural Annotation
- combining together a statistical and homologous
approach (similarities with known proteins). The
process automation resulted in an expert system
based on biological inference rules using gene
history and ab-initio program.
24(No Transcript)
25(No Transcript)
26Validation of structural annotation
Genome Sequence
Sequence
Experimentation
Genscan 31
HMMGene 38
Protein
Protein
Result 100
Figenix  87
The platform performances were validated on
standard dataset (HMR195) see Guigò et al, 2000
Rogic et al, 2001.
27Structural annotation
Accuracy versus Exon Type and Prediction
The Mouse and Rat sequence from the HMR195
dataset was used on the human division of
swissprot.
28Functional annotation
- Biochemical and Biological process
- Experimental approach
- RNA Interference
- Tandem affinity purification and mass
spectrometry - In Silico
- Similarity
29Functional annotation
- Functional Annotation
- Based on phylogeny. It is inferred exclusively
from experimentally annotated genes -
30Small fraction correspond to known,
well-characterized proteins.If the function is
unknown Phylogenetic analysis
- Case 1 an ortholog of experimentally known
function is found. The function of the gene to
annotate can be deduced. - Case 2 no ortholog of experimentally known
function is found. The function of the gene to
annotate will be deduced by the knowledge of the
function of the closest paralog. - In both cases the protein molecular function
prediction by Bayesian Phylogenomics can be used
(Engelhardt et al PLOS Computional biology - 2005)
31Functional annotation
Orthologs, Paralogs with experimentally known
function how information can be found.
Gene Ontology
SwissProt
GenBank
MedLine
Textual Information Analysis
G.O. Standard
32Functional annotation
Gene Ontology Classification
- Functionality classification Three GO
categories - Biological process biological process to which
the gene or gene product contributes. - Cell growth and maintenance pyrimidine
metabolism - Molecular function biochemical activity,
including specific binding to ligands or
structures, of a gene product. - Enzyme, transporter Toll receptor ligand,
- Cellular component place in the cell where a
gene product is active. - Cytoplasm, ribosome,
33Tumor necrosis factor family Phylogenetic tree
Orthologs identification
Atherosclerotic plaque formation
ALPS - LPR/GLD Lymphoproliferativesyndrome
Trends in Immunology (July 2003)
34Human TNF family Phylogenetic tree Search for
the closest Paralog
Functional annotation
Molecular Function
Biological Process
TNFSF3
TNFRSF3
LN, PP, GC, Tumorocidal activity
PP, GC, T cell Homeostasis (death)
TNFSF1
TNFRSF1A
T cell Homeostasis (death)
TNFSF2
TNFRSF1B
T cell costimulation, negative selection?
TNFRSF12
TNFSF15
T cell Homeostasis (survival?), CTL
activation, peripheral tolerance?
TNFSF14
TNFRSF14
TNFRSF6B
T cell Homeostasis (death), CTL
function, peripheral tolerance, T cell
costimulation, chemotaxis
TNFSF6
TNFRSF6
T cell transmigration and homeostasis (survival)?
TNFSF18
TNFRSF18
T cell homeostasis (survival), peripheral
tolerance
TNFSF4
TNFRSF4
GC, B cell function, peripheral tolerance, T cell
priming
TNFSF5
TNFRSF5
Tumorocidal activity, T cell function? Tumorocidal
activity, T cell function?
TNFSF10
TNFRSF11B
TNFSF11
TNFRSF11A
LN, bone Homeostasis, mammary gland development
B cell Homeostasis B cell Homeostasis ? B cell
Homeostasis
BR3
TNFSF13B
TNFRSF17
TNFSF13
TACI
TNFSF12?
T cell activation?
TNFSF7
TNFRSF7
TNFSF9
T cell activation and survival, CTL activity,
Tumorocidal actvity?
TNFRSF9
TNFSF8
Negative selection, autoimmunity
TNFRSF8
?
TNFRSF19
Tooth, hair, sweat gland formation
EDA-A1
EDAR
EDA-A2
Tooth, hair, skin formation?
XEDAR
TNFRSF21
?
Trends in Immunology (July 2003)
RELT
?
35- INFORMATISATION DES CONCEPTS
36 FIGENIX
FIGENIX est une plate-forme logicielle
multi-utilisateur dédiée aux taches d'annotation
structurales et fonctionnelles - Prédictions
de gènes pour de grandes séquences d'ADN -
Construction d'arbres phylogénétiques
robustes - Détection automatique d'orthologues
et de paralogues - Recherche automatique de
données fonctionnelles sur les gènes disponibles
à partir de bases de données Web -
Filtrage et construction de bases de données
protéiques (contigage d'EST) - Processus
chainés (ex Prédiction de gènes suivie
d'études phylogénétiques pour chacun)
37ETAPES DU PIPELINE de Phylogénie (1)
Séquence protéique codée par un gène putatif
Ensembl NR
BLAST filtrage
CLUSTAL W purification correction de biais
Alignement multiple
PFAM
Recherche de domaines par HmmPFAM
Conservation repeats monophylétiques
Enumération domaines
Construction Arbre de la Vie
Alignement repeats fusionnés
Existence repeats ?
O
N
Arbre de référence
Test de composition par TREEPuzzle pour élim
séq trop divergentes
Création domaine FIGENIX (correctDomains)
Conservation alignement complet
38ETAPES DU PIPELINE de phylogénie (2)
Détection groupes de paralogie élim sites
qui évol trop vites ( test de Gu )
Élim séq gt30 gaps
Construction Arbre de la Vie
Élim domaines les non congruents détectés par
HomPart de PAUP
Arbre de référence
Test de saturation
NJ
Parcimonie
Maximum de vraisemblance
arbre
arbre
arbre
Comparaison topologies par tests
Templeton-Hasegawa
Topologies congruentes?
Arbre consensus
Arbre NJ
O
N
Détection orthologues I recherche de fonctions
39(No Transcript)
40 Architecture de FIGENIX
EGEE
Genomic Data
- plate-forme Intranet/Extranet
-architecture 3 tiers (interface web/ serveurs
métier / base de données)
41 Résultats (1)
EGEE
42 Résultats (2)
EGEE
43Â Gouret P, Vitiello V, Balandraud N, Gilles
A, Pontarotti P, Danchin EG.FIGENIX
intelligent automation of genomic annotation
expertise integration in a new software
platform.BMC Bioinformatics. 2005 Aug 56198
__________________________________________Balan
draud N , Gouret P , Danchin EGJ , Blanc M ,
Zinn D , Roudier J Pontarotti P A rigorous
method for multigenic families' functional
annotation the peptidyl arginine deiminase
(PADs) proteins family exampleBMC Genomics 2005,
6153Â Â Â Â Â doi10.1186/1471-2164-6-153
44Analysis using Figenix
- Vienne et al . Evolution of the proto-MHC
ancestral region more evidence for the
plesiomorphic organisation of human chromosome
9q34 region. Immunogenetics. 2003 55(7)429-36 - Danchin E, et al. The Major Histocompatibiliy
Complex Origin Immunological reviews. 2004
April198(1)216-232. - Danchin EGJ , Gouret P, Pontarotti P
Universally conserved genes lost in mammals and
vertebrates BMC evolutionary biology accepted - . C Yu, et al Roles of co-option in the
emergence of vertebrate adaptive immune system,
insights from amphioxus submitted - On line users
- INSERM U624, TAGC, UPRESA CNRS 6032,
Marseille, INRA Nancy , Institute Mol. Genet.,
Acad.Sci. Czech Republic, SunYat Sen University
China, Uppsala University, Department of
Neuroscience Sweden. - Draft papers
45- Comparative genomics, concept of orthology and
paralogy. - What is phylogenomics?
- Structural and functional annotation.
- Structural annotation (deciphering of gene
structure). - Functional annotation (especially the use of
phylogeny to decipher proteins function). - Figenix
- . Genome evolution
- CASSIOPE
46C.A.S.S.I.O.P.E
- C.A.S.S.I.O.P.E Clever Agent System for Synteny
Inheritance and Other Phenomena in Evolution - find conserved regions between genomes
- C.A.S.S.I.O.P.E decrease 50 times the working
time
47C.A.S.S.I.O.P.E.
48- Vers la reconstruction des génomes ancestraux
49(No Transcript)
50- Etienne Danchin (AFMB) Collaboration
- Philippe Gouret Etienne
Pardoux - Vérane Vitiello Simona
Grusea - Nathalie Balandraud
- Alexandre Vienne
- Virginie Lopez
- Magali Lienart
- Pierre Pontarotti