Title: Gipsi LimaMendez
1The ACLAME database of Mobile Genetic Elements
and associated tools for in silico analysis of
(pro)phages
Service de Bioinformatique des Génomes et des
Réseaux (BiGRe) Université Libre de Bruxelles,
Bvd du Triomphe, 1050 Bruxelles
Gipsi Lima-Mendez gipsi_at_scmbb.ulb.ac.be
Raphaël Leplae raphael_at_scmbb.ulb.ac.be
Ariane Toussaint ariane_at_scmbb.ulb.ac.be
2ACLAME project (1) (A CLAssification of Mobile
Elements)
The ACLAME project aims at offering a repository
for the collection and analysis of prokaryotic
Mobile Genetic Elements (MGEs) i.e. phages,
plasmids and all MGEs that reside integrated in
their host genome, from IS sequences (see also
the IS-Finder database, http//www-is.biotoul.fr/h
ttp//www-is.biotoul.fr). The general schema of
the ACLAME database rests on the mosaic structure
of MGEs. Elements in different traditional
classes perform similar functions using similar
proteins. Hence, ACLAME aims at describing MGE as
composites of functional modules.
3ACLAME project (2)
- The next slide provides a few examples
-
- - To move form one position to another on their
host genome most IS sequences and the mutator
phages use a DDE transposase, which recognizes,
binds and cleaves the ends of those sequences.
The transposase gene and the target sites (called
IR in IS and att in phages) form a module ( in
yellow) - Similalry integrons, conjugative transposons and
phages (the latter not shown here) integrate and
excise by means of an integrase of the
Tyrosine-based families acting on att sites (in
green) - Conjugative transposons and conjugatiuve
plasmids share conjugation machinery (including
the type IV secretion related mating pair
formation apparatus, in red)
4Modular structure of MGE's
Integrons
Pant
Cassette
tnpA
IRL
IRR
Integron
atti
int
IS
Plasmid
ant
Pant
orfA
orfB
Integron with cassette
atti
ant
int
tnpA
Mutator Phage (Mu)
attL
attR
Late genes
c
ner
A
B
Terminal repeats
R3 R2 R1
L1 L2 L3
Conjugative transposon or ICE (Tn4371)
RO00024
RO00055
RO00007
RO00029
RO00022
RO00034
RO00035
RO00041
RO00039
RO00013
RO00014
RO00015
RO00016
RO00017
RO00018
RO00033
RO00002
RO00003
RO00009
RO00012
RO00011
RO00010
RO00008
RO00006
RO00004
trbJ
trbI
trbL
parA
parB
traG
trbB
trbC
trbE
trbF
trbG
repA
traF
tnpA
traR
int
attR
13 bph genes (biphenyl degradation)
attL
Type IV secretion system
5ACLAME project (3)
The problem of classifying MGEs thus moves from
the daunting task of deciding fixed categories
for combinatorial elements, to that of
identifying and classifying their constituent
modules. The next slide illustrates a vision of
what such modules could be, in term of individual
MGE proteins, organized into families of related
proteins that act within a complex/functional
module (phage heads and tails, conjugation/secreti
on apparatus etc.), various MGEs being
assortments of those modules.
6Basic ACLAME concept (Merlin et al.
2000) Reconstruction of Various Bacterial MGE's
Proteins
7The next slides illustrate how MGE protein
families are assembled and analyzed within the
ACLAME schema. 1- MGE (so far phage and plasmid)
protein sequences are extracted from the NCBI
database and they are - Compared all vs. all
using Blatsp, which provides a matrix of
pair-wise similarity scores. - All compared to
the protein sequences in the NRDB-NCBI, Swisprot
and SCOPE databases using Psi-Blast.
8Generating protein families Proteins clustering
All-vs-all
MGE proteins
9 2- The similarity matrix is used for clustering
with the TRIBE-MCL clustering algorithm (Enright
et al. 2000), with E-value threshold and
inflation values that were shown to best
reproduce the SCOPE protein families (clusters,
Leplae et al. 2004) and the IS sequence families
(IS-Finder database, Siguier et al. 2007). 3-
Multiple sequence alignments (MSA) are generated
for all protein families of 3 or more members. 4-
The MSA is used to generate a HMM profile for the
families. 5- The HMM profiles are compared with
protein sequences in NRDB, Swisprot and SCOPE.
10Generating protein families Proteins clustering
All-vs-all
MGE proteins
HMM Profile
MSA
11 6- All that information and additional
experimental evidence available in the literature
are used to assign a function to the families .
For the purpose of this functional annotation,
a list of functions has been assembles and is
progressively implemented into a structured
ontology based on the Gene Ontology (GO,
http//www.geneontology.org, ) format (see more
about the PhiGO ontology below).
12Generating protein families Proteins clustering
All-vs-all
MGE proteins
HMM Profile
MSA
ACLAME Classification
Functional annotation
13ACLAME is a relational database. It contains a
number of tables that are linked. Each table can
be browsed and it is possible to navigate between
the tables. MGE genomes linked to
NCBI MGE hosts MGE proteins families with a
functional annotation ACLAME list of functions
14View of protein families in ACLAME version 0.2.
Hits of HMM in databases
Click to view family
15View of one phage protein family in ACLAME
version 0.2.
View MSA of the family
ACLAME function
Link to GO ontology
View Hits in databases
Click to View protein
View hits of HMM in databases
16View of one phage protein in ACLAME version 0.2.
ACLAME function
View secondary structure prediction
Link to NCBI
Back to family view
17View of genomes list in ACLAME version 0.2
go to genome view
18One genome view
go to protein view
go to family view
go to NCBI
19Blast over the ACLAME content
Access to Blastp of ACLAME content
20View of the ACLAME Blastp output
Go to best hit protein view
Go to Family of best hit protein view
Query has no significant similarity with ACLAME
content
21- ACLAME TOOLS (1)
- PhiGO ontology for annotation of phage proteins.
22The PhiGO Phage Ontology.
- Structured list of terms that should capture
everything that's known about phage gene products
in terms of - Molecular functions
- Biological processes
- Components
To fit the Gene Ontology (Harris et al. 2004),
PhiGO, is in the OBO format and formalized as a
Directed Acyclic Graph (DAG) where "nodes are
terms and edges the type of relationship (is-a
or part-of) that relates them.
23Acyclic graph of the term "viral genome
replication" as it presently stands in GO"
is_a
biological process GO0016032
reproductive process GO0022414
is_a
part_of
Viral genome replication GO0019079
is_a
is_a
viral reproduction process GO0022415
reproduction GO0016032
part_of
viral infectious cycle GO0019058
viral reproduction GO0016032
is_a
part_of
is_a
The term labeling a node refers to this node and
all of its children.
24View of one term and its definition with AmiGO
viewer http//aclame.ulb.ac.be/Classification/phag
e_functions.html
25ACLAME TOOLS (2) - Prophinder prediction of
prophages in complete bacterial genome sequences.
26Prophinder general outline (1)
- Download all translated CDS (protein) sequences
of bacterial genomes - Compare to phage proteins in ACLAME
- Encode hits on bacterial genome sequence
- Walk along that genome with a window of
adjustable size - Use binomial formula to calculate probability to
observe at least n hits in a window of size w
- Calculate significance score - Nb tests Nb
CDS (w size 1) - Eval Pval Nb test - Sig
-log(Eval) For a window of a given size at a
given position along the genome, search for
segments with best sig values.
27Prophinder general outline (2)
- Implementation of biological criteria
- - Presence of an integrase gene at one extremity
- - Detection of short direct flanking repeats
- - No repeat of tpical phage genes (e.g. head and
tail major proteins)
28Sliding window
Significance Matrix
int
Direct repeat
29Access to the list of genomes analyzed with
Prophinder
View preditions for that genome
30View of prediction on the host genome map
31View of Orfs in prediction
Hit in ACLAME
No hit in ACLAME
Link to best hit in ACLAME
32Heatmap view of ACLAME hits
33http//aclame.ulb.ac.be ACLAME
database http//aclame. ulb.ac.be/prophinder
ACLAME Prophinder viewer http//aclame.ulb.ac.be
//functions ACLAME list of functions
http//aclame.ulb.ac.be/Classification/phage_func
tions.html PhiGO viewer and dowload of PhiGO
flat files with definitions http//www.godatabase
.org/cgi-bin/amigo/go.cgi/ GO
database http//www.godatabase.org/dev/java/oboedi
t/docs/index.html download OBO-edit
34References
- Enright AJ, Van Dongen S, Ouzounis CA. 2002 An
efficient algorithm for large-scale detection of
protein families. Nucleic Acids Res. Apr
301575-84. - Harris, M.A. Et al. 2004. The Gene Ontology (GO)
database and informatics resource. Nucleic Acids
Res 32 D258-261. - Leplae, R., A. Hebrant, S.J. Wodak, and A.
Toussaint. 2004. ACLAME a CLAssification of
Mobile genetic Elements. Nucleic Acids Res 32
Database issue D45-49. - Merlin, C., J. Mahillon, J. Nesvera, and A.
Toussaint. 2000. Gene recruiters and
transporters the modular structure of bacterial
mobile elements. In The horizontal gene pool
bacterial plasmids and gene spread (ed. C.M.
Thomas), pp. 363-409. Harwood Academic
Publishers, Amsterdam - Siguier, P., J. Perochon, L. Lestrade, J.
Mahillon, and M. Chandler. 2006. ISfinder the
reference centre for bacterial insertion
sequences. Nucleic Acids Res 34 D32-36.