Title: Bioinformatics Werkbespreking 20061107
1Bioinformatics Werkbespreking 2006-11-07
- 1 PhyloPat phylogenetic pattern analysis of
eukaryotic genes (20 slides) - 2 Chicken-human immunogenomics project (9
slides)
2PhyloPatphylogenetic pattern analysisof
eukaryotic genes
- Tim Hulsen
- 2006-10-17
- BeNeLux BioInformatics Conference 2006
3Introduction (1)
- Phylogenetic patterns show presence/absence of
genes over a certain set of species - e.g. for 10 species 0011101011
- Very useful for all kinds of evolutionary
analyses - Origin of certain genes
- Deletion of certain genes
- Clustering of genes with similar patterns likely
to have similar function / be in same pathway
4Introduction (2)
- Earlier phylogenetic pattern initiatives
- Phylogenetic Pattern Search (PPS), incorporated
into COG (Natale et al., 2000) - Extended Phylogenetic Patterns Search (EPPS)
(Reichard Kaufmann, 2003) - Incorporated into OrthoMCL-DB (Chen et al., 2006)
- All applied on proteins, not on genes!
- ? PhyloPat phylogenetic pattern analysis of
eukaryotic genes
5Method
- Genes easier to check for lineage-specific
expansions (no alternative transcripts or splice
forms) less redundant - Basis Ensembl (EnsMart) database 21 fully
available genomes (i.e. no Pre! versions or low
coverage genomes) S. cer. to H. sap. - Make use of accurate Ensembl orthology pipeline
(combination of BLAST,SW,MUSCLE and PHYML) - Single linkage cluster algorithm create
orthologous groups containing ALL genes in Ensembl
6Results
- 446,825 genes were clustered into 147,922 groups,
using 3,164,088 orthologies from 21 species - Species ordered from low ( ) to high (
), i.e. approximate distance to human - Can be queried in several ways
- Output in HTML, Excel or plain text format
7Web interface
http//www.cmbi.ru.nl/phylopat
8Pattern/ID Search
- Binary string
- 0absent, 1present, absent/present
- e.g. 0000011111111
- ? must be absent in non-chordata
, must be present in all mammals - MySQL regular expression
- e.g. 01100
- ? gives all genes that occur only in ten
subsequent species - Input list of Ensembl/EMBL IDs (PhyloPat contains
EMBL to Ensembl mapping)
9Output
10Phylogenetic Tree
11Oligo-/Polypresent Genes
- Oligopresent present in only one/two species
(oligofew), - e.g. 000000010000000000100
- These two species should be highly related
- C. sav C. int
1737 div. 100 Mya - (Boffelli et al., 2004)
- T. nig T. rub
1572 div. 85 Mya - (Yakanoue et al., 2006)
- A. gam A. Aeg 1058 div.
140 Mya - (Service, 1993)
- P. tro H. sap
887 div . 6 Mya - (Glazko Nei, 2003)
- R. nor M. Mus
713 div. 20 Mya - (Springer et al., 2003)
- Polypresent present in all species, except for
one/two (polymany), - e.g. 111110111110111111111
- These two species should be related too similar
analysis possible
12Omnipresent genes
- Omnipresent present in all 21 species
(omniall) 111111111111111111111 - Currently 1001 omnipresent groups
- Tend to have very general/important functions,
mostly involved in transcription/translation
13FatiGO analysis
- FatiGO connection with GO terms, KEGG pathways,
InterPro domains, etc. (El-Shahrour et al., 2004) - Analysis of all human genes in output by just
one mouse click - e.g. omnipresent genes
14Other possibilities
- Anti-correlating patterns
- e.g. 001111100011000000000
- and 110000011100111111111
- ? could be completely different, or very
similar (analogous)! - Easy homology-inferred functional annotation
(using information from other genes in the same
lineage)
15Case study Hox genes (1)
- Hox genes determine where limbs and other body
segments will grow in a developing embryo - Should exist mostly in vertebrates
- Expansion in teleost fish species (
, 8-11) - seven Hox clusters instead of the mammalian four
- Search Ensembl database for human genes with
term hox in annotation - 44 genes found -gt enter in PhyloPat -gt 32 groups
found (PP)
16Case study Hox genes (2)
PPID genes per species phylogenetic
pattern gene name(s) PP022041
011111136562233233222 011111111111111111111
MSX1, MSX2 PP024984 001000011111001111111
001000011111001111111 HOXC4 PP027791
001110023343233333333 001110011111111111111
TLX1, TLX2, TLX3 PP049478 000000221153112322223
000000111111111111111 HOXB8, HOXC8,
HOXD8 PP053824 000000011120010101011
000000011110010101011 HOXD11 PP053827
000000022211111111111 000000011111111111111
HOXA10 PP053828 000000021111212122222
000000011111111111111 HOXC13, HOXD13 PP053829
000000063341122222222 000000011111111111111
HOXA1, HOXB1 PP053830 000000011110010111111
000000011110010111111 HOXB4 PP053832
000000021111011111111 000000011111011111111
HOXA5 PP053833 000000021110111111011
000000011110111111011 HOXB2 PP053834
000000031101011111111 000000011101011111111
HOXD3 PP053835 000000021110111111101
000000011110111111101 HOXA9 PP053836
000000021111111111111 000000011111111111111
HOXA3 PP053838 000000021110101111111
000000011110101111111 HOXC12 PP053839
000000011111111110111 000000011111111110111
HOXD4 PP053840 000000021111201011101
000000011111101011101 HOXC11 PP053842
000000043221111111111 000000011111111111111
HOXA13 PP053844 000000032231011111111
000000011111011111111 HOXB5 PP053845
000000021111111111011 000000011111111111011
HOXB3 PP053846 000000021121111111111
000000011111111111111 HOXD10 PP053847
000000022211111111111 000000011111111111111
HOXA2 PP053849 000000034151132333323
000000011111111111111 HOXA6, HOXB6,
HOXC6 PP053853 000000011101111111011
000000011101111111011 HOXA4 PP053854
000000032252223133213 000000011111111111111
HOXB9, HOXC9, HOXD9 PP053858 0000000111200111111
11 000000011110011111111 HOXA11 PP070659
000000000121212222222 000000000111111111111
HOXA7, HOXB7 PP075622 000000000010001111111
000000000010001111111 HOXC5 PP084287
000000000001101111111 000000000001101111111
HOXC10 PP085049 000000000001011011111
000000000001011011111 HOXD1 PP087941
000000000000111011111 000000000000111011111
HOXD12 PP089685 000000000000111111111
000000000000111111111 HOXB13
17Case study Hox genes (3)
PPID(s) name cl.A cl.B
cl.C cl.D first sp. position PP053829,085049
HOX1 HOXA1 HOXB1 HOXD1
T. nigrov. anterior PP053847,053833
HOX2 HOXA2 HOXB2 T. nigrov.
anterior PP053836,053845,053834 HOX3
HOXA3 HOXB3 HOXD3 T. nigrov.
PG3 PP053832,053844,075622 HOX5 HOXA5
HOXB5 HOXC5 T. nigrov. central PP053849
HOX6 HOXA6 HOXB6 HOXC6
T. nigrov. central PP053835,053854
HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov.
posterior PP053827,084287,053846 HOX10
HOXA10 HOXC10 HOXD10 T. nigrov.
posterior PP053858,053840,053824 HOX11
HOXA11 HOXC11 HOXD11 T. nigrov.
posterior PP053838,087941 HOX12
HOXC12 HOXD12 T. nigrov.
posterior PP053842,089685,053828 HOX13
HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov.
posterior PP053853,053830,024984,053839 HOX4
HOXA4 HOXB4 HOXC4 HOXD4 A. gamb.
central PP027791 TLX TLX1
TLX2 TLX3 A. gamb. PP070659
HOX7 HOXA7 HOXB7
G. acul. central PP049478
HOX8 HOXB8 HOXC8 HOXD8 C. intest.
central PP022041 MSX
MSX1 MSX2 C. eleg.
First vertebrate
Non- vertebrate
Vertebrate
Non- vertebrate
Non- vertebrate
18Conclusions
- PhyloPat quick and easy tool for phylogenetic
pattern search on complete Ensembl database - Also usable for study of lineage-specific
expansions of genes - Just updated to Ensembl v41 (released last
Thursday) 5 new species - D.nov E.tel L.afr O.cun
O.lat - extra option gene neighborhood
19Gene neighborhood
Conservation of gene order functionally related
Equal color belonging to same orthologous group
20Acknowledgements
supervisor
- Supervision
- Peter Groenen
- Jacob de Vlieg
- Fruitful discussions
- Wilco Fleuren
- Erik Franck
- Nanning de Jong
- Arnold Kuzniar
head of group
suggestions
suggestions
suggestions
suggestions
21Where to find
- Web interface
- http//www.cmbi.ru.nl/phylopat
- (accessible through www.cmbi.ru.nl and
www.nbic.nl) - Publication
- Hulsen T., Groenen P.M.A., de Vlieg J.
- BMC Bioinformatics 2006, 7 398
- http//www.biomedcentral.com/1471-2105/7/398
- Powered by Ensembl
- http//www.ensembl.org/info/about/ensembl_powered
.html
22Bioinformatics Werkbespreking 2006-11-07
- 1 PhyloPat phylogenetic pattern analysis of
eukaryotic genes (20 slides) - 2 Chicken-human immunogenomics project (9
slides)
23Chicken-human immunogenomics project (part of
Biorange SP3.2.2)
In collaboration with Martien Groenen,
Hinri Kerstens (Animal Sciences Group, Wageningen
UR)
- Goals
- study evolution of genes/proteins involved in
immune system, from chicken to human - check for expansions and deletions in families
- zoom in to interesting families
24Proteins -gt Genes
- Earlier initiatives based on proteins (Protein
World, IPI, ParAlign, MCL) - Disadvantages
- large scale computations needed for orthology
determination - Difficult to study lineage-specific expansions
because of alternative transcripts, isoforms - Difficult to connect to WUR synteny data
- --gt Genes connect to PhyloPat tool
25PhyloPat
- PhyloPat queries the orthologies of all complete
genomes within Ensembl database using
phylogenetic patterns - Advantages
- Usage of accurate orthology determination of
Ensembl (BLAST/SW, MUSCLE, PHYML), single linkage
clustering by ourselves) - No alternative transcripts, isoforms
- Easy to connect to WUR synteny data
- 26 species, from S.cer. to H.sap.
- Disadvantage
- Genome information sometimes incomplete (but
Pre-versions and low coverage genomes are not
included)
26Immunophyle
- Application to immune system parse through
PhyloPat set using IRIS database - Take all HUGO IDs from IRIS database, input in
PhyloPat -gt 585 immunologic lineages containing
18,933 genes from 26 species - Divided into immunologic 22 categories from IRIS
database (adaptive immunity, innate immunity,
inflammation, chemotaxis, etc. - Connected to GO, InterPro, KEGG, etc. by FatiGO
27Immunophyle
- http//www.cmbi.ru.nl/immunophyle
28Categories
29Example Toll-like receptors
GeneGo MetaCore, canonical pathway
30Example Toll-like receptors
Check ImmunoPhyle for each gene involved in the
TLR pathway
Green first occurrence
Red deletion
31Current/future directions
- Connect to literature (CoPub?)
- Connect to expression data, protein interaction
data - Zoom in to families immunology expertise needed!