Nothing in (computational) biology makes - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Nothing in (computational) biology makes

Description:

Title: No Slide Title Author: Michael Fetchko Last modified by: Michael Fetchko Created Date: 11/20/2002 3:37:49 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 47
Provided by: Michael2840
Category:

less

Transcript and Presenter's Notes

Title: Nothing in (computational) biology makes


1
Comparative genomics, genome context and genome
annotation
Nothing in (computational) biology makes sense
except in the light of evolution
after Theodosius Dobzhansky (1970)
2
Genome context analysis and genome annotation
Using information other than homologous
relationships between individual gene/proteins
for functional prediction (guilt by association)
Types of context analysis
  • phyletic patterns
  • domain fusion (Rosetta Stone proteins)
  • gene order conservation
  • co-expression
  • .

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Goals
  • Using gene sets from complete genomes,
    delineate families of orthologs and paralogs -
    Clusters of Orthologous Groups (of genes) (COGs)
  • Using COGs, develop an engine
  • for functional annotation of new
  • genomes
  • Apply COGs for analysis of phylogenetic
    patterns

7
COG - group of homologous proteins such that
all proteins from different species are orthologs
(all proteins from the same species in a COG are
paralogs)
8
CONSTRUCTION OF COGs FOR 8 COMPLETE GENOMES
Complete set of proteins from the analyzed
genomes
Merge triangles with common edges
1
6
FULL SELF-COMPARISON (BLASTPGP, no cut-off)
Detect groups with multidomain proteins and
isolate domains
2
5
Collapse obvious paralogs
3
REPEAT STEPS 3-5
Detect all interspecies Best Hits (BeTs) between
individual proteins or groups of paralogs
4
COGs
Detect all triangles of consistent BeTs
9
A TRIANGLE OF BeTs IS A MINIMAL, ELEMENTARY COG
10
A RELATIVELY SIMPLE COG PRODUCED BY MERGING
ADJACENT TRIANGLES
11
A COMPLEX COG WITH MULTIPLE PARALOGS
12
Current status of the COGs
Prokaryotes
11 Archaea 1 unicellular eukaryote 46
bacteria 58 complete genomes
149,321 proteins
105,861 proteins in 4075 COGs (71)
Eukaryotes
4 animals 1 plant 2 fungi 1 microsporidium
8 complete genomes 142,498 proteins
74,093 proteins in 4822 COGs
(52)
13
COGnitor...
14
IN ACTION
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
The Universal COGs
19
Search for genomic determinants of
hyperthermophily
20
(No Transcript)
21
Search for unique archaeo-eukaryotic genes
22
(No Transcript)
23
A complementary pattern search for unique
bacterial genes
24
(No Transcript)
25
Essential function but holes in the
phyletic pattern
Strict complementary pattern
26
(No Transcript)
27
Relaxed complementary pattern
28
(No Transcript)
29
Relaxed complementary pattern with extra
restrictions
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Conservation of gene order in bacterial species
of the same genus
M. genitalium vs M. pneumoniae
35
Conservation of gene order in closely related
bacterial genera
C. trachomatis vs C. pneumoniae
36
Lack of gene order conservation - even in
closely related bacteria of the same
Proteobacterial subdivision
P. aeruginosa vs E. coli
37
Genome Alignments - Method
Protein sets from completely genomes
BLAST cross-comparison
Table of Hits
Pairwise Genome Alignment Local alignment
algorithm Lamarck (gap opening penalty, gap
extension penalty) statistics with Monte Carlo
simulations
Template-Anchored Genome Alignment
38
(No Transcript)
39
Genome Alignments - Statistics
Distribution of conserved gene string lengths
40
Genome Alignments - Statistics
Pairwise No. No. in in alignments
strings genes Gen1 Gen2 all
homologs ecoli-hinf 138 566 13 33 ecoli-bsub 8
9 322 8 8 ecoli-mjan 10 30 1 2 probable
orthologs ecoli-hinf 105 482 11 28 ecoli-bsub
34 168 4 4 ecoli-mjan 12 33 1 2
41
Genome Alignments - Statistics
Breakdown of genes in the genome
42
Genome Alignments - Statistics
Fraction of the genome in conserved gene strings
- from template-anchored alignments Minimum Synec
hocystis sp. 5 Aquifex aeolicus 10 Archaeoglo
bus fulgidus 13 Escherichia coli 14 Treponema
pallidum 17 Maximum Thermotoga
maritima 23 Mycoplasma genitalium 24
43
Context-Based Prediction of Protein Functions
A Novel Translation Factor (COG0536)
L21
L27
GTPase?
GTP-binding translation factor
44
Context-Based Prediction of Protein Functions
A Novel Translation Factor (COG0012)
TGS domain containing GTPase?
Peptidyl-tRNA hydrolase
GTP-binding translation factor
45
(No Transcript)
46
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com