New Tools for Visualizing Genome Evolution - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

New Tools for Visualizing Genome Evolution

Description:

Our research focuses on how information can be gained from the molecular record: ... using phylogenetics will be a key ingredient to unravel the life's early history. ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 26
Provided by: aisrp
Category:

less

Transcript and Presenter's Notes

Title: New Tools for Visualizing Genome Evolution


1
New Tools for VisualizingGenome Evolution
  • Lutz Hamel
  • Dept. of Computer Science and Statistics
  • University of Rhode Island
  • J. Peter Gogarten
  • Dept. of Molecular and Cell Biology
  • University of Connecticut

2
Motivation
  • Early life on Earth has left a variety of traces
    that can be utilized to reconstruct the history
    of life
  • the fossil and geological records
  • information retained in living organisms
  • Our research focuses on how information can be
    gained from the molecular record
  • information about the history of life that is
    retained in the structure and sequences of
    macromolecules found in extant organisms
  • The analyses of the mosaic nature of genomes
    using phylogenetics will be a key ingredient to
    unravel the life's early history.

3
Relevance to NASA
  • The analyses are relevant in the context of
    NASA's Origin theme
  • Understand the origin and evolution of life on
    Earth.
  • We address questions that are central to NASA's
    Astrobiology program
  • Understand how past life on Earth interacted with
    its changing planetary and Solar System
    environment.
  • Understand the evolutionary mechanisms and
    environmental limits of life.

4
Phylogenetics
  • Phylogenetics (Greek phylon race and genetic
    birth) is the taxonomical classification of
    organisms based on how closely they are related
    in terms of evolutionary differences.

5
Phylogenetic progression as envisioned by Darwin
  • From C. Darwin
  • Origin of Species

6
Haloferax
Riftia
E.coli
mitochondria
Chromatium
Methanospirillum
Agrobacterium
Chlorobium
Methanosarcina
Sulfolobus
Cytophaga
Methanobacterium
Thermoproteus
Epulopiscium
Thermofilum
Methanococcus
Bacillus
chloroplast
pSL 50
Thermococcus
Synechococcus
pSL 4
Methanopyrus
Treponema
pSL 22
Thermus
pSL 12
Deinococcus
Thermotoga
ORIGIN
Aquifex
Marine
EM 17
pJP 27
group 1
pJP 78
SSU-rRNA Tree of Life
E
UCARYA
Tritrichomonas
Zea
Homo
Coprinus
Paramecium
Giardia
Hexamita
Porphyra
Dictyostelium
Vairimorpha
Physarum
Naegleria
Fig. modified from Norman Pace
Entamoeba
Euglena
Trypanosoma
Encephalitozoon
7
Phylogenetics Classic View
  • All genes are inherited from ancestor.
  • Branching reflects speciation events.
  • Evolutionary tree follows very closely the
    SSU-rRNA tree.

8
However
Aquifex is assigned to different branches of the
tree
Science, 280 p.672ff (1998)
9
  • Horizontal Gene Transfer (HGT) leads to Mosaic
    Genomes, where different parts of the genome have
    different histories.

(a) concordant genes, (b) according to 16S (and
other conserved genes) (c) according to
phylogenetically discordant genes
Gophna, U., Doolittle, W.F. Charlebois, R.L.
Weighted genome trees refinements and
applications. J. Bacteriol. (in press)
10
A Revised Tree of Life
11
Evolutionary Processes Analogous to the Ones
Proposed to Occur in the Microbial World
12
Visualizing Phylogenies
  • Visualize the relation of four organisms at a
    time
  • three unrooted trees
  • plot the support of various genes for each of the
    tree topologies in an equilateral triangle

Orthologous Gene Families
13
Visualizing Phylogenies
Synechocystis sp. (cyanobact.) Chlorobium tepidum
(GSB) Rhodobacter capsulatus (?-prot)
Rhodopseudomonas palustris (?-prot)

14
Constructing the Visualization
BLAST every genome against every other
genome
Select top hit of every BLAST search
Download four genomes (genome
quartet) a.a.sequences
Detect quartets of orthologs
Align quartets of orthologues using ClustalW
Convert probabilities (barycentric coordinates)
into Cartesian coordinates
Calculate maximum-likelihood values and
posterior probabilities for all three tree
topologies
Plot all points onto equilateral triangle
15
Visualizing Five Genomes
  • Five genomes gt fifteen unrooted trees
  • Rather than triangle - dekapentagon

A Archaeoglobus S Sulfolobus Y Yeast R
Rhodobacter B Bacillus
Zhaxybayeva O, Hamel L, Raymond J, Gogarten JP
Visualization of Phylogenetic Content of Five
Genomes with Dekapentagonal Maps. Genome Biology
2004, 5R20
16
Visualizing Multiple Genomes
  • Given this explosion, plotting all possible
    relationships as unrootedtrees is impossible.

For comparison the universe contains only about
1089 protons and has an age of about 51017
seconds or 51029 picoseconds.
17
Visualizing Multiple Genomes SOMs
  • SOM ? Self-Organizing Map
  • An artificial neural network approach to
    clustering
  • we are looking for clusters of genes which favor
    certain tree topologies
  • Advantages over other clustering approaches
  • No a priori knowledge of how many clusters to
    expect
  • Explicit summary of commonalities and differences
    between clusters
  • Cluster membership is not exclusive a gene can
    indicate membership in multiple clusters at the
    same time
  • Visually appealing representation

T. Kohonen, Self-organizing maps, 3rd ed. Berlin
New York Springer, 2001.
18
Visualization with SOMs
4
11/4
11
12
9
15
19
Training a SOM
Data Set
SOM Neural Elements
k
y
x
mi
SOM Regression Equations
  • - learning rate
  • ? - neighborhood distance

SOM Visualization
20
Anatomy of a Trained SOM
In our case k 15 for the fifteen tree
topologies.
21
Visualizing a Larger Number of Genomes
  • 13 gamma-proteobacterial
  • genomes (258 putative orthologs)
  • E.coli
  • Buchnera
  • Haemophilus
  • Pasteurella
  • Salmonella
  • Yersinia pestis (2 strains)
  • Vibrio
  • Xanthomonas (2 sp.)
  • Pseudomonas
  • Wigglesworhtia
  • Vibrio Cholerae

There are 13,749,310,575 possible unrooted tree
topologies for 13 genomes
? switch to bipartitions
22
Bipartition of a Phylogenetic Tree
Bipartition a division of a phylogenetic tree
into two parts that are connected by a single
branch. It divides a dataset into two groups,
but it does not consider the relationships within
each of the two groups.
Here 95 represents the bootstrap support for
the internal branch. The number of bipartitions
for N genomes is equal to 2(N-1)-N-1.
23
Bipartitions Lento Plot SOM
Strongly supported bipartitions
Strongly supported bipartitions in SOM
24
Conclusions Future Work
  • Self-Organizing Maps seem to be an effective way
    to visualize mosaic genome evolution.
  • Corroborate findings with other methodologies
  • Scalable
  • In the Future
  • Larger data sets
  • Locally Linear Embedding (LLE)

25
Acknowledgements
  • Olga A. ZhaxybayevaDept. of Biochemistry and
    Molecular BiologyDalhousie University
  • Maria PoptsovaDept. of Molecular and Cell
    BiologyUniversity of Connecticut
Write a Comment
User Comments (0)
About PowerShow.com