Paramvir S Dehal, Jeffrey L. Boore

1 / 16

About This Presentation

Title:

Paramvir S Dehal, Jeffrey L. Boore

Description:

Gives definition to the evolution of the genome and specie. ... Figure 4: Species tree generated by cluster analysis.1. MSA and Phylogenetic Tree Extrapolation ... –

Number of Views:46

Avg rating:3.0/5.0

Slides: 17

Provided by: jaredc4

Category:

more less

Transcript and Presenter's Notes

Title: Paramvir S Dehal, Jeffrey L. Boore

1
A phylogenomic gene cluster resource the
Phylogenetically Inferred Groups (PhIGs) database

Paramvir S Dehal, Jeffrey L. Boore

Presented By Jared Carter
2
Principal Objectives

Define and add context to the growing need for
better methodology for high-throughput sorting of
genes into Orthologous families.
Outline the steps involved in the computational
framework for the identification of these sets.
Examination of model output data
Analysis of overall model effectiveness.

3
Introduction

Increasingly large whole genome projects require
further methodologies to increase the overall
contextual value of the data set.
Delving into the evolutionary history of each
gene leads to a more robust understanding of its
function.
Gives definition to the evolution of the genome
and specie.
Answers questions regarding the functional and
biochemical processes of the genome.
Previously homologs identified by pairwise
comparisons has been used, with many drawbacks.
Incorrect assignments caused by gene-duplication
events
Accelerated rates of AA substitution
Domain shuffling

4
Methodology Contextual Background

Uses known evolutionary relationships to
construct gene clusters of queried genomes.
Analyzes each cluster for evolutionary
relationships between the queried genes.
Goal is to reconstruct the evolutionary history
of each family.
Allows for whole genome analysis with emphasis on
the evolutionary background of each gene.
Can discriminate and classify numerous additional
types of evolutionary events (gene duplications,
AA substitution rates, and gene loss)

5
Methodology Overview

Creates and populates a relational database with
all known annotations for each taxon.
All sequence data and annotations available are
included
Data such as previous sequence alignments and
trees are also included
General process involves 5 steps
Step 1 All against All BLASTp
Step 2 Global alignment and distance calculation
Step 3 Hierarchal Clustering
Step 4 Multiple Sequence Alignment (MSA)
Step 5 Gene Tree Construction

6
Methodology Workflow Diagram
Figure 1 Work flow diagram illustrating analysis
pipeline for processing select gene models. 1
7
BLASTp and Global Alignment

All against All BLASTp generates local alignments
which then must be processed into a global
alignment.
ClustalW is used to align each protein pair.
These resulting alignments are stored in the PhIG
database
Distances are calculated using the JTT matrix
used in the ProDist program (PHYLIP)
Used later in the clustering process in
conjunction with the gap-free alignment lengths

Figure 2 BLASTp query output excerpt. 2
8
Gene Cluster Analysis

Process takes in the known evolutionary
relationships of the selected organisms as well
as the all pairwise protein distances
Uses an iterative approach, starting at the base
of the best known evolutionary tree at the common
ancestral gene, and then extends upwards
For each bifurcating node two new clades, A and
B, are created, with the remaining taxa being
added to an out-group.
Genes from clade A are more similar to each other
than in clade B.
Similarly genes from A and B are more closely
related than those in the out-group.
This is accomplished by comparing seeds (pairs of
sequences) and assigning a match quality.
Clusters grow by determining the shortest
distance from A and B.
If Proteins have a shorter distance than A-B it
is added to the cluster
This is repeated until all proteins have been
clustered

9
Protein Distance Map and Cluster Tree
Figure 4 Species tree generated by cluster
analysis.1
Figure 3 Protein Distance map of a pairwise
alignment.1
10
MSA and Phylogenetic Tree Extrapolation

An MSA is performed on each cluster, using
ClustalW.
Alignments are trimmed to remove gaps, or removed
if fewer than 100 AA are aligned successfully.
Trees are generated using a precise method
The quartet puzzling maximum likelihood method,
implemented by TREE-PUZZLE using JTT
The generated tree is assisted by the known
evolutionary relationships, further defining the
phylogeny.
MSAs are also used to find Hidden Markov Models,
which give better structural representation to
sparsely sampled genomes.

11
Phylogenetic Tree
Figure 5 Phylogenetic Tree incorporating the
cluster tree and known phylogenetic data.
Illustrates numerous duplication events as well
as the proportionality of AA substitutions
(represented by varying branch lengths). 1
12
Gene View, Annotation and Synteny Maps

Gene View
All known annotations to a gene can be displayed,
as well as its location on the chromosome of all
selected species.
Querying
The database can be searched for exact and
relative sequences, or text matches to search
fields.
Several specialized search tools exist, such as
HMMER, which searches directly against Hidden
Markov Models.
Synteny Maps
Produce a set of one-on orthologs plotted in
their relative locations across multiple genomes.
Performed by selected a reference sequence span
of one species, and the species to be queried.

13
Synteny Map
Figure 6 Synteny map of selected sequence with
rectangles representing genes, to the left and
right orthologs. Note Black lines indicated
similar transcriptional orientation, red lines
indicate inversion.1
14
Model Effectiveness Overview

Very efficient at analyzing evolutionary patterns
within whole genomes.
Assists in transferring annotations
Identifies sources of evolutionary pressure on
genes
Utilizes the applicable known evolutionary
history of the genomes studied
Creates clusters as descendants from one
ancestral gene
Combines phylogenetic data with functional
annotation, gene structure and genomic position.
Multiple, varied applications
Organismal Phylogeny construction
Genome evolution via gene duplications
Gene structure evolution via gain/loss of exons,
introns and domains.
Identification of Gene family expansions and
losses
Genome evolution as a whole.

15
Conclusions

With continuing development this highly modular
tool which is well founded in current analytical
molecular methodology will prove to be quite
useful in adding historical context to genome
data.
Understanding the history of a gene allows for a
better realization of that genes function in the
organism by relating it to other, similar
organisms which have also conserved its function.

16
Resources

1 Dehal, Paramvir S., Boore, Jeffrey L., A
phylogenomic gene cluster resource the
Phylogenetically Inferred Groups (PhIGs)
database, BMC Bioinformatics, 2006, 7201
2 NCBI, BLASTp protein-protein query,
http//www.ncbi.nlm.nih.gov/BLAST/Blast.cgi

Write a Comment

User Comments (0)