Paramvir S Dehal, Jeffrey L. Boore - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Paramvir S Dehal, Jeffrey L. Boore

Description:

Gives definition to the evolution of the genome and specie. ... Figure 4: Species tree generated by cluster analysis.1. MSA and Phylogenetic Tree Extrapolation ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 17
Provided by: jaredc4
Category:

less

Transcript and Presenter's Notes

Title: Paramvir S Dehal, Jeffrey L. Boore


1
A phylogenomic gene cluster resource the
Phylogenetically Inferred Groups (PhIGs) database
  • Paramvir S Dehal, Jeffrey L. Boore

Presented By Jared Carter
2
Principal Objectives
  • Define and add context to the growing need for
    better methodology for high-throughput sorting of
    genes into Orthologous families.
  • Outline the steps involved in the computational
    framework for the identification of these sets.
  • Examination of model output data
  • Analysis of overall model effectiveness.

3
Introduction
  • Increasingly large whole genome projects require
    further methodologies to increase the overall
    contextual value of the data set.
  • Delving into the evolutionary history of each
    gene leads to a more robust understanding of its
    function.
  • Gives definition to the evolution of the genome
    and specie.
  • Answers questions regarding the functional and
    biochemical processes of the genome.
  • Previously homologs identified by pairwise
    comparisons has been used, with many drawbacks.
  • Incorrect assignments caused by gene-duplication
    events
  • Accelerated rates of AA substitution
  • Domain shuffling

4
Methodology Contextual Background
  • Uses known evolutionary relationships to
    construct gene clusters of queried genomes.
  • Analyzes each cluster for evolutionary
    relationships between the queried genes.
  • Goal is to reconstruct the evolutionary history
    of each family.
  • Allows for whole genome analysis with emphasis on
    the evolutionary background of each gene.
  • Can discriminate and classify numerous additional
    types of evolutionary events (gene duplications,
    AA substitution rates, and gene loss)

5
Methodology Overview
  • Creates and populates a relational database with
    all known annotations for each taxon.
  • All sequence data and annotations available are
    included
  • Data such as previous sequence alignments and
    trees are also included
  • General process involves 5 steps
  • Step 1 All against All BLASTp
  • Step 2 Global alignment and distance calculation
  • Step 3 Hierarchal Clustering
  • Step 4 Multiple Sequence Alignment (MSA)
  • Step 5 Gene Tree Construction

6
Methodology Workflow Diagram
Figure 1 Work flow diagram illustrating analysis
pipeline for processing select gene models. 1
7
BLASTp and Global Alignment
  • All against All BLASTp generates local alignments
    which then must be processed into a global
    alignment.
  • ClustalW is used to align each protein pair.
  • These resulting alignments are stored in the PhIG
    database
  • Distances are calculated using the JTT matrix
    used in the ProDist program (PHYLIP)
  • Used later in the clustering process in
    conjunction with the gap-free alignment lengths

Figure 2 BLASTp query output excerpt. 2
8
Gene Cluster Analysis
  • Process takes in the known evolutionary
    relationships of the selected organisms as well
    as the all pairwise protein distances
  • Uses an iterative approach, starting at the base
    of the best known evolutionary tree at the common
    ancestral gene, and then extends upwards
  • For each bifurcating node two new clades, A and
    B, are created, with the remaining taxa being
    added to an out-group.
  • Genes from clade A are more similar to each other
    than in clade B.
  • Similarly genes from A and B are more closely
    related than those in the out-group.
  • This is accomplished by comparing seeds (pairs of
    sequences) and assigning a match quality.
  • Clusters grow by determining the shortest
    distance from A and B.
  • If Proteins have a shorter distance than A-B it
    is added to the cluster
  • This is repeated until all proteins have been
    clustered

9
Protein Distance Map and Cluster Tree
Figure 4 Species tree generated by cluster
analysis.1
Figure 3 Protein Distance map of a pairwise
alignment.1
10
MSA and Phylogenetic Tree Extrapolation
  • An MSA is performed on each cluster, using
    ClustalW.
  • Alignments are trimmed to remove gaps, or removed
    if fewer than 100 AA are aligned successfully.
  • Trees are generated using a precise method
  • The quartet puzzling maximum likelihood method,
    implemented by TREE-PUZZLE using JTT
  • The generated tree is assisted by the known
    evolutionary relationships, further defining the
    phylogeny.
  • MSAs are also used to find Hidden Markov Models,
    which give better structural representation to
    sparsely sampled genomes.

11
Phylogenetic Tree
Figure 5 Phylogenetic Tree incorporating the
cluster tree and known phylogenetic data.
Illustrates numerous duplication events as well
as the proportionality of AA substitutions
(represented by varying branch lengths). 1
12
Gene View, Annotation and Synteny Maps
  • Gene View
  • All known annotations to a gene can be displayed,
    as well as its location on the chromosome of all
    selected species.
  • Querying
  • The database can be searched for exact and
    relative sequences, or text matches to search
    fields.
  • Several specialized search tools exist, such as
    HMMER, which searches directly against Hidden
    Markov Models.
  • Synteny Maps
  • Produce a set of one-on orthologs plotted in
    their relative locations across multiple genomes.
  • Performed by selected a reference sequence span
    of one species, and the species to be queried.

13
Synteny Map
Figure 6 Synteny map of selected sequence with
rectangles representing genes, to the left and
right orthologs. Note Black lines indicated
similar transcriptional orientation, red lines
indicate inversion.1
14
Model Effectiveness Overview
  • Very efficient at analyzing evolutionary patterns
    within whole genomes.
  • Assists in transferring annotations
  • Identifies sources of evolutionary pressure on
    genes
  • Utilizes the applicable known evolutionary
    history of the genomes studied
  • Creates clusters as descendants from one
    ancestral gene
  • Combines phylogenetic data with functional
    annotation, gene structure and genomic position.
  • Multiple, varied applications
  • Organismal Phylogeny construction
  • Genome evolution via gene duplications
  • Gene structure evolution via gain/loss of exons,
    introns and domains.
  • Identification of Gene family expansions and
    losses
  • Genome evolution as a whole.

15
Conclusions
  • With continuing development this highly modular
    tool which is well founded in current analytical
    molecular methodology will prove to be quite
    useful in adding historical context to genome
    data.
  • Understanding the history of a gene allows for a
    better realization of that genes function in the
    organism by relating it to other, similar
    organisms which have also conserved its function.

16
Resources
  • 1 Dehal, Paramvir S., Boore, Jeffrey L., A
    phylogenomic gene cluster resource the
    Phylogenetically Inferred Groups (PhIGs)
    database, BMC Bioinformatics, 2006, 7201
  • 2 NCBI, BLASTp protein-protein query,
    http//www.ncbi.nlm.nih.gov/BLAST/Blast.cgi
Write a Comment
User Comments (0)
About PowerShow.com