Stuart M' Brown - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Stuart M' Brown

Description:

Taxonomy is the art of classifying things into groups a quintessential human ... gene transfer (hybridization, vector mediated DNA movement, or direct uptake of DNA) ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 28
Provided by: stuart67
Category:
Tags: brown | stuart

less

Transcript and Presenter's Notes

Title: Stuart M' Brown


1
Molecular PhylogeneticsComputing Evolution
presented by
  • Stuart M. Brown
  • New York University School of Medicine

2
Topics
  • A. Molecular Evolution
  • B. Calculating Distances
  • C. Clustering Algorithms
  • D. Computer Software

Portions of this lecture have been inspired by
web pages created by Dr. Brian Golding,
Department of Biology, McMaster University,
Hamilton, Ontario, Canada, L8S 4K1
3
Evolution
  • The theory of evolution is the foundation upon
    which all of modern biology is built.
  • From anatomy to behavior to genomics, the
    scientific method requires an appreciation of
    changes in organisms over time.
  • It is impossible to evaluate relationships among
    gene sequences without taking into consideration
    the way these sequences have been modified over
    time

4
Relationships
  • Similarity searches and multiple alignments
    of sequences naturally lead to the question
  • How are these sequences related?
  • and more generally
  • How are the organisms from which these
    sequences come related?

5
Taxonomy
  • The study of the relationships between groups of
    organisms is called taxonomy, an ancient and
    venerable branch of classical biology.
  • Taxonomy is the art of classifying things into
    groups a quintessential human behavior
    established as a mainstream scientific field by
    Carolus Linnaeus (1707-1778).

6
(No Transcript)
7
Phylogenetics
  • Evolutionary theory states that groups of
    similar organisms are descended from a common
    ancestor.
  • Phylogenetic systematics (cladistics) is a method
    of taxonomic classification based on their
    evolutionary history.
  • It was developed by Willi Hennig, a German
    entomologist, in 1950.

8
Molecular Evolution
  • Phylogenetics often makes use of numerical data,
    (numerical taxonomy) which can be scores for
    various character states such as the size of a
    visible structure or it can be DNA sequences.
  • Similarities and differences between organisms
    can be coded as a set of characters, each with
    two or more alternative character states.
  • In an alignment of DNA sequences, each position
    is a separate character, with four possible
    character states, the four nucleotides.

9
DNA is a good tool for taxonomy
  • DNA sequences have many advantages over
    classical types of taxonomic characters
  • Character states can be scored unambiguously
  • Large numbers of characters can be scored for
    each individual
  • Information on both the extent and the nature of
    divergence between sequences is available
    (nucleotide substitutions, insertion/deletions,
    or genome rearrangements)

10
A aat tcg ctt cta gga atc tgc cta atc ctg B
... ..a ..g ..a .t. ... ... t.. ... ..a C ...
..a ..c ..c ... ..t ... ... ... t.a D ... ..a
..a ..g ..g ..t ... t.t ..t t..
Each nucleotide difference is a character
11
Sequences Reflect Relationships
  • After working with sequences for a while, one
    develops an intuitive understanding that for a
    given gene, closely related organisms have
    similar sequences and more distantly related
    organisms have more dissimilar sequences. These
    differences can be quantified.
  • Given a set of gene sequences, it should be
    possible to reconstruct the evolutionary
    relationships among genes and among organisms.

12
(No Transcript)
13
Protein Evolution
  • Protein sequences can be used to study more
    distant evolutionary relationships
  • Related proteins have "conserved" substitutions
    (amino acids with similar biochemical properties).

14
What Sequences to Study?
  • Different sequences accumulate changes at
    different rates - chose level of variation that
    is appropriate to the group of organisms being
    studied.
  • Proteins (or protein coding DNAs) are constrained
    by natural selection.
  • Some sequences are highly variable (rRNA spacer
    regions, immunoglobulin genes), while others are
    highly conserved (actin, rRNA coding regions)
  • Different regions within a single gene can evolve
    at different rates (conserved vs. variable
    domains)

15
Orthologs vs. Paralogs
  • When comparing gene sequences, it is important to
    distinguish between identical vs. merely similar
    genes in different organisms.
  • Orthologs are homologous genes in different
    species with analogous functions.
  • Paralogs are similar genes that are the result of
    a gene duplication.
  • A phylogeny that includes both orthologs and
    paralogs is likely to be incorrect.
  • Sometimes phylogenetic analysis is the best way
    to determine if a new gene is an ortholog or
    paralog to other known genes.

16
Biodiversity and Conservation
  • Phylogenetics also overlaps significantly with
    other branches of evolutionary biology.
  • Check out the Tree of Life project for an
    introduction to phylogenetics and its
    relationship to biodiversity.
  • http//phylogeny.arizona.edu/tree/phylogeny.html
  • Measurements of DNA sequence differences are now
    being used to implement plans for the
    conservation of genetic resources.

17
Genes vs. Species
  • Relationships calculated from sequence data
    represent the relationships between genes, this
    is not necessarily the same as relationships
    between species.
  • Your sequence data may not have the same
    phylogenetic history as the species from which
    they were isolated
  • Different genes evolve at different speeds, and
    there is always the possibility of horizontal
    gene transfer (hybridization, vector mediated DNA
    movement, or direct uptake of DNA).

18
Distances Measurements
  • It is often useful to measure the genetic
    distance between two species, between two
    populations, or even between two individuals.
  • The entire concept of numerical taxonomy is based
    on computing phylogenies from a table of
    distances.
  • In the case of sequence data, pairwise distances
    must be calculated between all sequences that
    will be used to build the tree - thus creating a
    distance matrix.
  • Distance methods give a single measurement of the
    amount of evolutionary change between two
    sequences since divergence from a common
    ancestor.

19
Computing a Distance Matrix
Reading sequences... gtr1_human 548
total, 548 read gtr2_human 548 total,
548 read gtr3_human 548 total, 548
read gtr4_human 548 total, 548 read
gtr5_human 548 total, 548
read Computing distances using Kimura method...
1 x 2 48.61 1 x 3 45.50 1
x 4 65.74 1 x 5 107.70 2 x 3
61.53 2 x 4 74.57 2 x 5 113.82
3 x 4 68.93 3 x 5 104.43 4 x 5
110.86
Matrix 1 1 2
3 4 5 ________________________
____________________________________ ..
1 0.00 48.61 45.50 65.74
107.70 2 0.00
61.53 74.57 113.82 3
0.00 68.93 104.43
4 0.00
110.86 5
0.00
20
DNA Distances
  • Distances between pairs of DNA sequences are
    relatively simple to compute as the sum of all
    base pair differences between the two sequences.
  • this type of algorithm can only work for pairs of
    sequences that are similar enough to be aligned
  • Generally all base changes are considered equal
  • Insertion/deletions are generally given a larger
    weight than replacements (gap penalties).
  • It is also possible to correct for multiple
    substitutions at a single site, which is common
    in distant relationships and for rapidly evolving
    sites.

21
(No Transcript)
22
Clustering Algorithms
  • Clustering algorithms use distances to calculate
    phylogenetic trees. These trees are based solely
    on the relative numbers of similarities and
    differences between a set of sequences.
  • Start with a list of related genes
  • Make a multiple alignment
  • Compute a matrix of pairwise distances
  • Clustering methods construct a tree by linking
    the least distant pairs, followed by successively
    more distant pairs.

23
Neighbor Joining
  • The Neighbor Joining method is the most popular
    way to build trees from distance measurements
  • (Saitou and Nei 1987, Mol. Biol. Evol. 4406)
  • Neighbor Joining corrects the distance method
    for its (frequently invalid) assumption that the
    same rate of evolution applies to each branch of
    a tree.
  • The distance matrix is adjusted for differences
    in the rate of evolution of each taxon (branch).
  • Neighbor Joining has given the best results in
    simulation studies and it is the most
    computationally efficient of the distance
    algorithms (N. Saitou and T. Imanishi, Mol.
    Biol. Evol. 6514 (1989)

24
Computer Software for Phylogenetics
  • Due to the lack of consensus among evolutionary
    biologists about basic principles for
    phylogenetic analysis, it is not surprising that
    there is a wide array of computer software
    available for this purpose.
  • PHYLIP is a free package that includes 30
    programs that compute various phylogenetic
    algorithms on different kinds of data.
  • The GCG package (available at most research
    institutions) contains a full set of programs for
    phylogenetic analysis including simple
    distance-based clustering and the complex
    cladistic analysis program PAUP (Phylogenetic
    Analysis Using Parsimony)
  • CLUSTALX is a multiple alignment program that
    includes the ability to create tress based on
    Neighbor Joining.

25
Phylogenetics on the Web
  • There are several phylogenetics servers available
    on the Web
  • some of these will change or disappear in the
    near future
  • these programs can be very slow so keep your
    sample sets small
  • The Institut Pasteur, Paris has a PHYLIP server
    at
  • http//bioweb.pasteur.fr/seqanal/phylogeny/phyli
    p-uk.html
  • Louxin Zhang at the Natl. University of
    Singapore has a WebPhylip server
  • http//sdmc.krdl.org.sg8080/lxzhang/phylip/
  • The Belozersky Institute at Moscow State
    University has their own "GeneBee" phylogenetics
    server
  • http//www.genebee.msu.su/services/phtree_reduced.
    html
  • The Phylodendron website is a tree drawing
    program with a nice user interface and a lot of
    options, however, the output is limited to gifs
    at 72 dpi - not publication quality. http//iu
    bio.bio.indiana.edu/treeapp/treeprint-form.html

26
Other Web Resources
  • Joseph Felsenstein (author of PHYLIP) maintains a
    comprehensive list of Phylogeny programs at
  • http//evolution.genetics.washington.edu/phylip/
    software.html
  • Introduction to Phylogenetic Systematics,
  • Peter H. Weston Michael D. Crisp, Society of
    Australian Systematic Biologists
  • http//www.science.uts.edu.au/sasb/WestonCrisp.htm
    l
  • University of California, Berkeley Museum of
    Paleontology (UCMP)
  • http//www.ucmp.berkeley.edu/clad/clad4.html

27
Software Hazards
  • There are a variety of programs for Macs and PCs,
    but you can easily tie up your machine for many
    hours with even moderately sized data sets (i.e.
    ten 300 bp sequences)
  • Moving sequences into different programs can be a
    major hassle due to incompatible file formats.
  • Just because a program can perform a given
    computation on a set of data does not mean that
    that is the appropriate algorithm for that type
    of data.
Write a Comment
User Comments (0)
About PowerShow.com