Introduction to Bioinformatics - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Introduction to Bioinformatics

Description:

For any three leaves i, j, k, distances dij, dik, djk ... k, m, two of the distances dij dkm, dik djm, dim djk are equal and greater than the third ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 15
Provided by: chiche
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics


1
Introduction to Bioinformatics
  • Phylogenetics
  • Part II
  • Distance-Based Methods

2
Distance Matrix
  • (Evolutionary) Distance
  • Many possible measures
  • Fraction of sites that differ between two
    sequences
  • of changes needed to convert one sequence to
    another (count mismatches, substitution models,
    )
  • Distance Matrix
  • Matrix of pairwise distances between all
    sequences
  • Used to generate tree
  • Varies with construction method, distance metric

3
Distance in Phylogenetic Tree
  • Distances are ultrametric if
  • Same rate of change on all branches in tree (rare
    in practice)
  • All leaves equidistant from root
  • Also known as a molecular clock
  • Distance matrix must satisfy the following
    3-point condition
  • For any three leaves i, j, k, distances dij, dik,
    djk
  • two of three distances are equal and third

4
Distance in Phylogenetic Tree
  • Distances are additive if
  • Distance between any two leaves i j on tree
    sum of lengths of edges connecting i j
  • Distance matrix must satisfy the following
    4-point condition
  • For any four leaves i, j, k, m, two of the
    distances dijdkm, dikdjm, dimdjk are equal and
    greater than the third
  • In fact, the difference is 2 x the length of the
    bridge edge(s)

5
UPGMA
  • UPGMA (Unweighted Pair Group Method using
    Arithmetic Averages) Sokal Michener 1958
  • Algorithm
  • 1. Find pair of sequences (or clusters) A, B with
    smallest distance dAB
  • 2. Insert join for A, B at tree height ½ dAB.
    A and B thus form a new cluster.
  • 3. Update distance of any other sequence/cluster
    X to new cluster as ½ (dAX dBX)
  • 4. Repeat until all sequences / clusters joined
  • 5. Produces rooted tree
  • Assumptions
  • Distances for tree are ultrametric
  • Branch lengths for 2 leaves same after join
  • Distances for tree are additive

similar algorithms vary at this step
6
UPGMA Example
  • Given sequences
  • Build distance matrix

7
UPGMA Example
  • Form clusters
  • Next step?

8
Transformed Distance Method
  • Weakness of UPGMA
  • Assume constant evolution rate across lineage
  • Example Consider sequences A, B, C, and D is
    Figure 4.5. UPGMA cluster A and C first.
  • Transformed Distance Method J. Farris, 1977
  • Take advantage of the power of an outgroup
  • Similar to UPGMA except for the distance matrix
  • Algorithm
  • Select an outgroup D
  • Transformed distance between i and j
  • dij (dij diD djD)/2 (?dkD)/n
  • where n is ingroups
  • Run UPGMA with matrix of dij

9
Transformed Distance Method
  • Example
  • Select D as the outgroup
  • Calculate transformed distance
  • (?dkD)/n (dAD dBD dCD)/3
  • (12 15 10)/3 37/3
  • dAB (dAB dAD dBD)/2 37/3
  • (9 12 15)/2 37/3 10/3
  • dAC (dAC dAD dCD)/2 37/3
  • (8 12 10)/2 37/3 16/3
  • dBC (dBC dBD dCD)/2 37/3
  • (11 15 10)/2 37/3 16/3
  • Construct new distance matrix
  • Run UPGMA

10
Transformed Distance Method
  • Example (contd)
  • How do you compute the length of a lineage?

11
Neighbor-Joining Method
  • Goal
  • Join closest neighbors (nodes w / same parent) in
    tree
  • Avoids problem with UPGMA when rates of change
    differ
  • Example
  • Closest leaves not neighbors in correct tree, but
    joined first by UPGMA (see previous example)
  • Assumptions
  • Rate of change can differ
  • Branch lengths may differ after join
  • Branch lengths for tree are additive

12
Neighbor-Joining Method
  • Approach
  • To find closest pair of neighbors
  • Reduce branch length for a node by
    (approximately) the average distance of the node
    from all other nodes
  • Find smallest distance between nodes (after
    reduction)
  • Definitions
  • For all pairs of nodes A B in set of all nodes
    L, let
  • dA,B distance between A,B
  • RX ? dX,N where N ? L (total distance from X to
    all N)
  • rX RX / (n 2),where n of nodes
  • (normalized divergence from X to all other nodes)
  • QA,B (n 2) dA,B (RA RB) (rate-corrected
    distance)
  • Key property - 2 nodes w/ minimum Q are always
    neighbors!

13
Neighbor-Joining Method
  • Algorithm Saitou Nei 1987, Studier Keppler
    1988
  • 1. Begin with star tree all sequences as nodes
    in L
  • 2. Find pair of nodes A B ? L with minimum QA,B
  • 3. Create insert new join (node K) w/ branch
    lengths
  • dA,K ½ (dA,B rA rB)
  • dB,K ½ (dA,B rB rA)
  • 4. For remaining nodes C ? L, update distance to
    K as
  • dK,C ½ (dA,C dB,C dA,B)
  • 5. Insert K and remove A, B from L
  • 6. Repeat steps 2-5 until only two nodes left

K
A
B
14
Neighbor-Joining Method
  • Example
Write a Comment
User Comments (0)
About PowerShow.com