A Parallel Implementation of UPGMA - PowerPoint PPT Presentation

About This Presentation
Title:

A Parallel Implementation of UPGMA

Description:

From the Tree of the Life Website, University of Arizona. Orangutan. Gorilla. Chimpanzee. Human. 12 Species of Campanulaceae. Phylogenetic Data ... – PowerPoint PPT presentation

Number of Views:190
Avg rating:3.0/5.0
Slides: 56
Provided by: yuef
Learn more at: https://www.cse.sc.edu
Category:

less

Transcript and Presenter's Notes

Title: A Parallel Implementation of UPGMA


1
A Parallel Implementation of UPGMA
  • Feng Yue
  • Department of Computer Science
  • University of South Carolina
  • yue2_at_cse.sc.edu

2
Phylogeny
  • A phylogeny is a reconstruction of the
    evolutionary history of a collection of
    organisms.
  • It usually takes the form of a tree.
  • Modern organisms are placed at the leaves.
  • Edges denote evolutionary relationships.

3
Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
4
12 Species of Campanulaceae
5
Phylogenetic Data
  • All kinds of data have been used
  • behavioral, morphological, metabolic, etc.
  • Predominant choice is molecular data.
  • Two main kinds of molecular data
  • Sequence data (DNA sequence on genes)
  • Gene-order data (gene sequence on chromosomes)

6
Sequence Data
  • Typically the DNA sequence of a few genes.
  • Characters are individual positions in the string
    and can assume four states.
  • Evolves through point mutations, insertions
    (incl.duplications), and deletions.

7
DNA Sequence Evolution
8
Phylogeny Problem
U
V
W
X
Y
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
AGGGCAT
X
U
Y
V
W
9
Gene-Order Data
  • The ordered sequence of genes on one or more
    chromosomes.
  • Entire gene-order is a single character, which
    can assume a huge number of states.
  • Evolves through inversions, insertions (incl.
    duplications), and deletions also transpositions
    (in mitochondria) and translocations (between
    chromosomes).

10
Gene-Order Guillardia Chloroplast
  • Diagram

11
(No Transcript)
12
Gene-Order Data Attributes
  • Advantages
  • Low error rate (depends on recognizing
    homologies).
  • No gene tree/ species tree problem.
  • Rare evolutionary events and unlikely to cause
    silent" changes so can go back hundreds of
    millions years.
  • Problems
  • Mathematics much more complex than for sequence
    data.
  • Models of evolution not well characterized.
  • Very limited data (mostly organelles).
  • Possibly insufficient discrimination among
    recently evolved organisms.

13
Phylogenetic Reconstruction
  • Three categories of methods
  • Distance-based methods, such as neighbor-
    joining, UPGMA.
  • Parsimony-based methods (such as implemented in
    PAUP, Phylip, Mega, TNT, etc.)
  • Likelihood-based methods (including Bayesian
    methods, such as implemented in PAUP,
    Phylip,FastDNAML, MrBayes, GAML, etc.)

14
0. Distance Matrix
Neighbor Joining Method Example
Distance matrix
15
1. First Step
PAM distance 3.3 (Human - Monkey) is the minimum.
So we'll join Human and Monkey to MonHum and
we'll calculate the new distances.
Mon-Hum
Monkey
Human
Spinach
Mosquito
Rice
16
2. Calculation of New Distances
After we have joined two species in a subtree we
have to compute the distances from every other
node to the new subtree. We do this with a simple
average of distances DistSpinach, MonHum
(DistSpinach, Monkey DistSpinach, Human)/2
(90.8 86.3)/2 88.55
Mon-Hum
Monkey
Human
Spinach
17
3. Next Cycle
Mos-(Mon-Hum)
Mon-Hum
Human
Mosquito
Monkey
Spinach
Rice
18
4. Penultimate Cycle
Mos-(Mon-Hum)
Spin-Rice
Mon-Hum
Human
Mosquito
Monkey
Spinach
Rice
19
5. Last Joining
(Spin-Rice)-(Mos-(Mon-Hum))
Mos-(Mon-Hum)
Spin-Rice
Mon-Hum
Human
Mosquito
Monkey
Spinach
Rice
20
Unrooted Neighbor Joining tree
Human
Spinach
Monkey
Mosquito
Rice
21
UPGMA
  • UPGMA (Unweighted Pair Group Method with
    Arithmetic mean)
  • Cluster Analysis
  • Start from a set of nodes in a Graph G and a
    matrix D of pairwise distances between nodes.
  • The goal is to construct a tree
  • length of arcs distance
  • leaves original nodes of graph G
  • internal node created as clusters of
    children nodes.

22
UPGMA Example
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6
4 10 16 4 0 6
5 12 24 6 6 0
6
7
8
9
Ht
  • Original 5 x 5 matrix
  • Final 9 x 9
  • Nodes 1,2 ?6
  • Nodes 3,4 ?7
  • Nodes 5,7 ?8
  • Nodes 6,8 ?9 Root
  • Bold active
  • Asterisks not active

23
Combine nodes 1 and 2 to produce node 6
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6 11
4 10 16 4 0 6 13
5 12 24 6 6 0 18
6 11 13 18 0
7
8
9
Ht 0 0 1
  • Minimum 2 (1, 2)
  • Generate node 6
  • Height 2/2 1.
  • Set nodes 1, 2 not active
  • Calculate the distance
  • d3,6 (d3,1 d3,2)/2
  • (814)/2 11
  • Update the distance matrix

24
Combine nodes 3 and 4 to produce node 7
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6 11
4 10 16 4 0 6 13
5 12 24 6 6 0 18 6
6 11 13 18 0 12
7 6 12 0
8
9
Ht 0 0 0 0 1 2
  • Min 4 from (3,4)
  • Generate node 7.
  • The height of node 7 is 4/2 2.
  • Set nodes 3, 4 not active
  • Calculate the distance.
  • D5,7 (d5,3d5,4)/2
  • (66)/2 6
  • Update the distance matrix

25
Combine nodes 5 and 7 to produce node 8
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6 11
4 10 16 4 0 6 13
5 12 24 6 6 0 18 6
6 11 13 18 0 12 14
7 6 12 0
8 14
9
Ht 0 0 0 0 0 1 2 3
  • Min 6 (5, 7)
  • Generate node 8.
  • The height of node 8 is
  • 6/2 3.
  • Set nodes 5, 7 not active
  • Calculate the distance.
  • Update the distance matrix

26
Combine nodes 6 and 8 to produce node 9
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6 11
4 10 16 4 0 6 13
5 12 24 6 6 0 18 6
6 11 13 18 0 12 14
7 6 12 0
8 14
9
Ht 0 0 0 0 0 1 2 3 7
  • Min 14 from (6, 8)
  • Generate node 9.
  • The height of node 9 is
  • 14/2 7.

27
Distance Tree
28
UPGMA Algorithm Summarized
  • 1. Initialization
  • (a). Assign each sequence i to its own cluster
    Ci.
  • (b). Define one leaf of T for each sequence, and
    place at height zero.
  • 2. Iteration
  • (a) Find the minimal distance in the matrix,
    determine the two clusters i, j.
  • (b) Define a new cluster k by CkCi U Cj
  • (c) Define a node k with children nodes i and j,
    and place it at height Dij/2.
  • (d) Add k to the current clusters and remove i
    and j.
  • 3. Termination
  • When only two clusters I, j remain, place the
    root and height Dij/2.

29
Three Stages of MPI Execution
  • When the user gives the command to execute the
    program, a copy of the program is sent to each
    processor.
  • Each processor executes its own copy of the
    executable program.
  • Different processors may execute different
    statements by branching, within the program,
    based on their process rank.
  • User MPI_Barrier() to synchronize

30
Parallel Implementation
  • Process 0 is the master, the rest of the process
    are the workers.
  • Process 0 reads data from external file.
    distributes the data evenly to the workers.
  • Each worker find the local minimal and send it
    back to the master process.
  • Master finds the global min, updates the matrix
    and sends the new line to the proper worker.
  • The procedure will go n-1 times until we get the
    final matrix.

31
Loop 0, P0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5
6
7
8
  • Original matrix 5 x 5
  • Complete matrix 9 x 9
  • 3 processors, process 0 is the master, process 1
    and 2 are the slaves.
  • Asterisk indicates rows and columns are not
    active
  • Bold means active.
  • Only work on the lower triangle of the matrix.

32
Loop 0, P1
  • This is the matrix processor 1 receives from the
    master processor. It gets three rows 0,2,4. The
    local minimal is 6, coming from (4,2)

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6
7
8
33
Loop 0, P2
  • This is the matrix processor 2 receives from the
    master processor. It gets two rows 1and 3. The
    local minimal is 2 from (1,0).

0 1 2 3 4 5 6 7 8
0
1 2
2
3 10 16 4
4
5
6
7
8
34
Loop 1, P0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5 11 13 18
6
7
8
Ht. 0 0 0 0 0 1
  • Collect local mins 2, 6
  • Global min 2 (0, 1)
  • Combine nodes 1 and 0 to get node 5.
  • Update matrix
  • Update the non-active list to 1, 0.
  • Set row 5 to p2

35
Loop 1, p1
  • Data do not change, but the non-active list has
    changed.
  • Row 1, 2 and Column 1, 2 are no longer active.
  • The local minimal is 6, coming from (4,2)

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6
7
8
36
Loop 1, p2
  • P2 receives a new row 5 from the master
    processor.
  • Row 0, 1 and column 0, 1 are no longer active.
    The local minimal is 4 from (3,2).

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6
7
8
37
Loop 2, P0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5 11 13 18
6 6 12
7
8
Ht. 0 0 0 0 0 1 2
  • Collected local mins 4, 6
  • Calculate global min 4 (3,2)
  • Node 3 and 2 merge into node 6.
  • Update the non-active list to 1, 0, 3, 2.
  • Update matrix
  • Send row 6 to p1

38
Loop 2, p1
  • P1 receives a new row no. 6 from the master
    processor.
  • Now the non-active list is 0, 1, 2, 3.
  • The local minimal is 6, coming from (6,4)

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6 6 12
7
8
39
Loop 2, p2
  • The data in P2 do not change.
  • It only needs to work on row 5 (row1, 3 are not
    active).
  • The local minimal is 18 from (5,4).

0 1 2 3 4 5 6 7 8
0
1 2
2
3 10 16 4
4
5 11 13 18
6
7
8
40
Loop 3, P0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5 11 13 18
6 6 12
7 14
8
Ht. 0 0 0 0 0 1 2 3
  • Collected local mins 6,18.
  • Calculate global min.
  • Update the non-active list to 1, 0, 3, 2, 6, 4.
  • Combine nodes 6 and 4 to get node 7.
  • Send the new row no.7 to P2.

41
Loop 3, p1
  • The data in P1 do not change.
  • It should work on 0, 2, 4, 6, but none of them
    are active.
  • Therefore the local minimal is INFINITY.

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6 6 12
7
8
42
Loop 3, p2
  • P2 gets a new row no.7 from the master slave.
  • Non-active list is 0, 1, 2, 3, 4, 6.
  • The local minimal is 14 from (7,5).

0 1 2 3 4 5 6 7 8
0
1 2
2
3 10 16 4
4
5 11 13 18
6
7 14
8
43
Matrix on process 0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5 11 13 18
6 6 12
7 14
8
Ht. 0 0 0 0 0 1 2 3 7
  • Local mins 14, infinity.
  • Global min 14
  • Combine nodes 7 and 5 to get node 8, the root.
  • Update the non-active list to 1, 0, 3, 2, 6, 4,
    7, 5.
  • Write down the height as 14/2 7.0

44
Distance Tree
45
Performance Speedup
  • Only working on the lower triangular part of the
    matrix.
  • How to break up the triangular

46
Performance Speedup
  • Check active elements by triple loops
  • For i 1 to 1000
  • for j 1 to 1000
  • for active_live 1 to 1000
  • check active or not
  • next
  • next
  • next
  • Check active elements by linklist.
  • linkList a store active columns (same on
    different processor)
  • linkList b store active rows (different on
    different Processor)

47
Performance Analysis
800 x 800 matrix
800x800 clock time Total User Time Total System Time speedup
1 9.69 7.11 0.64
2 6.74 7.26 0.86 1.43
4 6.58 7.44 1.26 1.47
8 7.725 8.46 3.38 1.25
16 10.7 9.55 7.47 N/A
24 14.83 12.38 12.04 N/A
48
Performance Analysis
1600 x 1600 matrix
1600x1600 clock time Total User Time Total System Time Speedup
1 62.10 52.12 2.45
2 39.34 52.73 3.53 1.58
4 28.79 55.03 4.96 2.16
8 24.43 57.98 9.98 2.54
16 26.27 64.29 19.07 2.36
24 26.32 68.85 26.95 2.36
49
Performance Analysis
3200 x 3200 matrix
Worker proc clock time Total User Time Total System Time Speedup
1 497.91 464.24 7.72
2 268.96 451.77 9.34 1.85
4 163.46 467.83 13.32 3.04
8 114.52 489.94 26.11 4.35
16 87.73 499.34 47.27 5.67
24 105.22 527.97 70.6 4.73
50
(No Transcript)
51
Clock time Vs. Number of Processors
1 2 2 4 4 8 8-16 16-24
800x800 30 2 Neg Neg Neg
1600x1600 37 27 15 3 Neg
3200x3200 48 39 30 23 Neg
  • The bigger the problem size, the better the
    performance.
  • Communication time.

52
Speedup
  • T(1)
  • Speedup ---------------
  • T(n)
  • Only gain speedup from the part of a program
    that can be parallelized
  • The maximum speedup is limited by the serial
    part of the program.

53
(No Transcript)
54
Conclusion
  • The parallelization strategy appears to be highly
    effective.
  • The performance is closed related with the size
    of the problem.
  • Different problem size has different optimal
    number of processors.

55
Reference
  • The bioinformatics background come from
    http//www.compbio.unm.edu/poincare.pdf
    http//www.cs.utexas.edu/users/tandy
  • http//sansan.phy.ncu.edu.tw/hclee/lec/Lecture3_
    Phylogeny.ppt
  • The parallel implementation of UPGMA is
    abstracted from my master thesis under Dr.
    Buells supervision.
Write a Comment
User Comments (0)
About PowerShow.com