A Parallel Implementation of UPGMA - PowerPoint PPT Presentation

About This Presentation

Title:

A Parallel Implementation of UPGMA

Description:

From the Tree of the Life Website, University of Arizona. Orangutan. Gorilla. Chimpanzee. Human. 12 Species of Campanulaceae. Phylogenetic Data ... – PowerPoint PPT presentation

Number of Views:190

Avg rating:3.0/5.0

Slides: 56

Provided by: yuef

Learn more at: https://www.cse.sc.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Parallel Implementation of UPGMA

1
A Parallel Implementation of UPGMA

Feng Yue
Department of Computer Science
University of South Carolina
yue2_at_cse.sc.edu

2
Phylogeny

A phylogeny is a reconstruction of the
evolutionary history of a collection of
organisms.
It usually takes the form of a tree.
Modern organisms are placed at the leaves.
Edges denote evolutionary relationships.

3
Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
4
12 Species of Campanulaceae
5
Phylogenetic Data

All kinds of data have been used
behavioral, morphological, metabolic, etc.
Predominant choice is molecular data.
Two main kinds of molecular data
Sequence data (DNA sequence on genes)
Gene-order data (gene sequence on chromosomes)

6
Sequence Data

Typically the DNA sequence of a few genes.
Characters are individual positions in the string
and can assume four states.
Evolves through point mutations, insertions
(incl.duplications), and deletions.

7
DNA Sequence Evolution
8
Phylogeny Problem
U
V
W
X
Y
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
AGGGCAT
X
U
Y
V
W
9
Gene-Order Data

The ordered sequence of genes on one or more
chromosomes.
Entire gene-order is a single character, which
can assume a huge number of states.
Evolves through inversions, insertions (incl.
duplications), and deletions also transpositions
(in mitochondria) and translocations (between
chromosomes).

10
Gene-Order Guillardia Chloroplast

Diagram

11
(No Transcript)
12
Gene-Order Data Attributes

Advantages
Low error rate (depends on recognizing
homologies).
No gene tree/ species tree problem.
Rare evolutionary events and unlikely to cause
silent" changes so can go back hundreds of
millions years.
Problems
Mathematics much more complex than for sequence
data.
Models of evolution not well characterized.
Very limited data (mostly organelles).
Possibly insufficient discrimination among
recently evolved organisms.

13
Phylogenetic Reconstruction

Three categories of methods
Distance-based methods, such as neighbor-
joining, UPGMA.
Parsimony-based methods (such as implemented in
PAUP, Phylip, Mega, TNT, etc.)
Likelihood-based methods (including Bayesian
methods, such as implemented in PAUP,
Phylip,FastDNAML, MrBayes, GAML, etc.)

14
0. Distance Matrix
Neighbor Joining Method Example
Distance matrix
15
1. First Step
PAM distance 3.3 (Human - Monkey) is the minimum.
So we'll join Human and Monkey to MonHum and
we'll calculate the new distances.
Mon-Hum
Monkey
Human
Spinach
Mosquito
Rice
16
2. Calculation of New Distances
After we have joined two species in a subtree we
have to compute the distances from every other
node to the new subtree. We do this with a simple
average of distances DistSpinach, MonHum
(DistSpinach, Monkey DistSpinach, Human)/2
(90.8 86.3)/2 88.55
Mon-Hum
Monkey
Human
Spinach
17
3. Next Cycle
Mos-(Mon-Hum)
Mon-Hum
Human
Mosquito
Monkey
Spinach
Rice
18
4. Penultimate Cycle
Mos-(Mon-Hum)
Spin-Rice
Mon-Hum
Human
Mosquito
Monkey
Spinach
Rice
19
5. Last Joining
(Spin-Rice)-(Mos-(Mon-Hum))
Mos-(Mon-Hum)
Spin-Rice
Mon-Hum
Human
Mosquito
Monkey
Spinach
Rice
20
Unrooted Neighbor Joining tree
Human
Spinach
Monkey
Mosquito
Rice
21
UPGMA

UPGMA (Unweighted Pair Group Method with
Arithmetic mean)
Cluster Analysis
Start from a set of nodes in a Graph G and a
matrix D of pairwise distances between nodes.
The goal is to construct a tree
length of arcs distance
leaves original nodes of graph G
internal node created as clusters of
children nodes.

22
UPGMA Example
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6
4 10 16 4 0 6
5 12 24 6 6 0
6
7
8
9
Ht

Original 5 x 5 matrix
Final 9 x 9
Nodes 1,2 ?6
Nodes 3,4 ?7
Nodes 5,7 ?8
Nodes 6,8 ?9 Root
Bold active
Asterisks not active

23
Combine nodes 1 and 2 to produce node 6
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6 11
4 10 16 4 0 6 13
5 12 24 6 6 0 18
6 11 13 18 0
7
8
9
Ht 0 0 1

Minimum 2 (1, 2)
Generate node 6
Height 2/2 1.
Set nodes 1, 2 not active
Calculate the distance
d3,6 (d3,1 d3,2)/2
(814)/2 11
Update the distance matrix

24
Combine nodes 3 and 4 to produce node 7
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6 11
4 10 16 4 0 6 13
5 12 24 6 6 0 18 6
6 11 13 18 0 12
7 6 12 0
8
9
Ht 0 0 0 0 1 2

Min 4 from (3,4)
Generate node 7.
The height of node 7 is 4/2 2.
Set nodes 3, 4 not active
Calculate the distance.
D5,7 (d5,3d5,4)/2
(66)/2 6
Update the distance matrix

25
Combine nodes 5 and 7 to produce node 8
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6 11
4 10 16 4 0 6 13
5 12 24 6 6 0 18 6
6 11 13 18 0 12 14
7 6 12 0
8 14
9
Ht 0 0 0 0 0 1 2 3

Min 6 (5, 7)
Generate node 8.
The height of node 8 is
6/2 3.
Set nodes 5, 7 not active
Calculate the distance.
Update the distance matrix

26
Combine nodes 6 and 8 to produce node 9
1 2 3 4 5 6 7 8 9
1 0 2 8 10 12
2 2 0 14 16 24
3 8 14 0 4 6 11
4 10 16 4 0 6 13
5 12 24 6 6 0 18 6
6 11 13 18 0 12 14
7 6 12 0
8 14
9
Ht 0 0 0 0 0 1 2 3 7

Min 14 from (6, 8)
Generate node 9.
The height of node 9 is
14/2 7.

27
Distance Tree
28
UPGMA Algorithm Summarized

1. Initialization
(a). Assign each sequence i to its own cluster
Ci.
(b). Define one leaf of T for each sequence, and
place at height zero.
2. Iteration
(a) Find the minimal distance in the matrix,
determine the two clusters i, j.
(b) Define a new cluster k by CkCi U Cj
(c) Define a node k with children nodes i and j,
and place it at height Dij/2.
(d) Add k to the current clusters and remove i
and j.
3. Termination
When only two clusters I, j remain, place the
root and height Dij/2.

29
Three Stages of MPI Execution

When the user gives the command to execute the
program, a copy of the program is sent to each
processor.
Each processor executes its own copy of the
executable program.
Different processors may execute different
statements by branching, within the program,
based on their process rank.
User MPI_Barrier() to synchronize

30
Parallel Implementation

Process 0 is the master, the rest of the process
are the workers.
Process 0 reads data from external file.
distributes the data evenly to the workers.
Each worker find the local minimal and send it
back to the master process.
Master finds the global min, updates the matrix
and sends the new line to the proper worker.
The procedure will go n-1 times until we get the
final matrix.

31
Loop 0, P0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5
6
7
8

Original matrix 5 x 5
Complete matrix 9 x 9
3 processors, process 0 is the master, process 1
and 2 are the slaves.
Asterisk indicates rows and columns are not
active
Bold means active.
Only work on the lower triangle of the matrix.

32
Loop 0, P1

This is the matrix processor 1 receives from the
master processor. It gets three rows 0,2,4. The
local minimal is 6, coming from (4,2)

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6
7
8
33
Loop 0, P2

This is the matrix processor 2 receives from the
master processor. It gets two rows 1and 3. The
local minimal is 2 from (1,0).

0 1 2 3 4 5 6 7 8
0
1 2
2
3 10 16 4
4
5
6
7
8
34
Loop 1, P0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5 11 13 18
6
7
8
Ht. 0 0 0 0 0 1

Collect local mins 2, 6
Global min 2 (0, 1)
Combine nodes 1 and 0 to get node 5.
Update matrix
Update the non-active list to 1, 0.
Set row 5 to p2

35
Loop 1, p1

Data do not change, but the non-active list has
changed.
Row 1, 2 and Column 1, 2 are no longer active.
The local minimal is 6, coming from (4,2)

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6
7
8
36
Loop 1, p2

P2 receives a new row 5 from the master
processor.
Row 0, 1 and column 0, 1 are no longer active.
The local minimal is 4 from (3,2).

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6
7
8
37
Loop 2, P0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5 11 13 18
6 6 12
7
8
Ht. 0 0 0 0 0 1 2

Collected local mins 4, 6
Calculate global min 4 (3,2)
Node 3 and 2 merge into node 6.
Update the non-active list to 1, 0, 3, 2.
Update matrix
Send row 6 to p1

38
Loop 2, p1

P1 receives a new row no. 6 from the master
processor.
Now the non-active list is 0, 1, 2, 3.
The local minimal is 6, coming from (6,4)

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6 6 12
7
8
39
Loop 2, p2

The data in P2 do not change.
It only needs to work on row 5 (row1, 3 are not
active).
The local minimal is 18 from (5,4).

0 1 2 3 4 5 6 7 8
0
1 2
2
3 10 16 4
4
5 11 13 18
6
7
8
40
Loop 3, P0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5 11 13 18
6 6 12
7 14
8
Ht. 0 0 0 0 0 1 2 3

Collected local mins 6,18.
Calculate global min.
Update the non-active list to 1, 0, 3, 2, 6, 4.
Combine nodes 6 and 4 to get node 7.
Send the new row no.7 to P2.

41
Loop 3, p1

The data in P1 do not change.
It should work on 0, 2, 4, 6, but none of them
are active.
Therefore the local minimal is INFINITY.

0 1 2 3 4 5 6 7 8
0
1
2 8 14
3
4 12 24 6 6
5
6 6 12
7
8
42
Loop 3, p2

P2 gets a new row no.7 from the master slave.
Non-active list is 0, 1, 2, 3, 4, 6.
The local minimal is 14 from (7,5).

0 1 2 3 4 5 6 7 8
0
1 2
2
3 10 16 4
4
5 11 13 18
6
7 14
8
43
Matrix on process 0
0 1 2 3 4 5 6 7 8
0
1 2
2 8 14
3 10 16 4
4 12 24 6 6
5 11 13 18
6 6 12
7 14
8
Ht. 0 0 0 0 0 1 2 3 7

Local mins 14, infinity.
Global min 14
Combine nodes 7 and 5 to get node 8, the root.
Update the non-active list to 1, 0, 3, 2, 6, 4,
7, 5.
Write down the height as 14/2 7.0

44
Distance Tree
45
Performance Speedup

Only working on the lower triangular part of the
matrix.
How to break up the triangular

46
Performance Speedup

Check active elements by triple loops
For i 1 to 1000
for j 1 to 1000
for active_live 1 to 1000
check active or not
next
next
next
Check active elements by linklist.
linkList a store active columns (same on
different processor)
linkList b store active rows (different on
different Processor)

47
Performance Analysis
800 x 800 matrix
800x800 clock time Total User Time Total System Time speedup
1 9.69 7.11 0.64
2 6.74 7.26 0.86 1.43
4 6.58 7.44 1.26 1.47
8 7.725 8.46 3.38 1.25
16 10.7 9.55 7.47 N/A
24 14.83 12.38 12.04 N/A
48
Performance Analysis
1600 x 1600 matrix
1600x1600 clock time Total User Time Total System Time Speedup
1 62.10 52.12 2.45
2 39.34 52.73 3.53 1.58
4 28.79 55.03 4.96 2.16
8 24.43 57.98 9.98 2.54
16 26.27 64.29 19.07 2.36
24 26.32 68.85 26.95 2.36
49
Performance Analysis
3200 x 3200 matrix
Worker proc clock time Total User Time Total System Time Speedup
1 497.91 464.24 7.72
2 268.96 451.77 9.34 1.85
4 163.46 467.83 13.32 3.04
8 114.52 489.94 26.11 4.35
16 87.73 499.34 47.27 5.67
24 105.22 527.97 70.6 4.73
50
(No Transcript)
51
Clock time Vs. Number of Processors
1 2 2 4 4 8 8-16 16-24
800x800 30 2 Neg Neg Neg
1600x1600 37 27 15 3 Neg
3200x3200 48 39 30 23 Neg

The bigger the problem size, the better the
performance.
Communication time.

52
Speedup

T(1)
Speedup ---------------
T(n)
Only gain speedup from the part of a program
that can be parallelized
The maximum speedup is limited by the serial
part of the program.

53
(No Transcript)
54
Conclusion

The parallelization strategy appears to be highly
effective.
The performance is closed related with the size
of the problem.
Different problem size has different optimal
number of processors.

55
Reference

The bioinformatics background come from
http//www.compbio.unm.edu/poincare.pdf
http//www.cs.utexas.edu/users/tandy
http//sansan.phy.ncu.edu.tw/hclee/lec/Lecture3_
Phylogeny.ppt
The parallel implementation of UPGMA is
abstracted from my master thesis under Dr.
Buells supervision.

Write a Comment

User Comments (0)