Title: Domain decomposition in parallel computing
1Domain decomposition in parallel computing
- Ashok Srinivasan
- www.cs.fsu.edu/asriniva
- Florida State University
2Outline
- Background
- Geometric partitioning
- Graph partitioning
- Static
- Dynamic
3Background
- Tasks in a parallel computation need access to
certain data - Same datum may be needed by multiple tasks
- Example In matrix-vector multiplication, b2 is
needed for the computation of all ci2, 1 lt i lt n - If a process does not own a datum needed by its
task, then it has to get it from a process that
has it - This communication is expensive
- Aims of domain decomposition
- Distribute the data in such a manner that the
communication required is minimized - Ensure that the computational loads on processes
are balanced
4Domain decomposition example
- Finite difference computation
- New value of a node depends on old values of its
neighbors
- We want to divide the nodes amongst the processes
so that - Communication is minimized
- Measure of partition quality
- Computational load is evenly balanced
5Geometric partitioning
- Partition a set of points
- Uses only coordinate information
- Balances the load
- The heuristic tries to ensure that communication
costs are low - Algorithms are typically fast, but partition not
of high quality - Examples
- Orthogonal recursive bisection
- Inertial
- Space filling curves
6Orthogonal recursive bisection
- Recursively bisect orthogonal to the longest
dimension - Assume communication is proportional to the
surface area of the domain - Recursive bisection
- Divide into two pieces, keeping load balanced
- Apply recursively, until desired number of
partitions obtained
7Inertial
- ORB may not be effective if cuts along the x, y,
or z directions are not good ones - Inertial
- Recursively bisect orthogonal to the inertial axis
8Space filling curves
- Space filling curves
- A continuous curve that fills the space
- Order the points based on their relative position
on the curve - Choose a curve that preserves proximity
- Points that are close in space should be close in
the ordering too - Example
- Hilbert curve
9Hilbert curve
- Sources
- http//www.dcs.napier.ac.uk/andrew/hilbert.html
- http//www.fractalus.com/kerry/tutorials/hilbert/h
ilbert-tutorial.html
10Domain decomposition with a space filling curve
- Order points based on their position on the curve
- Divide into P parts
- P is the number of processes
- Space filling curves can be used in adaptive
computations too - They can be extended to higher dimensions too
11Graph partitioning
- Model as graph partitioning
- Graph G (V, E)
- Each task is represented by a vertex
- A weight can be used to represent the
computational effort - An edge exists between tasks if one needs data
owned by the other - Weights can be associated with edges too
- Goal
- Partition vertices into P parts such that each
partition has equal vertex weights - Minimize the weights of edges cut
- Problem is NP complete
- Edge cut metric
- Judge the quality of the partitioning by the
number of edges cut
12Static graph partitioning
- Combinatorial
- Levelized nested dissection
- Kernighan-Lin/Feduccia-Matheyses
- Spectral partitioning
- Multi-level methods
13Combinatorial partitioning
- Use only connectivity information
- Examples
- Levelized nested dissection
- Kernighan-Lin/Feduccia-Matheyses
14Levelized nested dissection (LND)
- Idea is similar to the geometric methods
- But cannot use coordinate information
- Instead of projecting vertices along the longest
axis, order them based on distance from a vertex
that may be one extreme of the longest dimension
of a graph - Pseudo-peripheral vertex
- Perform a breadth-first search, starting from an
arbitrary vertex - The vertex that is encountered last might be a
good approximation to a peripheral vertex
15LND example Finding a pseudoperipheral vertex
3
2
3
2
1
3
1
2
Initial vertex
1
3
4
Pseudoperipheral vertex
16LND example Partitioning
5
6
3
4
5
2
5
4
2
3
1
Partition
Initial vertex
Recursively bisect the subgraphs
17Kernighan-Lin/Fiduccia-Matheyses
- Refines an existing partition
- Kernighan-Lin
- Consider pairs of vertices from different
partitions - Choose a pair whose swapping will result in the
best improvement in partition quality - The best improvement may actually be a worsening
- Perform several passes
- Choose best partition among those encountered
- Fiduccia-Matheyses
- Similar but more efficient
- Boundary Kernighan-Lin
- Consider only boundary vertices to swap
- ... and many other variants
18Kernighan-Lin example
Swap these
Better partition Edge cut 3
Existing partition Edge cut 4
19Spectral method
- Based on the observation that a Fiedler vector of
a graph contains connectivity information - Laplacian of a graph L
- lii di (degree of vertex i)
- lij -1 if edge i,j exists, otherwise 0
- Smallest eigenvalue of L is 0 with eigenvector
all 1 - All other eigenvalues are positive for a
connected graph - Fiedler vector
- Eigenvector corresponding to the second smallest
eigenvalue
20Fiedler vector
- Consider a partitioning of V into A and B
- Let yi 1 if vi e A, and yi -1 if vi e B
- For load balance, Si yi 0
- Also Seij e E (yi-yj)2 4 x number of edges
across partitions - Also, yTLy Si di yi2 2 Seij e E yiyj
- Seij e E (yi-yj)2
21Optimization problem
- The optimal partition is obtain by solving
- Minimize yTLy
- Constraints
- yi e -1,1
- Si yi 0
- This is NP complete
- Relaxed problem
- Minimize yTLy
- Constraints
- Si yi 0
- Add a constraint on a norm of y, example, y2
n0.5 - Note
- (1, 1, ..., 1)T is an eigenvector with eigenvalue
0 - For a connected graph, all other eigenvalues are
positive and orthogonal to this eigenvector,
which implies Si yi 0 - The objective function is minimized by a Fiedler
vector
22Spectral algorithm
- Find a Fiedler vector of the Laplacian of the
graph - Note that the Fiedler value (the second smallest
eigenvalue) yield a lower bound on the
communication cost, when the load is balanced - From the Fiedler vector, bisect the graph
- Let all vertices with components in the Fiedler
vector greater than the median be in one
component, and the rest in the other - Recursively apply this to each partition
- Note Finding the Fiedler vector of a large graph
can be time consuming
23Multilevel methods
- Idea
- It takes time to partition a large graph
- So partition a small graph instead!
- Three phases
- Graph coarsening
- Combine vertices to create a smaller graph
- Example Find a suitable matching
- Apply this recursively until a suitably small
graph is obtained - Partitioning
- Use spectral or another partitioning algorithm to
partition the small graph - Multilevel refinement
- Uncoarsen the graph to get a partitioning of the
original graph - At each level, perform some graph refinement
24Multilevel example(without refinement)
9
10
5
7
3
11
2
4
8
12
1
6
25Multilevel example(without refinement)
9
10
5
7
3
11
2
4
8
12
1
6
26Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
12
1
6
27Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
12
1
6
28Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
12
1
6
1
2
1
29Dynamic partitioning
- We have an initial partitioning
- Now, the graph changes
- Determine a good partition, fast
- Also minimize the number of vertices that need to
be moved - Examples
- PLUM
- Jostle
30PLUM
- Partition based on the initial mesh
- Vertex and edge weights alone changed
- Map partitions to processors
- Use more partitions than processors
- Ensures finer granularity
- Compute a similarity matrix
- Measures savings on data redistribution cost for
each (processor, partition) pair - Choose assignment of partitions to processors
- Example Maximum weight matching
- Duplicate each processor of partitions/P times
- Alternative Greedy approximation algorithm
- http//citeseer.nj.nec.com/oliker98plum.html
31JOSTLE
- Use Hu and Blakes diffusive scheme for load
balancing - Solve Lx b
- L Laplacian, bi Weight on process Pi
Average weight - Move max(xi-xj, 0) weight between Pi and Pj
- Select vertices to move, based on relative gain
- http//citeseer.nj.nec.com/walshaw97parallel.html