Title: Dynamic Load Balancing in Scientific Simulation
1Dynamic Load Balancing in Scientific Simulation
2Static Load Balancing No Data Dependency
- Distribute the load evenly across processing
unit. - Is this good enough? It depends!
- No data dependency!
- Load distribution remain unchanged!
3Static Load Balancing Data Dependency
- Distribute the load evenly across processing
unit. - Minimize inter-processing-unit communication!
- By collocating the most communicating data into a
single PU.
4Load Balancing in Scientific Simulation
PUs need to communicate with each other to carry
out the computation.
- Distribute the load evenly across processing
unit. - Minimize inter-processing-unit communication!
- By collocating the most communicating data into a
single PU. - Minimize data migration among processing units.
Dynamic Load Balancing
5Dynamic Load Balancing (Hyper)graph Partitioning
- Given a (Hyper)graph G(V, E).
-
- (Hyper)graph Partitioning
- Partition V into k partitions P0, P1, Pk, such
that all parts - Disjoint P0 U P1 U Pk V and Pi n Pj Ø
where i ? j. - Balanced Pi (V / k) (1 ?)
- Edge-cut is minimized edges crossing different
parts.
6Dynamic Load Balancing (Hyper)graph
Repartitioning
- Given a partitioned (Hyper)graph G(V, E).
- (Hyper)graph Repartitioning
- Repartition V into k partitions P0, P1, Pk,
such that all parts - Disjoint.
- Balanced.
- Minimal Edge-cut.
- Minimal Migration.
Initial Partitioning
7Dynamic Load Balancing (Hypergraph)
Repartition-Based
- Building the (Hyper)graph
- Vertices represent data.
- Vertex object size reflects the amount of the
data per vertex. - Vertex weight accounts for computation per
vertex. - Edges reflects data dependencies.
- Edge weight represents the communication among
vertices.
Reduce the Dynamic Load Balancing to a
(Hyper)graph Repartitioning Problem.
8(Hypergraph) Repartition-Based Dynamic Load
Balancing Cost Model
9(Hypergraph) Repartition-Based Dynamic Load
Balancing Network Topology
Â
10(Hypergraph) Repartition-Based Dynamic Load
Balancing Cache-Hierarchy
Â
11Hierarchical Topology-Aware (Hyper)graph
Repartition-Based Dynamic Load Balancing
- Inter-Node Repartitioning
- Goal Group the most communicating data into
compute nodes closed to each other. - Solution
- Regrouping.
- Repartitioning.
- Refinement.
- Intra-Node Repartitioning
- Goal Group the most communicating data into
cores sharing more level or caches. - Solution1 Hierarchical repartitioning.
- Solution2 Flat repartitioning.
12Hierarchical Topology-Aware (Hyper)graph
Repartition-Based Dynamic Load Balancing
- Inter-Node Repartitioning
- Regrouping.
- Repartitioning.
- Refinement.
13Hierarchical Topology-Aware (Hyper)graph
Repartition-Based Dynamic Load Balancing
- Inter-Node (Hyper)graph Repartitioning
- Regrouping.
- Repartitioning.
- Refinement.
Migration Cost 2 (inter-node) 2
(intra-node) Communication Cost 3 (inter-node)
14Topology-Aware Inter-Node (Hyper)graph
Repartitioning
- Inter-Node (Hyper)graph Repartitioning
- Regrouping.
- Repartitioning.
- Refinement.
Migration Cost 2 (intra-node) Communication
Cost 3 (inter-node)
Â
15Hierarchical Topology-Aware Intra-Node
(Hyper)graph Repartitioning
- Main Idea Repartition the subgraph assigned to
each node hierarchically according to the cache
hierarchy.
16Flat Topology-Aware Intra-Node (Hyper)graph
Repartition
17Flat Topology-Aware Intra-Node (Hyper)graph
Repartition
P1 P2 P3
Core0 Core1 Core2
Old Partition Assignment
Old Partition
18Flat Topology-Aware Intra-Node (Hyper)graph
Repartition
Old Partition
New Partition
19Flat Topology-Aware Intra-Node (Hyper)graph
Repartition
P1 P2 P3
Core0 Core1 Core2
Old Partition Assignment
Core0 Core1 Core2 Core3
P1 0 4 4 4
P2 2 2 4 4
P3 4 4 0 4
P4 4 4 0 4
Partition Migration Matrix
P1 P2 P3 P4
P1 0 1 0 0
P2 1 0 3 0
P3 0 3 0 0
P4 0 0 0 0
New Partition
Partition Communication Matrix
20Flat Topology-Aware Intra-Node (Hyper)graph
Repartition
Core1 Core2 Core3 Core4
P1 0 4 4 4
P2 2 2 4 4
P3 4 4 0 4
P4 4 4 0 4
Partition Migration Matrix
P1 P2 P3 P4
P1 0 1 0 0
P2 1 0 3 0
P3 0 3 0 0
P4 0 0 0 0
Partition Communication Matrix
New Partition
Â
P1 P2 P3 P4
Core0 Core1 Core2 Core3
21Major References
- 1 K. Schloegel, G. Karypis, and V. Kumar, Graph
partitioning for high performance scientific
simulations. Army High Performance Computing
Research Center, 2000. - 2 B. Hendrickson and T. G. Kolda, Graph
partitioning models for parallel computing,"
Parallel computing, vol. 26, no. 12, pp.
15191534, 2000. - 3 K. D. Devine, E. G. Boman, R. T. Heaphy, R.
H.Bisseling, and U. V. Catalyurek, Parallel
hypergraph partitioning for scientific
computing," in Parallel and Distributed
Processing Symposium, 2006. IPDPS2006. 20th
International, pp. 10-pp, IEEE, 2006. - 4 U. V. Catalyurek, E. G. Boman, K. D.
Devine,D. Bozdag, R. T. Heaphy, and L. A.
Riesen, A repartitioning hypergraph model for
dynamic load balancing," Journal of Parallel and
Distributed Computing, vol. 69, no. 8, pp.
711724, 2009. - 5 E. Jeannot, E. Meneses, G. Mercier, F.
Tessier,G. Zheng, et al., Communication and
topology-aware load balancing in charm with
treematch," in IEEE Cluster 2013. - 6 L. L. Pilla, C. P. Ribeiro, D. Cordeiro, A.
Bhatele,P. O. Navaux, J.-F. Mehaut, L. V. Kale,
et al., Improving parallel system performance
with a numa-aware load balancer," INRIA-Illinois
Joint Laboratory on Petascale Computing, Urbana,
IL, Tech. Rep. TR-JLPC-11-02, vol. 20011, 2011.
22Thanks!