Title: A Hypergraph-Partitioning Approaches for Workload Decomposition
1A Hypergraph-Partitioning Approaches for Workload
Decomposition
Ümit V. Çatalyürek and Cevdet Aykanat
Department of Biomedical Informatics The Ohio
State University
Department of Computer EngineeringBilkent
University
2What do we mean by Decomposition
- Decomposition/partitioning of computation into
smaller works/work groups - Workload partitioning
- Workload assignment (but not mapping)
- Divide the work and data for efficient parallel
computation
3Outline
- Partitioning-based Decomposition Models / Why
Hypergraphs - Standard Graph Model for SpMxV
- Hypergraph Models for 1D Decomposition for SpMxV
- Fine-Grain Hypergraph Model for 2D Decomposition
- Proposed Two-phase Coarse-grain Decomposition for
both graph and hypergraph models - Application SpMxV
- Coarse-grain (checkerboard) decomposition of
SpMxV - SpMxV Experiment Results
- Conclusion
4What People Have Done Existing Graph Models
- Standard graph model
- Multi-constraint graph partitioning
- Skewed partitioning
- Bipartite graph model
- Are they any good?
- They might be sufficient for some cases but not
for all
5Why they are not sufficient
- Flaws
- Wrong cost metric for communication volume
- Latency messages also important
- Minimize the maximum volume and/or messages
- Processor distance of switches etc
- All of above? Some of above?
- Limitations
- Standard Graph Model can express symmetric
dependencies - Directed graph, convert to undirected graph,
weighting - Symmetricidentical partitioning of input and
output data - Multiple computation phases (a solution
multi-constraint partitioning we developed
multi-constraint hypergraph partitioner)
6Hypergraph Models
- We proposed use of hypergraphs for the workload
partitioning (PhD thesis, Bilkent99) - Coarser-grain owner computes (mapping HiPC95,
LP decomposition Para95, SpMxV Irr96,
TPDS99) - 1D partition in SpMxV
- Fine-grain assign each operation between
inputoutput Irr01,PPSC01 - 2D fine-grain decomposition in SpMxV (yiaij xj)
- Coarse-grain
- 2D checkerboard partitioning in SpMxV
- Advantages
- Correct cost for communication volume
- Naturally handles asymmetry
- Practical to use
- Public tools
- Good tools
- Insures a better upper bound on the number of
messages
7ApplicationParallel Matrix-Vector Multiplication
yAx
- Parallel iterative solvers
- 1D rowwise or columnwise partitioning of A
- symmetric partitioning
- processor Pk computes linear vector operations on
k-th blocks of vectors. - rowwise Pk computes yk Ark x
- entries of the x-vector are communicated
- columnwise Pk computes yk Ack xk, where y ?
yk - entries of the yk vectors are communicated
8Graph Model for Representing Sparse Matrices
- edge (vi, vj) ? E ? yi ? yi aij xj and yj ? yj
ajixi - exchange of xi and xj values before local
matrix-vector products
9Graph Model Minimizes the Wrong Metric
- cost(?) 2 ? 5 10 words, but actual
communication volume is 7 words - P1 sends xi to both P2 and P4
- P2 and P4 send xj, xk, xl and xm, xh ,
respectively, to P1
10Hypergraph Model for Representing Sparse Matrices
for 1D Decomposition
Each vertex, net pair represents unique
nonzero net-cut metric cutsize(?) ?n ? NE
w(ni) connectivity - 1 metric
cutsize(?) ?n ? NE w(ni) (c(nj) - 1)
11Hypergraph models the Correct Metric
P1
P2
P3
P4
nj
P2
P1
h
m
l
k
j
i
Vj
nk
Vk
P1
Vi
i
nl
Vl
j
k
P2
ni
l
nm
Vm
P3
nh
Vh
m
P4
h
P4
P3
- connectivity values
- c (ni) 2, c (nj ) c (nk ) c (nl ) c
(nm ) c (nh ) 1 - connectivity - 1 metric
- cutsize(?) 1 ? 2 5 ? 1 7 words
12Fine-Grain Hypergraph Model for 2D Decomposition
- M x M matrix A with Z nonzeros is represented by
H(V, N) - Z vertices one vertex vij for each aij ? 0
- 2 ?M nets one net for each row and for each
column of A - N NR? NC
- row nets NR m1, m2, , mM
- column nets NC n1, n2, , nM
- vij ? mi and vij ? nj iff aij ? 0
- column-net nj represents dependency of atomic
tasks to xj - row-net mi represents dependency of computing yi
to partial y'i results
13Fine-Grain Hypergraph Model
one vertex for each nonzero
14Fine-Grain Hypergraph Model for 2D Decomposition
- unit net weighting w(n) 1 for each net n ? N
- use connectivity-1 metric cutsize(?) ?n ? NE
(c(nj) - 1) - minimizing cutsize corresponds to minimizing
total volume of communication - consistency of the model
- exact correspondence between cutsize and
communication volume - maintain symmetric partitioning yi, xi assigned
to the same processor - consistency condition
- vii ? ni and vii ? mi for each vertex vii (holds
iff aii ? 0 ) - consider a K-way partition ??V1, V2, , VK
H(V, N) - ? induces a partition on nonzeros of matrix A
- decode vii ? Vk ? assign yi and xi to processor
Pk
15Fine-Grain Hypergraph Model for 2D Decomposition
1
2
3
4
5
6
7
8
1
1
2
1
2
2
2
2
2
2
3
3
1
3
3
4
5
3
3
6
1
1
x2
7
1
2
3
1
8
3
cutsize(?) 8
Communication Volume8
16Two-phase Coarse-grain Decomposition
- Phase 1
- Decompose domain along one dimension to a group
of processors - SpMxV rowwise decomposition
- graph/hypergraph partitioning minimize
communication volume during expand phase of
reduction - Phase 2
- Decompose domain -way along the other
dimension within each groupSpMxV columnwise
decompositionmulticonstraint graph/hypergraph
partitioning minimize communication volume
during gather phase of reduction - maintains computational balance while preserving
coherence among decompositions within different
processor groups.SpMxV checkerboard
decomposition - Applicable to both graph and hypergraph models
17Two-phase in SpMxV Phase 1Rowwise decomposition
thru HP
18Two-phase in SpMxV Phase 1 Rowwise
decomposition thru HP
R
1
P
P
11
12
P
P
21
22
R
2
19Two-phase in SpMxV Phase 2 Columnwise
decomposition thru Multi-constraint HP
R
1
P
P
11
12
P
P
21
22
R
2
20Two-phase in SpMxV Phase 2Columnwise
decomposition thru Multi-constraint HP
P
, W
12
11
11
P
, W
11
12
12
P
, W
12
21
21
P
, W
12
22
22
21Experimental Results Communication Volume
22Experimental Results
23Experimental Results Communication Volume
24Experimental Results Maximum messages
25Experimental Results Partitioning Time
26Experimental Results Summary
27Conclusion
- A suite of models/approaches for workload
partitioning - 1D decomposition Coarse-grain (owner computes)
- 2D decomposition Fine-grain
- Doesnt restrict the place of computation to the
owner of input or output - 2D decomposition Coarse-grain (checkerboard)
- Two-phase with better upper bound on the
number of messages - Two-phase is applicable to both graph and
hypergraph models - Which one to use
- For better balanced workload and/or comm vol min
? Fine-grain - If latency is important ? use proposed two-phase
28End of Talk
29Backup slides
30Graph Partitioning
- Graph G(V, E) set of vertices V and set of
edges E - every edge eij ? E connects pair of distinct
vertices vi and vj - K-way graph partition by edge separator ?V1,
V2, , VK - Vk is nonempty subset of V, i.e., Vk ? V,
- parts are pairwise disjoint, i.e., Vk ? Vl ?,
- union of K parts is equal to V, i.e., ?k1K Vk
V. - an edge eij is said to be
- cut if vi ? Vk and vj ? Vl and k?l
- uncut if vi ? Vk and vj ? Vk
- a partition is said to be balanced if
- Wk ? Wavg (1 ?)
- Wk weight of part Vk, ? maximum
imbalance ratio - cost of a partition
- cutsize(?) ?eij ? EE w(eij)
- where EE is set of cut edges
31Graph Partitioning
- Part Weights
- W116, W216
- Balance equation
- Wk ? Wavg (1 ?)
- this is a balanced partition with ? 0
- cut edges Ee v1 , v6, v4 , v8, vv , v7,
v5 , v7 - cutsize(?) ?e ? EE w(e)
- cutsize(?) 7
P
P
1
2
v
v
v
1
2
1
1
6
3
3
3
1
1
1
2
2
v
v
8
2
2
1
v
3
3
3
9
5
3
v
4
1
1
1
2
3
2
4
v
1
v
2
v
5
10
7
32Hypergraph Partitioning
- Hypergraph H(V,N) a set of vertices V and a set
of nets N - nets (hyperedges) connect two or more vertices
- every net nj ? N is a subset of vertices, i.e.,
nj ? V - graph is a special instance of hypergraph
- K-way hypergraph partition ??V1, V2, , VK
- a net that has at least one pin in a part is said
to connect that part - connectivity set C(nj) of a net nj set of
parts connected by nj - connectivity c(nj) C(nj) of a net nj
number of parts connected by nj. - a net nj is said to be
- cut if c(nj) gt 1
- uncut if c(nj) 1
- two cutsize definitions widely used in VLSI
community - net-cut metric cutsize(?) ?n ? NE w(ni)
- connectivity - 1 metric cutsize(?)
?n ? NE w(ni) (c(nj) - 1)
33Hypergraph Partitioning
- cut nets NE n1, n8, n15
- connectivity sets
- C(n1) V1,V2,
- C(n8) C(n15) V1,V2,V3
- connectivity values
- c (n1 ) 2, c (n8 ) c (n15 ) 3
- cutsize values assuming unit net weights
- net-cut metric cutsize(?) NE 3
- connectivity - 1 metric cutsize(?) 1 2 2
5
34Graph Model for Representing Sparse Matrices
- standard graph model G(V, E) for matrix A
- vertex set one vertex vi for each row/column i
of A - vi ? V ? task i of computing inner product yi
lt ri, xgt - edge set E (vi, vj) ? E ? aij ? 0 and aji ? 0
- each edge denotes bidirectional interaction
between tasks i and j - edge (vi, vj) ? E ? yi ? yi aij xj and yj ? yj
ajixi - exchange of xi and xj values before local
matrix-vector products - ? communication of two words
- edge weighting w (vi, vj) 2