Title: Software Pipelining By Nagaraju Pothineni 2003CSY0001
1Software PipeliningByNagaraju
Pothineni2003CSY0001
- A term-paper presentation
- On
2Outline
- Definition
- Representation of the program
- Data Dependency Graph (DDG)
- Unconstrained scheduling
- Estimating Initiation Interval
- Modulo Scheduling
- Kernel Recognition
3Definition
- Software Pipelining Compiler loop optimization
technique that reforms the loop to achieve faster
execution rate by overlapping the executions of
iterations.
4 Data Dependency Graph
- Node Operations
- Arc Dependency
- Types of Dependencies
- True Dependencies (RAW)
- Anti Dependencies (WAR)
- Output Dependencies (WAW)
- Control Dependencies
- Dependency Vs Conflict
5Data Dependency Graph
- Program segment
-
- 1. a b c
- 2. f a - d
- 3. b c d
- 4. a b / e
-
b
c
d
e
1
3
b
a
/4
-2
RAW
f
a
WAR
WAW
6Data Dependency Graph (For Loops)
- Two categories of arcs
- Loop Independent Arc
- Loop Carried Arc
- In DFG arcs are Loop Independent Arcs
- Types of Loops
- Doall loop
- Doacross Loop
7Data Dependency Graph (For Loops)
- Representation of arcs in the loops
- An arc a ? b, is annotated with (diff, min) pair
- diff indicates the dependency between am and b m
diff - min indicates that if am is placed at time t then
b m diff can be placed no earlier than tmin
8Scheduling
- Operations from different iterations are
scheduled together - No need to unroll the loop
- Find the Repeating pattern Kernel , the new
loop body - Pipeline Executing iterations in parallel
9Greedy Scheduling
- Assume no resource constraints
- Example1
(2,1)
for (i1 iltn i) O1 ai2 ai
1 O2 bi ai2 / 2 O3 ci bi
3 O4 di ci
ITERATIONS I1 1 1 T I2 2 2 1 1 I
I3 3 3 2 2 1 1 M E I5 4 4 3 3
2 2 1 1 I6 4 4 3 3 2
2 . . .
1
Kernel
(0,1)
2
(0,1)
I4 4 4 3 3 2 2 1 1
3
(0,1)
4
DDG
10Initiation Interval
- New loop body contains all the operations in the
original loop - Delay between initiation of iterations of new
loop Initiation Interval (II) or Length of
Kernel - Span of kernel - number of iterations from the
original loop, in the kernel - Effective Initiation Interval - Average time one
iteration takes to complete - EII (II/Iteration_ct)
-
11Initiation Interval
- Kernel does not start and stop as the original
loop - Prelude (?) Instructions before the new loop
- Postlude (?) Instructions after the new loop
- Lk ? Km ?
- L Original Loop
- K Kernel
- m (k-n1)/Iteration_ct
- n span
12Estimating II
- Resource Constrained Lower bound
- Dependency constrained Lower bound
13Resource Constrained LB
II ?4
14Dependency constrained LB
- Dependencies are transitive
- A path ? with sum of the dif values dif? and sum
of the min values min? is equivalent to an arc
with (dif?,min?)
15Methods of computing II
- Enumerating cycles
- Shortest path algorithm
- Iterative shortest path
- Linear programming
16Enumerating cycles
- Find all the cycles, then maximum of
- min?/dif? gives the IIdep
17Shortest path algorithm
- Find the transitive closure of dependency
constraints of the graph - Uses Floyds all paths shortest path algorithm
18Transitive closure of a Graph
19Iterative shortest path
- Assume an II and find transitive closure
- If it is not correct, then increment II and try
again - Transitive closure is found using path algebra
- M mIJ where mIJ gives the number of time
steps I and J must be separated - An arc (diff, min) means operations must be
separated by at least min-IIdiff
20Iterative shortest path
- M2 gives minimum time steps operations are to be
separated considering paths of length 2 - Similarly calculate Mi, where i is the maximum
path length in the graph - For MIJ take the best from MIJ, M2IJ, M3IJ,
- If all MII are non positive, then II is adequate
21Iterative shortest path (Example)
Incorrect II
Adequate II
22Linear Programming
- For each arc from a ? b, write the equality
- Ma,b ? min II diff
- Objective function minimize II
- Solve using LP
23Modulo Scheduling
- Basic Scheduling Algorithm
- Modulo scheduling via hierarchical reduction
- Path Algebra
- Predicated Modulo scheduling
24Modulo Scheduling
- Generate a Flat Schedule taking into account
resource conflicts and data dependencies. - Identical flat schedules for each iteration
- Regular pipelining
- Each original iteration starts after II time
steps to its previous iteration - Results in operations with same Modulo II ,
scheduled together
25Modulo Scheduling - Example
Flat schedule
Modulo scheduling with II2
26Modulo scheduling via Hierarchical reduction
- DDG is modified.
- Nodes are strongly connected components of
original DDG - Draw an arc between two nodes, if there is an
edge in original DDG between the two set of
nodes. - Each strongly connected component is scheduled
using modulo scheduling - Apply List scheduling to modified DDG
27Modulo scheduling via Hierarchical
reductionExample
- DDG is modified.
- Nodes are strongly connected components of
original DDG - Draw an arc between two nodes, if there is an
edge in original DDG between the two set of
nodes. - Each strongly connected component is scheduled
using modulo scheduling - Apply List scheduling to modified DDG
28Path Algebra
- Mathematical formulation of modulo scheduling
- Construct Matrix M mIJ represents the
relative position of OJ from OI - If the chosen II is feasible, then from the
matrix generate Flat schedule, else Increment II
and try again - Limitation Resource constraints are considered
29Predicated Modulo Scheduling
- Schedules loops containing predicates
- Resources for all operations in all decisions are
available - Hardware support
30Kernel Recognition
- Unroll the loop and note dependencies
- Schedule operations as early as possible
- Find a block which is repeating
31References
- Software pipelining Vicki H. Allan, Reese B.
Jones, Randal M. Lev, Stephens J. Allan, ACM,
Computer surveys, September 1995.
32Thank You