Title: Software Pipelining in PegasusCASH
1Software Pipelining in Pegasus/CASH
- Cody Hartwig
- Elie Krevat
- chartwig,ekrevat_at_cs.cmu.edu
2Software Pipelining
- Software pipelining is a method for increasing
the available parallelism for instruction
scheduling - Data dependencies limit the opportunity for
parallel execution - Software pipelining can overlap loop iterations
to increase available operations to schedule
between dependencies - Many techniques exist classification by Allan et
al. - Kernel recognition (e.g., Aiken Nicolau)
- Assumes schedule for iterations are fixed, loop
is unrolled n times - Pattern recognition identifies a repeating kernel
- Modulo scheduling
- Analysis of data dependencies (resource/precedence
constraints) - Finds minimum initiation interval to use when
scheduling
3Software Pipelining in Pegasus/CASH
- Pegasus is an intermediate representation used by
the CASH compiler - Pegasus graph models control-flow and data-flow
- Our Approach Apply optimizations to the Pegasus
graph, not the generated assembly - Abstracts away resource constraints
- Feedback loop possible after scheduler and
register allocation (e.g., to implement less
aggressive pipelining because of register
spilling)
4How Operations are Pipelined
- Our approach computes operation outputs for
future loop iterations in the current iteration - Operations are copied into pre-header and the
data-flow for values before and after executing
that operation are fed into the loop hyperblock - Then each loop iteration uses the value of the
operation already computed, and computes the
operation value for the next iteration - This approach is analogous to preparing temporary
variables of future iterations to make the loop
body schedule more efficient
5Choosing Operations to Pipeline via Pattern
Matching
- An operation may be pipelined if it matches a
number of possible patterns - Patterns depend only on the type of operation and
the source of its inputs - Operation type must allow speculative execution
(e.g., loads are ok, but not stores) - Operations on the most expensive paths to etas
are the first ones moved - The most expensive path is not necessarily the
longest (e.g., a single load operation is more
expensive than two add operations)
6Recognized Patterns
Arithmetic Operation
Load Operation
Cast Operation
As operations are moved, new operations will form
the recognized patterns
7Example
- int i 0
- char a100
- while(i lt 100)
- char tmp ai
- tmp tmp 2
- ai tmp
- i
The load and store are forced to execute in series
Operations in red are available to move
8Step 1 Step 2
Load and store are no longer dependent!
9Evaluation Moving Average
void move_avg(int a) int i 1 while (i lt
l00) int t1 ai int t2 ai-1
ai (t1t2)/2 i
Cost of entire function Cost(Pre-header)
100Cost(Loop Body) Cost before Software
Pipelining 2208 Cost after Software Pipelining
1814 Software Pipelining improves performance
here by 18
10Moving Average Before Software Pipelining
11Moving Average After Software
Pipelining Pipelined graphs are considerably more
complex
12Conclusion
- Software pipelining at the Pegasus level can
achieve significant loop improvement - Most regular operation types are pipelinable via
our iterative pattern matching algorithm - Cost of improvement is increased register
pressure more complicated Pegasus graphs