Title: Recursion Unrolling for Divide and Conquer Programs
1Recursion Unrolling for Divide and Conquer
Programs
- Radu Rugina and Martin Rinard
- Presented by Cristian Petrescu-Prahova
2Divide and Conquer
- Idea
- Divide problem in smaller sub problems, solve
each in turn - Use recursion as primary control structure
- Base case computation terminates the recursion
when a small enough size was reached - Combine results to generate solution of the
original problem - Interesting properties
- Lots of inherent parallelism natural recursively
generated concurrency - Good cache performance natural fits cache
hierarchies - In practice
- Potentially too much time spent in divide/combine
phases - Increasing the size of the base case alleviates
the problem - But the simplest and least error-prone coding
style reduces the problem to a minimum size
(typically one) - Solution recursion unrolling
3Example Divide and Conquer Array Increment
void dcInc (int p, int n) if (n 1)
p 1 else dcInc (p, n/2)
dcInc (p n/2, n/2)
Base case Divide
4Inlining Recursive Calls
void dcIncI (int p, int n) if (n 1)
p 1 else if (n/2 1) p
1 else dcIncI (p,
n/2/2) dcIncI (p n/2/2, n/2/2)
if (n/2 1) (p n/2) 1
else dcIncI (p n/2, n/2/2)
dcIncI (p n/2 n/2/2, n/2/2)
Base case Divide
5Conditional Fusion
void dcIncI (int p, int n) if (n 1)
p 1 else if (n/2 1) p
1 else dcIncI (p,
n/2/2) dcIncI (p n/2/2, n/2/2)
if (n/2 1) (p n/2) 1
else dcIncI (p n/2, n/2/2)
dcIncI (p n/2 n/2/2, n/2/2)
void dcIncF (int p, int n) if (n 1)
p 1 else if (n/2 1) p
1 (p n/2) 1 else
dcIncI (p, n/2/2) dcIncI (p
n/2/2, n/2/2) dcIncI (p n/2,
n/2/2) dcIncI (p n/2 n/2/2,
n/2/2)
Base case Divide
6Reroll Second Unrolling Iteration
void dcInc2 (int p, int n) if (n 1)
p 1 else if (n/2 1) p
1 (p n/2) 1 else
if (n/2/2 1) p 1 (p
n/2/2) 1 (p n/2) 1 (p
n/2 n/2/2) 1 else dcIncI
(p, n/2/2/2)
dcIncI (p n/2/2/2, n/2/2/2)
dcIncI (p n/2/2, n/2/2/2)
dcIncI (p n/2/2 n/2/2/2,
n/2/2/2) dcIncI (p n/2,
n/2/2/2) dcIncI (p n/2 n/2/2/2,
n/2/2/2) dcIncI (p n/2 n/2/2,
n/2/2/2) dcIncI (p n/2
n/2/2 n/2/2/2, n/2/2/2)
void dcInc2 (int p, int n) if (n 1)
p 1 else if (n/2 1) p
1 (p n/2) 1 else
if (n/2/2 1) p 1 (p
n/2/2) 1 (p n/2) 1 (p
n/2 n/2/2) 1 else dcIncI
(p, n/2/2/2)
dcIncI (p n/2/2/2, n/2/2/2)
dcIncI (p n/2/2, n/2/2/2)
dcIncI (p n/2/2 n/2/2/2,
n/2/2/2) dcIncI (p n/2,
n/2/2/2) dcIncI (p n/2 n/2/2/2,
n/2/2/2) dcIncI (p n/2 n/2/2,
n/2/2/2) dcIncI (p n/2
n/2/2 n/2/2/2, n/2/2/2)
void dcIncR (int p, int n) if (n 1)
p 1 else if (n/2 1) p
1 (p n/2) 1 else
if (n/2/2 1) p 1 (p
n/2/2) 1 (p n/2) 1 (p
n/2 n/2/2) 1 else dcIncR
(p, n/2) dcIncR (p n/2, n/2)
We need rerolling to ensure that the largest
unrolled base case is always executed.
7Algorithm
Algorithm RecursionUnrolling (Proc f, Int
m) funroll,0 clone (f) for (i 1 i lt m
i) funroll,i RecusionInline (funroll,i-1,
f) funroll,i ConditionalFusion
(funroll) freroll,m RecursionRerolling
(funroll,m, f) return freroll,m
8Implementation details
- Recursion unrolling
- Standard procedure inlining
- Increases the code size exponentially, must be
used with care - Conditional fusion
- Bottom up traversal of HTG conditional match
- Recursion rerolling
- Replaces the unrolled procedure recursion block
with the rolled procedure recursion block if the
unrolled procedure conditional sequence implies
the rolled procedure conditional sequence - Simple transformations !!!
9Experiments
- Programs
- Mul divide and conquer matrix multiplication
- 1 recursive procedure with 8 recursive calls
- Base case size 1 element
- LU divide and conquer LU decomposition
- 4 mutually recursive procedures main procedure
has 8 recursive calls - Base case size 1 element
- Implementation
- C to C transformations in SUIF
- Comparison
- Handcoded divide and conquer from Cilk benchmark
set (designed for thread parallelization)
10Results
11Conclusion
- Recursion unrolling, similar with loop unrolling.
- Basic recursion unrolling reduces the overhead of
procedure call - Extra optimizations
- Conditional fusion simplifies the control flow
- Recursion rerolling ensures the biggest unrolled
base case is always executed - Optimized programs performance is close to that
of handcoded programs