Title: C Design Techniques for High Performance
1C Design Techniques for High Performance
- Todd Veldhuizen
- Presented by Ganesh Bikshandi
2C Goals
- Library development
- Data Abstraction
- vector ltintgt, Stack, Queue etc.
- Object-oriented Programming
- Virtual functions, inheritance
- Safety
- Strict type conformance
3Templates
- Parameterized types
- Unbounded set of 'related' types
- Listltintgt, Listltdoublegt
- Class Templates
- Function Templates
4Example
template ltclass Tgt class Array class
Arrayltdoublegt int main () Arrayltintgt
a Arrayltdoublegt b
5Example
template ltclass T, int rankgt class
Array class Arrayltdouble, 2gt int
main () Arrayltint, 3gt a Arrayltdouble, 2gt
b
6Virtual Functions
struct base virtual void vf1() class
derived public base public void
vf1() void g (base bp) bp-gtvf1() in
t main () derived d base b g (d)
//calls derivedvf1 g (b) //calls basevf1
7Virtual Function mechanics (single inheritance)
..... vptr .....
derivedvf1
derived
..... vptr .....
basevf1
base
8Virtual Functions
struct base virtual void vf1() class
derived public base public void
vf1() void g (base bp) bp-gtvf1() ((bp-
gtvptr0))(bp) int main () derived
d base b g (d) //calls derivedvf1 g
(b) //calls basevf1
9Space Cost
- Space
- One vptr per object
- One vtable per class (usually).
- Growth factor sizeof(object) 1
- -----------------------
----- - sizeof(object)
10Time Cost
- Direct cost
- Extra memory references/call
- Indirect cost
- Restricts inlining
- Restricts loop invariant removal
- Restricts some more optimizations
- Pipeline Stalls due to branch misprediction
11Purpose
class Matrix //Abstract class public virtual
double operator()(int i, int j) 0 class
SymmetricMatrix public Matrix //concrete
class double operator()(int i, int j) ...
class UpperTriMatrix public Matrix
//concrete class double operator()(int i, int
j) ... double sum (Matrix A) ...
SymmetricMatrix A sum (A)
12Problem
- Sizeof (f) is small freqency(f) is high
- This is not a rare occurrence
for (int i 0 i lt 100000 i ) for (int j
0 j lt 100000 j) sum a(i, j)
13Static polymorphism
templateltclass T_leaftypegt class Matrix
public double operator()(int i, int j)
return leaf (i,j) private T_leaftype
leaf class SymmetricMatrix double operator
() (int i, int j) ... class UpperTriMatrix
double operator () (int i, int j) ....
template ltclass T_leaftypegt double sum (Matrix
ltT_leaftypegt a) ... MatrixltSymmetricMatrixgt
A sum (A)
14Static polymorphism
templateltclass T_leaftypegt class Matrix
public T_leaftype asLeaf() return
static_castltT_leaftypegt(this) double
operator()(int i, int j) return
asLeaf()(i,j) // delegate to
leaf class SymmetricMatrix public
MatrixltSymmetricMatrixgt class
UpperTriMatrix public MatrixltUpperTriMatrixgt
SymmetricMatrix A sum (A)
15Usage in HTALib
Class HTAltLgt public HTAltL-1gt operator
() (int i, int j) return wrapped_(i, j)
private HTAImplltLgt
wrapped_ Class HTAImplltLgt //HTA class
HTAImpllt0gt //Leaf
16Operator Overloading
- Enables clean syntax
- Array a c d
- String s a b
- Known restrictions
- Only valid C operators
- Operator can take only one argument
17Cost of operator overloading
Z A B C ..... tmp1 A.clone() for (int
i 0 i lt size i) tmp1.data_i
A.data_i B.data_i ..... tmp2
tmp1.clone() for (int i 0 i lt size
i) tmp2.data_i tmp1.data_i
C.data_i .... for (int i 0 i lt size
i) Z.data_i tmp2.data_i
18Cost of operator overloading
Z A B C ..... tmp1 A.clone() for (int
i 0 i lt size i) tmp1.data_i
A.data_i B.data_i ..... tmp2
tmp1.clone() for (int i 0 i lt size
i) tmp2.data_i tmp1.data_i
C.data_i .... for (int i 0 i lt size
i) Z.data_i tmp2.data_i
new overhead
loop overhead
2N/ M More memory traffic
19Stencil Computations
B AI, J AI-1, J AI1, J AI,
J-1 AI,J1 AI-1, J-1 AI1, J-1
AI-1, J1 AI1, J1 (factor of 16 slow
down)
20Expression Templates
- Idea
- Delay the evaluation of expression
- Construct an parse tree of expression
- Evaluate it on assignment (use)
21Expression Templates
Array A, B, C, D D A B C
D XltXltArray,plus,Arraygt,plus,Arraygt()
22Expression Templates
struct plus // Represents addition class
Array // some array class templateltclass
Left, class Op, class Rightgt class X
templateltclass Leftgt XltT, plus, Arraygt
operator(Left A, Array B) return XltLeft,
plus, Arraygt()
23Expression Templates
Array A, B, C, D D A B C
XltArray,plus,Arraygt() C XltXltArray,plus,Arra
ygt,plus,Arraygt()
C
B
A
24Expression Templates
struct Array .... templateltclass
Left,class Op, class Rightgt void
operator(XltLeft,Op,Rightgt expression)
for (int i0 i lt N_ i)
data_i expressioni double
operator(int i) return data_i
....
25Expression Templates
templateltclass Left, class Op, class
Rightgt struct X Left leftNode_ Right
rightNode_ X(Left t1, Right
t2) leftNode_(t1), rightNode_(t2)
double operator(int i) return
Opapply(leftNode_i,rightNode_i)
struct plus static double
apply(double a, double b) return ab
26Expression Templates
for (int i0 i lt D.N_ i) D.data_i
A.data_i B.data_i C.data_i
27Template Meta programs
- Compile time programs
- Sophisticated than MACROs
- Compile time specialization of algorithms
- Partial evaluation of programs
- P (S, V) Ps (V)
- Turing Complete
e.g . ? fj e(2pikj)
28Template Meta Programming
- template ltint Ngt
- struct fact
- static const int value N factltN-1gtval
-
- struct factlt1gt
- static const int val 1
-
- int main (int argc, char argv)
- cout ltlt factlt5gtval ltlt endl
-
29Template Meta Programs
- Traditional loop optimizations
template lttypename T, int DIMgt class VectorOps
static inline int dotProduct(const T x, const T
y) return xDIM-1 yDIM-1
VectorOpsltT, DIM-1gtdotProduct (x, y)
VectorOpsltintDIM, DIMgtdotProduct(x, y)
30Template Meta Programs
- Avoiding costly if-else switches
template ltint L, int Mgt HTA lt(L gt M) ? L
Mgt operator (HTAltLgt lhs, HTAltMgt rhs)
return add_ltM, (L gt M)gtcompute (lhs, rhs)
template ltint L, bool flaggt struct
add_.... template ltint Lgt struct
add_ltL, falsegt ...
31Other Patterns
- Productivity
- Traits
- Type promotion
- Packages
- PETE
- Automated generation of ETs
32Conclusion
- Large scale libraries have numerous overheads
from C constructs - Virtual functions, operator overloading.
- Side-effects of templates are accidental
discoveries, but effective - Fills the C - FORTRAN performance gap
- Drawbacks
- complex design, code growth.
33Thanks
- Todd Veldhuizen
- (osl.iu.edu/tveldhui)