C Design Techniques for High Performance - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

C Design Techniques for High Performance

Description:

C Design Techniques for High Performance. Todd Veldhuizen ... tmp1 = A.clone(); for (int i = 0; i size; i ) tmp1.data_[i] = A.data_[i] B.data_[i] ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 34
Provided by: csU70
Category:

less

Transcript and Presenter's Notes

Title: C Design Techniques for High Performance


1
C Design Techniques for High Performance
  • Todd Veldhuizen
  • Presented by Ganesh Bikshandi

2
C Goals
  • Library development
  • Data Abstraction
  • vector ltintgt, Stack, Queue etc.
  • Object-oriented Programming
  • Virtual functions, inheritance
  • Safety
  • Strict type conformance

3
Templates
  • Parameterized types
  • Unbounded set of 'related' types
  • Listltintgt, Listltdoublegt
  • Class Templates
  • Function Templates

4
Example
template ltclass Tgt class Array class
Arrayltdoublegt int main () Arrayltintgt
a Arrayltdoublegt b
5
Example
template ltclass T, int rankgt class
Array class Arrayltdouble, 2gt int
main () Arrayltint, 3gt a Arrayltdouble, 2gt
b
6
Virtual Functions
struct base virtual void vf1() class
derived public base public void
vf1() void g (base bp) bp-gtvf1() in
t main () derived d base b g (d)
//calls derivedvf1 g (b) //calls basevf1
7
Virtual Function mechanics (single inheritance)
..... vptr .....
derivedvf1
derived
..... vptr .....
basevf1
base
8
Virtual Functions
struct base virtual void vf1() class
derived public base public void
vf1() void g (base bp) bp-gtvf1() ((bp-
gtvptr0))(bp) int main () derived
d base b g (d) //calls derivedvf1 g
(b) //calls basevf1
9
Space Cost
  • Space
  • One vptr per object
  • One vtable per class (usually).
  • Growth factor sizeof(object) 1
  • -----------------------
    -----
  • sizeof(object)

10
Time Cost
  • Direct cost
  • Extra memory references/call
  • Indirect cost
  • Restricts inlining
  • Restricts loop invariant removal
  • Restricts some more optimizations
  • Pipeline Stalls due to branch misprediction

11
Purpose
class Matrix //Abstract class public virtual
double operator()(int i, int j) 0 class
SymmetricMatrix public Matrix //concrete
class double operator()(int i, int j) ...
class UpperTriMatrix public Matrix
//concrete class double operator()(int i, int
j) ... double sum (Matrix A) ...
SymmetricMatrix A sum (A)
12
Problem
  • Sizeof (f) is small freqency(f) is high
  • This is not a rare occurrence

for (int i 0 i lt 100000 i ) for (int j
0 j lt 100000 j) sum a(i, j)
13
Static polymorphism
templateltclass T_leaftypegt class Matrix
public double operator()(int i, int j)
return leaf (i,j) private T_leaftype
leaf class SymmetricMatrix double operator
() (int i, int j) ... class UpperTriMatrix
double operator () (int i, int j) ....
template ltclass T_leaftypegt double sum (Matrix
ltT_leaftypegt a) ... MatrixltSymmetricMatrixgt
A sum (A)
14
Static polymorphism
templateltclass T_leaftypegt class Matrix
public T_leaftype asLeaf() return
static_castltT_leaftypegt(this) double
operator()(int i, int j) return
asLeaf()(i,j) // delegate to
leaf class SymmetricMatrix public
MatrixltSymmetricMatrixgt class
UpperTriMatrix public MatrixltUpperTriMatrixgt
SymmetricMatrix A sum (A)
15
Usage in HTALib
Class HTAltLgt public HTAltL-1gt operator
() (int i, int j) return wrapped_(i, j)
private HTAImplltLgt
wrapped_ Class HTAImplltLgt //HTA class
HTAImpllt0gt //Leaf
16
Operator Overloading
  • Enables clean syntax
  • Array a c d
  • String s a b
  • Known restrictions
  • Only valid C operators
  • Operator can take only one argument

17
Cost of operator overloading
Z A B C ..... tmp1 A.clone() for (int
i 0 i lt size i) tmp1.data_i
A.data_i B.data_i ..... tmp2
tmp1.clone() for (int i 0 i lt size
i) tmp2.data_i tmp1.data_i
C.data_i .... for (int i 0 i lt size
i) Z.data_i tmp2.data_i
18
Cost of operator overloading
Z A B C ..... tmp1 A.clone() for (int
i 0 i lt size i) tmp1.data_i
A.data_i B.data_i ..... tmp2
tmp1.clone() for (int i 0 i lt size
i) tmp2.data_i tmp1.data_i
C.data_i .... for (int i 0 i lt size
i) Z.data_i tmp2.data_i
new overhead
loop overhead
2N/ M More memory traffic
19
Stencil Computations
B AI, J AI-1, J AI1, J AI,
J-1 AI,J1 AI-1, J-1 AI1, J-1
AI-1, J1 AI1, J1 (factor of 16 slow
down)
20
Expression Templates
  • Idea
  • Delay the evaluation of expression
  • Construct an parse tree of expression
  • Evaluate it on assignment (use)

21
Expression Templates
Array A, B, C, D D A B C
D XltXltArray,plus,Arraygt,plus,Arraygt()
22
Expression Templates
struct plus // Represents addition class
Array // some array class templateltclass
Left, class Op, class Rightgt class X
templateltclass Leftgt XltT, plus, Arraygt
operator(Left A, Array B) return XltLeft,
plus, Arraygt()
23
Expression Templates
Array A, B, C, D D A B C
XltArray,plus,Arraygt() C XltXltArray,plus,Arra
ygt,plus,Arraygt()
C
B
A
24
Expression Templates
struct Array .... templateltclass
Left,class Op, class Rightgt void
operator(XltLeft,Op,Rightgt expression)
for (int i0 i lt N_ i)
data_i expressioni double
operator(int i) return data_i
....
25
Expression Templates
templateltclass Left, class Op, class
Rightgt struct X Left leftNode_ Right
rightNode_ X(Left t1, Right
t2) leftNode_(t1), rightNode_(t2)
double operator(int i) return
Opapply(leftNode_i,rightNode_i)
struct plus static double
apply(double a, double b) return ab

26
Expression Templates
for (int i0 i lt D.N_ i) D.data_i
A.data_i B.data_i C.data_i
27
Template Meta programs
  • Compile time programs
  • Sophisticated than MACROs
  • Compile time specialization of algorithms
  • Partial evaluation of programs
  • P (S, V) Ps (V)
  • Turing Complete

e.g . ? fj e(2pikj)
28
Template Meta Programming
  • template ltint Ngt
  • struct fact
  • static const int value N factltN-1gtval
  • struct factlt1gt
  • static const int val 1
  • int main (int argc, char argv)
  • cout ltlt factlt5gtval ltlt endl

29
Template Meta Programs
  • Traditional loop optimizations

template lttypename T, int DIMgt class VectorOps
static inline int dotProduct(const T x, const T
y) return xDIM-1 yDIM-1
VectorOpsltT, DIM-1gtdotProduct (x, y)
VectorOpsltintDIM, DIMgtdotProduct(x, y)
30
Template Meta Programs
  • Avoiding costly if-else switches

template ltint L, int Mgt HTA lt(L gt M) ? L
Mgt operator (HTAltLgt lhs, HTAltMgt rhs)
return add_ltM, (L gt M)gtcompute (lhs, rhs)
template ltint L, bool flaggt struct
add_.... template ltint Lgt struct
add_ltL, falsegt ...
31
Other Patterns
  • Productivity
  • Traits
  • Type promotion
  • Packages
  • PETE
  • Automated generation of ETs

32
Conclusion
  • Large scale libraries have numerous overheads
    from C constructs
  • Virtual functions, operator overloading.
  • Side-effects of templates are accidental
    discoveries, but effective
  • Fills the C - FORTRAN performance gap
  • Drawbacks
  • complex design, code growth.

33
Thanks
  • Todd Veldhuizen
  • (osl.iu.edu/tveldhui)
Write a Comment
User Comments (0)
About PowerShow.com