More Code Optimization - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

More Code Optimization

Description:

Title: Introduction to Computer Systems Author: Binyu Zang Last modified by: Yi Li Created Date: 1/15/2000 7:54:11 AM Document presentation format – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 35
Provided by: Biny157
Category:

less

Transcript and Presenter's Notes

Title: More Code Optimization


1
More Code Optimization
2
Outline
  • Memory Performance
  • Tuning Performance
  • Suggested reading
  • 5.12 5.14

3
Load Performance
  • load unit can only initiate one load operation
    every clock cycle (Issue1.0)

typedef struct ELE struct ELE next int
data list_ele, list_ptr int
list_len(list_ptr ls) int len 0 while
(ls) len ls ls-gtnext return
len
len in eax, ls in rdi .L11 addl 1,
eax movq (rdi), rdi testq rdi,
rdi jne .L11
4
Store Performance
  • store unit can only initiate one store operation
    every clock cycle (Issue1.0)

void array_clear(int dest, int n) int
i for (i 0 i lt n i) desti 0
5
Store Performance
  • store unit can only initiate one store operation
    every clock cycle (Issue1.0)

void array_clear_4(int dest, int n) int
i int limit n-3 for (i 0 i lt limit
i4) desti 0 desti1
0 desti2 0 desti3 0 for (
i lt n i) desti 0
6
Store Performance
void write_read(int src, int dest, int
n) int cnt n int val 0 while (cnt--)
dest val val (src)1
Example A write_read(a0,a1,3)
cnt
a
val
Example B write_read(a0,a0,3)
cnt
a
val
7
Load and Store Units
Store Unit
Load Unit
Store buffer
Address
address
data




Matching addresses
Data
Address
Data
Address
Data
Data Cache
8
Graphical Representation
eax
ebx
ecx
edx
s_addr
movl eax,(ecx)
s_data
movl (ebx),eax
load
t
addl 1,eax
add
subl 1,edx
sub
jne loop
jne
eax
ebx
ecx
edx
//inner-loop while (cnt--) dest val
val (src)1
9
Graphical Representation
eax
ebx
ecx
edx
eax
edx
s_addr
s_data
s_data
load
sub
load
sub
add
jg
add
eax
edx
edx
eax
10
Graphical Representation(book)
Example B
Example A
Critical Path
s_data
s_data
load
load
sub
sub
add
add
load
s_data
load
load
sub
sub
add
add
11
Graphical Representation
Example B
Example A
Critical Path
s_data
s_data
load
load
sub
sub
add
add
s_data
s_data
load
load
sub
sub
add
add
12
Getting High Performance
  • High-level design
  • Choose appropriate algorithms and data structures
    for the problem at hand
  • Be especially vigilant to avoid algorithms or
    coding techniques that yield asymptotically poor
    performance

13
Getting High Performance
  • Basic coding principles
  • Avoid optimization blockers so that a compiler
    can generate efficient code.
  • Eliminate excessive function calls
  • Move computations out of loops when possible
  • Consider selective compromises of program
    modularity to gain greater efficiency
  • Eliminate unnecessary memory references.
  • Introduce temporary variables to hold
    intermediate results
  • Store a result in an array or global variable
    only when the final value has been computed.

14
Getting High Performance
  • Low-level optimizations
  • Unroll loops to reduce overhead and to enable
    further optimizations
  • Find ways to increase instruction-level
    parallelism by techniques such as multiple
    accumulators and reassociation
  • Rewrite conditional operations in a functional
    style to enable compilation via conditional data
    transfers
  • Write cache friendly code

15
Performance Tuning
16
Performance Tuning
  • Identify
  • Which is the hottest part of the program
  • Using a very useful method profiling
  • Instrument the program
  • Run it with typical input data
  • Collect information from the result
  • Analysis the result

17
Program Example
  • Task
  • Analyzing the n-gram statistics of a text
    document
  • an n-gram is a sequence of n words occurring in a
    document
  • reads a text file,
  • creates a table of unique n-grams
  • specifying how many times each one occurs
  • sorts the n-grams in descending order of
    occurrence

18
Program Example
  • Steps
  • Convert strings to lowercase
  • Apply hash function
  • Read n-grams and insert into hash table
  • Mostly list operations
  • Maintain counter for each unique n-gram
  • Sort results
  • Data Set
  • Collected works of Shakespeare
  • 965,028 total words, 23,706 unique
  • N2, called bigrams
  • 363,039 unique bigrams

19
Examples Timing
unixgt gcc O1 pg prog.c o prog unixgt ./prog
file.txt unixgt gprof prog cumulative
self self
total time seconds seconds calls
s/call s/call name 97.58 173.05
173.05 1 173.05 173.05
sort_words 2.36 177.24 4.19
965027 0.00 0.00 find_ele_rec 0.12
177.46 0.22 12511031 0.00
0.00 Strlen
20
Principle
  • Interval counting
  • Maintain a counter for each function
  • Record the time spent executing this function
  • Interrupted at regular time (1ms)
  • Check which function is executing when interrupt
    occurs
  • Increment the counter for this function
  • The calling information is quite reliable
  • By default, the timings for library functions are
    not shown

21
Example Calling History
  • index time self children called
    name
  • 158655725 find_ele_rec
    5
  • 4.19 0.02 965027/965027
    insert_string 4
  • 5 2.4 4.19 0.02
    965027158655725 find_ele_rec 5
  • 0.01 0.01
    363039/363039 new_ele 10
  • 0.00 0.01 363039/363039
    save_string 13
  • 158655725 find_ele_rec
    5
  • Ratio 158655725/965027 164.4
  • The average length of a list in one hash bucket
    is 164

22
Code Optimizations
  • First step Use more efficient sorting function
  • Library function qsort

23
Further Optimizations
24
Optimizaitons
  • Iter first Use iterative function to insert
    elements in linked list
  • Causes code to slow down
  • Iter last Iterative function, places new entry
    at end of list
  • Tend to place most common words at front of list
  • Big table Increase number of hash buckets
  • Better hash Use more sophisticated hash function
  • Linear lower Move strlen out of loop

25
Code Motion
  • 1 / Convert string to lowercase slow /
  • 2 void lower1(char s)
  • 3
  • 4 int i
  • 5
  • 6 for (i 0 i lt strlen(s) i)
  • 7 if (si gt A si lt Z)
  • 8 si - (A - a)
  • 9
  • 10

26
Code Motion
  • 11 / Convert string to lowercase faster /
  • 12 void lower2(char s)
  • 13
  • 14 int i
  • 15 int len strlen(s)
  • 16
  • 17 for (i 0 i lt len i)
  • 18 if (si gt A si lt Z)
  • 19 si - (A - a)
  • 20
  • 21

27
Code Motion
  • 22 / Sample implementation of library function
    strlen /
  • 23 / Compute length of string /
  • 24 size_t strlen(const char s)
  • 25
  • 26 int length 0
  • 27 while (s ! \0)
  • 28 s
  • 29 length
  • 30
  • 31 return length
  • 32

28
Code Motion
29
Performance Tuning
  • Benefits
  • Helps identify performance bottlenecks
  • Especially useful when have complex system with
    many components
  • Limitations
  • Only shows performance for data tested
  • E.g., linear lower did not show big gain, since
    words are short
  • Quadratic inefficiency could remain lurking in
    code
  • Timing mechanism fairly crude
  • Only works for programs that run for gt 3 seconds

30
Getting High Performance
  • High-level design
  • Choose appropriate algorithms and data structures
    for the problem at hand
  • Be especially vigilant to avoid algorithms or
    coding techniques that yield asymptotically poor
    performance

31
Getting High Performance
  • Basic coding principles
  • Avoid optimization blockers so that a compiler
    can generate efficient code.
  • Eliminate excessive function calls
  • Move computations out of loops when possible
  • Consider selective compromises of program
    modularity to gain greater efficiency
  • Eliminate unnecessary memory references.
  • Introduce temporary variables to hold
    intermediate results
  • Store a result in an array or global variable
    only when the final value has been computed.

32
Limit Amdahls Law
  • Tnew (1-?)Told (?Told)/k
  • Told(1-?) ?/k
  • S Told / Tnew 1/(1-?) ?/k
  • S? 1/(1-?)

33
Profiling Tools
  • Unix
  • gprof
  • Intels Vtune
  • Valgrind
  • Windows
  • Intels Vtune

34
Next
  • System Level I/O
Write a Comment
User Comments (0)
About PowerShow.com