Title: ECE1724F Special Topics in Software Engineering
1ECE1724FSpecial Topics in Software Engineering
- Software Systems for Runtime Program Optimization
- http//www.eecg.toronto.edu/voss/ece1724f-03
- Lecture 1, 2004 September 14
2Contact Information
Michael Voss Office EA310 (for now) Email
voss_at_eecg.toronto.edu Phone (416) 946
8031 Admin ???
Webpage www.eecg.toronto.edu/voss/ece1724f-04
3Your Information
- Name
- Email Address
- Department, Degree Objective, and Year
- Registered/Auditing
4Outline of the Course
- Introduction and Motivation
- A Compiler Primer / Refresher
- Empirical Optimization
- Dynamic Compilation
- Binary Translation
- JITs and VMs
5Prerequisites
- A compiler course would be best
- Feel free to ask questions
- Overview, not details
- Computer architecture
- Reasonable programming skills
- Most tools are C/C
- Well talk about Java some
- A few tools are written in Java
6Format of a Class Period
- The first 2 will be lecture only
- After that, a mix
- Part lecture
- Part paper presentation and discussion
- Why research papers
- No textbook is available
- Its good to read papers
7Ill Post on the Web
- http//www.eecg.toronto.edu/voss/ece1724f-04
- Lecture Notes Student Presentation Slides
- Reading List, a guess is available now
- Assignments, 1st one in a week or two
- Research Project Ideas, soon
- You can also propose your own
- Any announcements
- It is your responsibility to check here
8Grading
- 50 Project
- Proposal 5
- Final Report 40
- Final Presentation 5
- 20 Assignments
- 20 In-Class Presentation
- 10 Participation
- Quizzes only if necessary
- Mostly attendance (read the papers!)
9Projects (50)
- Small Group (2-3) or Individual
- Topics
- Evaluate an existing tool
- Use an existing tool
- Modify an existing tool
- Create a new tool
- Study Applications
- Written proposal, final report presentation at
mini-conference
10Assignments (20)
- 2 Assignments
- Grading (out of 10)
- Meets expectations (works or almost works) -0
- Doesnt work but its close -1
- You did a lot of work but it doesnt work -2
- A feeble attempt or nothing turned in -7
11Presentation (20)
- Format depends on enrollment
- Will be a paper from the reading list
- Probably 45-60 minutes (yes Im serious)
- Mixed with discussion
- You can see possible topics on the web site.
- You will sign up for a paper next week.
12Participation (10)
- For a discussion class to work
- Show up for class
- Read the papers ahead of time
- Participate in discussions
- If people seem unprepared,
- Quizzes on the reading
- Please dont make me grade quizzes!
13Software Systems for Runtime Program Optimization
14What is Optimization?
- Are applications really optimized
- Do the best we can, or want to do
- 90 / 10 rule
- 90 of the benefit for 10 of the work
- The Dragon Book (Aho, Sethi and Ullman)
- preserve correctness
- on average improve performance
- are worth the effort
15What Are We Optimizing For
Something thats important
16Whats Important?
- Look at your applications
- Look at your machines
- Look at the bottlenecks
- Look at costs
17Optimization is Used Loosely
- Converting from one ISA to another
- Transmeta
- DAISY
- FX!32
- Improving performance
- HP Dynamo (as good as really good static)
- because programmers are lazy (practical)
- Because of New Programming Models
- Java runs everywhere, just not very well
anywhere
18Technologies are Converging
19How do you tune for this environment?
- If machine is known
- Compile-time (accurate) prediction is hard
- Why?
- Machine is no longer known
- Compatible but diverse systems
- Condor pools, Grid or Web computing
- ? Empirical optimization
20Best Point in Optimization Space?
21Conservative Assumptions
22Example Loop Tiling
X
for i 1 to N do for k 1 to N do t
Ai,k for j 1 to N do Ci,j
Ci,j t Bk,j
j
k
i
23Example Loop Tiling
X
for kk 1 to N by B do for jj 1 to N by B
do for i 1 to N do for k kk to
min(kkB-1,N) do t Ai,k for j
1 to min(jjB-1,N) do Ci,j
Ci,j t Bk,j
j
k
i
jj
kk
24What Do You Need to Know?
- What did we assume?
- What happens if these assumptions dont hold
- What parameters are machine dependent?
- What parameters are input dependent?
25Answer Runtime Optimization
- More knowledge is available
- Machine information is known
- Input data set information is known
DO I 1,N DO I 1,512 DO J 1,N DO J
1,512 DO K 1,N 70 faster DO
K 1,512 ?
ENDDO ENDDO
ENDDO ENDDO ENDDO ENDDO
26How does this relax criteria?
- Must preserve correctness
- for (i0 iltn i) a a/b ? b0 1/b
-
for (i0 iltn i) a ab0 - Must on average improve performance
- Relax conservative assumptions
- Must be worth the effort
- Is it hard to use?
- Is it hard to implement?
27Everybodys Doing It
- Most commercial compilers (very limited)
- Transmetas Code Morphing
- HP Dynamo / HP-MIT DynamoRIO
- IBM DPCL
- IBM Jalapeno JVM
- Suns HotSpot Performance Engine
- .NET (Microsoft and Mono)
- Research groups
28Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
29Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
30Optimize When Programming
- Pick best algorithm
- Program for the machine
- Multiple instead of divide
- Program for locality
do i 1, n for (i 0 i lt n
i) do j 1,n for (j 0 j
lt n j) a(j,i)
aij enddo enddo
i1,j1
i1,j2
i2,j1
i2,j2
31Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
32Compiler Optimization
- Many well known techniques
- Common subexpression elimination
- Strength reduction
- Instruction scheduling
- Register allocation
- Locality optimizations
- Limited by machine and input knowledge
33Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
34Link-time Optimization
- May have all objects available
- Can optimize across boundaries
- Software is becoming more modular
- Compiler does not see entire application
- Linker may see entire application
- Dynamic Link Libraries (DLLs)?
- Not mainstream!!
35Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
36Load-Time Optimization
- DLLs may now be available
- Can optimize entire application
- Machine information is available
- Can load DLLs during execution
- Now you see impact in runtime
- User sees load time
- Slowing down the critical path
- Not mainstream!!
37Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
38Runtime Optimization
- Entire application is available
- Machine information is available
- Input data set is available
- Adds overhead to critical path
- This is what the class is about
- Of great interest to many people!
39Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
40Feedback-Directed Compilation
- Most commercial compilers support it
- Find hotspots to direct optimization
- Help to direct inlining
- No guarantee that behavior repeats
- Definitely cannot do something unsafe
- Must find representative input
41Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
42Feedback-Directed Modification
- User changes the source
- Find bottlenecks
- change algorithm
- back-off of good practice
- The user can always do more
- knows the programs intent
- Always has been, always will be
43Traditional Optimization Process
High-Level Language
Modify
Intermediate Language
Compile
Machine Code
Link
Load
Execute
44Runtime Program Optimization
Optimization Modification of a programs
behavior that has no perceptible effect on the
output.
Runtime At least part of the optimization
decision making occurs during program execution.
45Runtime Optimization
- Pros
- Input data set knowledge
- Machine parameter knowledge
- More of the program available
- Cons
- Optimization is seen in the critical path
- Can perturb runtime, memory use
- Must amortize these costs
46How much is done at runtime
Power
Overhead
of decisions made at runtime
47What is the overhead?
- Extra instructions
- Changes in cache and memory use
- More complex, less optimizable code
- May still use a traditional compiler
- Bad decisions
48Simplest Approaches
- Static Multiversioning
- Most compilers do this
- If-else or switch statement that selects
- Pros / Cons?
if N gt Threshold DOALL I 1, N else DO
I 1,N
49- Parameterization
- Not uncommon with compilers
- Use a variable to change behavior
- Read or set variable based on environment
- Tile size for Tiling
- Pros / Cons?
50Example Loop Tiling
X
for i 1 to N do for k 1 to N do t
Ai,k for j 1 to N do Ci,j
Ci,j t Bk,j
j
k
i
51Example Loop Tiling
X
for kk 1 to N by B do for jj 1 to N by B
do for i 1 to N do for k kk to
min(kkB-1,N) do t Ai,k for j
1 to min(jjB-1,N) do Ci,j
Ci,j t Bk,j
j
k
i
jj
kk
52Inspector-Executor Models
- Run an inspector loop to see if a technique
should be applied - Then run the executor loop using this decision
- Scheduling for parallel computation is a common
example, runtime ddtest - Research compilers
- Joel Saltz (UMD)
- Lawrence Rauchwerger (TAM), not really IE model
- Pros / Cons?
53Using Performance Feedback
- Run 2 or more versions of a piece of code
- Select the one that shows best performance
- fastest
- least cache misses
-
- Research compilers
- ATLAS (Whaley )
- Dynamic Feedback (Rinard )
- ADAPT (Voss )
- Pros / Cons?
54Big Pros and Big Cons
- Pros
- Dont need to model
- Captures entire system behavior
- Cons
- Are comparisons valid?
- How do you debug?
- Is it necessary?
55Dynamic Compilation
- Generate new code at runtime
- Generally requires two stages
- anaylsis at compile-time
- application at runtime with staged compilers
- Several research groups
- DyC (UW)
- Tempo (INRIA)
- C / Vcode (MIT)
- Pros / Cons?
FUNC_PTR fp gen_sub1(N) fp()
56Dynamic Compilation Pros/Cons
- Pros
- Generated new code, anything possible
- Cons
- Compiler in the critical path!!!!
- How do you access the new code.
- Must do quick optimization
57Binary Translation
- Make changes directly to the executable
- no source code needed
- Used for 2 main purposes
- to improve performance
- to run non-native code
- Many companies interested
- HP Dynamo
- Compaq FX!32
- IBM Daisy
58Binary Translation Pros Cons
- Pros
- no need for source code !!! great !!!
- can work on instruction traces
- Cons
- In the critical path
- Might use interpretation
- Quick optimizations
- Instruction cache is not a data cache
59Java the traditional way
javac
interpret
60Just-in-Time Compilers
Do we need to do this for every .class
file? Every class in a .java file? Every method
in a class? Every instruction in the method?
javac
JIT
execute
61JITs and VMs
- Move from interpreted to native code
- even bad native code is faster
- removes a software layer
- Move from bad native to good native
- Some do complicated things
- overlap compilation with execution
- only optimize HotSpots
- monitor and recompile if things change
- pick weird shapes for compilation units
62Next Week A Quick and Dirty Overview of Compiler
Optimization