Title: EECS 583 Advanced Compilers Course Introduction
1EECS 583 Advanced CompilersCourse Introduction
- Fall 2007, University of Michigan
- September 5, 2007
2About Me
- Mahlke mall key
- But just call me Scott
- 6 years here at Michigan
- Compiler guy who likes hardware
- Program optimization and building custom hardware
for high performance - Before this HP Labs
- Compiler research for Itanium-like processors
- PICO automatic design of NPAs
- Before before Grad student at UIUC
- Before 3 Undergrad at UIUC
3Class Overview
- This class is NOT about
- Programming languages
- Parsing, syntax checking, semantic analysis
- Handling advanced language features virtual
functions, - Frontend transformations
- Debugging
- Simulation
- Compiler backend
- Mapping applications to processor hardware
- Retargetability work for multiple platforms
(not hard coded) - Work at the assembly-code level
- Processor independent -gt Machine code
- Speed/Efficiency
- How to make the application run fast
- Use less memory (text, data)
4Background You Should Have
- 1. Programming
- Good C programmer (essential)
- Linux, gcc, gdb, emacs
- Compiler system not ported to Windows or Mac
- 2. Computer architecture
- EECS 370 is good, 470 is better but not essential
- Basics caches, pipelining, function units,
registers, virtual memory, branches, branch
prediction, assembly code - 3. Compilers
- Frontend stuff is not very relevant for this
class - Basic backend stuff we will go over fast
- Non-EECS 483 people will have to do some
supplemental reading - 4. Powerpoint
- You will have to make a presentation in this class
5Textbook and Other Classroom Material
- No required text Lecture notes, papers
- Other useful material
- Trimaran webpage http//www.trimaran.org
- UIUC Impact webpage http//www.crhc.uiuc.edu/Impa
ct - Course webpage course newsgroup
- http//www.eecs.umich.edu/mahlke/583f07
- Lecture notes available the night before class
- Newsgroup forum for helping each other, I will
try to check regularly, but I wont be able to
answer everything - http//phorum.eecs.umich.edu
6What the Class Will be Like
- Class meeting time 1030 1230, MW
- 2 hrs is hard to handle
- Well go for an hour, take 10 min break
- Core backend stuff
- Text book material some overlap with 483
- Few homeworks to apply classroom material
- Research papers
- Ill present research material along the way
- Presentations by students You guys are going to
teach
7What the Class Will be Like (2)
- Learning compilers
- No memorizing definitions, terms, formulas,
algorithms, etc - Learn by doing Writing code
- Substantial amount of programming
- Big learning curve for Trimaran compiler
- Reasonable amount of reading
- Classroom
- Attendance You should be here
- Discussion important
- Work out examples, discuss papers, etc
- Each of you will teach some advanced material to
the rest of us - Essential to stay caught up
- Special interest groups smaller meetings
outside of class where certain compiler topics
are focused on
8Course Grading
- Yes, everyone will get a grade
- Distribution of grades, scale, etc - ???
- Most (hopefully all) will get As and Bs
- Slackers will be obvious and will suffer
- Components
- Midterm exam 25
- Project 45
- Homeworks 10
- Paper presentation 10
- Class participation 10
9Homeworks
- Around 2-3 of these
- Small/modest programming assignments
- Design and implement something we discussed in
class - Goals
- Learn the important concepts
- Learn the compiler infrastructure so you can do
the project - Grading
- Good, weak effort but did something, did nothing
(2/1/0) - Working together is fine (and encouraged!)
- Make sure you understand things or it will come
back to bite you - For now, everyone must turn in their own
assignment
10Projects Most Important Part of the Class
- Design and implement an interesting compiler
technique and demonstrate its usefulness - Topic/scope/work
- 1-3 people per project
- You will pick the topics (I have to agree)
- Projects will be planned/organized at the SIG
level - You will have to
- Read background material
- Plan and design
- Implement and debug
- Deliverables
- Working implementation
- Project report 5 page paper describing what
you did/results - 20 min presentation at end (demo if you want)
11Types of Projects
- New idea
- Small research idea
- Design and implement it, see how it works
- Extend existing idea
- Take an existing paper, implement their technique
- Then, extend it to do something interesting
- Generalize strategy, make more efficient/effective
- Implementation
- Take existing idea, create quality implementation
in Trimaran - Generate code for a real architecture
12Class Participation
- Interaction and discussion is essential in a
graduate class - Be here
- Dont just stare at the wall
- Be prepared to discuss the material
- Have something useful to contribute
- Opportunities for participation
- Research paper discussions thoughts, comments,
etc - Saying what you think in the special interest
group meetings - Solving class problems
13Special Interest Groups
- Divide up the class into focus groups
- Each group will meet at times TBD
- Identify research papers, discuss papers and
project ideas - Start SIGs about ½ way through class
- SIG topics from previous semesters
- Analysis and optimization
- Code generation (scheduling, register allocation,
... ) - Managing the memory hierarchy
- Power/energy management
- Reliability
- Multiple threads
14Special Interest Groups (2)
- FAQ
- Do I have to be in a group Yes
- Can I be in more than 1 group No
- Do I get to pick which group I am in Sort of
- What if I get put in a group that I do not want
to be in Tough - Do I have to go to the SIG meetings Yes
- Can I do my project with someone in another SIG
No
15Contact Information
- Office 4633 CSE
- Email mahlke_at_umich.edu
- Office hours
- Mon, Wed briefly after class Wed 4-5pm
- Visiting office hrs
- No GSI for this class
- I dont have the time or energy to debug
everyones code - You will have to be independent in this class
- Read the documentation and look at the code
- Come to me when you are really REALLY stuck or
confused - Helping each other is encouraged
- Use the phorum
16Role of the Compiler My Biased View
- Hardware people have to understand compilers
- No attention to compilers -gt bad processor design
- Frontend material is not what real compiler
people do - Parsing, syntax checking, etc Standard, mature
field - Buy a frontend from EDG
- Backend is where the action is at
- How to make code run fast (approach hand coding)
- How to reduce power/energy
- How to reduce code size
- How to reduce memory stalls
- How to make use of unusual architectural features
- How to design better processors
17Superscalar Processors
- Do everything in hardware
- Sequential code comes in
- Hardware parallelizes the code on the fly
- Traditional computer architecture class
- Emphasis on Pentium class architectures
- Desktop architecture is the only thing that is
important - In this class ...
- Very Long Instruction Word (VLIW) architectures
and multicore VLIWs are the focus - Why? Dumb hardware Smart compiler
- Burden shifted to the compiler to exploit machine
resources
18VLIW/EPIC Architectures
- Our target processor for this class is VLIW/EPIC
- EPIC Explicitly Parallel Instruction Computing
- Think of these as synonyms for this class
- Desktop
- IA-64 aka Itanium I and II, Merced, McKinley
- Embedded processors
- All high-performance DSPs are VLIW
- Why? Cost/power of superscalar, more scalability
- TI-C6x, Philips Trimedia, Starcore, ST-200
- Itanium (aka Itanic) Is it a bad idea?
19VLIW/EPIC Philosphy
- Compiler creates complete plan of run-time
execution - At what time and using what resource
- POE communicated to hardware via the instruction
set - Processor obediently follows POE
- No dynamic scheduling, out of order execution
(these second guess the compilers plan) - Compiler allowed to play the statistics
- Many types of info only available at run-time
(branch directions, locations accessed via
pointers) - Traditionally compilers behave conservatively ?
handle worst case possibility - Allow the compiler to gamble when it believes the
odds are in its favor Feedback directed
optimization - Expose microarchitecture to the compiler
- memory system, branch execution
20Defining Feature I - MultiOp
- Superscalar
- Operations are sequential
- Hardware figures out resource assignment, time of
execution - MultiOp instruction
- Set of independent operations that are to be
issued simultaneously (no sequential notion
within a MultiOp) - 1 instruction issued every cycle provides
notion of time - Resource assignment indicated by position in
MultiOp - POE communicated to hardware via MultiOps
add
sub
load
load
store
mpy
shift
branch
21Defining Feature II - Exposed Latency
- Superscalar
- Sequence of atomic operations
- Sequential order defines semantics
- Unit assumed latency (UAL)
- Each conceptually finishes before the next one
starts - VLIW non-atomic operations
- Register reads/writes for 1 operation separated
in time - Semantics determined by relative ordering of
reads/writes - Assumed latency (NUAL if gt 1 for at least one op)
- Contract between the compiler and hardware
- Instruction issuance provides common notion of
time
22UAL vs NUAL example
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13
Operation r1 load(r2) r1 load(r3) r4
mpy(r1, r5) r4 add(r1, r6) r7 mpy(r4, r9) r7
add(r7, r8)
Phase1 Operation v1 load(r2) v2
load(r3) v3 mpy(r1, r5) v4 add(r1, r6) v5
mpy(r4, r9) v6 add(r7, r8)
Phase2 Operation r1 v1 r1 v2 r4 v4 r4
v3 r7 v6 r7 v5
Time 1 2 3 4 5 6 7 8 9 10 11 12 13
NUAL
traditional
Assume load 4 cycles, add 1, mpy 3
23Other Architectural Features of VLIW/EPIC
- Add features into the architecture to support
VLIW/EPIC philosphy - Create more efficient POEs
- Expose the microarchitecture
- Play the statistics
- Example features
- Register files with explicit register renaming
- Unbundled branches
- Control/data speculation
- Memory hierarchy management
- Predicated execution
24Explicit Register Renaming
- Superscalar
- Small number of architectural registers
- Rename using large pool of physical registers at
run-time - VLIW
- Compiler responsible for all resourceallocation
including registers - Rename at compile time large poolof regs
needed - Static renaming
- Modify operands explicitly
- Dynamic renaming
- Operands not explicitly modified
- Is this feature lost? NO!
Op1
r13
Op2
Op3
r13 ? r67
Op4
25Fancier Renaming With Rotating Registers
iteration n RRB 7
- Overlap loop iterations
- How do you prevent register overwrite in later
iterations? - Compiler-controlled dynamic register renaming
- Rotating registers
- Each iteration writes to r13
- But this gets mapped to a different physical
register - Block of consecutive regs allocated for each reg
in loop corresponding to number of iterations it
is needed
iteration n 1 RRB 6
II
Op1
Op1
r13
Op2
r13
Op2
actual reg (reg RRB) NumRegs At end of each
iteration, RRB--
26Unbundled Branches
- Branch separated into 3 distinct operations
- 1. Prepare to branch compute target address,
prefetch instructions from likely target - Executed well in advance of branch
- 2. Compute branch condition comparison
operation - 3. Branch itself
PBR btr1, TARGET
Branch
CMPP pr0, (xgt100)?
BR btr1, pr0
27Control/Data Speculation
if (a gt b) x u w y x z y
4 . . .
a b . . . y x z y 4
Hoist conditionally executed instructions above
the condition
Hoist loads/uses over potentially aliased stores
x u w y x z y 4 if (a gt b) .
. .
y x z y 4 . . . a b
28Predicated Execution
a b c if (a gt 0) e f g else e f
/ g h i - j
add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1
add e, f, g L2 sub h, i, j
BB1 BB1 BB3 BB3 BB2 BB4
BB1
BB2
BB3
BB4
Traditional branching code
add a, b, c if T p2 a gt 0 if T p3 a lt 0 if
T div e, f, g if p3 add e, f, g if p2 sub h, i, j
if T
BB1 BB1 BB1 BB3 BB2 BB4
BB1 BB2 BB3 BB4
p2 ? BB2 p3 ? BB3
Predicated code
29Scaling VLIW Architectures
Conventional Architecture
- Register file access latency
- Grows linearly with number of registers
- Grows quadratically with number of ports
- Increasing processor width requires increases to
both - Clustered Approach
- Decentralized architecture
- Break design down into multiple chunks aka
clusters - Communication through interconnection network
- Used in Alpha 21264, TI C6x, Analog Tigersharc
and others.
RF
Register File
FU
FU
FU
FU
FU
Clustered Architecture
Register File
Register File
FU
FU
FU
FU
Cluster 1
Cluster 2
30Basics of Multicluster Compilation
- Objectives
- Divide instructions across clusters to maximize
parallelism - Minimize critical intercluster communication
Interconnection Network
Register File
Register File
gtgt
LW
I
MEM
MEM
I
Intercluster move
Cluster 1
Cluster 2
31Multicore VLIWs
To north
To west
Mem
Comm
. . .
FU
FU
FU
Register Files
GPR
FPR
PR
BTR
Instruction Fetch/Decode
L1
L1
Instruction Cache
Data Cache
To/From Banked L2
From Banked L2
Scalar operand network enables multicores to
behave as a multicluster VLIW
32Speculating Larger Chunks of Work with a
Transactional Memory
- Atomic and isolated execution
- Replace locks for critical sections
- No lock granularity problem
- Software Error Recovery
- Allow programmers to abort/rollback transactions
when errors are detected - Convenient interface for exception handling
- Enables thread level speculation
Wrt Buffer
CPU
L1 D
33What if I Dont Care About These Architectures?
- How do we compile for superscalars?
- How do we compile for RISCs?
- All the basic compiler analyses and
transformations are the same for all processor
types - They were developed for RISCs
- Superscalar compilers work by pretending the
processor is a VLIW - But must worry about hardware undoing what the
compiler did - Other resources to worry about (ie reorder
buffer, reserv stations, etc.) - Not all hardware features available