EECS 583 Advanced Compilers Course Introduction - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

EECS 583 Advanced Compilers Course Introduction

Description:

Linux, gcc, gdb, emacs. Compiler system not ported to Windows or Mac. 2. ... 1-3 people per project. You will pick the topics ... the documentation and look ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 34

Provided by: scottm3

Category:

more less

Transcript and Presenter's Notes

Title: EECS 583 Advanced Compilers Course Introduction

1
EECS 583 Advanced CompilersCourse Introduction

Fall 2007, University of Michigan
September 5, 2007

2
About Me

Mahlke mall key
But just call me Scott
6 years here at Michigan
Compiler guy who likes hardware
Program optimization and building custom hardware
for high performance
Before this HP Labs
Compiler research for Itanium-like processors
PICO automatic design of NPAs
Before before Grad student at UIUC
Before 3 Undergrad at UIUC

3
Class Overview

This class is NOT about
Programming languages
Parsing, syntax checking, semantic analysis
Handling advanced language features virtual
functions,
Frontend transformations
Debugging
Simulation
Compiler backend
Mapping applications to processor hardware
Retargetability work for multiple platforms
(not hard coded)
Work at the assembly-code level
Processor independent -gt Machine code
Speed/Efficiency
How to make the application run fast
Use less memory (text, data)

4
Background You Should Have

1. Programming
Good C programmer (essential)
Linux, gcc, gdb, emacs
Compiler system not ported to Windows or Mac
2. Computer architecture
EECS 370 is good, 470 is better but not essential
Basics caches, pipelining, function units,
registers, virtual memory, branches, branch
prediction, assembly code
3. Compilers
Frontend stuff is not very relevant for this
class
Basic backend stuff we will go over fast
Non-EECS 483 people will have to do some
supplemental reading
4. Powerpoint
You will have to make a presentation in this class

5
Textbook and Other Classroom Material

No required text Lecture notes, papers
Other useful material
Trimaran webpage http//www.trimaran.org
UIUC Impact webpage http//www.crhc.uiuc.edu/Impa
ct
Course webpage course newsgroup
http//www.eecs.umich.edu/mahlke/583f07
Lecture notes available the night before class
Newsgroup forum for helping each other, I will
try to check regularly, but I wont be able to
answer everything
http//phorum.eecs.umich.edu

6
What the Class Will be Like

Class meeting time 1030 1230, MW
2 hrs is hard to handle
Well go for an hour, take 10 min break
Core backend stuff
Text book material some overlap with 483
Few homeworks to apply classroom material
Research papers
Ill present research material along the way
Presentations by students You guys are going to
teach

7
What the Class Will be Like (2)

Learning compilers
No memorizing definitions, terms, formulas,
algorithms, etc
Learn by doing Writing code
Substantial amount of programming
Big learning curve for Trimaran compiler
Reasonable amount of reading
Classroom
Attendance You should be here
Discussion important
Work out examples, discuss papers, etc
Each of you will teach some advanced material to
the rest of us
Essential to stay caught up
Special interest groups smaller meetings
outside of class where certain compiler topics
are focused on

8
Course Grading

Yes, everyone will get a grade
Distribution of grades, scale, etc - ???
Most (hopefully all) will get As and Bs
Slackers will be obvious and will suffer
Components
Midterm exam 25
Project 45
Homeworks 10
Paper presentation 10
Class participation 10

9
Homeworks

Around 2-3 of these
Small/modest programming assignments
Design and implement something we discussed in
class
Goals
Learn the important concepts
Learn the compiler infrastructure so you can do
the project
Grading
Good, weak effort but did something, did nothing
(2/1/0)
Working together is fine (and encouraged!)
Make sure you understand things or it will come
back to bite you
For now, everyone must turn in their own
assignment

10
Projects Most Important Part of the Class

Design and implement an interesting compiler
technique and demonstrate its usefulness
Topic/scope/work
1-3 people per project
You will pick the topics (I have to agree)
Projects will be planned/organized at the SIG
level
You will have to
Read background material
Plan and design
Implement and debug
Deliverables
Working implementation
Project report 5 page paper describing what
you did/results
20 min presentation at end (demo if you want)

11
Types of Projects

New idea
Small research idea
Design and implement it, see how it works
Extend existing idea
Take an existing paper, implement their technique
Then, extend it to do something interesting
Generalize strategy, make more efficient/effective
Implementation
Take existing idea, create quality implementation
in Trimaran
Generate code for a real architecture

12
Class Participation

Interaction and discussion is essential in a
graduate class
Be here
Dont just stare at the wall
Be prepared to discuss the material
Have something useful to contribute
Opportunities for participation
Research paper discussions thoughts, comments,
etc
Saying what you think in the special interest
group meetings
Solving class problems

13
Special Interest Groups

Divide up the class into focus groups
Each group will meet at times TBD
Identify research papers, discuss papers and
project ideas
Start SIGs about ½ way through class
SIG topics from previous semesters
Analysis and optimization
Code generation (scheduling, register allocation,
... )
Managing the memory hierarchy
Power/energy management
Reliability
Multiple threads

14
Special Interest Groups (2)

FAQ
Do I have to be in a group Yes
Can I be in more than 1 group No
Do I get to pick which group I am in Sort of
What if I get put in a group that I do not want
to be in Tough
Do I have to go to the SIG meetings Yes
Can I do my project with someone in another SIG
No

15
Contact Information

Office 4633 CSE
Email mahlke_at_umich.edu
Office hours
Mon, Wed briefly after class Wed 4-5pm
Visiting office hrs
No GSI for this class
I dont have the time or energy to debug
everyones code
You will have to be independent in this class
Read the documentation and look at the code
Come to me when you are really REALLY stuck or
confused
Helping each other is encouraged
Use the phorum

16
Role of the Compiler My Biased View

Hardware people have to understand compilers
No attention to compilers -gt bad processor design
Frontend material is not what real compiler
people do
Parsing, syntax checking, etc Standard, mature
field
Buy a frontend from EDG
Backend is where the action is at
How to make code run fast (approach hand coding)
How to reduce power/energy
How to reduce code size
How to reduce memory stalls
How to make use of unusual architectural features
How to design better processors

17
Superscalar Processors

Do everything in hardware
Sequential code comes in
Hardware parallelizes the code on the fly
Traditional computer architecture class
Emphasis on Pentium class architectures
Desktop architecture is the only thing that is
important
In this class ...
Very Long Instruction Word (VLIW) architectures
and multicore VLIWs are the focus
Why? Dumb hardware Smart compiler
Burden shifted to the compiler to exploit machine
resources

18
VLIW/EPIC Architectures

Our target processor for this class is VLIW/EPIC
EPIC Explicitly Parallel Instruction Computing
Think of these as synonyms for this class
Desktop
IA-64 aka Itanium I and II, Merced, McKinley
Embedded processors
All high-performance DSPs are VLIW
Why? Cost/power of superscalar, more scalability
TI-C6x, Philips Trimedia, Starcore, ST-200
Itanium (aka Itanic) Is it a bad idea?

19
VLIW/EPIC Philosphy

Compiler creates complete plan of run-time
execution
At what time and using what resource
POE communicated to hardware via the instruction
set
Processor obediently follows POE
No dynamic scheduling, out of order execution
(these second guess the compilers plan)
Compiler allowed to play the statistics
Many types of info only available at run-time
(branch directions, locations accessed via
pointers)
Traditionally compilers behave conservatively ?
handle worst case possibility
Allow the compiler to gamble when it believes the
odds are in its favor Feedback directed
optimization
Expose microarchitecture to the compiler
memory system, branch execution

20
Defining Feature I - MultiOp

Superscalar
Operations are sequential
Hardware figures out resource assignment, time of
execution
MultiOp instruction
Set of independent operations that are to be
issued simultaneously (no sequential notion
within a MultiOp)
1 instruction issued every cycle provides
notion of time
Resource assignment indicated by position in
MultiOp
POE communicated to hardware via MultiOps

add
sub
load
load
store
mpy
shift
branch
21
Defining Feature II - Exposed Latency

Superscalar
Sequence of atomic operations
Sequential order defines semantics
Unit assumed latency (UAL)
Each conceptually finishes before the next one
starts
VLIW non-atomic operations
Register reads/writes for 1 operation separated
in time
Semantics determined by relative ordering of
reads/writes
Assumed latency (NUAL if gt 1 for at least one op)
Contract between the compiler and hardware
Instruction issuance provides common notion of
time

22
UAL vs NUAL example
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13
Operation r1 load(r2) r1 load(r3) r4
mpy(r1, r5) r4 add(r1, r6) r7 mpy(r4, r9) r7
add(r7, r8)
Phase1 Operation v1 load(r2) v2
load(r3) v3 mpy(r1, r5) v4 add(r1, r6) v5
mpy(r4, r9) v6 add(r7, r8)
Phase2 Operation r1 v1 r1 v2 r4 v4 r4
v3 r7 v6 r7 v5
Time 1 2 3 4 5 6 7 8 9 10 11 12 13
NUAL
traditional
Assume load 4 cycles, add 1, mpy 3
23
Other Architectural Features of VLIW/EPIC

Add features into the architecture to support
VLIW/EPIC philosphy
Create more efficient POEs
Expose the microarchitecture
Play the statistics
Example features
Register files with explicit register renaming
Unbundled branches
Control/data speculation
Memory hierarchy management
Predicated execution

24
Explicit Register Renaming

Superscalar
Small number of architectural registers
Rename using large pool of physical registers at
run-time
VLIW
Compiler responsible for all resourceallocation
including registers
Rename at compile time large poolof regs
needed
Static renaming
Modify operands explicitly
Dynamic renaming
Operands not explicitly modified
Is this feature lost? NO!

Op1
r13
Op2
Op3
r13 ? r67
Op4
25
Fancier Renaming With Rotating Registers
iteration n RRB 7

Overlap loop iterations
How do you prevent register overwrite in later
iterations?
Compiler-controlled dynamic register renaming
Rotating registers
Each iteration writes to r13
But this gets mapped to a different physical
register
Block of consecutive regs allocated for each reg
in loop corresponding to number of iterations it
is needed

iteration n 1 RRB 6
II
Op1
Op1
r13
Op2
r13
Op2
actual reg (reg RRB) NumRegs At end of each
iteration, RRB--
26
Unbundled Branches

Branch separated into 3 distinct operations
1. Prepare to branch compute target address,
prefetch instructions from likely target
Executed well in advance of branch
2. Compute branch condition comparison
operation
3. Branch itself

PBR btr1, TARGET
Branch
CMPP pr0, (xgt100)?
BR btr1, pr0
27
Control/Data Speculation
if (a gt b) x u w y x z y
4 . . .
a b . . . y x z y 4
Hoist conditionally executed instructions above
the condition
Hoist loads/uses over potentially aliased stores
x u w y x z y 4 if (a gt b) .
. .
y x z y 4 . . . a b
28
Predicated Execution
a b c if (a gt 0) e f g else e f
/ g h i - j
add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1
add e, f, g L2 sub h, i, j
BB1 BB1 BB3 BB3 BB2 BB4
BB1
BB2
BB3
BB4
Traditional branching code
add a, b, c if T p2 a gt 0 if T p3 a lt 0 if
T div e, f, g if p3 add e, f, g if p2 sub h, i, j
if T
BB1 BB1 BB1 BB3 BB2 BB4
BB1 BB2 BB3 BB4
p2 ? BB2 p3 ? BB3
Predicated code
29
Scaling VLIW Architectures
Conventional Architecture

Register file access latency
Grows linearly with number of registers
Grows quadratically with number of ports
Increasing processor width requires increases to
both
Clustered Approach
Decentralized architecture
Break design down into multiple chunks aka
clusters
Communication through interconnection network
Used in Alpha 21264, TI C6x, Analog Tigersharc
and others.

RF
Register File
FU
FU
FU
FU
FU
Clustered Architecture
Register File
Register File
FU
FU
FU
FU
Cluster 1
Cluster 2
30
Basics of Multicluster Compilation

Objectives
Divide instructions across clusters to maximize
parallelism
Minimize critical intercluster communication

Interconnection Network

Register File
Register File
gtgt

LW
I
MEM
MEM
I

Intercluster move
Cluster 1
Cluster 2
31
Multicore VLIWs
To north
To west
Mem
Comm
. . .
FU
FU
FU
Register Files
GPR
FPR
PR
BTR
Instruction Fetch/Decode
L1
L1
Instruction Cache
Data Cache
To/From Banked L2
From Banked L2
Scalar operand network enables multicores to
behave as a multicluster VLIW
32
Speculating Larger Chunks of Work with a
Transactional Memory

Atomic and isolated execution
Replace locks for critical sections
No lock granularity problem
Software Error Recovery
Allow programmers to abort/rollback transactions
when errors are detected
Convenient interface for exception handling
Enables thread level speculation

Wrt Buffer
CPU
L1 D
33
What if I Dont Care About These Architectures?

How do we compile for superscalars?
How do we compile for RISCs?
All the basic compiler analyses and
transformations are the same for all processor
types
They were developed for RISCs
Superscalar compilers work by pretending the
processor is a VLIW
But must worry about hardware undoing what the
compiler did
Other resources to worry about (ie reorder
buffer, reserv stations, etc.)
Not all hardware features available