Title: Recap
1Recap
2Complex Instruction Set Computing(CISC)
- CISC older design idea (x86 instruction set is
CISC) - Many (powerful) instructions supported within the
ISA - Upside Makes assembly programming much easier
(lots of assembly programming in 60-70s)
compiler is also simpler - Upside Reduced instruction memory usage
- Downside designing CPU is much harder
3Reduced Instruction Set Computing(CISC)
- RISC newer concept than CISC (but still old)
- MIPS, PowerPC, SPARC, all RISC designs
- Small instruction set, CISC type operation
becomes a chain of RISC operations - Upside Easier to design CPU
- Upside Smaller instruction set gt higher clock
speed - Downside assembly language typically longer
(compiler design issue though) - Most modern x86 processors are implemented using
RISC techniques
4Birth of RISC
- Roots can be traced to three research projects
- Berkeley RISC processor (1980, D. Patterson)
- Stanford MIPS processor (1981, J. Hennessy)
- Stanford Berkeley projects driven by interest
in building a simple chip that could be made in a
university environment - Commercialization benefited from these two
independent projects - Berkeley Project -gt began Sun Microsystems
- Stanford Project -gt began MIPS (used by SGI)
5Modern RISC processors
- Complexity has nonetheless increased
significantly - Superscalar execution (where CPU has multiple
functional units of the same type e.g. two add
units) require complex circuitry to control
scheduling of operations - What if we could remove the scheduling complexity
by using a smart compiler?
6VLIW EPIC
- VLIW very long instruction word
- Idea pack a number of noninterdependent
operations into one long instruction - Strong emphasis on compilers to schedule
instructions - Natural successor to RISC designed to avoid the
need for complex scheduling in RISC designs - ISA is called IA-64
Instr 1
Instr 2
Instr 3
3 instructions scheduled into one long
instruction word
7The EPIC Philosophy
8Who won?
- Modern x86 are RISC-CISC hybrids
- ISA is translated at hardware level to shorter
instructions - Very complicated designs though, lots of
scheduling hardware - MIPS, Sun SPARC, DEC Alpha are much truer
implementations of the RISC ideal - Modern metric for determining RISCkyness of
design does the ISA have LOAD STORE instructions
to memory?
9Multi-Core Technology
2004 2005 2007 Single
Core Dual Core Multi-Core
4 or more cores
Cache
2X more cores
Cache
Cache
Core
2 or more cores
Cache
Cache
Core
Cache
- Itanium architecture has smaller core size
enabling up to 2x more cores per die than IA-32
for higher performance at same cost
Itanium architecture expected to enable up to 2x
more cores per processor than Xeon processors by
2007
10Multi Core Processors
- Historical trend 20-30/yr by raising frequency
- Future tend multiply cores at lower freq for
better performance-to-power ratio
1132-Bit and 64-Bit Processors
Long Mode
Legacy Mode
User
Application
Operating System
Kernel
Drivers
12The Role of Compilers
13Compiler and ISA
- ISA decisions are no more just for programming
assembly language (AL) easily - Due to HLL, ISA is a compiler target today
- Performance of a computer will be significantly
affected by compiler - Understanding the compiler technology today is
critical to designing and efficiently
implementing an instruction set - Architecture choice affects the code quality and
the complexity of building a compiler for it
14Goal of the Compiler
- Primary goal is correctness
- Second goal is speed of the object code
- Others
- Speed of the compilation
- Ease of providing debug support
- Inter-operability among languages
- Flexibility of the implementation - languages may
not change much but they do evolve - e. g.
Fortran 66 gt HPF
Make the frequent cases fast and the rare case
correct
15Typical Modern Compiler Structure
Common Intermediate Representation
Somewhat language dependentLargely machine
independent
Small language dependentSlight machine dependent
Language independentHighly machine dependent
16Typical Modern Compiler Structure (Cont.)
- Multi-pass structure ? easy to write bug-free
compilers - Transform HL, more abstract representations, into
progressively low-level representations,
eventually reaching the instruction set - Compilers must make assumptions about the ability
of later steps to deal with certain problems - Ex. 1 choose which procedure calls to expand
inline before they know the exact size of the
procedure being called - Ex. 2 Global common sub-expression elimination
- Find two instances of an expression that compute
the same value and saves the result of the first
one in a temporary - Temporary must be register, not memory
(Performance) - Assume register allocator will allocate temporary
into register
17Optimization Types
- High level - done at source code level
- Procedure called only once - so put it in-line
and save CALL - Local - done on basic sequential block
(straight-line code) - Common sub-expressions produce same value
- Constant propagation - replace constant valued
variable with the constant - saves multiple
variable accesses with same value - Global - same as local but done across branches
- Code motion - remove code from loops that compute
same value on each pass and put it before the
loop - Simplify or eliminate array addressing
calculations in loop
18Optimization Types (Cont.)
- Register allocation
- Use graph coloring (graph theory) to allocate
registers - NP-complete
- Heuristic algorithm works best when there are at
least 16 (and preferably more) registers - Processor-dependent optimization
- Strength reduction replace multiply with shift
and add sequence - Pipeline scheduling reorder instructions to
minimize pipeline stalls - Branch offset optimization Reorder code to
minimize branch offsets
19Register Allocation
- One the most important optimizations
- Based on graph coloring techniques
- Construct graph of possible allocations to a
register - Use graph to allocate registers efficiently
- Goal is to achieve 100 register allocation for
all active variables. - Graph coloring works best when there are at least
16 general-purpose registers available for
integers and more for floating-point variables.
20Constant propagation a 5 ... // no change to
a so far. if (a gt b) . . . The
statement (a gt b) can be replaced by (5 gt b).
This could free a register when the comparison is
executed. When applied systematically, constant
propagation can improve the code significantly.
21Strength reduction Example for (j 0 j n
j) Aj 2j for (i 0 4i lt n i)
A4i 0 An optimizing compiler can replace
multiplication by 4 by addition by 4. This is an
example of strength reduction. In general, scalar
multiplications can be replaced by additions.
22Major Types of Optimizations and Example in Each
Class
23Change in IC Due to Optimization
- Level 1 local optimizations, code scheduling,
and local register allocation - Level 2 global optimization, loop transformation
(software pipelining), global register allocation - Level 3 procedure integration
24How can Architects Help Compiler Writers
- Provide Regularity
- Address modes, operations, and data types should
be orthogonal (independent) of each other - Simplify code generation especially multi-pass
- Counterexample restrict what registers can be
used for a certain classes of instructions - Provide primitives - not solutions
- Special features that match a HLL construct are
often un-usable - What works in one language may be detrimental to
others
25How can Architects Help Compiler Writers (Cont.)
- Simplify trade-offs among alternatives
- How to write good code? What is a good code?
- Metric IC or code size (no longer true) ?caches
and pipeline - Anything that makes code sequence performance
obvious is a definite win! - How many times a variable should be referenced
before it is cheaper to load it into a register - Provide instructions that bind the quantities
known at compile time as constants - Dont hide compile time constants
- Instructions which work off of something that the
compiler thinks could be a run-time determined
value hand-cuffs the optimizer
26Short Summary -- Compilers
- ISA has at least 16 GPR (not counting FP
registers) to simplify allocation of registers
using graph coloring - Orthogonality suggests all supported addressing
modes apply to all instructions that transfer
data - Simplicity understand that less is more in ISA
design - Provide primitives instead of solutions
- Simplify trade-offs between alternatives
- Dont bind constants at runtime
- Counterexample Lack of compiler support for
multimedia instructions