Title: Lecture 4: Instruction Set Design/Pipelining
1Lecture 4 Instruction Set Design/Pipelining
- Instruction set design (Sections 2.9-2.12)
- control instructions
- instruction encoding
- Basic pipelining implementation (Section A.1)
2Control Transfer Instructions
- Conditional branches (75 - Int) (82 - FP)
- Jumps (6 - Int) (10 - FP)
- Procedure calls/returns (19 - Int) (8 - FP)
- Design issues
- How do you specify the target address?
- How do you specify the condition?
- What happens on a procedure call/return?
3Specifying the Target Address
- PC-Relative needs fewer bits to encode,
independent of - how/where the compiled code is linked, used for
branches - and jumps typically, the displacement needs
4-8 bits - Register-indirect jumps the address is not
known at - compile-time and has to be computed at run-time
(note can - use any other addressing mode too)
- procedure returns
- case statements
- virtual functions
- function pointers
- dynamically shared libraries
4Specifying the Condition
Name Examples How condition is tested Advantages Disadvantages
Condition Code (CC) 80x86, ARM, PowerPC, SPARC Tests special bits set by ALU ops Sometimes condition is set for free CC is extra state. Instructions cannot be re-ordered
Condition Register Alpha, MIPS Comparison sets register and this is tested Simple Register pressure
Compare and branch PA-RISC, VAX Comparison is part of the branch One instruction instead of two Complex pipelines
5Procedure Call/Returns
- Need to maintain a stack of return addresses (in
memory or - in hardware)
- Can copy and save all registers together or this
can be done - selectively
- Who is responsible for saving registers?
- Caller saving correctness issues (global
register has to - be made available to other procedures), it
only saves - values that it cares about
- Callee saving it saves only as many registers
as it - needs (provided it doesnt call other
procedures) - A combination of both is typically employed
6Instruction Set Encoding
- Operations are easy to encode efficiently the
key issues - are the number of operands and their
addressing modes - Few addressing modes ? low complexity in
decoding and - pipelining, but greater code size
- Fixed instruction lengths ? low complexity in
decoding, but - greater code size
7Instruction Lengths
8Dealing with Code Size in RISC
- Some hybrid versions allow for 16 and 32-bit
instructions - (40 reduction in code size) useful for
embedded apps - IBM PowerPC stores 32-bit instructions in
compressed - form in memory more hardware complexity on an
I-cache - miss (need to translate from uncompressed to
compressed - in addition to virtual to physical)
- Reducing the register file size can also reduce
the - instruction length
9Compiler Optimizations
- The phase-ordering problemearly phases have to
assume that - register allocation will find a register, else,
optimizations such as - common subexpression elimination may increase
memory traffic
10Register Allocation Issues
- Graph coloring determine when variables are
live and - avoid allocating the same register to variables
that are - simultaneously live
- Stack variables (typically local to a
procedure) easy to - allocate registers for
- Global data can be accessed from multiple
places (aliasing), - difficult to allocate to registers
- Heap data dynamically created objects, accessed
with - pointers, difficult to allocate to registers
because of aliasing
11Case Study The MIPS ISA
- Load-store architecture
- Focus on pipelining, decoding, and compiler
efficiency - In other words, RISC
12Registers
- 32 GPRs (general-purpose/integer registers) and
32 FPRs - 64-bit registers two single-precision FP values
can fit in - one register
- Register R0 is hardwired to zero with
displacement - addressing mode, we can also accomplish
absolute - addressing other uses for R0?
13Instruction Format
14Control Instructions
- Comparisons with zero can happen as part of the
branch - Compares between registers are placed in other
registers - that are tested by branches
- Jump-and-link places the return address in
register R31
15Instruction Frequencies
16Summary
- In the 1960s, stack architectures were
considered a good - match for high-level languages
- In the 1970s, software costs were a concern
ISAs were - enriched to make the compilers job easier
CISC - In the 1980s, there was a push for simpler
architectures - high clock speed and high parallelism RISC
- ISAs designed in 1980 are still around!
17The Assembly Line
Unpipelined
Start and finish a job before moving to the next
Jobs
Time
A
B
C
Break the job into smaller stages
A
B
C
A
B
C
A
B
C
Pipelined
18Performance Improvements?
- Does it take longer to finish each individual
job? - Does it take shorter to finish a series of jobs?
- What assumptions were made while answering these
- questions?
- Is a 10-stage pipeline better than a 5-stage
pipeline?
19Title