Title: Instruction Set Design
1Instruction Set Design
- Vincent H. Berk
- September 29th, 2008
- Reading for Today Chapter 1.5 1.11, Mazor
article - Reading for Wednesday Appendix B.1 B.11, Wulf
article - Homework for Wednesday 1.1, 1.3, 1.6, 1.7, 1.13
2Instruction Sets
software
instruction set
hardware
3Interface Design
- A good interface
- Lasts through many implementations (portability,
compatibility). - Is used in many different ways (generality).
- Provides convenient functionality to higher
levels. - Permits an efficient implementation at lower
levels.
4Evolution of Instruction Sets
- Single Accumulator (EDSAC 1950)
- Accumulator Index Registers
- (Manchester Mark I, IBM 700
series 1953) - Separation of Programming Model
- from Implementation
- High-level Language Based
Concept of a Family - (B5000 1963)
(IBM 360 1964) - General Purpose Register Machines
- Complex Instruction Sets
Load/Store Architecture - (Vax, Intel 432 1977-80) (CDC
6600, Cray 1 1963-76) - CISC RISC
- (Intel x86, Pentium II/III/4, (MIPS, SPARC,
88000, IBM RS6000, 1987) - core 2, AMD Atlon/Opteron)
5Evolution of Instruction Sets
- Major advances in computer architecture are
typically associated with landmark instruction
set designs - Ex Stack vs General Purpose Registers (GPR)
- Design decisions must take into account
- technology
- machine organization
- programming languages
- compiler technology
- operating systems
- Few will ever design an instruction set, but
understanding ISA design decisions is important
6Design Space of ISA
- Five Primary Dimensions
- Number of explicit operands (0,1,2,3) - ISA
class - Operand storage Where besides memory?
- Effective address How is memory location
specified? - Type size of operands byte, int, float,
vectors, - 32-bits, 64-bits? How is it specified?
- Operations add, sub, mul, How is it
specified? - Other Aspects
- Successor How is it specified?
- Conditions How are they determined?
- Encodings Fixed or variable? Wide?
- Parallelism
7Basic ISA Classes
- Accumulator
- 1 address add A acc ? acc memA
- 1x address addx A acc ? acc memA x
- Stack
- 0 address add tos ? tos next
- General Purpose Register
- 2 address add A B A ? A B
- 3 address add A B C A ? B C
- Load/Store
- 3 address add Ra Rb Rc Ra ? Rb Rc
- load Ra Rb Ra ? memRb
- store Ra Rb memRb ? Ra
8Primary Advantages and Disadvantagesof Each
Class of Machine
- Stack
- A Simple model of expression evaluation (reverse
polish). Short instructions can yield good code
density. - D A stack cannot be randomly accessed. This
limitation makes it difficult to generate
efficient code. Its also difficult to implement
efficiently, since the stack becomes a
bottleneck. - Accumulator
- A Minimizes internal state of machine. Short
instructions. - D Since accumulator is only temporary storage,
memory traffic is highest for this approach.
9Register
- A Most general model for code generation.
- D All operands must be named, leading to longer
instructions. - While most early machines used stack or
accumulator-style architectures, modern machines
(designed in last 10-15 years and still in use)
use a general-purpose register architecture. - Registers are faster than memory
- Registers are easier for compilers to use
- Registers can be used more effectively than other
forms of internal storage
10Machine Types
11(No Transcript)
12Addressing Modes
memory
- Register Ri
- Immediate (literal) v
- Direct (absolute) Mv
- BaseDisplacement MRi v
- Register indirect MRi
- BaseIndex (Indexed) MRi Rj
- Scaled Index MRi Rjd v
- Autoincrement MRi
- Autodecrement MRi--
- Memory indirect M MRi
reg. file
13Addressing Modes
14Memory Alignment
- Processors often require data-types to be aligned
on addresses that are a multiple of their size - address sizeof (datatype) 0
- bytes can be aligned everywhere
- 4 byte integers aligned on addresses divisible
by 4
Byte Order
- Little Endian - Little End First (Intel)
-
- Big Endian Big End First (PowerPC, MIPS, NBO)
- Bi-Endian can do both (SPARC v9)
D C B A
A B C D
15Operations in the Instruction Set
- Arithmetic and logical integer arithmetic and
logical operations add, and, subtract, or - Data transfer loads/stores (move instructions
on machines with memory addressing) - Control branch, jump, procedure call and
return, traps - System operating system call, virtual memory
management instructions - Floating point floating-point operations add,
multiply - Decimal decimal add, decimal multiply,
decimal-to-character conversions - String string move, string compare, string
search - Graphics pixel and vertex operations
16(No Transcript)
17Control Flow
- PIC Position Independent Code
- Caller vs. Callee saving of state
18Instruction Set Encoding
- Affects program size
- Number of instructions size of the Opcode
- Number of instructions types of instructions
- Number of operands
- Number of registers size of the operand fields
- Variable instruction length vs. Fixed instruction
length - Intel x86 instructions are between 1 and 17 bytes
long.
19(No Transcript)
20RISC vs. CISC
- RISC Reduced Instruction Set Computer
- Small instruction sets
- Fixed-length instructions that often execute in a
single cycle - Operations performed only on registers
- Simpler chip that can run at higher clock speed
- CISC Complex Instruction Set Computer
- Large instruction sets
- Complex, variable-length instructions
- Memory-to-memory operations
21Design Principles ? CISC(Patterson, 1985)
- Richer instruction sets would simplify compilers.
- Richer instruction sets would alleviate the
software crisis. - Richer instruction sets would improve
architecture quality. - Since execution speed was proportional to program
size, architectural techniques that led to
smaller programs also led to faster computers.
22Design Principles ? RISC(Patterson, 1985)
- Functions should be kept simple unless there is a
very good reason to do otherwise. - Simple decoding and pipelined execution are more
important than program size. - Compiler technology should be used to simplify
instructions rather than to generate complex
instructions.
23A Typical RISC(Patterson)
- 32-bit fixed format instruction (3 formats)
- 32 64-bit general-purpose registers (R0 contains
zero, double-precision numbers take two
registers) - Single address mode for load/store base
displacement (no indirection) - Simple branch conditions
- Delayed branch to avoid pipeline penalties
- Examples DLX, SPARC, MIPS, HP PA-RISC, DEC
Alpha, IBM/Motorola PowerPC, Motorola M88000
24MIPS Instruction Formats (DLX)
31 26 21 16 11 6 0
op
rs
shamt
rd
rt
funct
R-type
6 bits 5 bits 5 bits 5 bits 5
bits 6 bits
I-type
op
rt
rs
immediate/address
6 bits 5 bits 5 bits
16 bits
op
target address
J-type
6 bits 26 bits
25Impact of Compiler Technologyon Architecture
Decisions
- The interaction of compilers and high-level
languages significantly affects how programs use
an instruction set. - 1. How are variables allocated and addressed?
How many registers are needed to allocate
variables appropriately? - 2. What is the impact of optimization
techniques on instruction mixes? - 3. What control structures are used and with
what frequency?
26Instruction Set PropertiesThat Simplify Compiler
Writing
- 1. Provide regularity.
- 2. Provide primitives, not solutions.
- 3. Simplify tradeoffs among alternatives.
- 4. Provide instructions that bind the quantities
known at compile time as constants.
27DEC VAX The penultimate CISC
- VAX-11/780 introduced in 1977
- 2 goals
- 32-bit extension of PDP-11 architecture (make
customers comfortable) - ease task of writing compilers and operating
systems - General-purpose register machine with large
orthogonal instruction set - 16 general-purpose registers (4 reserved)
- Large number of addressing modes, large number of
instructions
28- Any combination of addressing modes works with
nearly every opcode - Variable-length instructions
- 3-operand instruction may have 0 to 3 operand
memory references, each of which may be any of
the addressing modes - Elaborate instructions can take dozens of clock
cycles
29IBM 360/370
- 360 introduced in 1964 first to use notion of
instruction set architecture (370 introduced in
1970 as successor to 360) - Goals
- exploit storage large main storage, storage
hierarchies - support concurrent I/O
- create a general-purpose machine with new OS
facilities and many data types - maintain strict upward and downward
machine-language compatibility - 32-bit machine with byte addressability and
support for variety of data types
30- 16 32-bit, general-purpose registers
- 4 double-precision (64-bit) floating-point
registers - 5 instruction formats, each of which is
associated with a single addressing mode - Basic operations
- logic operations on bits, character strings, and
fixed words - decimal or character operations on strings of
characters or decimal digits - fixed-point binary arithmetic
- floating-point arithmetic
31IBM 360
32Cray
33ISA Metrics
- Regularity (Orthogonality)
- No special registers, few special cases, all
operand modes available with any data type or
instruction type - Primitives rather than solutions
- Completeness
- Support for a wide range of operations and target
applications - Streamlined
- Resource needs easily determined
- Ease of compilation
- Ease of implementation
- Scalability
- Density (network bandwidth and power
consumption)