Title: Instruction Set Architectures
1Instruction Set Architectures
2Computer Architecture is ...
Instruction Set Architecture
Organization
Hardware
3The Big Picture
SPEC
Requirements
Problem Focus
Algorithms
f2() f3(s2, j, i) s2-gtp 10 i
s2-gtq i
Prog. Lang./OS
i1 ld r1, b ltp1gt i2 ld r2, c
ltp1gt i3 ld r5, z ltp3gt i4 mul r6, r5, 3
ltp3gt i5 add r3, r1, r2 ltp1gt
ISA
uArch
Performance Focus
Circuit
Device
4Instruction Set Architecture (ISA)
- In order to use the hardware of a computer, we
must speak its language. - The words of a computer language are called
instructions, and its vocabulary is called an
instruction set. - Instruction set of a computer the portion of the
computer visible to the assembly level programmer
or to the compiler writer.
5Instruction Set Architecture
Application
SPARC MIPS ARM x86 HP-PA IA-64
Instruction Set Architecture
Intel Pentium X, Core 2 AMD K6, Athlon,
Opteron Transmeta Crusoe TM5x00
Implementation
6Interface Design
- A good interface
- Lasts through many implementations (portability,
compatibility) - Is used in many different ways (generality)
- Provides convenient functionality to higher
levels - Permits an efficient implementation at lower
levels
use
time
imp 1
Interface
use
imp 2
use
imp 3
7Instruction Set Architecture
- Strong influence on cost/performance
- Remember the Execution Time formula?
- ISA influences everything!
- New ISAs are rare, but versions are not
- 16-bit, 32-bit and 64-bit X86 versions
- Longevity is a strong function of marketing
prowess
8Instruction Set Architecture
- What is an instruction set architecture?
- Just a set of instructions
- Each instruction is directly executed by the CPU
hardware. - How is it represented?
- By a binary format, typically bits, bytes, words.
- Word size is typically 16, 32, 64 bits today.
- Length format options
- Fixed each instruction encoded in same size
field - Variable half-word, whole word, multiple word
instructions are possible
9Instruction Formats
Alpha (fixed length)
x86 (variable length)
TRAP
opcode
prefixes
opcode
addr mode
displ
imm
Branch
opcode
RA
0 to 8 bytes
opcode
RA
RB
Mem
opcode
RA
RB
RC
Operate
0 to 2 bytes (ModR/M and SIB)
6 bits
1 or 2 bytes of opcode
32 bits
0 to 4 bytes of prefix
10Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
Load/Store Architecture
(Vax, Intel 432 1977-80)
(CDC 6600, Cray 1 1963-76)
RISC
CISC
(Mips,Sparc,HP-PA,IBM RS6000,PowerPC 1987)
(Intel x86 1980-199x)
Mixed CISC RISC?
(IA-64. . .1999)
11Instruction characteristics
- Usually a simple operation
- Identified by the op-code field
- But operations require operands 0, 1, 2
- To identify where they are, they must be
addressed at some piece of storage - Typically main memory, registers
- Each has its own particular organization
- 2 options implicit or explicit addressing
- Implicit the op-code implies the address of the
operands - Explicit the address is specified in some field
of the instruction
12Instruction Set Architecture What Must be
Specified?
- Instruction Format or Encoding
- how is it decoded?
- Location of operands and result
- where other than memory?
- how many explicit operands?
- how are memory operands located?
- which can or cannot be in memory?
- Data type and Size
- Operations
- what are supported
- Successor instruction
- jumps, conditions, branches
- fetch-decode-execute is implicit!
13Classifying ISA
- Based on CPU internal storage options AND
Operands. - These choices critically affect - instructions,
CPI, and cycle time
14Basic ISA Classes
- Accumulator
- 1 Address add A (acc ? acc MemA).
- Stack
- 0 address add (tos ? tos second of stack).
- General Purpose Register
- 2 addresses add A, B EA(A)
?EA(A)EA(B) - 3 addresses add A, B, C EA(A)
?EA(C)EA(B) - Load/Store (register-register)
- ALU operations No memory reference.
- 3 addresses add R1, R2, R3 R1 ? R2 R3
- load R1, R2
R1 ?MemR2 - store R1, R2
MemR1 ? R2 - Comparison Bytes per Instruction? Number of
Instructions? Cycles per instruction?
15Operand Locations in Four ISA Classes
16Comparison of ISA Classes
- Memory efficiency? Instruction access? Data
access?
17General Purpose Register
- All machines use general purpose registers after
1975. - Advantages of registers
- Registers are faster than memory.
- Registers are easier for a compiler to use.
- E.g. (AB) - (CD) - (EF) can do multiplication
in any order, but stack? - Registers can hold variables.
- Memory traffic is reduced.
- Code density improved (since register name with
fewer bits than memory address).
18 How Many Registers?
- Depends on
- Compiler ability
- Program characteristics
- Lots-of-registers enable two important
optimizations - Register allocation (more variables can be in
registers) - Limiting reuse of registers improves parallelism
- Reuse example
- Load R2, A Load R3, B Load R4, C Load R5, D
- Add R1, R2, R3
- Add R2, R5, R4 (reuse of R2)
- vs.
- Add R1, R2, R3
- Add R6, R4, R5 (no reuse had R6)
- Without reuse Adds are parallelizable if there
are two adders
Conflict artificially serializes the two
instructions
19Examples of Register Usage
- Typical ALU Instructions
- MIPS add Rd, Rs, Rt ? (0,3)
- 80x86 ADD AL, SI ? (1,2)
- VAX CMPB (R0), (R0) ? (2,2)
20Register Register (0,3)
- Notation (m,n) m memory operands, n total
operands in ALU instruction - ALU is register to register
- Advantages
- Simple fixed length instruction encoding
- Decode is simple since instruction types are
small - Simple code generation model
- Instruction CPI tends to be very uniform
- Disadvantages
- Instruction count tends to be higher
- Some instructions are short wasting instruction
word bits
21Register-Memory (1,2)
- Evolved RISC and also old CISC
- Register-Memory ALU architecture
- Advantages
- Data access to ALU immediate without loading
first - Instruction format is relatively simple
- Density is improved over Register (0,3) model
- Disadvantages
- Operands are not equivalent source may be
destroyed - Need for memory address field of registers
- CPI will vary
22Memory-Memory (3,3)
- True Memory-Memory architecture
- True and most complex CISC model
- Currently extinct
- Advantages
- Most compact
- Doesnt waste registers for temporary values
- Disadvantages
- Large variation in instruction size
- Large variation in CPI
23Summary Classifying ISA
- Computers should use general purpose registers
- Computer should use a load-store architecture
24Memory Addressing
- All architectures clearly need some way to
address memory - What is accessed byte, word, multiple words?
- Since 1980s, almost all machines are byte
addressable - But the main memory is organized in 32-64 bits
lines to match cache model - Can a word be placed on any boundary?
- Accessing a word or double-word which crosses 2
lines requires 2 references - Automatic alignment is possible but hides the
number of references
25Addressing Objects Endianess and Alignment
- Two conventions for memory ordering ordering
bytes within a word - Big Endian address of most significant byte
word address (xx00 Big End of word) - IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
- Little Endian address of least significant byte
word address(xx00 Little End of word) - Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
26Addressing Objects
- Endian Wars
- Order of bytes in words
- Big Endian MSB at address xxxxx00
- Little Endian MSB at address xxxxx11
- Big Endian (IBM, Motorola)
- Little Endian (DEC, Intel)
- The main problem with the two ordering schemes is
when two computers with different ordering have
to communicate with each other (a hardware or
software solution has to be provided to perform
the proper conversion). -
27Objects Alignment
- Alignment require that objects fall on address
that is multiple of their size. - Misalignment causes hardware complications since
memory is aligned on a word or double word
boundary
28Objects Alignment
- Byte alignment
- Any access is accommodated
- Word alignment
- Only accesses that are aligned at natural word
boundaries are accommodated due to DRAM/SRAM
organization - Reduces number of reads/writes to memory
- Eliminates hardware for alignment (typically
expensive) - Often handle misalignment via software
- Compiler detects generates appropriate
instructions
memory (bytes)
0
1
2
3
4
Unaligned access
5
6
7
Word size 4 bytes
read 1
0
1
2
3
Asking for words beginning at 0 or 4 is OK
Asking for other words requires two reads (e.g.,
ask for word starting at 2)
read 2
4
5
6
7
2
3
4
5
reorder
2
3
4
5
29Addressing Modes
- An important aspect of ISA design
- Has major impact on both the HW complexity and
the instruction count - HW complexity affects the CPI and the cycle time
- Basically a set of mappings
- From address specified to address used
- Address used effective address
- Effective address may go to memory or to a
register array - Effective address generation is an important focus
30Typical Memory Addressing Modes
Addressing Sample
Mode
Instruction Meaning
Register Immediate Displacement
Indirect Indexed Absolute
Memory indirect Autoincrement
Autodecrement Scaled
Regs R4 RegsR4 RegsR3 RegsR4
RegsR4 3 RegsR4 RegsR4Mem10RegsR1
RegsR4 RegsR4 MemRegsR1 Regs R3
RegsR3MemRegsR1RegsR2 RegsR1
RegsR1 Mem1001 RegsR1 RegsR1
MemMemRegsR3 RegsR1 RegsR1
MemRegsR2 RegsR2 RegsR2 d Regs R2
RegsR2 -d RegsR1 RegsRegsR1
MemRegsR2 RegsR1 RegsR1
Mem100RegsR2RegsR3d
Add R4, R3 Add R4,
3 Add R4, 10 (R1)
Add R4, (R1) Add R3, (R1 R2) Add R1,
(1001) Add R1, _at_ (R3) Add R1, (R2) Add
R1, - (R2) Add R1, 100 (R2) R3
31Addressing Modes Usage Example
For 3 programs running on VAX ignoring direct
register mode
Displacement 42 avg, 32 to 55 Immediate
33 avg, 17 to 43 Register
deferred (indirect) 13 avg, 3 to 24 Scaled
7 avg, 0 to 16 Memory indirect 3 avg,
1 to 6 Misc 2 avg, 0 to 3 75
displacement immediate 88 displacement,
immediate register indirect. Observation In
addition Register direct, Displacement,
Immediate, Register Indirect addressing modes are
important.
75
88
32Benchmarks Show Mode Importance
Based on a VAX which supported everything from
SPEC89
33Utilization of Memory Addressing Modes
34Displacement Address Size Example
Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92
programs
1 of addresses gt 16-bits
12 - 16 bits of displacement needed
35Immediate Addressing Mode
About one quarter of data transfers and ALU
operations have an immediate operand for SPEC
CPU2000 programs.
36Immediate Addressing Mode
- 10 Programs from SPECInt92 and SPECfp92
37Immediate Addressing Mode
- 50 to 60 fit within 8 bits
- 75 to 80 fit within 16 bits
gcc
spice
Tex
38Addressing Mode Summary
- Important data addressing modes
- Displacement
- Immediate
- Register Indirect
- Displacement size should be 12 to 16 bits.
- Immediate size should be 8 to 16 bits.
39Instruction Operations
- Arithmetic and Logical
- add, subtract, and , or, etc.
- Data transfer
- Load, Store, etc.
- Control
- Jump, branch, call, return, trap, etc.
- Synchronization
- Test Set.
- String
- string move, compare, search.
40Instruction Usage Example Top 10 Intel X86
Instructions
Rank
Integer Average Percent total executed
1
2
3
4
5
6
7
8
9
10
Observation Simple instructions dominate
instruction usage frequency.
41Instructions for Control Flow
Breakdown of control flow instructions into three
classes calls or returns, jumps and conditional
branches for SPEC CPU2000 programs.
42Conditional Branch Distance
- Short displacement fields often sufficient for
branch
FP Average
Integer Average
43Conditional Branch Addressing
- PC-relative, since most branches from current PC
address - At least 8 bits.
- Compare Equal/Not Equal most important for
integer programs.
44Data Type and Size of Operands
- Byte, half word (16 bits), word (32 bits), double
word (64 bits). - Arithmetic
- Decimal 4bit per digit.
- Integers 2s complement
- Floating-point IEEE standard-- single, double,
extended precision.
45Type and Size of Operands
Distribution of data accesses by size for SPEC
CPU2000 benchmark programs
46Instruction Set Encoding
- Considerations affecting instruction set
encoding - To have as many registers and address modes as
possible. - The Impact of the size of the register and
addressing mode fields on the average instruction
size and on the average program. - To encode instructions into lengths that will be
easy to handle in the implementation. On a
minimum to be a multiple of bytes.
47Instruction Format
- Fixed
- Operation, address specifier 1, address specifier
2, address specifier 3. - MIPS, SPARC, Power PC.
- Variable
- Operation of operands, address specifier1, ,
specifier n. - VAX
- Hybrid
- Intel x86
- operation, address specifier, address field.
- Operation, address specifier 1, address specifier
2, address field. - Operation, address field, address specifier 1,
address specifier 2. - Summary
- If code size is most important, use variable
format. - If performance is most important, use fixed
format.
48Three Examples of Instruction Set Encoding
Operations no of operands
Address specifier 1
Address field 1
Address specifier n
Address field n
Variable VAX (1-53 bytes)
Operation
Address field 1
Address field 2
Address field3
Fixed DLX, MIPS, PowerPC, SPARC
Operation
Address field
Address Specifier
Address Specifier 1
Address Specifier 2
Operation
Address field
Address Specifier
Address field 2
Operation
Address field 1
Hybrid IBM 360/370, Intel 80x86
49Summary ISA
- Use general purpose registers with a load-store
architecture. - Support these addressing modes displacement,
immediate, register indirect. - Support these simple instructions load, store,
add, subtract, move register, shift, compare
equal, compare not equal, branch, jump, call,
return. - Support these data size 8-,16-,32-bit integer,
IEEE FP standard. - Provide at least 16 general purpose registers
plus separate FP registers and aim for a minimal
instruction set.