Title: Instruction Set Architecture
1Instruction Set Architecture
- Pradondet Nilagupta
- Spring 2005
- (original notes from Prof. Baniasadi,
- Prof. Shaaban, Prof. Katz)
2Outline
- ISA Introduction
- ISA Classifying
- Memory Addressing
- Addressing Modes
- Operands
- Encoding ISA
3Hot Topics in Computer Architecture
- 1950s and 1960s
- Computer Arithmetic
- 1970 and 1980s
- Instruction Set Design
- ISA Appropriate for Compilers
- 1990s
- Design of CPU
- Design of memory system
- Design of I/O system
- Multiprocessors
- Instruction Set Extensions
4Instruction Set Architecture
- Instruction set architecture is the structure of
a computer that a machine language programmer
must understand to write a correct (timing
independent) program for that machine. - The instruction set architecture is also the
machine description that a hardware designer must
understand to design a correct implementation of
the computer.
5Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Load/Store Architecture
Complex Instruction Sets
(CDC 6600, Cray 1 1963-76)
(Vax, Intel 432 1977-80)
RISC
(Mips,SPARC,HP-PA,IBM RS6000, . . .1987)
6Instruction Set Architecture
- The instruction set architecture serves as the
interface between software and hardware
software
instruction set
hardware
7Interface Design
- A good interface
- Lasts through many implementations (portability,
compatibility) - Is used in many different ways (generality)
- Provides convenient functionality to higher
levels - Permits an efficient implementation at lower
levels
8What Are the Components of an ISA? (1/2)
- Sometimes known as The Programmers Model of the
machine - Storage cells
- General and special purpose registers in the CPU
- Many general purpose cells of same size in memory
- Storage associated with I/O devices
- The machine instruction set
- The instruction set is the entire repertoire of
machine operations - Makes use of storage cells, formats, and results
of the fetch/execute cycle - i.e., register transfers
9What Are the Components of an ISA? (2/2)
- The instruction format
- Size and meaning of fields within the instruction
- The nature of the fetch-execute cycle
- Things that are done before the operation code is
known
10Instruction Set Architecture
Computer Program (Instructions)
Programmer's View
ADD SUBTRACT AND OR COMPARE . . .
01010 01110 10011 10001 11010 . . .
Memory
CPU
I/O
Computer's View
Princeton (Von Neumann) Architecture
Harvard Architecture
--- Data and Instructions mixed in same
memory ("stored program computer") --- Program
as data (dubious advantage) --- Storage
utilization --- Single memory interface
--- Data Instructions in separate
memories --- Has advantages in certain
high performance implementations
11Basic Issues in Instruction Set Design
--- What operations (and how many) should be
provided LD/ST/INC/BRN sufficient to
encode any computation But not useful
because programs too long! --- How (and how
many) operands are specified Most
operations are dyadic (eg, A lt- B C)
Some are monadic (eg, A lt- B) --- How to
encode these into consistent instruction
formats Instructions should be
multiples of basic data/address widths
Typical instruction set 32 bit word
basic operand addresses are 32 bits long
basic operands, like integers, are 32 bits long
in general case, instruction could reference
3 operands (A B C) challenge encode
operations in a small number of bits!
12Execution Cycle
Instruction Fetch
Obtain instruction from program storage
Instruction Decode
Determine required actions and instruction size
Operand Fetch
Locate and obtain operand data
Execute
Compute result value or status
Result Store
Deposit results in storage for later use
Next Instruction
Determine successor instruction
13What Must be Specified?
Instruction Fetch
- Instruction Format or Encoding
- how is it decoded?
- Location of operands and result
- where other than memory?
- how many explicit operands?
- how are memory operands located?
- which can or cannot be in memory?
- Data type and Size
- Operations
- what are supported
- Successor instruction
- jumps, conditions, branches
Instruction Decode
Operand Fetch
Execute
Result Store
Next Instruction
14ISA
- What are the important questions?
15Operand Locations in Four ISA Classes
GPR
16ISA Classes
- ISA Classes?
- Stack
- Accumulator
- Register memory
- Register register/load store
Input1
Input2
Operation
Output
17ISA Classes
- Accumulator
- 1 address add A acc acc memA
- 1x address addx A acc acc memA x
- Stack
- 0 address add tos tos next
- General Purpose Register
- 2 address add A B EA(A) EA(A) EA(B)
- 3 address add A B C EA(A) EA(B) EA(C)
- Load/Store
- 3 address add Ra Rb Rc Ra Rb Rc
- load Ra Rb Ra memRb
- store Ra Rb memRb Ra
tos
stack
Comparison
Bytes per instruction? Number of Instructions?
Cycles per instruction?
18ISA Classes Stack
- Operate on TOS, put result TOS
- C AB?
- PUSH A
- PUSH B
- ADD
- POP C
- Memory not touched
TOP OF STACK
Operation
MEMORY
19ISA Classes Accumulator
- Accumulator
- Implicit input output.
- C AB?
- LOAD A - Put A in Accumulator
- ADD B - Add B with AC put result in AC
- STORE C- Put AC in C
Accumulator (AC)
Operation
MEMORY
20ISA Classes Register-Memory
- Input, Output Register or Memory
- C AB?
- LOAD R1, A
- ADD R3, R1, B
- STORE R3, C
Register File
Operation
MEMORY
21ISA Classes Register-Register
- LOAD/STORE ARCH.
- C AB?
- LOAD R1, A
- LOAD R2, B
- ADD R3, R1, R2
- STORE R3, C
Register File
Operation
MEMORY
22Operand Location
- Example
- How do we compute CAB using four classes of
ISAs
Stack Accumulator Register(Reg-Mem) Register(Load/Store)
Push A Push B Add Pop C Load A Add B Store C Load R1,A Add R3,R1,B Store R3,C Load R1,A Load R2,B Add R3,R1,R2 Store R3,C
Comparing Number of Instruction?
23General Purpose Registers Dominate
24General-Purpose Register (GPR) Machines
- Every ISA designed after 1980 uses a load-store
GPR architecture (i.e RISC, to simplify CPU
design). - Registers, like any other storage form internal
to the CPU, are faster than memory. - Registers are easier for a compiler to use.
- GPR architectures are divided into several types
depending on two factors - Whether an ALU instruction has two or three
operands. - How many of the operands in ALU instructions may
be memory addresses.
25ISA Examples
- Machine Number of General
Architecture year - Purpose Registers
EDSAC IBM 701 CDC 6600 IBM 360 DEC PDP-8 DEC
PDP-11 Intel 8008 Motorola 6800 DEC VAX Intel
8086 Motorola 68000 Intel 80386 MIPS HP
PA-RISC SPARC PowerPC DEC Alpha HP/Intel
IA-64 AMD64 (EMT64)
1 1 8 16 1 8 1 1 16 1 16 8 32 32 32 32 32 128 16
accumulator accumulator load-store register-memory
accumulator register-memory accumulator accumulat
or register-memory memory-memory extended
accumulator register-memory register-memory load-s
tore load-store load-store load-store load-store l
oad-store register-memory
1949 1953 1963 1964 1965 1970 1972 1974 1977 1978
1980 1985 1985 1986 1987 1992 1992 2001 2003
26Pros and cons for each ISA type
Machine Type Advantages Disadvantages
Stack
Accumulator
Register
27Pros and cons for each ISA type
Machine Type Advantages Disadvantages
Stack Short instructions Good code density Simple to decode instruction Lack of random access Efficient code hard to get Stack if often a bottleneck
Accumulator Minimal internal state Short instruction Simple to decode instruction Very high memory traffic
Register Lots of code generation option Efficient code (compiler options) Longer instructions Complex instructions
28Examples of GPR Machines
For Arithmetic/Logic Instructions
- Number of Maximum number
- memory addresses of operands allowed
-
- SPARK, MIPS
- 0
3 PowerPC, ALPHA - 1
2 Intel 80x86, -
Motorola 68000 - 2
2 VAX - 3
3 VAX
29Pros/Cons of Mem. Operands/Operands (1/3)
- Register-register 0 memory operands/instr, 3
(register) operands/instr - Pro Simple, fixed-length instruction encoding.
Simple code generation model. Instructions take
similar numbers of clocks to execute - Con Higher instruction count than architectures
with memory references in instructions. Some
instructions are short and bit encoding may be
wasteful.
30Pros/Cons of Mem. Operands/Operands (2/3)
- Registermemory (1,2)
- Pro Data can be accessed without loading first.
Instruction format tends to be easy to encode and
yields good density. - Con Operands are not equivalent since a source
operand in a binary operation is destroyed.
Encoding a register number and a memory address
in each instruction may restrict the number of
registers. Clocks per instruction varies by
operand location.
31Pros/Cons of Mem. Operands/Operands (3/3)
- Memorymemory (3,3)
- Pro Most compact. Doesnt waste registers for
temporaries. - Con Large variation in instruction size,
especially for three-operand instructions. Also,
large variation in work per instruction. Memory
accesses create memory bottleneck.
32Memory Addressing
- How do we specify memory addresses?
- This issue is independent of type of ISA(they
all need to address memory) - We need to specify
- (1) Operand sizes
- (2) Address alignment
- (3) Byte ordering for multi-byte operands
- (4) Addressing Modes
33Operand Sizes (1)
- Byte (8 bits), half-word (16 bits),word (32
bits), double word (64 bits) - An ISA may (and typically does)support multiple
operand sizes - Instruction must specify the operand size
- E.g. LOAD.b R1,A vs. LOAD.w R1,A
- Why? Make sure theres no garbage data
- But usually there is a default size
- Most commonly word on 32-bit machines
- On x86, different register names for different
sizes - (And think about Amdahls Law too)
34Alignment (2)
- For multi-byte memory operands
- An aligned address for an n-byte operandis an
address that is a multiple of n - Word-aligned 0, 4, 8, 12, etc.
- An ISA can require alignment of operands
- Assume it is required unless otherwise specified
- MIPS all memory operands must be
aligned(special two-instruction sequences
foraccessing unaligned data in memory) - x86 no alignment required(but unaligned
accesses are slower)
35Byte Ordering (Endianness) (3)
- Layout of multi-byte operands in memory
- Little endian (x86)
- Least significant byteat lowest address in
memory - Big endian (most other ISAs)
- Most significant byteat lowest address in memory
- Assume this ordering unless otherwise specified
- Some ISAs support both byte ordering
- E.g. MIPS has a little-endian mode
36Another view of Endianness
- No, were not making this up.
- at word address 100 (assume a 4-byte word)
- long a 0x11223344
- big-endian (MSB at word address) layout
- 100 101 102 103
- 100 11 22 33 44
- 0 1 2 3
- little-endian (LSB at word address) layout
- 103 102 101 100
- 11 22 33 44 100
- 3 2 1 0
37Endianness Continued
- Usually instruction sets are byte addressed
- Provide access for bytes (8 bits), half-words (16
bits), words (32 bits), double words (64 bits) - Two different ordering types big/little endian
Little Endian
31
23
15
7
0
Puts byte w/addr. xx00 at least significant
position in the word
Big Endian
31
23
15
7
0
Puts byte w/addr. xx00 at most significant
position in the word
38Addressing Modes (4)
- What is the location of an operand?
- Three basic possibilities
- Register operand is in a register
- Register number encoded in the instruction
- Immediate operand is a constant
- Constant encoded in the instruction
- Memory operand is in memory
- Many address modes possibilities
39Immediate Addressing Mode
- Operand is a constant encoded in instruction
- Can we have any value as an immediate?
- x86 yes. of bytes used to encode the
instruction will change to accommodate. - RISC no, instruction size is fixed (e.g. 32
bits) - Immediates in RISC
- Typically a load immediate instruction
- Some bits used to specify the instruction opcode
- Remaining bits encode the immediate value
- This is OK most-frequently needed constants have
few bits - MIPS also has a special two-instruction
sequenceto put a full 32-bit immediate into a
register
40Size of Immediate Operand
(comments?)
41Memory Addressing Modes (A)
- Register Indirect
- Address is in a register
- LD R1, (R2)
- Use access via pointer or computed address
- Direct (Absolute)
- Address is a constant
- LD R1, (100)
- Use access to static data
- Note constant encoded in the instruction
42Memory Addressing Modes (B)
- Displacement
- Address is registerimmediate
- LD R1, 100(R2)
- Local variables, fields in a structure
- Can simulate register-indirect and direct modes
- LD R1,0(R2) and LD R1, 100(R0)
- Note displacement encoded in instruction
43Size of Displacement
44Memory Addressing Modes (C)
- Autoincrement
- Address is in a register
- LD R1, (R2)
- Register value increased by d after access(d is
the data size, e.g. 4 for word accesses) - Some flavors useful for iterating through arrays,
implementing a stack, etc. - Indexed, Memory Indirect, Scaled, Autodecrement
- See textbook (Page 98)
45FYI More Addressing Modes
Addressing Mode Example Instruction Meaning When Used
Register Add R4, R3 R4 ? R4 R3 When a value is in a register
Immediate Add R4, 3 R4 ? R4 3 For constants
Displacement Add R4, 100(R1) R4 ? R4 Mem100R1 Accessing local variables
Register deferred or Indirect Add R4, (R1) R4 ? R4 MemR1 Accessing using pointer or computed address
Indexed Add R3, (R1R2) R3 ? R3 MemR1R2 Array addressing R1 base of array, R2 index amount
Direct or Absolute Add R1, (1001) R1 ? R1 Mem1001 Accessing static data addr. constant may need to be big
46FYI Addressing modes continued
Addressing Mode Example Instruction Meaning When Used
Memory indirect or Memory deferred Add R1, _at_(R3) R1 ? R1 MemMemR3 If R3 is the address of a pointer p, then mode yields p
Autoincrement Add R1, (R2) R1 ? R1MemR2 R2 ? R2 d Useful for stepping through arrays within a loop R2 points to start of array each ref. increments R2 by d
Autodecrement Add R1, -(R2) R1 ? R1-MemR2 R2 ? R2 d Same as autoincrement can be used for push/pop on stack
Scaled Add R1, 100(R2), R3 R1 ? R1 Mem100R2R3d Used to index arrays
47Use of Addressing Modes (DSP)
- Hand-coded
- Uses more powerful modes when possible
- Figure 2.11 in textbook
- Goes through mix of addressing modes and the of
time each is used for a TI DSP - 70
- Immediate, displacement, register immediate,
direct - 25
- Auto increment, Auto decrement
- Thoughts?
- (random comment make these 6 addressing out
of 17 total the fastest)
48Conrol Flow Instructions
- Up until now implicitly have discussed memory
and arithmetic instructions - Now, control flow4 basic types
- Procedure Call and Return
- Jumps
- Conditional branches
49Addr. Modes for CF Instructions
- PC-relative
- Most commonly used for branches and jumps
- Position-independent code
- Target known at compile time
- Register indirect
- Used when target not known at compile
time(procedure returns, virtual functions and
function pointers, dynamically loaded libraries,
case/switch statements, etc.)
50Size of Branch Displacement
(comments?)
51Branch Conditions
(thoughts on s, what constructs they apply to,
etc.?)
52Call/Return Instructions
- Call
- Minimum save return addressto the stack (x86)
or in a register (MIPS) - Can create a stack frame, save registers, etc.
- Return
- Jumps to return address
- Can pop the stack frame, restore registers, etc.
- Simpler typically turns out to be better
- E.g. many functions do not need a stack frame
53MIPS Call/Return
- Call Jump-And-Link (JAL ltfunctiongt)
- Puts return address into R31,then jumps to
target address - Return Register-Indirect Jump (JR R31)
- Jumps to address in R31(no special RET
instruction) - Stack frame create/pop via ordinary add/sub
instrs(stack-pointer register is R29) - Register save/restore via ordinary load/store
instrs
54Procedure call essentials (1)Caller/Callee
Mechanics
who does what when?
foo() bar(int a)
int temp 3 bar(42)
... ...
return(temp a)
2. callee at entry
1. caller at call time
4. caller after return
3. callee at exit
55Procedure call essentials (2)MIPS Registers
56Procedure call essentials (3)Good Strategy
do most work at callee entry/exit
- Caller at call time
- put arguments in a0..a4
- save any caller-save temporaries
- jalr ..., ra
- Callee at entry
- allocate all stack space
- save ra s0..s3 if necessary
- Callee at exit
- restore ra s0..s3 if used
- deallocate all stack space
- put return value in v0
- Caller after return
- retrieve return value from v0
- restore any caller-save temporaries
most of the work
57Procedure call essentials (4)Summary
- Summary
- Caller saves registers (outside the agreed upon
convention i.e. ax) at point of call - Callee saves registers (per convention i.e. sx)
at point of entry - Callee restores saved registers, and re-adjusts
stack before return - Caller restores saved registers, and re-adjusts
stack before resuming from the call - Big ?
- Is this clear? I can work through an example if
needed
58Instruction Encoding
- Instruction must specify
- What is supposed to be done (opcode)
- What are the operands
- Three popular formats
- Variable format (VAX, x86)
- Opcode specifies how many operands, operands are
listed after opcode - Each operand has an address specifier and an
address field - Address specifier describes addressing mode for
that operand - Fixed format (RISC)
- All instructions of the same size
- Opcode specifies addressing mode for load/store
operations - All other operations use register operands
- Hybrid Format (IBM 360, some DSP processors)
- Several (but few) fixed size instruction formats
- Some formats have address specifier fields
59How is the operation specified?
- Typically in a bit field called the opcode
- Also must encode addressing modes, etc.
- Some options
Variable
.
Operation of operands
Address Specifier 1
Address Field 1
Address Specifier n
Address Field n
Operation
Address Field 1
Address Field 2
Address Field 3
Fixed
Operation
Address Specifier
Address Field
Operation
Address Specifier 1
Address Specifier 2
Address Field
Hybrid
Operation of operands
Address Specifier
Address Field 1
Address Field 2
60Some random comments
- Variable addressing mode allows virtually all
addressing modes with all operations - Best when many addressing modes operations
- Fixed addressing mode combines operation
addressing mode into opcode - Best when few addressing modes and operations
- Good for RISC
- Hybrid approach is 3rd alternative
- Usually need a separate address specifier per
operand - When encoding instructions, of registers and
addressing modes can affect instruction size
61Instruction Encoding Tradeoffs
- Decoding vs. Programming
- Fixed formats easy to decode
- Variable formats easier to program (assembler)
- But we mostly use compilers now
- Number of registers
- More registers help optimization (a lot)
- Operand fields smaller with few registers
- In general, we want many (e.g. 32) registers,but
do we want even more is still an issue
62Helping the Compiler Writers
- Regularity and Orthogonality
- General-Purpose Registers
- If an operation works with one data type,is
should work with all supported data types - If an operation works with one addr. mode
- Primitives, not solutions
- E.g. JAL vs. elaborate function call instruction
- Simplify tradeoffs
- Bind constants at compile time
63Todays compilers work like this
Dependencies
Function
Pass
Front-end per language
Transform language to common, intermediate form
- Language dependent
- Machine independent
Intermediate representation
- Somewhat language dependent
- Largely machine independent
For example, procedure inlining and loop
transformations
High-level optimizations
- Small language dependencies
- Machine dependencies slight
- (I.e. register counts/types)
Including global and local optimization
register allocation
Global optimizer
Detailed instruction selection and
machine-dependent optimizations (assembler next?)
- Highly machine dependent
- Language independent
Code generator
64How the architect can help the compiler writer
- Keep in mind
- Most programs are locally simple!
- Simple translations work just fine
- Complexity arises b/c program require lots of
instructions and they must interact globally - Also b/c of the whole multiple pass thing
- The compiler writers corollary/rule/manifesto
- Make the frequent cases fast and the rare cases
correct!
65Reading Assignment
- Read Section 2.12 (MIPS Architecture)
- Especially Figure 2.27
- Not required, but good to read anyway ?
- Section 2.14 (Fallacies and Pitfalls)
- Section 2.16 (Historical Perspective)
66The DLX mProcessor
- A generic mP that well use from time-to-time
- Very similar to a MIPS machine
- Compiled by taking the average of a of recent
experimental and commercial machines - Has 32 general purpose registers (R0, R1, R31)
and floating point registers - Data types include
- 8-bit bytes
- 16-bit half words
- 32-bit words for integer data words
- 32 64-bit double precision words
67DLX addressing modes
- The only data addressing modes are immediate and
displacement - Possible to implement register deferred and
absolute - DLX memory is byte addressable in the Big Endian
mode with a 32-bit address - DLX uses a load/store architecture so
- All memory references are through loads or stores
between memory and either GPRs and FPRs
Add R1, (1001) R1 ? R1 M(1001)
Add R4, (R1) R4 ? R4 Mem(R1)
68DLX instruction format
DLX has 2 addressing modes which are encoded in
the opcode
I-type instruction
6
5
5
16
Opcode
rs1
rd
Immediate
- Encodes Loads and Stores of bytes, words, half
words - All immediates (rd ? rs op immediate)
- Conditional branch instructions (rs1 is
register, rd is unused) - Jump register, jump and link register (rd 0,
rs1 destination, immediate 0)
R-type instruction
6
5
5
5
11
- Register-register ALU operations rd ? rs1 func
rs2 - Function encodes the data path operation Add,
Sub, - Read/write special registers and moves
69DLX instruction format
J-type instruction
6
26
Opcode
Offset added to PC
Jump and jump and link Trap and return from
exception
70An example MIPS machine
71Memory Addressing
72Displacement Address Size
12 - 16 bits of displacement needed
73Addressing Objects Endianess and Alignment
- Big Endian address of most significant bit
- IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
- Little Endian address of least significant bit
- Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
little endian byte 0
3 2 1 0
msb
lsb
0 1 2 3
0 1 2 3
Aligned
big endian byte 0
Alignment require that objects fall on address
that is multiple of their size.
Not Aligned
74Addressing Objects
- Big Endian address of most significant
msb
lsb
3 2 1 0
little endian word 0
big endian word 0
0 1 2 3
Alignment require that objects fall on address
that is multiple of their
size. (e.g., 2-bye word should be at 0,2,4,..)
75Byte Swap Problem
A
3
D
3
B
C
2
2
C
B
1
1
increasing byte address
D
0
A
0
Little Endian
Big Endian
When words are transferred
76Typical Memory Addressing Modes (Again!!)
Addressing mode
Example
Meaning
Register indirect
Add R4,(R1)
R4
R4MemR1
Indexed
Add R3,(R1R2)
R3
R3MemR1R2
Direct or absolute
Add R1,(1001)
R1
R1Mem1001
Memory indirect
Add R1,_at_(R3)
R1
R1MemMemR3
Auto-increment
Add R1,(R2)
R1
R1MemR2 R2
R2d
Auto-decrement
Add R1,(R2)
R2
R2d R1
R1MemR2
Scaled
Add R1,100(R2)R3
R1
R1Mem100R2R3d
77Addressing Modes Usage Example
For 3 programs running on VAX ignoring direct
register mode
Displacement 42 avg, 32 to 55 Immediate
33 avg, 17 to 43 Register
deferred (indirect) 13 avg, 3 to 24 Scaled
7 avg, 0 to 16 Memory indirect 3 avg,
1 to 6 Misc 2 avg, 0 to 3 75
displacement immediate 88 displacement,
immediate register indirect. Observation In
addition Register direct, Displacement,
Immediate, Register Indirect addressing modes are
important.
75
88
CISC to RISC observation (fewer addressing modes
simplify CPU design)
78Immediate Size
50 to 60 fit within 8 bits 75 to 80 fit
within 16 bits (size of the immediate no used
in an instruction)
79Addressing Summary
- Data Addressing modes that are important
- Displacement, Immediate, Register Indirect
- Displacement size should be 12 to 16 bits
- Immediate size should be 8 to 16 bits
80Utilization of Memory Addressing Modes
Most Common
81Displacement Address Size Example
Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92
programs
1 of addresses gt 16-bits
12 - 16 bits of displacement needed
CISC to RISC observation
82Operation Types in The Instruction Set
- Operator Type
Examples - Arithmetic and logical Integer arithmetic
and logical operations add, or - Data transfer Loads-stores
(move on machines with memory -
addressing) - Control Branch,
jump, procedure call, and return, traps. - System Operating
system call, virtual memory -
management instructions - Floating point Floating point
operations add, multiply. - Decimal Decimal add,
decimal multiply, decimal to -
character conversion - String String
move, string compare, string search
Operator Type
Examples Arithmetic and logical Integer
arithmetic and logical operations add, or
Data transfer Loads-stores
(move on machines with memory
addressing)
Control Branch, jump,
procedure call, and return, traps. System
Operating system call,
virtual memory
management instructions
Floating point Floating point
operations add, multiply. Decimal
Decimal add, decimal multiply,
decimal to
character conversion String
String move, string
compare, string search Media
The same operation performed on
multiple data
(e.g Intel MMX, SSE)
83Instruction Usage ExampleTop 10 Intel X86
Instructions
Rank
Integer Average Percent total executed
1
2
3
4
5
6
7
8
9
10
Observation Simple instructions dominate
instruction usage frequency.
CISC to RISC observation
84Instruction Set Encoding
- Considerations affecting instruction set
encoding - To have as many registers and addressing modes as
possible. - The Impact of of the size of the register and
addressing mode fields on the average instruction
size and on the average program. - To encode instructions into lengths that will be
easy to handle in the implementation. On a
minimum to be a multiple of bytes. - Fixed length encoding Faster and easiest to
implement in hardware. - Variable length encoding Produces smaller
instructions. - Hybrid encoding.
CISC to RISC observation
85Three Examples of Instruction Set Encoding
Operations no of operands
Address specifier 1
Address field 1
Address specifier n
Address field n
Variable VAX (1-53 bytes)
Operation
Address field 1
Address field 2
Address field3
Fixed MIPS, PowerPC, SPARC (Each instruction is
4 bytes)
Operation
Address field
Address Specifier
Address Specifier 1
Address Specifier 2
Operation
Address field
Address Specifier
Address field 2
Operation
Address field 1
Hybrid IBM 360/370, Intel 80x86
86Complex Instruction Set Computer (CISC)
- Emphasizes doing more with each instruction
- Thus fewer instructions per program (more compact
code). - Motivated by the high cost of memory and hard
disk capacity when original CISC architectures
were proposed - When M6800 was introduced 16K RAM 500, 40M
hard disk 55, 000 - When MC68000 was introduced 64K RAM 200, 10M
HD 5,000 - Original CISC architectures evolved with faster
more complex CPU designs but backward instruction
set compatibility had to be maintained. - Wide variety of addressing modes
- 14 in MC68000, 25 in MC68020
- A number instruction modes for the location and
number of operands - The VAX has 0- through 3-address instructions.
- Variable-length instruction encoding.
87Example CISC ISA Motorola 680X0
- 18 addressing modes
- Data register direct.
- Address register direct.
- Immediate.
- Absolute short.
- Absolute long.
- Address register indirect.
- Address register indirect with postincrement.
- Address register indirect with predecrement.
- Address register indirect with displacement.
- Address register indirect with index (8-bit).
- Address register indirect with index (base).
- Memory inderect postindexed.
- Memory indirect preindexed.
- Program counter indirect with index (8-bit).
- Program counter indirect with index (base).
- Program counter indirect with displacement.
- Program counter memory indirect postindexed.
- Program counter memory indirect preindexed.
GPR ISA (Register-Memory)
- Operand size
- Range from 1 to 32 bits, 1, 2, 4, 8, 10, or 16
bytes. - Instruction Encoding
- Instructions are stored in 16-bit words.
- the smallest instruction is 2- bytes (one word).
- The longest instruction is 5 words (10 bytes) in
length.
88Example CISC ISAIntel IA-32, X86 (80386)
GPR ISA (Register-Memory)
- 12 addressing modes
- Register.
- Immediate.
- Direct.
- Base.
- Base Displacement.
- Index Displacement.
- Scaled Index Displacement.
- Based Index.
- Based Scaled Index.
- Based Index Displacement.
- Based Scaled Index Displacement.
- Relative.
- Operand sizes
- Can be 8, 16, 32, 48, 64, or 80 bits long.
- Also supports string operations.
- Instruction Encoding
- The smallest instruction is one byte.
- The longest instruction is 12 bytes long.
- The first bytes generally contain the opcode,
mode specifiers, and register fields. - The remainder bytes are for address displacement
and immediate data.
89Reduced Instruction Set Computer (RISC)
- Focuses on reducing the number and complexity of
instructions of the machine. - Reduced CPI. Goal At least one instruction per
clock cycle. - Designed with pipelining in mind.
- Fixed-length instruction encoding.
- Only load and store instructions access memory.
- Simplified addressing modes.
- Usually limited to immediate, register indirect,
register displacement, indexed. - Delayed loads and branches.
- Instruction pre-fetch and speculative execution.
- Examples MIPS, SPARC, PowerPC, Alpha
(CPI 1 or less)
(Thus more instructions executed than CISC)
90Example RISC ISA HP Precision Architecture, HP
PA-RISC
- Operand sizes
- Five operand sizes ranging in powers of two from
1 to 16 bytes. - Instruction Encoding
- Instruction set has 12 different formats.
-
- All are 32 bits in length.
- 7 addressing modes
- Register
- Immediate
- Base with displacement
- Base with scaled index and displacement
- Predecrement
- Postincrement
- PC-relative
91Example RISC ISA DEC/Compaq/Intel? Alpha AXP
- Operand sizes
- Four operand sizes 1, 2, 4 or 8 bytes.
- Instruction Encoding
- Instruction set has 7 different formats.
-
- All are 32 bits in length.
- 4 addressing modes
- Register direct.
- Immediate.
- Register indirect with displacement.
- PC-relative.
92RISC ISA Example MIPS R3000 (32-bits)
- 5 Addressing Modes
- Register direct (arithmetic).
- Immedate (arithmetic).
- Base register immediate offset (loads and
stores). - PC relative (branches).
- Pseudodirect (jumps)
- Operand Sizes
- Memory accesses in any multiple between 1 and 4
bytes.
- Instruction Categories
- Load/Store.
- Computational.
- Jump and Branch.
- Floating Point
- (using coprocessor).
- Memory Management.
- Special.
93A RISC ISA Example MIPS
94An Instruction Set Example MIPS64
- A RISC-type 64-bit instruction set architecture
based on instruction set design considerations of
chapter 2 - Use general-purpose registers with a load/store
architecture to access memory. - Reduced number of addressing modes displacement
(offset size of 16 bits), immediate (16 bits). - Data sizes 8 (byte), 16 (half word) , 32 (word),
64 (double word) bit integers and 32-bit or
64-bit IEEE 754 floating-point numbers. - Use fixed instruction encoding (32 bits) for
performance. - 32, 64-bit general-purpose integer registers
GPRs, R0, ., R31. R0 always has a value of
zero. - Separate 32, 64-bit floating point registers
FPRs F0, F1 F31 When holding a 32-bit
single-precision number the upper half of the FPR
is not used.
95MIPS64 Instruction Format (1/2)
I - type instruction
Encodes Loads and stores of bytes, words, half
words. All immediates (rt rs op immediate)
Conditional branch instructions Jump register,
jump and link register ( rs destination,
immediate 0)
R - type instruction
6
5
5
5
5
6
shamt
Opcode
rs
rt
rd
func
Register-register ALU operations rd rs func
rt Function encodes the data path operation
Add, Sub .. Read/write special registers and
moves.
96MIPS64 Instruction Format (2/2)
J - Type instruction
Jump and jump and link. Trap and return from
exception
97MIPS Addressing Modes/Instruction Formats
- All instructions 32 bits wide
R-Type
ALU
(loads/stores)
Branches
Pseudodirect Addressing for jumps (J-Type) not
shown here
98MIPS64 Instructions Load and Store
- LD R1,30(R2) Load double word RegsR1
64 Mem30RegsR2 - LW R1, 60(R2) Load word
RegsR1 64 (Mem60RegsR20)32
-
Mem60RegsR2 - LB R1, 40(R3) Load byte
RegsR1 64 (Mem40RegsR30)56 -
Mem40RegsR3 - LBU R1, 40(R3) Load byte unsigned RegsR1
64 056 Mem40RegsR3 - LH R1, 40(R3) Load half word RegsR1
64 (Mem40RegsR30)48 -
Mem40 RegsR3
Mem 41RegsR3 - L.S F0, 50(R3) Load FP single RegsF0
64 Mem50RegsR3 032 - L.D F0, 50(R2) Load FP double
RegsF0 64 Mem50RegsR2 - SD R3,500(R4) Store double word Mem
500RegsR4 64 RegR3 - SW R3,500(R4) Store word
Mem 500RegsR4 32 RegR3 - S.S F0, 40(R3) Store FP single
Mem 40, RegsR3 32 RegsF0 031 - S.D F0,40(R3) Store FP double
Mem40RegsR3 -64 RegsF0 - SH R3, 502(R2) Store half
Mem502RegsR2 16 RegsR34863 - SB R2, 41(R3) Store byte
Mem41 RegsR3 8 RegsR2 5663
99MIPS64 Instructions Arithmetic/Logical
- DADDU R1, R2, R3 Add unsigned RegsR1
RegsR2 RegsR3 - DADDI R1, R2, 3 Add immediate
RegsR1 RegsR2 3 - LUI R1, 42 Load upper immediate
RegsR1 032 42 016 - DSLL R1, R2, 5 Shift left logical
RegsR1 Regs R2 ltlt5 - DSLT R1, R2, R3 Set less than
if (regsR2 lt RegsR3 ) -
Regs R1 1 else RegsR1
0
100MIPS64 Instructions Control-Flow
- J name Jump
PC 36..63 name - JAL name Jump and link
Regs31 PC4 PC 36..63 name -
((PC4)-
227) name lt ((PC 4) 227) - JALR R2 Jump and link register
RegsR31 PC4 PC RegsR2 - JR R3 Jump register
PC RegsR3 - BEQZ R4, name Branch equal zero
if (RegsR4 0) PC name -
((PC4) -217)
name lt ((PC4) 217 - BNEZ R4, Name Branch not equal zero
if (RegsR4 ! 0) PC name -
((PC4) - 217)
name lt ((PC 4) 217 - MOVZ R1,R2,R3 Conditional move if zero
-
if (RegsR3 0)
RegsR1 RegsR2 -
101The Role of Compilers
- The Structure of Recent Compilers
Dependencies Language dependent machine
dependent
Function Transform Language to
Common intermediate form
Somewhat Language dependent largely machine
independent
For example procedure inlining and loop
transformations
Small language dependencies machine dependencies
slight (e.g. register counts/types)
Include global and local optimizations
register allocation
Detailed instruction selection and
machine-dependent optimizations may include or
be followed by assembler
Highly machine dependent language independent
102The Role of Compilers
103Compiler Optimization andInstruction Count
Change in instruction count for the programs
lucas and mcf from SPEC2000 as compiler
optimizations vary.
104Typical Operations
Load (from memory) Store (to memory) memory-to-mem
ory move register-to-register move input (from
I/O device) output (to I/O device) push, pop
(to/from stack)
Data Movement
Arithmetic
integer (binary decimal) or FP Add, Subtract,
Multiply, Divide
not, and, or, set, clear
Logical
shift left/right, rotate left/right
Shift
Control (Jump/Branch)
unconditional, conditional
Subroutine Linkage
call, return
Interrupt
trap, return
Synchronization
test set (atomic r-m-w)
String
search, translate (e.g., char to int)
105Top 10 80x86 Instructions
106Methods of Testing Condition
- Condition Codes
- Processor status bits are set as a side-effect
of arithmetic instructions (possibly on Moves) or
explicitly by compare or test instructions. - ex add r1, r2, r3
- bz label
- Condition Register
- Ex cmp r1, r2, r3
- bgt r1, label
- Compare and Branch
- Ex bgt r1, r2, label
107Condition Codes
Setting CC as side effect can reduce the of
instructions X . . .
SUB r0, 1, r0 BRP X
X . . . SUB r0,
1, r0 CMP r0, 0 BRP X
vs.
But also has disadvantages --- not all
instructions set the condition codes which
do and which do not often confusing! e.g.,
shift instruction sets the carry bit ---
dependency between the instruction that sets the
CC and the one that tests it to overlap
their execution, may need to separate them
with an instruction that does not change the CC
write
ifetch
read
compute
New CC computed
Old CC read
write
ifetch
read
compute
108Branches
--- Conditional control transfers
Four basic conditions N -- negative
Z -- zero
V -- overflow C -- carry
Sixteen combinations of the basic four conditions
Always Never Not Equal Equal Greater Less or
Equal Greater or Equal Less Greater Unsigned Less
or Equal Unsigned Carry Clear Carry
Set Positive Negative Overflow Clear Overflow Set
Unconditional NOP Z Z Z (N V) Z (N
V) (N V) N V (C Z) C Z C C N N V V
109Conditional Branch Distance
Distance from branch in instructions 2i gt Å
2i-1 gt 2i-2 25 of integer branches are gt 2
to 4
110Conditional Branch Addressing
- PC-relative since most branches At least 8 bits
suggested ( 128 instructions) - Compare Equal/Not Equal most important for
integer programs (86)
111Operation Summary
- Support these simple instructions, since they
will dominate the number of instructions
executed - load,
- store,
- add,
- subtract,
- move register-register,
- and,
- shift,
- compare equal, compare not equal,
- branch (with a PC-relative address at least
8-bits long), jump, - call,
- return
112Data Types
Bit 0, 1 Bit String sequence of bits of a
particular length 4 bits is a nibble
8 bits is a byte 16 bits is a half-word
(VAX word) 32 bits is a word (VAX long
word) Character ASCII 7 bit code
EBCDIC 8 bit code Decimal digits 0-9
encoded as 0000b thru 1001b two decimal
digits packed per 8 bit byte Integers
Sign Magnitude 0X vs. 1X 1's
Complement 0X vs. 1(X) 2's
Complement 0X vs. (1's comp) 1 Floating
Point Single Precision Double
Precision Extended Precision
Positive 's same in all First 2 have two
zeros Last one usually chosen
exponent
How many /- 's? Where is decimal pt? How are
/- exponents represented?
E
M x R
base
mantissa
113Operand Size Usage
- Support these data sizes and types 8-bit,
16-bit, 32-bit integers and 32-bit and 64-bit
IEEE 754 floating point numbers
114Instruction Format
- If have many memory operands per instructions
and many addressing modes, need an Address
Specifier per operand - If have load-store machine with 1 address per
instr. and one or two addressing modes, then just
encode addressing mode in the opcode
115Generic Examples of Instruction Formats
Variable Fixed Hybrid
116Summary of Instruction Formats
- If code size is most important, use variable
length instructions - If performance is most important, use fixed
length instructions
117Lecture Summary ISA