Title: Topic II Instruction-Set Architecture
1Topic IIInstruction-Set Architecture
- Introduction
- A Case Study The MIPS Instruction-Set
Architecture
2Reading List
- Slides Topic2x
- Henn Patt Chapter 2
- Other papers as assigned in class or homeworks
3The Stored Memory Computer
- Five parts of a computer
- Datapath (channels/changes bits)
- Control (directs operations)
- Memory (places to keep bits)
- Input (get data from outside)
- Output (send data to outside
4Steps in Executing an Instruction
- Instruction Fetch Fetch the next instruction
from memory - Instruction Decode Examine instruction to
determine - What operation is performed by the instruction
(e.g., addition) - What operands are required, and where the result
goes - Operand Fetch Fetch the operands
- Execution Perform the operation on the operands
- Result Writeback Write the result to the
specified location - Next Instruction Determine where to get next
instruction
5What is Specified in an ISA?
- Instruction Decode How are operations and
operands specified? - Operand Fetch Where can operands be located? How
many? - Execution What operations can be performed? What
data types and sizes? - Result Writeback Where can results be written?
How many? - Next Instruction How can we choose the next
instruction?
6A Simple ISA Memory-Memory
- What operation can be performed? Basic arithmetic
(for now) - What data types and sizes? 32-bit integers
- Where can operands and results be located? Memory
- How many operands and results ? 2 operands, 1
result - How are operations and operands specified?
- OP DEST, SRC1, SRC2
- How can we choose the next instruction? Next in
sequence
7Memory Model
- Think of memory as being a large array of n
integers, referenced by the index (random Access
Memory, or RAM)
For instance, M1 contains the value 3. We can
read and write these locations. These are the
only locations available to us. All abstract
locations (such as variables in a C program) must
be assigned locations in M.
Address Contents
0
14
1
3
2
99
. . .
. . .
N - 1
0
8Simple Code Translation
- Given the C code
- A B C
- Assuming that we could decide that variable A
uses location 100, B uses 48, and C uses 76.
Convert the code above to the following
assembly code - ADD M100, M48, M76
- How would we express
- A (B C) (D E)
9Using a Temporary Location
- Assume we put A in 100, B in 48, C in 76, D in
20, and E in 32. - Now choose an unused memory location (e.g., 84).
-
- ADD M100, M48, M76 A B C
- ADD M84, M20, M32 temp D E
- MUL M100, M100, M84 A A temp
-
10Problems with Memory-Memory ISAs
- Main memory much slower than arithmetic circuits
- This was as true in 1950 as in today!
- It takes a lot of room to specify memory
addresses - Results are often used one or two instructions
later - Remember make the common case fast!
- Solution store temporary or intermediate results
in fast memories near the arithmetic units.
11Accumulator Machines
- An accumulator machine keeps a single
high-speed buffer (e.g., a set of D latches or
flip-flops, one for each data bit) near the
arithmetic logic. - In the simplest kind, only one operand can be
specified the accumulator is implicit OP
operand means - acc. acc. OP operand
-
- Example
- LOAD M48 Load B into acc.
- ADD M76 Add C to acc. (now has BC)
- STORE M100 Write acc. To A
12Accumulator Machines Does A(BC)(DE)
- LOAD M20 Load D into acc.
- ADD M32 Add E to acc. (now has DE)
- STORE M100 Write acc. To A
- LOAD M48 Load B into acc.
- ADD M76 Add C to acc. (now has BC)
- MUL M100 Multiply A to acc.
- STORE M100 Write (BC) (DE) to A
13Shortcomings of Accumulator Machines
- Still requires storing lots of temporary and
intermediate values in memory - Accumulator is only really beneficial for a chain
(sequence) of calculations where the result of
one is the input to the next.
14Still, Accumulator Machines Were Common in Early
Computers
- A simple design, and hence popular, especially
for - Early computers
- Early microprocessors (4004, 8008)
- Low-end (cheap) models
- Reason accumulator logic much more expensive
than memory - Vacuum tubes vs. core memory
- D flip-flops vs. DRAM
- Precious space on processor chip vs. off-chip DRAM
15Alternatives to Accumulator Machines
- If more hardware resources are available, put
more fast storage locations alongside the
accumulator - Stack machines
- Register machines
- Special purpose
- General purpose
16Stack Machines
- Idea A pile of fast storage locations with a top
and a bottom.
An instruction can only get at the top value, or
may be the top two or three values. We can put
new values on the top (push) or take them off
the top (pop) but thats it. We cant get to
locations underneath the top unless we remove
everything above.
Address Contents
top
14
2nd from top
3
3rd from top
99
. . .
. . .
bottom
0
17Stack Machine ISA
- Basic operations include
- Load get value from memory and push onto stack
- Store pop value off of stack and put into memory
- Arithmetic pop 1 or 2 values off of stack push
result on stack - Dup Get value at top of stack without removing
push new copy onto stack (why is this useful?)
18Stack Machine Does A(BC)(DE)
(stack top at start)
(DE)
ADD
XXX
(D)
LOAD M20
XXX
(B)
(DE)
LOAD M48
XXX
(E)
(D)
(continued next slide)
LOAD M32
XXX
19Stack Machine (cont.)
((BC)(DE))
(B)
XXX
MULT
(DE)
LOAD M76
XXX
STORE M100
(BC)
(DE)
ADD
XXX
Note that the stack is now the same as when we
began.
20Stack Machines Used
- Some early computers
- 8086 floating point unit (sort of)
- Java Virtual Machine (JVM)
21Register Machines
- Idea Put more storage locations (registers)
near the accumulator - Regs have names/numbers and can be used instead
of memory - Accessed much faster than main memory
- (1-2 CPU cycles vs. 10s to 100 cycles)
- Far fewer registers than memory locations
- MIPS has 32 32-bit registers
- Fewer regs, smaller addresses, fewer bits to name
them - A scarce resource use them carefully!
22Special- vs. General-Purpose Registers
- A special-purpose register is used for specific
purposes and there may be limitations on which
operations can use it - Easier on the HW design put the reg right where
its needed - More difficult for the compiler to use
effectively - A general-purpose register can be used in any
operation - - Datapaths more general, but routing is more
difficult
23Special-Purpose Registers The Z-80 CPU
- Seven 8-bit registers A, B, C, D, E, H, L (BC,
DE, HL can be pairs) - Three 16-bit registers SP, IX, IY, plus PC
(Program counter) - Add, subtract, shift can only be done to A (8-bit
accumulator) - Increment and decrement can be done to all regs
and reg pairs - Can fetch from memory at address (HL) and put in
any 8-bit reg - A fetch from address (BC) or(DE) can only go to A
- Fetches from (BC), (HL) and (IX) take different
numbers of cycles - Anyone want to write a compiler for this?
24General Purpose Register (GPR) Machines
- The MIPS (and similar processors) has 32 General
Purpose Registers (GPRs), each 32 bits long. All
can be read or written, except register 0,
whichis always 0 and cant be changed. - Register access time is uniform.
Address Contents
0
0
1
3
2
99
. . .
. . .
31
14
25GPR Machine Does A(BC)(DE)
- ADD 1 M48, M76 R1 B C
- ADD 2 M20, M32 R2 D E
- MUL M100, 1, 2 A R1 R2
26Some Trend
- From hardware technology number of Rs can be
put on chip has potential grow very fast (Moores
Law ?) - Very large register set will have slow access
time. - Instruction set evolution is slow to accommodate
the change of of Rs
27Memory and Data Sizes
- So far, weve only talked about uniform data
sizes. Actual data come in many different sizes - Single bits (boolean values, true or false)
- Bytes (8 bits) Characters (ASCII), very small
integers - Halfwords (16 bits) Characters (Unicode), short
integers - Words (32 bits) Long integers, floating-point
(FP) numbers - Double-words (64 bits) Very long integers,
double-precision FP - Quad-words (128 bits) Quad-precision
floating-point numbers
NOTE There is another data size which is called
extended double precision which is 80 bits long.
Used in x86 FPUs
28Different Data Sizes
- How do we handle different data sizes?
- Pick one size to be the unit stored in a single
address - Store larger datum in a set of contiguous memory
locations - Store smaller datum in one location use shift
mask ops - Today, almost all machines (including MIPS) are
byte-addressable each addressable location in
memory holds 8 bits.
29MIPS Memory
- On a byte-addressable machine such as the MIPS,
if we say a word (32 bits) is stored at address
80, we mean it occupies locations 80-83. (The
next word would start at 84.) - Normally, multi-byte loads and stores must be
aligned. The address of an n-byte load/store
must be a multiple of n. For instance, halfwords
can only be stored at even addresses. - MIPS allow non-aligned loads and stores using
special instructions, but they may be slower.
(Most processors dont allow this at all!)
30Byte-Order (Endianness)
- For a multi-byte datum, which part goes in which
byte? - If 1 contains 1,000,000 (F4240H) and we store it
into address 80 - On a big-endian machine, the big end goes
into address 80 - On a little-endian machine, its the other way
around
00 0F 42 40
79 80 81 82 83 84
40 42 0F 00
79 80 81 82 83 84
31Big-Endian vs. Little-Endian
- Big-endian machines MIPS, Sparc, 68000
- Little-endian machines most Intel processors,
Alpha, VAX, Intel 8086 - No real reason one is better than the other
- Compatibility problems transferring multi-byte
data between big-endian and little-endian
machines CAREFUL! - Read Appendix A-43 for more information.
32Addressing Modes
- - An ISAs addressing modes answer the question
where can operands be located? - We have two types of storage in the MIPS (and
most other machines) registers and main memory. - We can go to either or both for operands. A
single operand can come from either a register or
a memory location - and addressing modes offer various ways of
specifying this location.
33Simple Addressing Modes
- In these modes, a location or datum is given
directly in the instruction
Mode name Example Meaning
Register mov 1, 2 R2 R1
Direct (or absolute) mov 1, (40) M40 R1
Immediate mov 1, 40 40 R1
34Indirect Addressing Modes
- One or more registers are used to produce a
memory address
Mode name Example Meaning
Reg. Indirect mov 1, (2) MR2 R1
Displacement mov 1, 40(2) M40R2 R1
Indexed mov 1, 4(2) MR4R2 R1
Mem. Indirect mov 1, _at_(2) MMR2 R1
35Advanced Addressing Modes
- Extra features to support features in high-level
languages or reduce the number of instructions
during common memory accesses
Mode name Example Meaning
Auto-increment mov 1, 4(2) M4R2 R1
Auto-decrement mov 1, 4(2) - - MR2-4 R1
Scaled mov 1, 40(2) s M40R2xs R1
36Choices in Addressing Modes
- Anything goes Any addressing mode may be used
for any operand at any time - - Easier to map high-level statements directly
to instructions - - Hard to design processor, due to all the
complexity - Limited addressing Only allow a few modes,
and/or restrict some operands to certain modes - - Harder for compiler/programmer to follow all
the rules - - Code may be longer
37Frequency of Addressing Modes
- 3 programs measured on VAX, which supports all
kinds of modes
Frequency of mode () Min. ave. max.
Mode Name
Displacement 32 42 55
Immediate 17 33 43
Reg. Indirect 3 13 24
Scaled 0 7 16
Mem. Indirect 1 3 6
Others 0 2 3
38Empirical Data on Addressing Modes
- How big do the displacements need to be?
- In study of SPECin92 and SPECfp92, 99 of
displacements fell within 215 - How big do the immediates (constants) need to be?
- Studies show 50 - 60 fit within 8 bits
- 75-80 fit within 16 bits
Excercise search current results (e.g. for
SPEC2005 ?)
39How Do We Represent Instructions?
- We need some bits to tell what operation is
performed (e.g., add, sub, mul, etc.) this is
called the opcode. - We need some bits for each operand and result (3
total, in our case) - What type of addressing mode
- Number of the register, memory address and/or
immediate constant
40Variable-Length Instructions
- Since the VAX allows any mode for any operand,
there could be an instruction with three 32-bit
addresses (direct addressing) ? gt 12 bytes in
this instruction. - But registers need only a few bits to specify, so
12 bytes would be wasteful for an instruction
using 3 registers only! - Must use variable-length instructions. On the
VAX, instructions can vary from 1 to 17 bytes!
41Fixed-Length Instructions
- If every instruction has the same number of bits
(preferable a nice even number like 16 or 32),
many components of the processor will be simpler. - But we either waste some amounts of space or
cant support all the addressing modes!
42Loading Small Integers
- All registers in MIPS are 32 bits
- What if we load a byte or halfword into a reg?
- Load the bits into the lowest 8 or 16 bits of the
reg. - Unsigned load All upper bits set to 0
- Signed load All upper bits set to sign bit (MSB
of byte/halfword)
43The RISC Approach
- In a Reduced Instruction Set Computer
- All instructions are the same size (32 bits on
the MIPS) - Few addressing modes are supported (only the
frequent ones) - Only a few instruction formats (makes decoding
easier!) - Arithmetic instructions can only work on
registers - Data in memory must be loaded into registers
before processing - - This is called a load-store architecture
44RISC Criteria Colwell 85
- Single cycle operation
- Load/store machine
- Hardwired control
- Relative few instructions and addressing modes
- Fixed instruction format
- More compile time effort