Title: Instructions: Language of the Computer
1Instructions Language of the Computer
2Instruction Set Architecture a Critical
Interface
software
instruction set
hardware
Portion of the machine that is visible to the
programmer or the compiler writer.
3Good ISA
- Lasts through many implementations (portability,
compatibility) - Can be used for many different applications
(generality) - Provide convenient functionality to higher levels
- Permits an efficient implementation at lower
levels
4Von Neumann Machines
- Von Neumann invented stored program computer in
1945 - Instead of program code being hardwired, the
program code (instructions) is placed in memory
along with data
Control
ALU
Program Data
5Stored Program Concept
- Instructions are bits
- Programs are stored in memory to be read or
written just like data - Fetch Execute Cycle
- Instructions are fetched and put into a special
register - Bits in the register "control" the subsequent
actions - Fetch the next instruction and continue
memory for data, programs, compilers, editors,
etc.
6Execution Cycle
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor instruction
7Basic ISA Classes
- Memory to Memory Machines
- Every instruction contains a full memory address
for each operand - Maybe the simplest ISA design
- However memory is slow
- Memory is big (lots of address bits)
8Memory-to-memory machine
- Assumptions
- Two operands per operation
- first operand is also the destination
- Memory address 16 bits (2 bytes)
- Operand size 32 bits (4 bytes)
- Instruction code 8 bits (1 byte)
- Example A BC (hypothetical code)
- mov A, B A
- add A, C A
- 5 bytes for instruction
- 4 bytes for fetch 1st and 2nd operands
- 4 bytes to store results
- add needs 17 bytes and mov needs 13 byts
- Total 30 bytes memory traffic
9Why CPU Storage?
- A small amount of storage in the CPU
- To reduce memory traffic by keeping repeatedly
used operands in the CPU - Avoid re-referencing memory
- Avoid having to specify full memory address of
the operand - This is a perfect example of make the common
case fast. - Simplest Case
- A machine with 1 cell of CPU storage the
accumulator
10Accumulator Machine
- Assumptions
- Two operands per operation
- 1st operand in the accumulator
- 2nd operand in the memory
- accumulator is also the destination (except for
store) - Memory address 16 bits (2 bytes)
- Operand size 32 bits (4 bytes)
- Instruction code 8 bits (1 byte)
- Example A BC (hypothetical code)
- Load B acc
- Add C acc
- Store A A
- 3 bytes for instruction
- 4 bytes to load or store the second operand
- 7 bytes per instruction
- 21 bytes total memory traffic
11Stack Machines
- Instruction sets are based on a stack model of
execution. - Aimed for compact instruction encoding
- Most instructions manipulate top few data items
(mostly top 2) of a pushdown stack. - Top few items of the stack are kept in the CPU
- Ideal for evaluating expressions (stack holds
intermediate results) - Were thought to be a good match for high level
languages - Awkward
- Become very slow if stack grows beyond CPU local
storage - No simple way to get data from middle of stack
12Stack Machines
- Binary arithmetic and logic operations
- Operands top 2 items on stack
- Operands are removed from stack
- Result is placed on top of stack
- Unary arithmetic and logic operations
- Operand top item on the stack
- Operand is replaced by result of operation
- Data move operations
- Push place memory data on top of stack
- Pop move top of stack to memory
13General Purpose Register Machines
- With stack machines, only the top two elements of
the stack are directly available to instructions.
In general purpose register machines, the CPU
storage is organized as a set of registers which
are equally available to the instructions - Frequently used operands are placed in registers
(under program control) - Reduces instruction size
- Reduces memory traffic
14General Purpose Registers Dominate
1975-present all machines use general purpose
registers
Advantages of registers
registers are faster than memory
registers are easier for a compiler to use
-
e.g., (AB) (CD) (EF) can do multiplies in
any order
vs. stack
registers can hold variables
-
memory traffic is reduced, so program is sped up
(since registers are faster than memory)
-
code density improves (since register named with
fewer bits
than memory location)
15Classifying General Purpose Register Machines
- General purpose register machines are
sub-classified based on whether or not memory
operands can be used by typical ALU instructions - Register-memory machines machines where some ALU
instructions can specify at least one memory
operand and one register operand - Load-store machines the only instructions that
can access memory are the load and the store
instructions
16Comparing number of instructions
- Code sequence for A BC for five classes of
instruction sets
Register (Register-memory) load R1 B add R1
C store A R1
Register (Load-store) Load R1 B Load R2 C Add R1
R1 R2 Store A R1
Stack push B push C add pop A
Memory to Memory mov A B add A C
Accumulator load B add C store A
MIPS is one of these
17Instruction Set Definition
- Objects architecture entities machine state
- Registers
- General purpose
- Special purpose (e.g. program counter, condition
code, stack pointer) - Memory locations
- Linear address space 0, 1, 2, ,2s -1
- Operations instruction types
- Data operation
- Arithmetic
- Logical
- Data transfer
- Move (from register to register)
- Load (from memory location to register)
- Store (from register to memory location)
- Instruction sequencing
- Branch (conditional)
- Jump (unconditional)
18Registers (MIPS)
- 32 registers provided (but not 32-useable
registers!) - R0 .. R31
- Register R0 is hard-wired to zero
- Register R1 is reserved for assembler
- Arithmetic instructions operands must be
registers
19MIPS Software conventions for Registers
0 zero constant 0 1 at reserved for
assembler 2 v0 expression evaluation
3 v1 function results 4 a0 arguments 5 a1 6 a2 7
a3 8 t0 temporary caller saves . . . (callee
can clobber) 15 t7
16 s0 callee saves . . . (callee must
save) 23 s7 24 t8 temporary (contd) 25 t9 26 k0
reserved for OS kernel 27 k1 28 gp Pointer to
global area 29 sp Stack pointer 30 fp frame
pointer 31 ra Return Address (HW)
20Memory Organization
- Viewed as a large, single-dimension array, with
an address. - A memory address is an index into the array
- "Byte addressing" means that the index points to
a byte of memory.
0
8 bits of data
1
8 bits of data
2
8 bits of data
3
8 bits of data
4
8 bits of data
5
8 bits of data
6
8 bits of data
...
21Memory Addressing
- Bytes are nice, but most data items use larger
"words" - For MIPS, a word is 32 bits or 4 bytes.
- 2 questions for design of ISA
- Since one could read a 32-bit word as four loads
of bytes from sequential byte addresses or as one
load word from a single byte address, - How do byte addresses map to word addresses?
- Can a word be placed on any byte boundary?
22Addressing Objects Endianess and Alignment
- Big Endian address of most significant byte
word address (xx00 Big End of word) - IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
- Little Endian address of least significant byte
word address (xx00 Little End of word) - Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
little endian byte 0
3 2 1 0
msb
lsb
0 1 2 3
0 1 2 3
Aligned
big endian byte 0
Not Aligned
Alignment require that objects fall on address
that is multiple of their size.
23Assembly Language vs. Machine Language
- Assembly provides convenient symbolic
representation - much easier than writing down numbers
- e.g., destination first
- Machine language is the underlying reality
- e.g., destination is no longer first
- Assembly can provide 'pseudoinstructions'
- e.g., move t0, t1 exists only in Assembly
- would be implemented using add t0,t1,zero
- When considering performance you should count
real instructions
24MIPS arithmetic
- Design Principle simplicity favors regularity.
Why? - Of course this complicates some things... C
code A B C D E F - A MIPS
code add t0, s1, s2 add s0, t0,
s3 sub s4, s5, s0 - Operands must be registers, only 32 registers
provided - Design Principle smaller is faster. Why?
25MIPS arithmetic
- All ALU instructions have 3 operands
- add R1, R2, R3
- sub R1, R2, R3
- Operand order is fixed (destination
first)Example C code A B
C MIPS code add s0, s1, s2 (registers
associated with variables by compiler)
26Execution assembly instructions
- Program counter holds the instruction address
- CPU fetches instruction from memory and puts it
onto the instruction register - Control logic decodes the instruction and tells
the register file, ALU and other registers what
to do - An ALU operation (e.g. add) data flows from
register file, through ALU and back to register
file
27ALU Execution Example
28ALU Execution example
29Memory Instructions
- Load and store instructions
- lw t1, offset(t0)
- sw t1, offset(t0)
- Example C code A8 h A8 assume h in
s2 and base address of the array A in s3 - MIPS code lw t0, 32(s3) add t0, s2,
t0 sw t0, 32(s3) - Store word has destination last
- Remember arithmetic operands are registers, not
memory!
30Accessing Data
- ALU generates address
- Address goes to Memory address register
- When memory is accessed, results are returned to
Memory data register - Notice that data and instruction addresses can be
the same both just address memory
31Memory Operations - Loads
- Load data from memory
- lw R6, 0(R5) R6
32Memory Operations - Stores
- Storing data to memory works essentially the same
way - sw R6 , 0(R5)
- R6 200 lets assume R5 0x18
- mem0x18
33So far weve learned
- MIPS loading words but addressing bytes
arithmetic on registers only - Instruction Meaningadd s1, s2, s3 s1
s2 s3sub s1, s2, s3 s1 s2 s3lw
s1, 100(s2) s1 Memorys2100 sw s1,
100(s2) Memorys2100 s1
34Use of Registers
- Example
- a ( b c) - ( d e) // C statement
- s0 - s4 a - e
- add t0, s1, s2
- add t1, s3, s4
- sub s0, t0, t1
- a b A4 // add an array element to a var
- // s3 has address A
- lw t0, 16(s3)
- add s1, s2, t0
35Use of Registers load and store
- Example
- A8 a A6 // A is in s3, a is in s2
- lw t0, 24(s3) t0 gets A6 contents
- add t0, s2, t0 t0 gets the sum
- sw t0, 32(s3) sum is put in A8
-
36load and store
- Ex
- a b Ai // A is in s3, a,b, i in //
s1, s2, s4 - add t1, s4, s4 t1 2 i
- add t1, t1, t1 t1 4 i
- add t1, t1, s3 t1 addr. of Ai
- (s3(4i))
- lw t0, 0(t1) t0 Ai
- add s1, s2, t0 a b Ai
37Example Swap
- Swapping words
- s2 has the base address of the array v
temp v0 v0 v1 v1 temp
swap lw t0, 0(s2) lw t1, 4(s2) sw t0,
4(s2) sw t1, 0(s2)
38Machine Language
- Instructions, like registers and words of data,
are also 32 bits long - Example add t0, s1, s2
- registers have numbers, t08, s117, s218
- Instruction Format 000000 10001 10010 01000
00000 100000 op rs rt rd shamt funct - Can you guess what the field names stand for?
39Arithmetic Operation
- op operation of the instruction
- rs first register source operand
- rt second register source operand
- rd register destination operand
- shamt shift amount
- funct function (select type of ALU operation)
- add 32
- sub 34
40Machine Language
- Consider the load-word and store-word
instructions, - What would the regularity principle have us do?
- New principle Good design demands a compromise
- Introduce a new type of instruction format
- I-type for data transfer instructions
- other format was R-type for register
- Example lw t0, 32(s2) 35 18 8
32 op rs rt 16 bit number - Where's the compromise?
41Generic Examples of Instruction Format Widths
Variable Fixed
42Instruction Formats
- If code size is most important, use variable
length instructions - If performance is most important, use fixed
length instructions - Recent embedded machines (ARM, MIPS) added
optional mode to execute subset of 16-bit wide
instructions (Thumb, MIPS16) per procedure
decide performance or density - Some architectures actually exploring on-the-fly
decompression for more density.
43Example
- C code A 300 h A 300
- Assembler code
- lw t0, 1200(s3)
- add t0, s2, t0
- sw t0, 1200(s3)
- Binary code (decimal notation)
9
9
44Example
- Real binary code
- The highlighted number shows the difference of
only 1 bit in the op codes.
45Constants
- Small constants are used quite frequently (50 of
operands) e.g., A A 5 B B 1 C
C - 18 - Solutions? Why not?
- put 'typical constants' in memory and load them?
- create hard-wired registers (like zero) for
constants like one? - MIPS Instructions addi s0, s0, 4 andi
s0, s0, 6 ori s0, s0, 4 - Make the common case fast
46Loading Immediate Values
- How do we put a constant (immediate) value into a
register - addi R6, R0, 100
- Put the value 100 into register R6 R6 0100 100
47Loading Immediate Values
op rs rt rd shamt funct
R I
op rs rt 16 bit address
- What should be the format of addi?
- addi is in I format
- Whats the largest immediate value that can be
loaded into a register? - But, how do we load larger numbers?
48Load Upper Immediate
Transfers the immediate field into the registers
top 16 bits and fills the registers lower 16
bits with zeros R83116 bits of R8
49How about larger constants?
- We'd like to be able to load a 32 bit constant
into a register - Must use two instructions, new "load upper
immediate" instruction lui t0,
1010101010101010 - Then must get the lower order bits right,
i.e., addi t0, t0, 0010101010101010 - ori t0, t0, 0010101010101010
1010101010101010
0000000000000000
0000000000000000
0010101010101010
addi
50Logical Operations Shifting Bits
- Shift left or right with instructions sll and
srl. - sll t2, s0, 2 t2 s0
- srl t2, s0, 2 t2 s0 2
- Fill with zeros for shift operations
- Example sll t0, s2, 3
- S2 0110 0000 0000 0000 1100 1000 0000 1111
- t0 0000 0000 0000 0110 0100 0000 0111 1000
- The sll and srl instructions are R format
instructions
0
0
16
10
2
0
op
rs
rt
rd
shamt
funct
the shift amount field is used
51More Logical Operations
- Logical Operations
- AND
- bit-wise AND between registers
- and t1, s0, s1
- OR
- bit-wise OR between registers
- or t1, s0, s1
- NOR
- Bit-wise NOR between registers
- nor t1, s0, s1
- nor t1, t0, 0 t1 NOT(t0)
- Immediate modes
- andi and ori
52Example
- Example and R3, R10, R16 or R4, R10,
R16 - nor R5 , R10, R16
- R16 0000 0000 0000 0000 1100 1000 0000 1111
- R10 0000 0000 0000 0110 0100 0000 0111 1000
- R3 0000 0000 0000 0000 0100 0000 0000 1000
- R4 0000 0000 0000 0110 1100 1000 0111 1111
- R5 1111 1111 1111 1001 0011 0111 1000 0000
53Example C Bit Fields
- int data
- struct
- unsigned int ready 1
- unsinged int enable 1
- unsigned int receivedByte 8
- reciever
-
- data receiver.receivedByte
- receiver.ready 0
- receiver.enable 1
54Example C Bit Fields
- Assume data and receiver are in s0 and s1
- sll s0, s1, 22
- srl s0, s0, 24
- andi s1, s1, 0xfffe
- ori s1, Ss1, 0x0002
- Alternative code sequence
- srl s0, s1, 2
- andi s0, s0, 0x00ff
55Instructions for Making Decisions
- beq reg1, reg2, L1
- Go to the statement labeled L1 if the value in
reg1 equals the value in reg2 - bne reg1, reg2, L1
- Go to the statement labeled L1 if the value in
reg1 does not equals the value in reg2 - j L1
- Unconditional jump
- jr t0
- jump register. Jump to the instruction
specified in register t0
56Making Decisions
- Example
- if ( a ! b) goto L1 // x,y,z,a,b mapped
to s0-s4 - x y z
- L1 x x a
- bne s3, s4, L1 goto L1 if a ! b
- add s0, s1, s2 x y z (ignored if
a!b) - L1sub s0, s0, s3 x x a (always ex)
- Reminder
- Registers variable in C code s0 ... s7 16
... 23 - Registers temporary variable t0 ... t7 8
... 15 - Register zero always 0
57if-then-else
- Example
- if ( ab) x y z
- else x y z
- bne s3, s4, Else goto Else if a!b
- add s0, s1, s2 x y z
- j Exit goto Exit
- Else sub s0,s1,s2 x y z
- Exit
58Example Loop with array index
- Loop g g A i i i j if (i !
h) goto Loop .... - s1, s2, s3, s4 g, h, i, j, array A base
s5 - LOOP add t1, s3, s3 t1 2 i add t1,
t1, t1 t1 4 i add t1, t1, s5 t1
adr. Of Ai lw t0, 0(t1) load
Ai add s1, s1, t0 g g Ai add s3,
s3, s4 i i j bne s3, s2, LOOP
59Loops
- Example
- while ( Ai k ) // i,j,k in s3. s4, s5
- i i j // A is in s6
- Loop sll t1, s3, 2 t1 4 i
- add t1, t1, s6 t1 addr. Of Ai
- lw t0, 0(t1) t0 Ai
- bne t0, s5, Exit goto Exit if Ai!k
- add s3, s3, s4 i i j
- j Loop goto Loop
- Exit
-
60Other decisions
- Set R1 on R2 less than R3 slt R1, R2, R3
- Compares two registers, R2 and R3
- R1 1 if R2 R3
- Example slt t1, s1, s2
- Branch less than
- Example if(A
- slt t1, s1, s2 t1 1 if A
- bne t1, 0, LESS
61Switch statement
- switch(k)
- case 0 f I j break
- case 1 f g h break
- case 2 f g h break
- case 3 f i j break
-
- f-k in s0-s5 and t2 contains 4 (maximum of var
k) - The switch statement can be converted into a big
chain of if-then-else statements. - A more efficient method is to use a jump address
table of addresses of alternative instruction
sequences and the jr instruction. Assume the
table base address in t4 -
62Switch cont.
- slt t3, s5, zero is k
- bne t3, zero, Exit if k
- slt t3, s5, t2 is k
- beq t3, zero, Exit if k 4 goto Exit
- sll t1, s5, 2 t1 4 k
- add t1, t1, t4 t1 addr. Of t4k
- lw t0, 0(t1) t0 t4k
- jr t0 jump to addr. In t0
- t40L0, t41L1, ,
- L0 add s0, s3, s4 f i j
- j Exit
- L1 add s0, s1, s2 f g h
- j Exit
- L2 sub s0, s1, s2 f g h
- j Exit
- L3 sub s0, s1, s2 f i j
- Exit
63MIPS Instruction Formats
- More than more than one format for instructions,
usually - Different kinds of instructions need different
kinds of fields, data - Example 3 MIPS instruction formats
R I J
64Addresses in Branches and Jumps
- Instructions
- bne t4,t5,Label Next instruction is at Label
if t4 ? t5 - beq t4,t5,Label Next instruction is at Label
if t4 t5 - j Label Next instruction is at Label
- Formats
- Addresses are not 32 bits How do we handle
this with large programs? - First idea limitation of branch space to the
first 216 bits
op rs rt 16 bit address
I J
op 26 bit address
65Addresses in Branches
- Instructions
- bne t4,t5,Label Next instruction is at Label if
t4?t5 - beq t4,t5,Label Next instruction is at Label if
t4t5 - Formats
- Treat the 16 bit number as an offset to the PC
register PC-relative addressing - Word offset instead of byte offset, why??
- most branches are local (principle of locality)
- Jump instructions just use the high order bits of
PC Pseudodirect addressing - 32-bit jump address 4 Most Significant bits of
PC concatenated with 26-bit word address (or 28-
bit byte address) - Address boundaries of 256 MB
op rs rt 16 bit address
I
66Conditional Branch Distance
25 of integer branches are 2 to 4 instructions
67Conditional Branch Addressing
- PC-relative since most branches are relatively
close to the current PC - At least 8 bits suggested (?128 instructions)
- Compare Equal/Not Equal most important for
integer programs (86)
68PC-relative addressing
- For larger distances Jump register jr required.
69Example
- LOOP mult 9, 19, 10 R9 R19R10 lw 8,
1000(9) R8 _at_(R91000) - bne 8, 21, EXIT add 19, 19, 20 i
i j j LOOP EXIT ...... - Assume LOOP is placed at location 80000
70Example
- LOOP mult 9, 19, 10 R9 R19R10
lw 8, 1000(9) R8 _at_(R91000) - bne 8, 21, EXIT add 19, 19, 20 i
i j j LOOP EXIT ... - Assume LOOP is placed at location 80000
2
20000
71MIPS Addressing Modes
CS 331
Xiaoyu Zhang, CSUSM
71
72(No Transcript)
73Procedure calls
- Procedures or subroutines
- Needed for structured programming
- Steps followed in executing a procedure call
- Place parameters in a place where the procedure
(callee) can access them - Transfer control to the procedure
- Acquire the storage resources needed for the
procedure - Perform desired task
- Place results in a place where the calling
program (caller) can access them - Return control to the point of origin
74Resources Involved
- Registers used for procedure calling
- a0 - a3 four argument registers in which to
pass parameters - v0 - v1 two value registers in which to
return values - ra one return address register to return to
the point of origin - Transferring the control to the callee
- jal ProcedureAddress
- jump-and-link to the procedure address
- the return address (PC4) is saved in ra
- Example jal 20000
- Returning the control to the caller
- jr ra
- instruction following jal is executed next
75Memory Stacks
Useful for stacked environments/subroutine call
return even if operand stack not part of
architecture
Stacks that Grow Up vs. Stacks that Grow Down
High address
0 Little
inf. Big
a
Memory Addresses
grows up
grows down
SP
b
c
inf. Big
0 Little
Low address
76Calling conventions
- int func(int g, int h, int i, int j)
- int f
- f ( g h ) ( i j )
- return ( f )
- // g,h,i,j - a0,a1,a2,a3, f in s0
- func
- addi sp, sp, -12 make room in stack for 3
words - sw t1, 8(sp) save the regs we want to use
- sw t0, 4(sp)
- sw s0, 0(sp)
- add t0, a0, a1 t0 g h
- add t1, a2, a3 t1 i j
- sub s0, t0, t1 s0 has the result
- add v0, s0, zero return reg v0 has f
77Calling (cont.)
- lw s0, 0(sp) restore s0
- lw t0, 4(sp) restore t0
- lw t1, 8(sp) restore t1
- addi sp, sp, 12 restore sp
- jr ra
- we did not have to restore t0-t9 (caller save)
- we do need to restore s0-s7 (must be preserved
by callee)
78Nested Calls
Stacking of Subroutine Calls Returns and
Environments
A
A CALL B CALL C
C RET
RET
A
B
B
D
A
B
C
A
B
A
- Some machines provide a memory stack as part of
the - architecture (e.g., VAX, JVM)
- Sometimes stacks are implemented via software
convention
79Compiling a String Copy Proc.
- void strcpy ( char x , y )
- int i0
- while ( x i y i ! 0)
- i
- // x and y base addr. are in a0 and a1
- strcpy
- addi sp, sp, -4 reserve 1 word space in
stack - sw s0, 0(sp) save s0
- add s0, zer0, zer0 i 0
- L1 add t1, a1, s0 addr. of y i in t1
- lb t2, 0(t1) t2 y i
- add t3, a0, s0 addr. Of x i in t3
- sb t2, 0(t3) x i y i
- beq t2, zero, L2 if y i 0 goto L2
- addi s0, s0, 1 i
- j L1 go to L1
- L2 lw s0, 0(sp) restore s0
- addi sp, sp, 4 restore sp
- jr ra return
80Array vs. Pointers
- Clear1 ( int array , int size)
- int i
- for ( i 0 i
- array i 0
-
- Clear2 ( int array , int size)
- int p, i0
- for(parray0 p
- p 0
-
- // a0 addr. of array, a1 has size
81Arrays vs. Pointers MIPS for array version
- Clear1 ( int array , int size)
- int i
- for ( i 0 i
- array i 0
-
- // a0 addr. of array, a1 has size, t0 i
- move t0, zero i 0
- Loop1add t1, t0, t0 t1 2 i
- add t1, t1, t1 t1 4 i
- add t2, a0, t1 t2 addr. of array
- sw zero, 0(t2) arrayi 0
- addi t0, t0,1 i
- slt t3, t0, a1 check end of loop (i
- bne t3, zero, Loop1 if i
-
82Arrays vs. Pointers MIPS for pointer version
- Clear2 ( int array , int size)
- int p, i0
- for(parray0 p
- p 0
-
- // a0 addr. of array, a1 has size, t0 p
- move t0, a0 p addr. of array0
- add t1, a1, a1 t1 2 size
- add t1, t1, t1 t1 4 size
- add t2, a0, t1 t2 addr. of arraysize
- Loop2 sw zero, 0(t0) store 0 in p
- addi t0, t0, 4 p p 4
- slt t3, t0, t2 t3(p
- bne t3, zero, Loop2if p Loop2
-
- The pointer version reduces the of instructions
per iteration - from 7 to 4
- Many optimizing compilers will generate this
code, even for - array-based C code
83Alternative Architectures
- Design alternative
- provide more powerful operations
- goal is to reduce number of instructions executed
- danger is a slower cycle time and/or a higher CPI
- Sometimes referred to as RISC vs. CISC
- virtually all new instruction sets since 1982
have been RISC - VAX minimize code size, make assembly language
easy instructions from 1 to 54 bytes long! - Well look at PowerPC and Intel IA-32
84PowerPC
- PowerPC is a RISC architecture very similar to
MIPS, but has some unique instructions - Indexed addressing
- example lw t1,a0s3 t1Memorya0s3
- What do we have to do in MIPS?
- Update addressing
- update a register as part of load (for marching
through arrays) - example lwu t0,4(s3) t0Memorys34s3s3
4 - What do we have to do in MIPS?
- Others
- load multiple/store multiple
- a special counter register bc Loop, ctr!0
decrement counter, if not 0 goto loop
85IA - 32
- 1978 The Intel 8086 is announced (16 bit
architecture) - 1980 The 8087 floating point coprocessor is
added - 1982 The 80286 increases address space to 24
bits, instructions - 1985 The 80386 extends to 32 bits, new
addressing modes - 1989-1995 The 80486, Pentium, Pentium Pro add a
few instructions (mostly designed for higher
performance) - 1997 57 new MMX instructions are added,
Pentium II - 1999 The Pentium III added another 70
instructions (SSE) - 2001 Another 144 instructions (SSE2)
- 2003 AMD extends the architecture to increase
address space to 64 bits, widens all registers
to 64 bits and other changes (AMD64) - 2004 Intel capitulates and embraces AMD64
(calls it EM64T) and adds more media extensions - This history illustrates the impact of the
golden handcuffs of compatibilityadding new
features as someone might add clothing to a
packed bagan architecture that is difficult
to explain and impossible to love
86IA-32 Overview
- Complexity
- Instructions from 1 to 17 bytes long
- one operand must act as both a source and
destination - one operand can come from memory
- complex addressing modes e.g., base or scaled
index with 8 or 32 bit displacement - Saving grace
- the most frequently used instructions are not too
difficult to build - compilers avoid the portions of the architecture
that are slow - what the 80x86 lacks in style is made up in
quantity, making it beautiful from the right
perspective
87To summarize
88Summary
- Instruction complexity is only one variable
- lower instruction count vs. higher CPI / lower
clock rate - Design Principles
- simplicity favors regularity
- smaller is faster
- good design demands compromise
- make the common case fast
- Instruction set architecture
- a very important abstraction indeed!