Title: Course number: CS141
1Introduction to Computer Architecture
Course number CS141 Who? Tarun Soni (
tsoni_at_cs.ucsd.edu ) TA Wenjing Rao
(wrao_at_cs) and Eric Liu (xeliu_at_cs) Where? CENTR
119 When? M,W _at_ 6-850pm Textbook Patterson and
Hennessy, Computer Organization Design The
hardware software interface, 2nd
edition. Web-page http//www-cse.ucsd.edu/users/
tsoni/cse141 (slides, homework questions, other
pointers and information) Office hours Tarun
Mon. 4pm-6pm APM 3151 Yang Yu and Wenjing Rao
TBD, look on the webpage
2Todays Agenda
- Administrivia
- Technology trends
- Computer organization concept of abstraction
- Instruction Set Architectures Definition,
types, examples - Instruction formats operands, addressing modes
- Operations load, store, arithmetic, logical
- Control instructions branch, jump, procedures
- Stacks
- Examples in-line code, procedure,
nested-procedures - Other architectures
3Schedule-sort of
1 6/30 Intro., Technology, ISA
2 7/2 Performance, Cost, Arithmetic
3 7/7 Multiply, Divide?, FP numbers
4 7/9 Single cycle Datapath, Control
5 7/14 Multiple Cycle CPU, Microprogramming
6 7/16 Mid-term quiz
7 7/21 Pipelining intro, control, exceptions
8 7/23 Memory systems, Cache, Virtual memory
9 7/28 I/O Devices
10 7/30 Superscalars, Parallel machines
11 ?? Overview, wrapup, catchup ..
?? Final, 7-10 pm, Friday
4Grading
- Grade breakdown
- Mid-term (1.5 hours) 30
- Final (3 hours) 40
- Pop-Quizzes (3, 45 min each, only 2 high scores
cout) 30 - Class Participation Extras??
- Cant make exams tell us early and we will work
something out - Homeworks do not need to be turned in. However,
pop-quizzes will be based on hw. - What is cheating?
- Studying together in groups is encouraged
- Work must be your own
- Common examples of cheating copying an exam
question from other material or other person... - Better off to skip question (small fraction of
grade.) - Written/email request for changes to grades
- average grade will be a B or B set expectations
accordingly
5Why?
- You may become a practitioner someday ?
- Keeper of Moores law
- Architecture concepts are core to other
sub-systems - Video-processors
- Security engines
- Routing/Networking etc.
-
- Even if you become a software geek?
- Architecture enables a way of thinking
- Understanding leads to breadth and better
implementation of software
6Computer of the day Jacquard loom late
1700s for weaving silk Program on punch
cards Microcode each hole lifts a set of
threads Or gate thread lifted if any
controlling hole punched
7Trends Moores law
8Trends 1000 will buy you
9Trends Densities
10Technology
Source Intel Journal, May 2002
11Other technology trends
- Processor
- logic capacity about 30 per year
- clock rate about 20 per year
- Memory
- DRAM capacity about 60 per year (4x every 3
years) - Memory speed about 10 per year
- Cost per bit about 25 per year
- Disk
- capacity about 60 per year
Physics-advancement Architecture-advancement
Speed
Capacity
12SPEC Performance
RISC introduction
performance now improves 50 per year (2x every
1.5 years)
13Organization A Basic Computer
Every computer has 5 basic components
Computer
Control
Input
Memory
Output
Datapath
14Organization A Basic Computer
- Not all memory are created equally
- Cache fast (expensive) memory are placed closer
to the processor - Main memory less expensive memory--we can have
more
Proc
Caches
Busses
adapters
Memory
Controllers
Disks Displays Keyboards
I/O Devices
Networks
- Input and output (I/O) devices have the messiest
organization - Wide range of speed graphics vs. keyboard
- Wide range of requirements speed, standard, cost
... - Least amount of research (so far)
15What is Computer Architecture
Computer Architecture Instruction Set
Architecture Machine Organization
How you talk to the machine
What the machine looks like
Computer Architecture and Engineering
Instruction Set Design Computer
Organization Interfaces Hardware
Components Compiler/System View Logic Designers
View
16Architecture?
Application
Operating
System
Compiler
Firmware
Instruction Set Architecture
I/O system
Instr. Set Proc.
Datapath Control
Digital Design
Circuit Design
Layout
- Coordination of many levels of abstraction
- Under a rapidly changing set of forces
- Design, Measurement, and Evaluation
17Levels of abstraction?
temp vk vk vk1 vk1 temp
High Level Language Program
Compiler
lw 15, 0(2) lw 16, 4(2) sw 16, 0(2) sw 15,
4(2)
Assembly Language Program
Assembler
0000 1001 1100 0110 1010 1111 0101 1000 1010 1111
0101 1000 0000 1001 1100 0110 1100 0110 1010
1111 0101 1000 0000 1001 0101 1000 0000 1001
1100 0110 1010 1111
Machine Language Program
Machine Interpretation
Control Signal Specification
ALUOP03 lt InstReg911 MASK
18Instruction Set Architecture
ISA is the agreed-upon interface between all the
software that runs on the machine and the
hardware that executes it.
software
instruction set
hardware
19Example ISAs
- IBM360, VAX etc.
- Digital Alpha (v1, v3) 1992-97
- HP PA-RISC (v1.1, v2.0) 1986-96
- Sun Sparc (v8, v9) 1987-95
- SGI MIPS (MIPS I, II, III, IV, V) 1986-96
- Intel (8086,80286,80386, 1978-96 80486,Pentium,
MMX, ...) - ARM ARM7,8,StrongARM 1995-
Digital Signal Processors also have an
ISA TMS320, Motorola, OAK etc.
20ISAs
Instruction Set Architecture
How to talk to computers if you arent in Star
Trek
21ISAs
- Language of the Machine
- More primitive than higher level languages e.g.,
no sophisticated control flow - Very restrictive e.g., MIPS Arithmetic
Instructions - Well be working with the MIPS instruction set
architecture - similar to other architectures developed since
the 1980's - used by NEC, Nintendo, Silicon Graphics, Sony
- Design goals maximize performance and minimize
cost, reduce design time
22ISAs
- Ideally the only part of the machine visible to
the programmer/compiler - Available instructions (Opcodes)
- Formats
- Registers, number and type
- Addressing modes, access mechanisms
- Exception conditions etc.
23Instruction Set Architecture What Must be
Specified?
Instruction Format or Encoding how is it
decoded? Location of operands and result
where other than memory? how many explicit
operands? how are memory operands located?
which can or cannot be in memory? Data type and
Size Operations what are supported
Successor instruction jumps, conditions,
branches
fetch-decode-execute is implicit!
24Vocabulary
- superscalar processor -- can execute more than
one instructions per cycle. - cycle -- smallest unit of time in a processor.
- parallelism -- the ability to do more than one
thing at once. - pipelining -- overlapping parts of a large task
to increase throughput without decreasing latency
25ISA Decisions
destination operand
operation
y x b
- operations
- how many?
- which ones
- operands
- how many?
- location
- types
- how to specify?
- instruction format
- size
- how many formats?
(add r1, r2, r5)
how does the computer know what 0001 0100 1101
1111 means?
26Crafting an ISA
- Well look at some of the decisions facing an
instruction set architect, and - how those decisions were made in the design of
the MIPS instruction set. - MIPS, like SPARC, PowerPC, and Alpha AXP, is a
RISC (Reduced Instruction Set Computer) ISA. - fixed instruction length
- few instruction formats
- load/store architecture
- RISC architectures worked because they enabled
pipelining. They continue to thrive because they
enable parallelism.
27Basic types of ISAs
Accumulator (1 register) 1 address add A acc
acc memA 1x address addx A acc acc
memA x Stack 0 address add tos tos
next General Purpose Register 2 address add A
B EA(A) EA(A) EA(B) 3
address add A B C EA(A) EA(B)
EA(C) Load/Store 3 address add Ra Rb Rc Ra
Rb Rc load Ra Rb Ra memRb store Ra
Rb memRb Ra
Comparison
Bytes per instruction? Number of Instructions?
Cycles per instruction?
28Instruction Count
C AB
Accumulator (1 register) Load A Add B Store
C Stack Push A Push B Add Pop C
General Purpose Register (Register-Memory) Load
R1,A Add R1,B Store C,R1 Load/Store Load
R1,A Load R2,B Add R3,R1,R2 Store C,R3
29Instruction Length
Variable Fixed Hybrid
MIPS Instructions
- All instructions have 3 operands
- Operand order is fixed (destination first)C
code A B CMIPS code add s0, s1, s2
(associated with variables by compiler)
30Instruction Length
- Variable-length instructions (Intel 80x86, VAX)
require multi-step fetch and decode, but allow
for a much more flexible and compact instruction
set. - Fixed-length instructions allow easy fetch and
decode, and simplify pipelining and parallelism. - All MIPS instructions are 32 bits long.
- this decision impacts every other ISA decision we
make because it makes instruction bits scarce.
Recent embedded machines (ARM, MIPS) added
optional mode to execute subset of 16-bit wide
instructions (Thumb, MIPS16) choose performance
or density per procedure
- If code size is most important, use variable
length instructions - If performance is most important, use fixed
length
31MIPS Instruction Format
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
OP
rs
rd
sa
funct
rt
OP
rs
rt
immediate
OP
target
- the opcode tells the machine which format
- so add r1, r2, r3 has
- opcode0, funct32, rs2, rt3, rd1, sa0
- 000000 00010 00011 00001 00000 100000
32Operands
- operands are generally in one of two places
- registers (32 int, 32 fp)
- memory (232 locations)
- registers are
- easy to specify
- close to the processor (fast access)
- the idea that we want to access registers
whenever possible led to load-store
architectures. - normal arithmetic instructions only access
registers - only access memory with explicit loads and stores
33Load Store Architectures
Load-store architectures
- can do
- add r1r2r3
- and
- load r3, M(address)
- forces heavy dependence on registers, which is
exactly what you want in todays CPUs
cant do add r1 r2 M(address) -more
instructions fast implementation (e.g., easy
pipelining)
Expect new instruction set architecture to use
general purpose register
Pipelining gt Expect it to use load store variant
of GPR ISA
34General Purpose Registers
Advantages of registers
registers are faster than memory
registers are easier for a compiler to use
vs. stack
-
e.g., (AB) (CD) (EF) multiplies in any
order
registers can hold variables
-
memory traffic is reduced, so program is sped up
-
code density improves (since register named with
fewer bits
than memory location)
MIPS Registers
- Programmable storage
- 232 x bytes of memory
- 31 x 32-bit GPRs (R0 0)
- 32 x 32-bit FP regs (paired DP)
- HI, LO, PC
35Memory Organization
- Viewed as a large, single-dimension array, with
an address. - A memory address is an index into the array
- "Byte addressing" means that the index points to
a byte of memory.
0
8 bits of data
1
8 bits of data
2
8 bits of data
3
8 bits of data
4
8 bits of data
5
8 bits of data
6
8 bits of data
36Memory Organization
- Bytes are nice, but most data items use larger
"words" - For MIPS, a word is 32 bits or 4 bytes.
- 232 bytes with byte addresses from 0 to 232-1
- 230 words with byte addresses 0, 4, 8, ... 232-4
- Words are aligned i.e., what are the least 2
significant bits of a word address?
0
32 bits of data
4
32 bits of data
Registers hold 32 bits of data
8
32 bits of data
12
32 bits of data
...
37Data Types
Bit 0, 1 Bit String sequence of bits of a
particular length 4 bits is a nibble
8 bits is a byte 16 bits is a half-word
32 bits is a word 64 bits is a
double-word Character ASCII 7 bit
code Decimal digits 0-9 encoded as 0000b
thru 1001b two decimal digits packed per 8
bit byte Integers 2's Complement Floating
Point Single Precision Double
Precision Extended Precision
How many /- 's? Where is decimal pt? How are
/- exponents represented?
exponent
E
M x R
base
mantissa
38Operand Usage
Support data sizes and types 8-bit, 16-bit,
32-bit integers and 32-bit and 64-bit IEEE 754
floating point numbers
39Addressing Endian-ness and alignment
- Big Endian address of most significant byte
word address (xx00 Big End of word) - IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
- Little Endian address of least significant byte
word address(xx00 Little End of word) - Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
little endian byte 0
3 2 1 0
msb
lsb
0 1 2 3
0 1 2 3
Aligned
big endian byte 0
Alignment require that objects fall on address
that is multiple of their size.
Not Aligned
40Addressing Modes
how do we specify the operand we want?
- Register direct R3
- Immediate (literal) 25
- Direct (absolute) M10000
- Register indirect MR3
- BaseDisplacement MR3 10000
- if register is the program counter, this is
PC-relative - BaseIndex MR3 R4
- Scaled Index MR3 R4d 10000
- Autoincrement MR3
- Autodecrement MR3 - -
- Memory Indirect M MR3
41Addressing Modes
Addressing mode
Example
Meaning
Register
Add R4,R3
R4
R4R3
Immediate
Add R4,3
R4
R43
Displacement
Add R4,100(R1)
R4
R4Mem100R1
Register indirect
Add R4,(R1)
R4
R4MemR1
Indexed / Base
Add R3,(R1R2)
R3
R3MemR1R2
Direct or absolute
Add R1,(1001)
R1
R1Mem1001
Memory indirect
Add R1,_at_(R3)
R1
R1MemMemR3
Auto-increment
Add R1,(R2)
R1
R1MemR2 R2
R2d
Auto-decrement
Add R1,(R2)
R2
R2d R1
R1MemR2
Scaled
Add R1,100(R2)R3
R1
R1Mem100R2R3d
42Addressing Modes Usage
- 3 programs measured on machine with all address
modes (VAX) - --- Displacement 42 avg, 32 to 55 75
- --- Immediate 33 avg, 17 to 43
85 - --- Register deferred (indirect) 13 avg, 3
to 24 - --- Scaled 7 avg, 0 to 16
- --- Memory indirect 3 avg, 1 to 6
-
- --- Misc 2 avg, 0 to 3
- 75 displacement immediate
- 88 displacement, immediate register indirect
- similar measurements
43Addressing mode usage Application Specific
Program Base Dis- placement Immediate Scaled Index Memory Indirect All Others
TEX 56 43 0 1 0
Spice 58 17 16 6 3
GCC 51 39 6 1 3
44MIPS Addressing Modes
register direct add 1, 2, 3 immediate add
1, 2, 35 base displacement lw 1, disp(2)
OP
rs
rd
sa
funct
rt
OP
rs
rt
immediate
rs
immediate
- register indirect
- disp 0
- absolute
- (rs) 0
rt
45MIPS ISA-so far
- fixed 32-bit instructions
- 3 instruction formats
- 3-operand, load-store architecture
- 32 general-purpose registers (integer, floating
point) - R0 always equals 0.
- 2 special-purpose integer registers, HI and LO,
because multiply and divide produce more than 32
bits. - registers are 32-bits wide (word)
- register, immediate, and basedisplacement
addressing modes
But what about the actual instructions themselves
??
46Typical Operations (little change since 1960)
Data Movement
Load (from memory) Store (to memory) memory-to-mem
ory move register-to-register move input (from
I/O device) output (to I/O device) push, pop
(to/from stack)
Arithmetic
integer (binary decimal) or FP Add, Subtract,
Multiply, Divide
Shift
shift left/right, rotate left/right
Logical
not, and, or, set, clear
Control (Jump/Branch)
unconditional, conditional
Subroutine Linkage
call, return
Interrupt
trap, return
Synchronization
test set (atomic r-m-w)
String
search, translate
Graphics (MMX)
parallel subword ops (4 16bit add)
4780x86 Instruction usage
48Instruction usage
Support the simple instructions, since they
will dominate the number of instructions
executed load, store, add, subtract, move
register-register, and, shift, compare equal,
compare not equal, branch, jump, call, return
Compiler Issues orthogonality no special
registers, few special cases, all operand modes
available with any data type or instruction
type completeness support for a wide range of
operations and target applications regularity
no overloading for the meanings of instruction
fields streamlined resource needs easily
determined Register Assignment is critical
too Easier if lots of registers
49MIPS Instructions
- arithmetic
- add, subtract, multiply, divide
- logical
- and, or, shift left, shift right
- data transfer
- load word, store word
- conditional Branch
- unconditional Jump
50MIPS Instructions
- arithmetic
- add, subtract, multiply, divide
Instruction Example Meaning Comments add add
1,2,3 1 2 3 3 operands exception
possible subtract sub 1,2,3 1 2 3 3
operands exception possible add immediate addi
1,2,100 1 2 100 constant exception
possible add unsigned addu 1,2,3 1 2
3 3 operands no exceptions subtract
unsigned subu 1,2,3 1 2 3 3 operands
no exceptions add imm. unsign. addiu 1,2,100 1
2 100 constant no exceptions multiply
mult 2,3 Hi, Lo 2 x 3 64-bit signed
product multiply unsigned multu2,3 Hi, Lo 2
x 3 64-bit unsigned product divide div 2,3 Lo
2 3, Lo quotient, Hi remainder Hi
2 mod 3 divide unsigned divu 2,3 Lo 2
3, Unsigned quotient remainder Hi 2
mod 3 move from Hi mfhi 1 1 Hi Used to get
copy of Hi move from Lo mflo 1 1 Lo Used to
get copy of Lo
51MIPS Instructions
- logical
- and, or, shift left, shift right
Instruction Example Meaning Comment and and
1,2,3 1 2 3 3 reg. operands Logical
AND or or 1,2,3 1 2 3 3 reg. operands
Logical OR xor xor 1,2,3 1 2 Å 3 3 reg.
operands Logical XOR nor nor 1,2,3 1 (2
3) 3 reg. operands Logical NOR and
immediate andi 1,2,10 1 2 10 Logical AND
reg, constant or immediate ori 1,2,10 1 2
10 Logical OR reg, constant xor immediate xori
1, 2,10 1 2 10 Logical XOR reg,
constant shift left logical sll 1,2,10 1 2
ltlt 10 Shift left by constant shift right
logical srl 1,2,10 1 2 gtgt 10 Shift right by
constant shift right arithm. sra 1,2,10 1 2
gtgt 10 Shift right (sign extend) shift left
logical sllv 1,2,3 1 2 ltlt 3 Shift left
by variable shift right logical srlv 1,2, 3
1 2 gtgt 3 Shift right by variable shift
right arithm. srav 1,2, 3 1 2 gtgt 3
Shift right arith. by variable
52MIPS Instructions
- data transfer
- load word, store word
Instruction Comment SW 500(R4), R3 Store
word SH 502(R2), R3 Store half SB 41(R3),
R2 Store byte LW R1, 30(R2) Load word LH R1,
40(R3) Load halfword LHU R1, 40(R3) Load
halfword unsigned LB R1, 40(R3) Load byte LBU
R1, 40(R3) Load byte unsigned LUI R1, 40 Load
Upper Immediate (16 bits shifted left by 16) Why
need LUI?
53MIPS Control Instructions
- How do you specify the destination of a
branch/jump? - studies show that almost all conditional branches
go short distances from the current program
counter (loops, if-then-else). - we can specify a relative address in much fewer
bits than an absolute address - e.g., beq 1, 2, 100 gt if (1 2) PC PC
100 4 - How do we specify the condition of the branch?
Condition Codes Processor status bits are set
as a side-effect of arithmetic instructions
(possibly on Moves) or explicitly by compare or
test instructions. add r1, r2, r3 bz
label Condition Register cmp r1, r2, r3 bgt
r1, label Compare and Branch bgt r1, r2, label
54Conditional Branch Distance
55Conditional Branching
- PC-relative since most branches are relatively
close - to the current PC address
- At least 8 bits suggested ( 128 instructions)
- Compare Equal/Not Equal most important for
integer programs (86)
56Conditional Branching
- Compare and Branch
- BEQ rs, rt, offset if Rrs Rrt then
PC-relative branch - BNE rs, rt, offset ltgt
- Compare to zero and Branch
- BLEZ rs, offset if Rrs lt 0 then PC-relative
branch - BGTZ rs, offset gt
- BLT lt
- BGEZ gt
- BLTZAL rs, offset if Rrs lt 0 then branch
and link (into R 31) - BGEZAL gt
- Remaining set of compare and branch take two
instructions - Almost all comparisons are against zero!
MIPS Branch Instructions
- beq, bne beq r1, r2, addr gt if (r1 r2)
goto addr - slt 1, 2, 3 gt if (2 lt 3) 1 1 else 1
0 - these, combined with 0, can implement all
fundamental branch conditions - Always, never, !, , gt, lt, gt, lt, gt(unsigned),
lt (unsigned), ...
57Jumps
- need to be able to jump to an absolute address
sometime - need to be able to do procedure calls and returns
- jump -- j 10000 gt PC 10000
- jump and link -- jal 100000 gt 31 PC 4 PC
10000 - used for procedure calls
- jump register -- jr 31 gt PC 31
- used for returns, but can be useful for lots of
other things.
58Jumps
MIPS Instruction Formats
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
R I J
OP
rs
rd
sa
funct
rt
OP
rs
rt
Immediate (16 bits)
OP
target
MIPS Addressing Formats Branches and Jumps
- Branch (e.g., beq) uses PC-relative addressing
mode (few bits if addr typically close) uses
basedisplacement mode, with the PC being the
base. - Jump uses pseudo-direct addressing mode. 26 bits
of the address is in the instruction, the rest is
taken from the PC.
program counter
instruction
6 26
6 26
jump destination address
59MIPS Branch Jump Instructions
Instruction Example Meaning branch on equal beq
1,2,100 if (1 2) go to PC4100 Equal
test PC relative branch branch on not eq. bne
1,2,100 if (1! 2) go to PC4100 Not equal
test PC relative set on less than slt
1,2,3 if (2 lt 3) 11 else 10 Compare
less than 2s comp. set less than imm. slti
1,2,100 if (2 lt 100) 11 else 10 Compare
lt constant 2s comp. set less than uns. sltu
1,2,3 if (2 lt 3) 11 else 10 Compare
less than natural numbers set l. t. imm.
uns. sltiu 1,2,100 if (2 lt 100) 11 else
10 Compare lt constant natural numbers jump j
10000 go to 10000 Jump to target address jump
register jr 31 go to 31 For switch, procedure
return jump and link jal 10000 31 PC 4 go
to 10000 For procedure call
60Stacks
Stacking of Subroutine Calls Returns and
Environments
A
A CALL B CALL C
C RET
RET
B
A
B
A
B
C
A
B
A
Some machines provide a memory stack as part of
the architecture (e.g., VAX) Sometimes
stacks are implemented via software convention
(e.g., MIPS)
61Stacks
Useful for stacked environments/subroutine call
return even if operand stack not part of
architecture
Stacks that Grow Up vs. Stacks that Grow Down
0 Little
inf. Big
Next Empty?
Memory Addresses
grows up
grows down
c
b
Last Full?
a
SP
inf. Big
0 Little
Little --gt Big/Last Full POP Read from
Mem(SP) Decrement SP PUSH
Increment SP Write to Mem(SP)
Little --gt Big/Next Empty POP Decrement
SP Read from Mem(SP) PUSH
Write to Mem(SP) Increment SP
62Stack Frames
High Mem
ARGS
Reference args and local variables at fixed
(positive) offset from FP
Callee Save Registers
(old FP, RA)
Local Variables
FP
Grows and shrinks during expression evaluation
SP
Low Mem
- Many variations on stacks possible (up/down, last
pushed / next ) - Block structured languages contain link to
lexically enclosing frame - Compilers normally keep scalar variables in
registers, not memory!
63MIPS Software Register Conventions
0 zero constant 0 1 at reserved for
assembler 2 v0 expression evaluation
3 v1 function results 4 a0 arguments 5 a1 6 a2 7
a3 8 t0 temporary caller saves . . . (callee
can clobber) 15 t7
16 s0 callee saves . . . (caller can
clobber) 23 s7 24 t8 temporary
(contd) 25 t9 26 k0 reserved for OS
kernel 27 k1 28 gp Pointer to global
area 29 sp Stack pointer 30 fp frame
pointer 31 ra Return Address (HW)
64MIPS Branch Jump Instructions
65Example Swap()
swap(int v, int k) int temp temp
vk vk vk1 vk1 temp
- Can we figure out the code?
swap // 4v, 5k muli 2, 5, 4
// 2 k4 add 2, 4, 2 // 2 v(4k) lw
15, 0(2) // 15temp (20)(vk) lw 16,
4(2) // 16 (24) (vk1) sw 16,
0(2) // (vk) 16 (vk1) sw 15, 4(2)
// (vk1) 15 temp jr 31 //
return
66Example Leaf_procedure()
int PairDiff(int a, int b, int c,int d) int
temp temp (ab)-(cd) return temp
Assume caller puts a0-a3 a,b,c,d and wants
result in v0 PairDiff // sub
sp,sp,12 // Make space for 3 temp
locations sw t1, 8(sp) // save t1 (optional
if MIPS convention) sw t0, 4(sp) // save t0
(optional if MIPS convention) sw s0, 0(sp) //
save s0 add t0,a0,a1 // (t0ab) add
t1,a2,a3 // (t1cd) sub s0,t0,t1 //
(s0t0-t1) add v0,s0,zero // store return
value in v0 lw s0,0(sp) // restore
registers lw t0,4(sp) // (optional if MIPS
convention) lw t1,8(sp) // (optional if
MIPS convention) add sp,sp,12 // pop the
stack jr ra // The actual return
to calling routine
67Example Nested_procedure()
int fact(int n) if(nlt1) return(1) else
return (nfact(n-1))
- What about nested procedures? ra ??
- Recursive procedures?
Assume a0 n fact // sub
sp,sp,8 // Make space for 2 temp
locations sw ra, 4(sp) // save return
address sw a0, 4(sp) // save argument
n slt t0,a0,1 // test for nlt1 beq
t0,zero, L1 // if (ngt1) goto L1 add
v0,zero,1 // v01 add sp,sp,8 //
pop the stack jr ra // return
L1 sub a0,a0,1 // n-- jal fact
// call fact again. lw a0,0(sp) //
fact() returns here. Restore n lw ra,4(sp)
// restore return address add sp,sp,8 //
pop stack mult v0,a0,v0 // v0
nfact(n-1) jr ra // return to
caller
(nlt1) case
(ngt1) case
68Other Architectures
- Design alternative
- provide more powerful operations (e.g., DSP,
Encryption engines, Java Processors) - goal is to reduce number of instructions executed
- danger is a slower cycle time and/or a higher CPI
- Sometimes referred to as RISC vs. CISC
- virtually all new instruction sets since 1982
have been RISC - VAX minimize code size, make assembly language
easy instructions from 1 to 54 bytes long! - Well look at PowerPC and 80x86
69Power PC
- Indexed addressing
- example lw t1,a0s3 //
t1Memorya0s3 - What do we have to do in MIPS?
- Update addressing
- update a register as part of load (for marching
through arrays) - example lwu t0,4(s3) //
t0Memorys34s3s34 - What do we have to do in MIPS?
- Others
- load multiple/store multiple
- a special counter register bc Loop
- decrement counter, if not 0 goto loop
70x86 Volume is beautiful
- 1978 The Intel 8086 is announced (16 bit
architecture) - 1980 The 8087 floating point coprocessor is
added - 1982 The 80286 increases address space to 24
bits, instructions - 1985 The 80386 extends to 32 bits, new
addressing modes - 1989-1995 The 80486, Pentium, Pentium Pro add a
few instructions (mostly designed for higher
performance) - 1997 MMX is added
This history illustrates the impact of the
golden handcuffs of compatibilityadding new
features as someone might add clothing to a
packed bagan architecture that is difficult
to explain and impossible to love what the
80x86 lacks in style is made up in quantity,
making it beautiful from the right perspective
71x86 Complex Instruction Set
- See text for a detailed description.
- Complexity
- Instructions from 1 to 17 bytes long
- one operand must act as both a source and
destination - one operand can come from memory
- complex addressing modes e.g., base or scaled
index with 8 or 32 bit displacement - Saving grace
- the most frequently used instructions are not too
difficult to build - compilers avoid the portions of the architecture
that are slow
72Comparing Instruction Set Architectures
Design-time metrics Can it be implemented, in
how long, at what cost? Can it be programmed?
Ease of compilation? Static Metrics How many
bytes does the program occupy in memory? Dynamic
Metrics How many instructions are
executed? How many bytes does the processor
fetch to execute the program? How many clocks
are required per instruction? How "lean" a
clock is practical? Best Metric Time to
execute the program!
- This depends on
- instruction set,
- processor organization, and
- compilation techniques.
73Instruction Set Architectures What did we learn
today?
- MIPS is a general-purpose register, load-store,
fixed-instruction-length architecture. - MIPS is optimized for fast pipelined performance,
not for low instruction count - Four principles of IS architecture
- simplicity favors regularity
- smaller is faster
- good design demands compromise
- make the common case fast
74Todays Agenda
- Administrivia
- Technology trends
- Computer organization concept of abstraction
- Instruction Set Architectures Definition, types,
examples - Instruction formats operands, addressing modes
- Operations load, store, arithmetic, logical
- Control instructions branch, jump, procedures
- Stacks
- Examples in-line code, procedure,
nested-procedures - Other architectures