Title: Computer Architecture
1Computer Architecture
- Lecture 3
- Basic Fundamentals
- and
- Instruction Sets
2The Task of a Computer Designer
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
Evaluate Existing Systems for Bottlenecks
Implementation Complexity
Benchmarks
Technology Trends
Implement Next Generation System
Simulate New Designs and Organizations
Workloads
3Technology and Computer Usage Trends
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
- When building a Cathedral numerous very practical
considerations need to be taken into account - available materials
- worker skills
- willingness of the client to pay the price.
- Similarly, Computer Architecture is about working
within constraints - What will the market buy?
- Cost/Performance
- Tradeoffs in materials and processes
4Trends
- Gordon Moore (Founder of Intel) observed in 1965
that the number of transistors that could be
crammed on a chip doubles every year. - This has CONTINUED to be true since then.
5Measuring And Reporting Performance
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
- This section talks about
- Metrics how do we describe in a numerical way
the performance of a computer? - What tools do we use to find those metrics?
6Metrics
- Time to run the task (ExTime)
- Execution time, response time, latency
- Tasks per day, hour, week, sec, ns
(Performance) - Throughput, bandwidth
7Metrics - Comparisons
- "X is n times faster than Y" means
- ExTime(Y) Performance(X)
- --------- ---------------
- ExTime(X) Performance(Y)
- Speed of Concorde vs. Boeing 747
- Throughput of Boeing 747 vs. Concorde
8Metrics - Comparisons
- Pat has developed a new product, "rabbit" about
which she wishes to determine performance. There
is special interest in comparing the new product,
rabbit to the old product, turtle, since the
product was rewritten for performance reasons.
(Pat had used Performance Engineering techniques
and thus knew that rabbit was "about twice as
fast" as turtle.) The measurements showed -
- Performance Comparisons
-
- Product Transactions / second Seconds/
transaction Seconds to process transaction - Turtle 30 0.0333 3
- Rabbit 60 0.0166 1
- Which of the following statements reflect the
performance comparison of rabbit and turtle? -
o Rabbit is 100 faster than turtle. o Rabbit is
twice as fast as turtle. o Rabbit takes 1/2 as
long as turtle. o Rabbit takes 1/3 as long as
turtle. o Rabbit takes 100 less time than turtle.
o Rabbit takes 200 less time than turtle. o
Turtle is 50 as fast as rabbit. o Turtle is 50
slower than rabbit. o Turtle takes 200 longer
than rabbit. o Turtle takes 300 longer than
rabbit.
9Metrics - Throughput
10Methods For Predicting Performance
- Benchmarks, Traces, Mixes
- Hardware Cost, delay, area, power estimation
- Simulation (many levels)
- ISA, RT, Gate, Circuit
- Queuing Theory
- Rules of Thumb
- Fundamental Laws/Principles
11Benchmarks
SPEC System Performance Evaluation Cooperative
- First Round 1989
- 10 programs yielding a single number
(SPECmarks) - Second Round 1992
- SPECInt92 (6 integer programs) and SPECfp92 (14
floating point programs) - Compiler Flags unlimited. March 93 of DEC 4000
Model 610 - spice unix.c/def(sysv,has_bcopy,bcopy(a,b,c)
memcpy(b,a,c) - wave5 /ali(all,dcomnat)/aga/ur4/ur200
- nasa7 /norecu/aga/ur4/ur2200/lcblas
- Third Round 1995
- new set of programs SPECint95 (8 integer
programs) and SPECfp95 (10 floating point) - benchmarks useful for 3 years
- Single flag setting for all programs
SPECint_base95, SPECfp_base95
12Benchmarks
CINT2000 (Integer Component of SPEC CPU2000)
- Program Language What Is It
- 164.gzip C Compression
- 175.vpr C FPGA Circuit Placement and Routing
- 176.gcc C C Programming Language Compiler
- 181.mcf C Combinatorial Optimization
- 186.crafty C Game Playing Chess
- 197.parser C Word Processing
- 252.eon C Computer Visualization
- 253.perlbmk C PERL Programming Language
- 254.gap C Group Theory, Interpreter
- 255.vortex C Object-oriented Database
- 256.bzip2 C Compression
- 300.twolf C Place and Route Simulator
http//www.spec.org/osg/cpu2000/CINT2000/
13Benchmarks
CFP2000 (Floating Point Component of SPEC
CPU2000)
- Program Language What Is It
- 168.wupwise Fortran 77 Physics / Quantum
Chromodynamics - 171.swim Fortran 77 Shallow Water Modeling
- 172.mgrid Fortran 77 Multi-grid Solver 3D
Potential Field - 173.applu Fortran 77 Parabolic / Elliptic
Differential Equations - 177.mesa C 3-D Graphics Library
- 178.galgel Fortran 90 Computational Fluid
Dynamics - 179.art C Image Recognition / Neural Networks
- 183.equake C Seismic Wave Propagation Simulation
- 187.facerec Fortran 90 Image Processing Face
Recognition - 188.ammp C Computational Chemistry
- 189.lucas Fortran 90 Number Theory / Primality
Testing - 191.fma3d Fortran 90 Finite-element Crash
Simulation - 200.sixtrack Fortran 77 High Energy Physics
Accelerator Design - 301.apsi Fortran 77 Meteorology Pollutant
Distribution
http//www.spec.org/osg/cpu2000/CFP2000/
14Benchmarks
Sample Results For SpecINT2000
http//www.spec.org/osg/cpu2000/results/res2000q3/
cpu2000-20000718-00168.asc
Base Base
Base Peak Peak Peak Benchmarks
Ref Time Run Time Ratio Ref Time
Run Time Ratio 164.gzip 1400
277 505 1400 270
518 175.vpr 1400 419 334
1400 417 336 176.gcc
1100 275 399 1100 272
405 181.mcf 1800 621
290 1800 619 291 186.crafty
1000 191 522 1000 191
523 197.parser 1800 500
360 1800 499 361 252.eon
1300 267 486 1300 267
486 253.perlbmk 1800 302
596 1800 302 596 254.gap
1100 249 442 1100 248
443 255.vortex 1900 268
710 1900 264 719 256.bzip2
1500 389 386 1500 375
400 300.twolf 3000 784
382 3000 776 387 SPECint_base200
0 438 SPECint2000
442
Intel OR840(1 GHz Pentium III processor)
15Benchmarks
Performance Evaluation
- For better or worse, benchmarks shape a field
- Good products created when have
- Good benchmarks
- Good ways to summarize performance
- Given sales is a function in part of performance
relative to competition, investment in improving
product as reported by performance summary - If benchmarks/summary inadequate, then choose
between improving product for real programs vs.
improving product to get more salesSales almost
always wins! - Execution time is the measure of computer
performance!
16Benchmarks
How to Summarize Performance
- Management would like to have one number.
- Technical people want more
- They want to have evidence of reproducibility
there should be enough information so that you or
someone else can repeat the experiment. - There should be consistency when doing the
measurements multiple times.
How would you report these results?
Computer A Computer B Computer C
Program P1 (secs) 1 10 20
Program P2 (secs) 1000 100 20
Total Time (secs) 1001 110 40
17Quantitative Principles of Computer Design
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
Make the common case fast. Amdahls Law Relates
total speedup of a system to the speedup of some
portion of that system.
18Amdahl's Law
Quantitative Design
Speedup due to enhancement E
This fraction enhanced
- Suppose that enhancement E accelerates a fraction
F of the task by a factor S, and the remainder of
the task is unaffected
19Quantitative Design
Cycles Per Instruction
CPI (CPU Time Clock Rate) / Instruction Count
Cycles / Instruction Count
Number of instructions of type I.
Instruction Frequency
where
- Invest Resources where time is Spent!
20Quantitative Design
Cycles Per Instruction
Suppose we have a machine where we can count the
frequency with which instructions are executed.
We also know how many cycles it takes for each
instruction type.
- Base Machine (Reg / Reg)
- Op Freq Cycles CPI(i) ( Time)
- ALU 50 1 .5 (33)
- Load 20 2 .4 (27)
- Store 10 2 .2 (13)
- Branch 20 2 .4 (27)
- Total CPI 1.5
21Quantitative Design
Locality of Reference
- Programs access a relatively small portion of the
address space at any instant of time. - There are two different types of locality
- Temporal Locality (locality in time) If an item
is referenced, it will tend to be referenced
again soon (loops, reuse, etc.) - Spatial Locality (locality in space/location)
If an item is referenced, items whose addresses
are close by tend to be referenced soon (straight
line code, array access, etc.)
22The Concept of Memory Hierarchy
- 1.1 Introduction
- 1.2 The Task of a Computer Designer
- 1.3 Technology and Computer Usage Trends
- 1.4 Cost and Trends in Cost
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together The Concept of
Memory Hierarchy
Fast memory is expensive. Slow memory is
cheap. The goal is to minimize the
price/performance for a particular price point.
23Memory Hierarchy
Registers
Level 1 cache
Level 2 Cache
Memory
Disk
Typical Size 4 - 64 lt16K bytes lt2 Mbytes lt16 Gigabytes gt 5 Gigabytes
Access Time 1 nsec 3 nsec 15 nsec 150 nsec 5,000,000 nsec
Bandwidth (in MB/sec) 10,000 50,000 2000 - 5000 500 - 1000 500 - 1000 100
Managed By Compiler Hardware Hardware OS OS/User
24Memory Hierarchy
- Hit data appears in some block in the upper
level (example Block X) - Hit Rate the fraction of memory access found in
the upper level - Hit Time Time to access the upper level which
consists of - RAM access time Time to determine hit/miss
- Miss data needs to be retrieve from a block in
the lower level (Block Y) - Miss Rate 1 - (Hit Rate)
- Miss Penalty Time to replace a block in the
upper level - Time to deliver the block the processor
- Hit Time ltlt Miss Penalty (500 instructions on
21264!)
25Memory Hierarchy
Registers
Level 1 cache
Level 2 Cache
Memory
Disk
- What is the cost of executing a program if
- Stores are free (theres a write pipe)
- Loads are 20 of all instructions
- 80 of loads hit (are found) in the Level 1 cache
- 97 of loads hit in the Level 2 cache.
26The Instruction Set
- 2.1 Introduction
- 2.2 Classifying Instruction Set Architectures
- 2.3 Memory Addressing
- 2.4 Operations in the Instruction Set
- 2.5 Type and Size of Operands
- 2.6 Encoding and Instruction Set
- 2.7 The Role of Compilers
- 2.8 The MIPS Architecture
- Bonus
27Introduction
- The Instruction Set Architecture is that portion
of the machine visible to the assembly level
programmer or to the compiler writer.
- What are the advantages and disadvantages of
various instruction set alternatives. - How do languages and compilers affect ISA.
- Use the DLX architecture as an example of a RISC
architecture.
28Classifying Instruction Set Architectures
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
- Classifications can be by
- Stack/accumulator/register
- Number of memory operands.
- Number of total operands.
29Instruction Set Architectures
Basic ISA Classes
- Accumulator
- 1 address add A acc acc memA
- 1x address addx A acc acc memA x
- Stack
- 0 address add tos tos next
- General Purpose Register
- 2 address add A B EA(A) EA(A) EA(B)
- 3 address add A B C EA(A) EA(B) EA(C)
- Load/Store
- 0 Memory load R1, Mem1
- load R2, Mem2
- add R1, R2
- 1 Memory add R1, Mem2
ALU Instructions can have two or three operands.
ALU Instructions can have 0, 1, 2, 3 operands.
Shown here are cases of 0 and 1.
30Instruction Set Architectures
Basic ISA Classes
The results of different address classes is
easiest to see with the examples here, all of
which implement the sequences for C A B.
Stack Accumulator Register (Register-memory) Register (load-store)
Push A Load A Load R1, A Load R1, A
Push B Add B Add R1, B Load R2, B
Add Store C Store C, R1 Add R3, R1, R2
Pop C Store C, R3
Registers are the class that won out. The more
registers on the CPU, the better.
31Instruction Set Architectures
Intel 80x86 Integer Registers
GPR0 EAX Accumulator
GPR1 ECX Count register, string, loop
GPR2 EDX Data Register multiply, divide
GPR3 EBX Base Address Register
GPR4 ESP Stack Pointer
GPR5 EBP Base Pointer for base of stack seg.
GPR6 ESI Index Register
GPR7 EDI Index Register
CS Code Segment Pointer
SS Stack Segment Pointer
DS Data Segment Pointer
ES Extra Data Segment Pointer
FS Data Seg. 2
GS Data Seg. 3
PC EIP Instruction Counter
Eflags Condition Codes
32Memory Addressing
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
- Sections Include
- Interpreting Memory Addresses
- Addressing Modes
- Displacement Address Mode
- Immediate Address Mode
33Memory Addressing
Interpreting Memory Addresses
- What object is accessed as a function of the
address and length? - Objects have byte addresses an address refers
to the number of bytes counted from the beginning
of memory. - Little Endian puts the byte whose address is
xx00 at the least significant position in the
word. - Big Endian puts the byte whose address is xx00
at the most significant position in the word. - Alignment data must be aligned on a boundary
equal to its size. Misalignment typically
results in an alignment fault that must be
handled by the Operating System.
34Memory Addressing
Addressing Modes
- This table shows the most common modes.
Addressing Mode Example Instruction Meaning When Used
Register Add R4, R3 RR4 lt- RR4 RR3 When a value is in a register.
Immediate Add R4, 3 RR4 lt- RR4 3 For constants.
Displacement Add R4, 100(R1) RR4 lt- RR4 M100RR1 Accessing local variables.
Register Deferred Add R4, (R1) RR4 lt- RR4 MRR1 Using a pointer or a computed address.
Absolute Add R4, (1001) RR4 lt- RR4 M1001 Used for static data.
35Memory Addressing
Displacement Addressing Mode
- How big should the displacement be?
- For addresses that do fit in displacement size
- Add R4, 10000 (R0)
- For addresses that dont fit in displacement
size, the compiler must do the following - Load R1, address
- Add R4, 0 (R1)
- Depends on typical displaces as to how big this
should be. - On both IA32 and DLX, the space allocated is 16
bits.
36Memory Addressing
Immediate Address Mode
- Used where we want to get to a numerical value in
an instruction.
At high level a b 3 if ( a gt 17
) goto Addr
At Assembler level Load R2, 3 Add R0,
R1, R2 Load R2, 17 CMPBGT R1,
R2 Load R1, Address Jump (R1)
37Operations In The Instruction Set
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
- Sections Include
- Detailed information about types of instructions.
- Instructions for Control Flow (conditional
branches, jumps)
38Operations In The Instruction Set
Operator Types
- Arithmetic and logical and, add
- Data transfer move, load
- Control branch, jump, call
- System system call, traps
- Floating point add, mul, div, sqrt
- Decimal add, convert
- String move, compare
- Multimedia - 2D, 3D? e.g., Intel MMX and Sun
VIS
39Operations In The Instruction Set
Control Instructions
Conditional branches are 20 of all instructions!!
- Control Instructions Issues
- taken or not
- where is the target
- link return address
- save or restore
- Instructions that change the PC
- (conditional) branches, (unconditional) jumps
- function calls, function returns
- system calls, system returns
40Operations In The Instruction Set
Control Instructions
- There are numerous tradeoffs
- Compare and branch
- no extra compare, no state passed between
instructions - -- requires ALU op, restricts code scheduling
opportunities - Implicitly set condition codes Z, N, V, C
- can be set for free''
- -- constrains code reordering, extra state to
save/restore - Explicitly set condition codes
- can be set for free'', decouples branch/fetch
from pipeline - -- extra state to save/restore
There are numerous tradeoffs condition in
generalpurpose register no special state but
uses up a register -- branch condition separate
from branch logic in pipeline some data for MIPS
gt 80 branches use immediate data, gt 80 of
those zero 50 branches use 0 or ltgt 0
compromise in MIPS branch0, branchltgt0
compare instructions for all other compares
41Operations In The Instruction Set
Control Instructions
- Link Return Address
- implicit register many recent architectures use
this - fast, simple
- -- s/w save register before next call, surprise
traps? - explicit register
- may avoid saving register
- -- register must be specified
- processor stack
- recursion direct
- -- complex instructions
Save or restore state What state? function
calls registers system calls registers, flags,
PC, PSW, etc Hardware need not save registers
Caller can save registers in use Callee save
registers it will use Hardware register save
IBM STM, VAX CALLS Faster? Many recent
architectures do no register saving Or do
implicit register saving with register windows
(SPARC)
42Type And Size of Operands
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
- The type of the operand is usually encoded in the
Opcode a LDW implies loading of a word. - Common sizes are
- Character (1 byte)
- Half word (16 bits)
- Word (32 bits)
- Single Precision Floating Point (1 Word)
- Double Precision Floating Point (2 Words)
- Integers are twos complement binary.
- Floating point is IEEE 754.
- Some languages (like COBOL) use packed decimal.
43Encoding And Instruction Set
- This section has to do with how an assembly level
instruction is encoded into binary. - Ultimately, its the binary that is read and
interpreted by the machine.
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
44Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with the debugger option so is
not optimized.
- for ( index 0 index lt iterations index )
- 0040D3AF C7 45 F0 00 00 00 00 mov
dword ptr ebp-10h,0 - 0040D3B6 EB 09 jmp
main0D1h (0040d3c1) - 0040D3B8 8B 4D F0 mov
ecx,dword ptr ebp-10h - 0040D3BB 83 C1 01 add
ecx,1 - 0040D3BE 89 4D F0 mov
dword ptr ebp-10h,ecx - 0040D3C1 8B 55 F0 mov
edx,dword ptr ebp-10h - 0040D3C4 3B 55 F8 cmp
edx,dword ptr ebp-8 - 0040D3C7 7D 15 jge
main0EEh (0040d3de) - long_temp (alignment
long_temp) 47 - 0040D3C9 8B 45 F4 mov
eax,dword ptr ebp-0Ch - 0040D3CC 8B 00 mov
eax,dword ptr eax - 0040D3CE 03 45 EC add
eax,dword ptr ebp-14h - 0040D3D1 99 cdq
- 0040D3D2 B9 2F 00 00 00 mov
ecx,2Fh - 0040D3D7 F7 F9 idiv
eax,ecx - 0040D3D9 89 55 EC mov
dword ptr ebp-14h,edx - 0040D3DC EB DA jmp
main0C8h (0040d3b8)
This code was produced using Visual Studio
45Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with optimization
- for ( index 0 index lt iterations index )
- 00401000 8B 0D 40 54 40 00 mov
ecx,dword ptr ds405440h - 00401006 33 D2 xor
edx,edx - 00401008 85 C9 test
ecx,ecx - 0040100A 7E 14 jle
00401020 - 0040100C 56 push esi
- 0040100D 57 push edi
- 0040100E 8B F1 mov
esi,ecx - long_temp (alignment long_temp) 47
- 00401010 8D 04 11 lea
eax,ecxedx - 00401013 BF 2F 00 00 00 mov
edi,2Fh - 00401018 99 cdq
- 00401019 F7 FF idiv
eax,edi - 0040101B 4E dec esi
- 0040101C 75 F2 jne
00401010 - 0040101E 5F pop edi
- 0040101F 5E pop esi
- 00401020 C3 ret
This code was produced using Visual Studio
46Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with optimization
- for ( index 0 index lt iterations index )
- 0x804852f ltmain143gt add 0x10,esp
- 0x8048532 ltmain146gt lea 0xfffffff8(ebp),e
dx - 0x8048535 ltmain149gt test esi,esi
- 0x8048537 ltmain151gt jle 0x8048543
ltmain163gt - 0x8048539 ltmain153gt mov esi,eax
- 0x804853b ltmain155gt nop
- 0x804853c ltmain156gt lea 0x0(esi,1),esi
- long_temp (alignment long_temp) 47
- 0x8048540 ltmain160gt dec eax
- 0x8048541 ltmain161gt jne 0x8048540
ltmain160gt - 0x8048543 ltmain163gt add 0xfffffff4,esp
This code was produced using gcc and gdb.
Note that the representation of the code is
dependent on the compiler/debugger!
47Encoding And Instruction Set
- 80x86 Instruction Encoding
3
4
8
1
A Morass of disjoint encoding!!
Reg
ADD
Disp.
W
6
8
2
8
postbyte
SHL
V/w
Disp.
7
1
8
8
TEST
W
postbyte
Immediate
48Encoding And Instruction Set
- 80x86 Instruction Encoding
4
4
8
Cond
JE
Disp.
16
16
8
CALLF
Offset
Segment Number
6
8
2
8
postbyte
MOV
D/w
Disp.
5
3
PUSH
Reg
49The Role of Compilers
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
- Compiler goals
- All correct programs execute correctly
- Most compiled programs execute fast
(optimizations) - Fast compilation
- Debugging support
50The Role of Compilers
Parsing gt intermediate representation Jump
Optimization Loop Optimizations Register
Allocation Code Generation gt assembly code
Common SubExpression Procedure in-lining
Constant Propagation Strength Reduction
Pipeline Scheduling
51The Role of Compilers
Optimization Name Explanation of the total number of optimizing transformations
High Level At or near the source level machine-independent Not Measured
Local Within Straight Line Code 40
Global Across A Branch 42
Machine Dependent Depends on Machine Knowledge Not Measured
52The Role of Compilers
- What compiler writers want
- regularity
- orthogonality
- composability
- Compilers perform a giant case analysis
- too many choices make it hard
- Orthogonal instruction sets
- operation, addressing mode, data type
- One solution or all possible solutions
- 2 branch conditions eq, lt
- or all six eq, ne, lt, gt, le, ge
- not 3 or 4
- There are advantages to having instructions that
are primitives. - Let the compiler put the instructions together to
make more complex sequences.
53The MIPS Architecture
- MIPS is very RISC oriented.
- MIPS will be used for many examples throughout
the course.
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The MIPS
Architecture
MIPS (originally an acronym for Microprocessor
without Interlocked Pipeline Stages) is a RISC
microprocessor architecture developed by MIPS
Technologies. We will look at the Pipeline
concept in our next lecture.
The acronym RISC (pronounced risk), for reduced
instruction set computer represents a CPU design
strategy emphasizing the insight that simplified
instructions which "do less" may still provide
for higher performance if this simplicity can be
utilized to make instructions execute very fast.
Well known RISC families include DEC Alpha, ARC,
ARM, AVR, MIPS, PA-RISC, Power Architecture
(including PowerPC), and SPARC.
54The MIPS Architecture
- Addressing Modes
- Immediate
- Displacement
- (Register Mode used only for ALU)
32bit byte addresses aligned Load/store only
displacement addressing Standard data types 3
fixed length formats 32 32bit GPRs (r0 0) 16
64bit (32 32bit) FPRs FP status register No
Condition Codes
- Data transfer
- load/store word, load/store byte/half word
signed? - load/store FP single/double
- moves between GPRs and FPRs
- ALU
- add/subtract signed? immediate?
- multiply/divide signed?
- and, or, xor immediate?, shifts ll, rl, ra
immediate? - sets immediate?
Theres MIPS 64 the current arch. Standard
datatypes 4 fixed length formats (8,16,32,64) 32
64bit GPRs (r0 0) 64 64bit FPRs
55The MIPS Architecture
- Control
- branches 0, ltgt 0
- conditional branch testing FP bit
- jump, jump register
- jump link, jump link register
- trap, returnfromexception
- Floating Point
- add/sub/mul/div
- single/double
- fp converts, fp set
56The DLX Architecture
The DLX is a RISC processor architecture design
by the principal designers of the MIPS and the
Berkeley RISC designs, the two benchmark examples
of RISC design. The DLX is essentially a cleaned
up and simplified MIPS with a simple 32-bit
load/store architecture. Intended primarily for
teaching purposes, the DLX design is widely used
in university-level computer architecture courses.
The next couple of lectures will use the MIPS and
DLX architectures as examples to demonstrate
concepts.
57End of Lecture