Title: Lecture 4 Instruction Set Architecture
1Lecture 4Instruction Set Architecture
2Instruction Set Architecture
- 1950s to 1960s Computer Architecture Course
Computer Arithmetic - 1970 to mid 1980s Computer Architecture Course
Instruction Set Design, especially ISA
appropriate for compilers - 1990s Computer Architecture Course Design of
CPU, memory system, I/O system, Multiprocessors
3Languages of Computers
- Machine Language
- Programs consist of machine instructions
- Directly executable without preprocessing
- Direct manipulation of machine registers
- Efficient in view of machine resource utilization
- Difficult to program
- Assembly language
- Improved version of machine language with
emphasis on user-friendliness - Symbolic machine language(symbols for operations
and addresses) - Assembler is needed to translate into a machine
language program - High-Level Language
- Programs consist of statements, each of which can
be translated into several machine language
instructions - Need a compiler to translate into a machine
language program - Relatively easy to program compare to ML or AL
- Hardware resource utilization may be inefficient
4Semantic Gap Between ML and HLL
- As Hardware cost goes down, Software cost goes up
- Shortage of programmers
- Unreliable Software gt Unreliable Computers
- Response Keep the programming cost down
- Develop powerful, complex user-friendly HLL
- HLL programmers are easy to train
- Greater Semantic Gap between HLL and Machine
Language - Execution inefficiency
- Software complexity
- Compiler complexity
- To offset the semantic gap
- Large instruction set
- Variety of addressing modes
- Hardware/Firmware implementation of HLL
primitives
5Instruction Set
- Boundary between Designers(architects) and
programmers - For designers Specification of the function of
CPU - For Programmers A pool of functions from which
they choose to use in the program
One would expect that human language should
directly reflect the characteristics of human
intellectual capabilities that language should be
a direct mirror of mind in ways which other
systems of knowledge and belief cannot. - Noam
Chomsky
- Instruction Set
- Language of a machine
- Characterizes the machines capability and
behavior - Performance Issues
- Memory Bandwidth is used 1/2 for Instructions and
1/2 for Data - For efficient utilization of MB, instruction
representation must as compact as possible whilst
still being compatible with data - von Neumann Bottleneck exists in MB
6Memory Bandwidth Issue
- Memory Bandwidth is used by CPU and I/O
- Memory Bandwidth given to CPU is used for
Instruction Fetches and Operand Fetches or
Operand Stores - Consider an AC-machine ADD X, or LDA X
7Machine Language
- Machine Language
- Vocabulary
- Operations
- Addressing Modes for operands addresses and the
next instruction address - Syntax
- Methods of representing operation(OP-code),
operands, addresses in an instruction - Instruction format
- Encoding of Instruction fields
- Grammar
- Rules of using instructions to make a program
8Components of an Instruction
- Operation Code(OP-code)
- Format specifier
- Long / Short
- Field definition
- Operation
- Types of operands
- Operand Address(es)
- Operand itself
- Address themselves(including abbreviated)
- Address modification specification
- Automatic indexing
- Relative address
- Sequencing
9Instruction Set and Computer Architecture
- Computer Architectures are classified into three
classes according to the Register Structures for
operands storage
- Stack Computer Architecture
- General Purpose Register Computer Architecture
10Stack Computer Architecture
Instruction Operation
PUSH X if F1, then S overflow
POP X if E1, then empty S
if E1, then empty S
Unary Instr.
(Shift Left)
if E1, then empty S
Binary Instr.
(ADD)
if SP(n-1),
11Characteristics of the Stack Architecture
- Instruction length is short
- No need to represent the address(es) of
operand(s) in functional instructions - Instruction execution time is fast
- Operand(s) access is fast because they are in the
stack(register) - Operand(s) must be stored in the stack before
operating on them - Inconvenient to prepare data in the stack
- Frequent use of PUSH and POP instructions to
prepare data in the stack - memory access
12AC Computer Architecture
(CPA)
(ADD X)
Transfer Instruction
- Characteristics
- - Instruction execution time of binary
instructions are slow - One of the operands must be read from memory
- - Instruction length is longer than in the stack
architecture - One of the operands memory address must be
specified in the instruction
although AC(a data register) can be implied - - Frequency of LDA/STA instructions is high
- There is only one data register
13GPR Computer Architecture
Unary Instruction
Binary Instruction
Transfer Instruction
Characteristics - Instruction length is short
because register addresses are used
for operands - Instruction execution time is
fast because all the operands are in the
registers - Frequency of using LD/ST
instructions depends on the number of
registers - Opportunities of storing the results
of operations in GPR is high because there
are many registers
14Computer Architecture?
- . . . the attributes of a computing system as
seen by the programmer, i.e. the conceptual
structure and functional behavior, as distinct
from the organization of the data flows and
controls, the logic design, and the physical
implementation. - Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
15Towards Evaluation of ISA and Organization
16Interface Design
- A Good Interface
- Lasts through many implementations (portability,
compatibility) - Is used in many different ways (generality)
- Provides convenient functionality to higher
levels - Permits an efficient implementation at lower
levels
17Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
18Evolution of Instruction Sets
- Major advances in computer architecture are
typically associated with landmark instruction
set designs - Ex Stack(B1700) vs GPR (System S/360)
- Design decisions must take into account
- technology(component)
- machine organization
- programming languages
- compiler technology
- operating systems
- And they in turn influence these
19Design Space of ISA
- Five Primary Dimensions
- Number of explicit operands ( 0, 1, 2, 3 )
- Operand Storage Where besides memory?
- Effective Address How is memory location
specified? - Type and Size of Operands byte, int, float,
vector, . . . - How is it specified?
- Operations add, sub, mul, . . .
- How is it specified?
- Other Aspects
- Successor How is it specified?
- Conditions How are they determined?
- Encoding Fixed or variable? Wide?
- Parallelism
20Number of Explicit Operands
- To optimize the memory bandwidth required by
instructions(for fetching from Memory), the
number of explicitly specified operands in the
instruction needs to be reduced - 2 operands(GPR machine)
- 2 source operands(1 of the source operands is
destroyed after execution to store the result) - 1 operand(AC machine)
- 1 of the operands is implied to a specific
hardware register called Accumulator(AC)(result
of the execution is also stored in this register) - 0 operand(Stack machine)
- Both of the operands and the result are implied
to a stack
21Operand Storage
- Storage
- Memory
- - Long memory addressing
- - Need to represent the address with a few bits
- Relative addressing with displacement
- Page/Segment addressing
- Register
- - General purpose register
- Short register addressing
- - AC
- Stack(register)
- - Does not need for addresses
22Address Space and Storage Space
- Address Space
- Consists of addresses that programmers can use
- Storage Space
- Consists of physical storage locations
- For a simple low cost machine, the Address Space
and the Storage Space are identical - Programmers program with the actual storage
addresses - Modern computers provide the storage systems with
Independent Address and Storage Spaces - An Effective Address(EA) needs to be obtained
from the Address used in the program to access
the operand from the memory - Usually the Address Space is much larger than the
Storage Space - Virtual Storage System
23Effective Address
- Address and Physical Storage Location are two
different concepts. - Addresses of Operands are represented or
implied in the instruction. - Operands address needs to be mapped into an
Effective Address of the physical
storage location
Basic Addressing Modes(A or R in instructions)
Immediate opdA of M refer limited
value Direct EAA simple limited addr
space Indirect EAMA large addr space
multiple M refer Register EAR no M refer
limited addr space R Indirect EA MR large addr
space extra M refer Displacement EA
AR flexibility complexity Stack opdSTOP n
o M refer limited applications
24Specification of Type and Size of Operand
- Specification of the Type of the operand
- Usually different op-codes for different types of
operands - Specification of the Size of the operand
- op-code represents the resolution of the operand
address - bit, byte, half word(upper/lower half) , word,
... - Length of operands
- Implicit
- Variable length
- Specified explicitly in the instruction
- Specified by a designated register
- Specified by the delimiter marks in the operand
- reserved-bit delimiter(field or word mark)
- reserved-bit configuration(record or group mark)
25(No Transcript)
26Operation
- Specification
- Encoded to reduce the instruction length reason
- Types
- Minimal Instruction Set
- Complex Instruction Set vs RISC
27Four Types of Operations
- Functional
- ADD, AND, CPA, CPC, ROL, CLA, CLC, INC,
- Transfer
- LDA, STA(LD, ST),
- Control
- JMP, JNA, JZA, JZC(SMA, SZA, SZC),
- Input/Output
- INP, OUT,
28Minimal Instruction Set
29Why NOT Use a Minimal Instruction Set?
Inefficient Program Size(M bandwidth) -
Large IC and CPI Programming difficulty
30Instruction Set DesignOperations to Include in
the Instruction Set
- Trade-off 3 Es(Elegance, Efficiency,
Environment) - Elegance
- Completeness(Even Bn instruction is complete)
- Symmetry AC lt f(AC, MX) and MX lt
f(AC, MX) - Flexibility, Generality
- Efficiency
- Space
- Bit budget
- Efficient specification of address
- Fewer instructions require fewer bits to encode
OP-code - Frequency of use arguments
- Bandwidth arguments(NOP simply waste memory
bandwidth) - Ratio of overheads non-functional to
functional - Environment
- Multiprogramming(Relocation, Protection, Sharing)
- Code generation by compilers(Compiler favors only
a little portion of instruction set)
31ISA Metrics
- Aesthetics
- Orthogonal
- No special registers, few special cases, all
operand modes available with any data type or
instruction type - Completeness
- Support for a wide range of operations and target
applications - Regularity
- No overloading for the meanings of instruction
fields - Streamlined
- Resource needs easily determined
- Ease of compilation (programming?)
- Ease of implementation
- Scalability
32Powerful Instruction
Overhead for Execution(O)
(E)
- Rich, Powerful Instruction
- Instruction with longer Execution Time(E) to
balance the overhead penalty(O) - Instruction which has a large E/O
33Powerful Instructions
- Extended Arithmetic Function
- Multiply, divide, Trigonometric Functions, etc
- Automatic Indexing
- BCT R1, addr (R1 lt- R1 - 1, if R1 0 then PC
lt- addr) - BXLE R1, R3, addr (R1 lt- R1 R3,
if R3odd, R1 lt R3,
PC lt- addr if R3even, R1 lt
R31, PC lt- addr) - Subroutine Linkage
- JMS X (MX lt- PC, PC lt- X1)
34Powerful Instructions
- Process State Exchange(Context Switch)
- Instructions required in the multiprogramming
environments
Otherwise LD R1, addr LD R2,
addr1 LD R5, addr4
XJ(Exchange Jump of CDC 6000 series)
35Basic ISA ClassesType of Internal Storage
36Stack Machines
- Instruction set
- Arithmetic operators(, -, , /, . . .)
- push A, pop A
37The Case Against Stacks
- Performance is derived from the existence of
several fast registers, not from the way they are
organized - Data does not always surface when needed
- Constants, repeated operands, common
sub-expressions - so TOP and Swap instructions are required
- Code density is about equal to that of GPR
instruction sets - Registers have short addresses
- Keep things in registers and reuse them
- Slightly simpler to write a poor compiler, but
not an optimizing compiler
38(No Transcript)
39GPR Machines
GPR(General Purpose Register)
- Faster than memory
- Easier for a compiler to use
- Used to hold variables, intermediate operands
- the memory traffic reduces
- the code density improves
- How many registers?
- depends on how they are used by the compiler
40How Many Registers in RF
6 algorithms from CALGO(ACM) written in 4
languages ALGOL,BASIC, BLISS,FORTRAN
We need to try to keep the live registers in the
RF
41GPR Machines
- Maximum number of operands(O)
- two or three operands
- Number of memory addresses(M)
- 0,1,2,3
42GPR Machines
Type Register-register (0,3) Register-memory (1,
2) Memory-memory (3,3)
Advantages Simple,
fixed-length instr. encoding. Simple
code generation model Data can be accessed
without loading first. Instruction format tends
to be easy to encode and yields good
density. Program becomes most compact. No waste
of registers for temporaries.
Disadvantages Higher instruction count. Some
instructions are short and bit encoding may be
wasteful. A source operand is destroyed. Clocks
per instruction varies by operand
location. Large variation in instruction sizes
and in work per instruction. Memory accesses
create memory bottleneck.
43R-R vs RM
ABC
RR Instructions LD R1,A LD R2,B LD R3,C ADD R4
,R1,R2 ADD R5,R4,R3 RM instructions LD R1,A AD
D R1,B ADD R1,C
RM instructions reduce IC
44What About Actual Programs
- Consider a GPR machine with a large register
file. - - Highly probable that the intermediate data can
be found in a register - - Thus, LD/ST instruction will be used less
frequently - - However, frequency of using LD/ST instructions
in the computers that use RM instructions will
reduced further
45VAX-11
Variable format, 2- and 3-address instructions
- 32-bit word size, 16 GPR (4 reserved)
- Rich set of addressing modes (apply to any
operand) - Rich set of operations
- bit field, stack, call, case, loop, string,
poly, system - Rich set of data types (B, W, L, Q, O, F, D,
G, H) - Condition codes
46Kinds of Addressing Modes
Addressing Mode value in is the
operand
- Register direct Ri
- Immediate (literal) v
- Direct (absolute) Mv
- Register indirect MRi
- BaseDisplacement MRi v
- BaseIndex MRi Rj
- Scaled Index MRi Rjd v, eg. d8
- Autoincrement MRi1
- Autodecrement MRi - 1
- Memory Indirect M MRi
- Indirection Chains
47Memory Addressing Modes (VAX)
48Operand Address bitsDisplacement Values
- This value is related to the operand address
field when the address is represented by the
displacement from the base address - Wide distribution
- The vast majority --- positive
- A majority of the large displacements -negative
49Operand Address bits Immediate Addressing Mode
Percentage of operations that use immediates
50Operand Address bits Immediate Addressing Mode
51Operations in the Instr. Set
Operator type Examples
Add, Subtract, Data transfer
Arithmetic and logical
Load, Store, Move, Control
Branch, Jump, Procedure Call, Return,
Trap System
Operating System Call, VMM instructions Floatin
g Point
Floating Point Add Decimal
Decimal Add, Decimal-to-Character
Conversion String
String Move, String Compare, String
Search Graphics
Pixel operations, Compress/Decompress op.
52Operations in the Instr. Set
Integer average ( total executed) 22 20 16
12 8 6 5 4 1 1 96
Rank 1 2 3 4 5 6 7 8 9 10 Total
80x86 instructions load conditional
branch compare store add and sub move
reg-reg call return
53Control Flow Instructions
54(No Transcript)
55RISC
56Instruction Execution CharacteristicsType of
Operations
Relative Dynamic Frequencies of statements in HLL
programs
- What type of statements is most frequent?
- Assignment statements dominate
- Functional instructions and Transfer
instructions - Movements of data must be made simple, thus fast
- Conditional Statements(if and loop together)
- Instructions with Control function
- Sequence control mechanism is important
57Instruction Execution CharacteristicsTime
Consumed by Statements
Machine instruction weighted Average No.
of machine Instr. / Statements x Frequency of
Occurrences Memory reference weighted
Average No. of memory references / Statement x
Frequency of Occurrences Most time consuming
statement is procedure CALL/RETURN
58Instruction Execution CharacteristicsType of
Operands
- Majority of references to scalar
- 80 are local to a procedure
- References to arrays/structure require index or
pointer
- Locations of operands(Average per instruction)
- 0.5 operands in memory
- 1.4 operands in registers
59Instruction Execution CharacteristicsProcedure
Calls
- Two most significant aspects in implementing
procedure Call/Returns - Number of parameters
- Depth of nesting
- Statistics on Number of Parameters
- 98 of dynamically called procedures were passed
fewer than 6 parameters - 92 of them used fewer than 6 local scalar
variables
60Multiple Register Sets
Multiple register sets - Assume that we have
several sets of registers that each set can be
used by each different procedure - Saves some
time in procedure CALL/RETURN simply by changing
the R set pointer value
61Instruction Execution CharacteristicsDepth of
Procedure Nesting
Procedure Nesting and Register Set Window
t
Depth
Shifting register set window need to save the
information in one register
set in the memory so that a register set can
be used by the new procedure
Statistics Window depth of 8 will need to shift
only on less than 1 of calls and
returns
62RISC Philosophy(1)Make the Most Frequent
Statements Execute Fast
Most frequent statements are Assignment Type of
Statements and each of them are translated by the
compiler into a set of Functional Instructions
and/or Transfer Instruction. Thus Functional and
Transfer Instructions need to be made to execute
fast.
Instruction Cycle of Functional Instruction or
Transfer Instruction
63Assignment Statements
- To make the Instruction Fetch fast
- Short OP-code part Small number of instructions
in the instruction set - Short Operand Address part Make the operands in
the registers instead of M - To make the Instruction Preparation fast
- Fixed length instruction
- Fixed format instruction
- Simple addressing modes
- To make the Operand Fetch fast
- Make the operands available from registers
instead of memory - Needs a large register file
- To make the Instruction Execution fast
- Multiple register set Overlapping MRS
- Instruction execution pipeline
64RISC Philosophy(2)Make the Most Time-Consuming
Statements Execute Fast
- Methods of passing Parameters
-
- Through memory
- Parameters are stored in the memory locations
which are commonly accessible by both calling
and called procedures - Execution of CALL and RETURN instructions are
very slow due to the memory accesses, especially
when there are many parameters to pass - Through registers
- Parameters are stored in the registers in CPU
- Calling procedure needs to save the registers,
which are not used for passing parameters, in
the memory. This results in a lot of memory
accesses and makes the execution times of these
instructions slow.
65CISC and RISC
- RISC
- A limited and simple instruction set
- A large number of GPR(Register File)
- An emphasis on optimizing the instruction
pipeline
66Large Register File
Quick access to operands is desirable -
Assignment Statements rely on Functional and
Transfer Instructions - Functional
Instructions heavily rely on registers -
Frequency of Transfer Instructions depends on the
number of registers in the register file
If the number of registers is small, it needs a
strategy to keep the most frequently accessed
operands in registers to minimize Register-Memory
traffic - Software approach Maximize
register usage by compiler (Requires
sophisticated program analysis) - Hardware
approach More registers in the register file
67Register Window
- Fact
- Statistically, most operand references are to
local scalars - 80 - Local variables to a procedure cannot be accessed
by other procedure(s) - Problem
- Local changes with each procedure CALL/RETURN
- CALL/RETURN occurs frequently
- Parameters need to be passed around
- Observations
- Statistically, a few parameters(lt6) and local
variables(lt6) - Statistically, depth of procedure activation
fluctuates within relatively narrow range(lt8) - Solution
- Multiple small sets of registers
- Each set is assigned to a different procedures
- Windows for adjacent procedures overlap to allow
parameter passing
68Multiple Register Set
Each Register Set is assigned to a different
procedure - Size of a Register Set is equal to
the size of a window - Parameters need to be
copied in the called/calling procedures Register
Set, however, there is no need to copy all the
registers from the switched off register
set - Require register move instructions
69Overlapping Register Window
When multiple of Register Sets are implemented in
a large Register File, we call a Register Set as
a Register Window. Multiple register sets still
require to copy the parameter values between
register sets. Overlapping Register Window -
Portions of register windows overlap for passing
parameters - At any time only one window is
visible - No need for moving information for
parameter passing
How about global variables?
70Global Variables
- Global Variables are commonly accessible by all
the procedures - Assign to memory locations by compiler
- Straight forward but inefficient for the
frequently accessed global variables because of
frequent memory accesses - Set aside a set of Global Variable registers
- Available to all procedures
- Unified register numbering system to simplify
instruction format - e.g. R0 R7 Global
R8 R13 Current window
71Linear Organization of Register Windows
72(No Transcript)
73(No Transcript)
74(No Transcript)
75Code Size
- Smaller programs
- Program takes less memory space
- Smaller program improves performance
- Fewer instructions
- Fewer bytes to fetch
- In paging environment, occupy in fewer pages and
reduces page faults - CISC
- Smaller number of instructions in the
program(program may be shorter but not
necessarily smaller space)
76Example
CISC
Memory Traffic Instruction 56
bits Data 32 x 3 96 bits Total MB
used 56 96 152 bits
RISC LD Rb B
LD Rc
C ADD Ra Rb
Rc ST Ra
A
Memory Traffic Instruction 112
bits Data 96 bits Total MB used 200 bits
77Characteristic of RISC(1, 2)
- (1) 1 Instruction per cycle(memory cycle)
- Machine cycle IF IP Time to fetch the
operands from registers
Perform operation Store the result in
a register - RISC instruction ltgt CISC micro-instruction
gt No need to
microprogram(Hardwired control) - (2) Register-to-Register operation
- With only simple Load and Store operations for
accessing memory(Load/Store Arch.) - Simplifies the instruction set, and control unit
78Characteristic of RISC(3, 4)
- (3) Simple Addressing Modes - Shorten EA
generation time - Almost all instructions use register addressing
- Relative addressing using PC, BAR, and Index
address - Other complex modes may be synthesized by software
- (4) Simple Instruction Format - Shorten
instruction Decoding Time - Usually one format
- Fixed length/align on word boundary
- Fixed field length
79Characteristic of RISC(5)
- (5) Pipelining (We will learn this later in
detail) - At this time, you just need to know that
- - Instruction execution hardware can be made of
a few inter- connected independent
sub-modules, called pipeline STAGEs
- An instruction execution progresses at each
pipeline stage in sequence - When an
instruction completes its execution at the i-th
stage, the next instruction commences
its execution at the i-th stage - Thus, in the
ideal situation, throughput increases nearly n
times, where n is the number of
pipeline stages - Branch instruction makes the
pipelined execution inefficient
80Pipelined Execution
1 instruction execution
I0
Execution of a Sequence of Instructions
I0
S3
At 4t I0
N instructions complete at (n3)t When n
is large it becomes nt Thus, 1 instruction
in every t
I1
S3
At 5t I1
I2
At 6t I2
I3
At 7t I3
I4
At 8t I4
81A "Typical" RISC
- 32-bit fixed format instruction (3 formats)
- 32 32-bit GPR (R0 contains zero, DP takes a pair)
- 3-address, R-R functional instruction
- Single address mode for load/store base
displacement - no indirection
- Simple branch conditions
- Delayed branch
see SPARC, MIPS, MC88100, AMD2900, i960, i860
PARisc, DEC Alpha, Clipper, CDC
6600, CDC 7600, Cray-1, Cray-2, Cray-3
82Branch Displacement
83Implementation of Conditional Branch Instructions
Evaluating branch conditions
How condition is tested Test special bits set
by ALU operations, possibly under program
control. Test arbitrary registers set by the
result of a comparison. Compare is part of the
branch. Often compare is limited to subset.
Name Condition Code(CC) Condition
Register Compare and Branch
Advantages Sometimes condition is set for free,
if not 2 instrs for a branch. Simple. 2
instrs for a branch 1 instr. rather than 2 for
a branch
Disadvantages CC is an extra state. CCs
constrain the ordering of instrs since they pass
info from one instr to a branch. Uses up a
register. May be too much work per instruction.
84Putting It All Together DLX Architecture
- Read Section 2.8 ---- MUST
- DLX emphasizes
- A simple load-store instruction set
- Design for pipelining efficiency
- A fixed instr. set encoding
- Efficiency as a compiler target
85Example MIPS
86The Different Goals for VAX and MIPS
- VAX - simple compilers and code density
- powerful addressing modes
- powerful instructions
- efficient instruction encoding
- few registers
- MIPS - high performance via pipelining, ease of
HW implementation, compatibility with highly
optimizing compiler - simple instruction
- simple addressing modes
- fixed-length instruction formats
- a large number of registers
87VAX vs. MIPS
88Fallacies and Pitfalls
- Pitfall Designing a high-level instruction set
feature specifically oriented to
supporting a high-level language structure. - Fallacy There is such a thing as a typical
program. - Fallacy An architecture with flaws cannot be
successful. - 80x86 supports Segmentation while other support
page - Extended AC for integer, while others use
GPR - Stack for FP operations, while others
abandoned stack - Fallacy You can design a flawless architecture.
- All architecture design involves trade-off made
in the context of a set of HW and SW technologies.
89Most Popular ISA of All TimeIntel 80x86
- 1971 Intel invents microprocessor 4004/8008,
8080 in 1975 - 1975 Gordon Moore realized one more chance for
new ISA before ISA locked in for decades - Hired CS people in Oregon
- Werent ready in 1977 (CS people did 432 in 1980)
- Started crash effort for 16-bit microcomputer
- 1978 8086 dedicated registers, segmented
address, 16- bit - 8088 8-bit external bus version of 8086 added
as after thought
90Most Popular ISA of All TimeIntel 80x86
- 1980 IBM selects 8088 as basis for IBM PC
- 1980 8087 floating point coprocessor adds
60 instructions using hybrid stack/register
scheme - 1982 80286 24-bit address, protection, memory
mapping - 1985 80386 32-bit address, 32-bit GP registers,
paging - 1989 80486 Pentium in 1992 faster MP few
instructions
9180x86 Addressing/Protection
9280x86 Instruction Format
- 8086 in blue 80386 extensions in red
9380x86 Instruction Encoding Address Specifier
Field Mod, Reg, R/M
- r w0 w1 r/m mod0 mod1
mod2 mod3 - 16b 32b 16b 32b 16b 32b 16b 32b
- 0 AL AX EAX 0 addrBXSI EAX same same same same
same - 1 CL CX ECX 1 addrBXDI ECX addr addr addr
addr as - 2 DL DX EDX 2 addrBPSI EDX mod0 mod0 mod0 mo
d0 reg - 3 BL BX EBX 3 addrBPSI EBX d8 d8 d16 d32
field - 4 AH SP ESP 4 addrSI (sib) SId8 (sib)d8 SId8
(sib)d32 - 5 CH BP EBP 5 addrDI d32 DId8 EBPd8 DId16
EBPd32 - 6 DH SI ESI 6 addrd16 ESI BPd8 ESId8 BPd16
ESId32 - 7 BH DI EDI 7 addrBX EDI BXd8 EDId8 BXd16
EDId32
Address Specifier Reg3 bits, R/M3 bits, Mod2
bits
9480x86 Instruction EncodingSc/Index/Base field
Base Scaled Index Mode Used when mod
0,1,2 in 32-bit mode and r/m 4 2-bit
Scale Field 3-bit Index Field 3-bit Base Field
- 0 EAX EAX
- 1 ECX ECX
- 2 EDX EDX
- 3 EBX EBX
- 4 no index ESP
- 5 EBP if mod0, d32 if modltgt0, EBP
- 6 ESI ESI
- 7 EDI EDI
9580x86 Addressing Mode Usage for 32-bit Mode
Register indirect 10 10 6 2 7 Base 8-bit
disp 46 43 32 4 31 Base 32-bit
disp 2 0 24 10 9 Indexed 1 0 1 0 1
Based indexed 8b disp 0 0 4 0 1 Based
indexed 32b disp 0 0 0 0 0 Base Scaled
Indexed 12 31 9 0 13 Base Scaled Index
8b disp 2 1 2 0 1 Base Scaled Index 32b
disp 6 2 2 33 11 32-bit Direct 19 12 20
51 26
9680x86 Length Distribution
97Instruction Counts 80x86 vs. DLX
gcc 3,771,327,742 3,892,063,460 1.03
espresso 2,216,423,413
2,801,294,286 1.26 spice 15,257,026,309
16,965,928,788 1.11 nasa7 15,603,040,963
6,118,740,321 0.39
98Intel Compiler vs. Compilers YOU Can Buy
- 66 MHz Pentium Comparison SpecInt92 SpecFP92
- Intel Internal Optimizing Compiler 64.6 59.7
- Best 486 Compiler (June 1993) 57.6 39.9
- Typical 486 Compiler in 1990,
when Intel
started project 41.0 32.5 - Integer Intel 1.1X faster, FP 1.5X faster
- .
- 486 Comparison SpecInt92 SpecFP92
- Intel Internal Optimizing Compiler 35.5 17.5
- Best 486 Compiler (June 1993) 32.2 16.0
- Typical 486 Compiler in 1990,
- when Intel started project 23.0 12.8
- Integer Intel 1.1X faster, FP 1.1X faster
99Intel Summary
- Archeology history of instruction design in a
single product - Address size 16 bit vs. 32-bit
- Protection Segmentation vs. paged
- Temp. storage accumulator vs. stack vs.
registers - Golden Handcuffsof binary compatibility affect
design 20 years later, as Moore predicted - Not too difficult to make faster, as Intel has
shown - HP/Intel announcement of common future
instruction set by 2000 means end of 80x86??? - Beauty is in the eye of the beholder
- At 50M/year sold, it is a beautiful business