COMP 206: Computer Architecture and Implementation - PowerPoint PPT Presentation

About This Presentation

Title:

COMP 206: Computer Architecture and Implementation

Description:

Title: Lecture 4 Author: Montek Singh Last modified by: UNC-CS Created Date: 3/13/2000 2:52:39 AM Document presentation format: Letter Paper (8.5x11 in) – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 22

Provided by: Mont139

Learn more at: https://www.cs.unc.edu

Category:

more less

Transcript and Presenter's Notes

Title: COMP 206: Computer Architecture and Implementation

1
COMP 206Computer Architecture and Implementation

Montek Singh
Wed., Sep 10, 2003
Topic Instruction Set Design

2
Organization of an Instruction
3
Classification by Operands

Important machines that are difficult to classify
Intel 80x86
variable instruction size 1-17 bytes
memory can be destination
uses implied registers
Motorola 680x0
Instruction size 2, 4, 6, 8, 10 bytes
Two address format only (2, 2)

(m,n) means m memory operands n total
operands
4
Instruction Set Design Objective 1

Code size (code density)
Depends on
size of MM/cache
access time of cache (on-chip/off-chip)
CPU-MM bandwidth
Frequently used (written down) instructions
should be short
Implies variable-length instructions

5
Instruction Set Design Objective 2

Execution speed (performance)
Only frequently executed instructions should be
included in the instruction set
Infrequently executed instructions slow down the
others
Complex and long instructions tend to be used
infrequently
Defining hardware-software interface
Frequently executed instructions should be fast
Pipelining should be made as easy as possible
Overlapped execution lowers CPI value
Single instruction length, simple instruction
formats, and few addressing modes for easy
decoding
Three (register) address instructions decouple
CPU and memory, and also do not destroy their
operands (reducing memory accesses)

6
Instruction Set Design Objective 3

Size and complexity of hardware (ALU, CU)
Implementing infrequently executed instructions
ties down hardware that is rarely used, and could
be used for some other purpose with greater
advantage
Some instructions should not be included in the
instruction set

7
Instruction Set Design Objective 4

Instruction set as a programming language
Needs of a human programmer (less important
today)
Several desirable properties of instruction sets
have been recognized and described, such as
orthogonality (each operand can be specified
independently of the others) and consistency
(being able to predict the remainder of an
architecture given partial knowledge of the
system)
Needs of an optimizing compiler
Simple instructions are more suitable for code
optimizations
Optimizing compilers try to find the shortest or
fastest code sequence that implements the
semantics of a HLL program. To make code
reorganization tractable, an instruction set is
needed that makes
the size of each instruction easy to calculate
the execution time of each instruction easy to
calculate
the interactions between instructions easy to
figure out.
ISA features such as complex addressing modes,
variable length instructions, special-purpose
registers provide too many ways of doing the same
thing and lead to combinatorial explosion

8
Addressing Modes
R the register file M the memory address
space d the size of the data item being
accessed (1, 2, 4, 8 bytes)

We cant directly refer to data values, only
their addresses
Except for immediate operands
Register deferred and direct addressing modes can
be synthesized from displacement addressing mode

9
Registers versus Cache

Similarities
Both are small, fast, and expensive (flip-flops)
Both are used to increase execution speed of CPU
Both operate based on locality of reference
Differences
Registers are visible in ISA caches are not
(except for instructions for invalidation,
prefetch, or flushing)
Number of registers is fixed by instruction
format size of cache is easily changeable
Registers have higher BW 3 words/cycle, and are
random-access caches have lower BW 1
word/cycle, and are associative
Register access time is fixed cache access time
is statistical
Register allocation is explicit by compiler
cache allocation is automatic
Registers require fewer bits to address caches
require full memory addresses
Registers create no I/O problems caches do

10
Organization of Registers

One general-purpose set (all interchangeable,
typeless)
One general-purpose set (a few with dedicated
uses)
PDP-11 eight 16-bit registers (R6 stack
pointer, R7 PC)
VAX 11/780 sixteen 32-bit registers (four
special-purpose, R14 stack pointer, R15 PC)
Two sets
Motorola 68000 eight 32-bit data, eight 32-bit
address
IBM 370 sixteen 32-bit integer, four 64-bit FP
DLX, MIPS 31 32-bit integer, 32 32-bit FP
Three sets
CDC 6600 eight 18-bit integer, eight 18-bit
address, eight 60-bit FP
Many registers with dedicated use
Intel 80x86

11
Notations for Information Representation
On holy wars and a plea for peace, Danny Cohen,
IEEE Computer 14(10), pages 49-54, Oct 1981
Q How do we number these various units of
information in a consistent manner?
9 6 2 1 7 6 6
12
Why Is Numbering Important?

English text is written left-to-right and the
characters are numbered left-to-right
Numbers can be numbered in two different ways
Memory locations are numbered (addresses)
Consequences of numbering
Data is stored in memory according to byte
numbering (the lower-numbered byte goes into a
byte in memory with a smaller address)
Data is sent through a bit-serial communication
channel according to bit numbering (bit 0 goes
first, followed by bit 1, etc.)
When displaying computer representation for
humans
Numbers are written in the usual way (MSD on
left, LSD on right)
Text is written in such a way as to match the
numbering of numbers

13
Consequences of Numbering
Machine A Big Endian Numbering
Machine B Little Endian Numbering
n o t g n i h s a W 1 9 9 5 f e
d c b a 9 8 7 6 5 4 3 2 1 0
1 9 9 5 W a s h i n g t o n 0
1 2 3 4 5 6 7 8 9 a b c d e f
n o t g n i h s a W 5 9 9 1 f e
d c b a 9 8 7 6 5 4 3 2 1 0
Fix Complicate protocol and treat numbers and
character strings differently, which has to be
done in software (by attaching descriptors to
data items)
1 9 9 5 W a s h i n g t o n f e
d c b a 9 8 7 6 5 4 3 2 1 0
14
Odds and Ends about Numbering

The Little Endian notation is compatible with
mathematical conventions of positional notation
The Little Endian notation has the disadvantage
that is displays English text in reverse
To overcome this, manuals for Little Endian
machines usually display character strings
vertically
Example machines
Little Endian PDP-11, VAX, 80x86
Big Endian IBM 370, MIPS, DLX, SPARC
Mixed Motorola 68000, Z8000
Big Endian byte ordering
Little Endian bit ordering

15
Alignment of Words in Memory

CPU accesses a 32-bit word of data starting at
byte address xx00
Such an address (multiple of 32b/8b/B 4B)
is called word-aligned
Memory controller is simple and fast, data
available in one cycle
CPU accesses a 32-bit word of data starting at
byte address 01111
Byte addresses are 01111, 10000, 10001, 10010
(misaligned address)
Doubles the access time of word
Requiring aligned addresses results in simpler
memory controller and faster execution
Costs some loss of storage, and adds complexity
in code generators

16
Sub-Word Accesses
CPU Register File (32 bits)

Byte operand in register is usually the rightmost
byte of register
Byte may come from any of the four memory banks
Needs routing/permuting hardware
Either at memory side of bus (justified bus)
Byte always travels on rightmost quarter of bus
Or on CPU side (unjustified bus)
Bus lanes are extensions of memory bank lanes
Source of complications in either case

17
Control Transfer Instructions

Terminology
BTA (Branch Target Address) The destination
address of the branch
The BTA is static if it is always the same during
execution
The BTA is dynamic if it can vary during a single
execution of a program (procedure return, O-O
dynamic dispatch, switch statements are major
examples)
Branch is taken if next instruction to be
executed is at address BTA
Branch is not taken if next instruction to be
executed is the one following the branch
instruction (fall-through)
Branch outcome whether the branch is taken or
not taken
Forward branch BTA gt (PC), where (PC) is the
address of the branch instruction
Backward branch BTA lt (PC)
An unconditional branch is always taken

18
Code Generation Examples for Branches
while (a lt b) a b-- x
if (x gt 0) y z else y -z
blez r7, L18 addu r3, r3, r4 j L33 L18 subu r3,
r3, r4 L33
j L33 L34 addu r5, r5, 1 addu r6, r6, -1 addu
r7, r7, 1 L33 slt r2, r5, r6 bne r2, r0, L34
Register r3 contains y Register r4 contains
z Register r5 contains a Register r6 contains
b Register r7 contains x
19
Classification of Branches
Classifying branches into these four groups
permits us to compute some of the dynamic
frequencies if some others have been measured.
Rule of thumb Backward branches tend to be
taken, forward branches tend not to be taken.
20
Computing Branch Frequencies
Assume that 75 of all branches are forward, and
that 55 of all branches are taken. If 80 of
all backward branches are taken, what is the
probability that a taken branch is a forward
branch?
21
Evaluating Branch Conditions

Typical set of condition codes (e.g., Motorola
680x0)
NegativeResult, ZeroResult, ArithmeticOverflow,
CarryOut
Many RISC machines do not use condition codes
(e.g., MIPS, Alpha)
Magnitude comparisons are done with explicit
COMPARE instructions that put their results into
named registers
Overflow and carry-out have to be inferred by
software
Some instructions have two variants one traps on
overflow, the other does not