Advanced Computer Architecture 5MD00 5Z033 Instruction Set Design - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Advanced Computer Architecture 5MD00 5Z033 Instruction Set Design

Description:

The instruction set architecture serves as the interface between software and hardware ... Same compiler can compile different languages. Multiple Target Compilers ... – PowerPoint PPT presentation

Number of Views:186

Avg rating:3.0/5.0

Slides: 39

Provided by: henkcor2

Category:

more less

Transcript and Presenter's Notes

Title: Advanced Computer Architecture 5MD00 5Z033 Instruction Set Design

1
Advanced Computer Architecture5MD00 /
5Z033Instruction Set Design

Henk Corporaal
www.ics.ele.tue.nl/heco/courses/aca
TUEindhoven
November 2009

2
Lecture overview

ISA and Evolution
Architecture classes
Addressing
Operands
Operations
Encoding
RISC
SIMD extensions

3
Instruction Set Architecture

The instruction set architecture serves as the
interface between software and hardware
It provides the mechanism by which the software
tells the hardware what should be done
Architecture definitionthe architecture of a
system/processor is (a minimal description of)
its behavior as observed by its immediate users

4
Instruction Set Design Issues

Where are operands stored?
registers, memory, stack, accumulator
How many explicit operands are there?
0, 1, 2, or 3
How is the operand location specified?
register, immediate, indirect, . . .
What type size of operands are supported?
byte, int, float, double, string, vector. . .
What operations are supported?
basic operations add, sub, mul, move, compare .
. .
or also very complex operations?

5
Operands

How are operands designated?
fixed always in the same place
by opcode always the same for groups of
instructions
by a field in the instruction requires decode
first
What is the format of the data?
binary
character
decimal (packed and unpacked)
floating-point IEEE 754 (others used less and
less)
size 8-, 16-, 32-, 64-, 128-bit
What is the influence on the ISA (
Instruction-Set Architecture)?

6
Operand Locations
7
Classifying ISAs
Accumulator (before 1960) 1 address add A acc
acc memA Stack (1960s to 1970s) 0
address add tos tos next Memory-Memory
(1970s to 1980s) 2 address add A, B memA
memA memB 3 address add A, B, C memA
memB memC Register-Memory (1970s to
present) 2 address add R1, A R1 R1
memA load R1, A R1 memA Register-Regis
ter (Load/Store) (1960s to present) 3
address add R1, R2, R3 R1 R2 R3 load R1,
R2 R1 memR2 store R1, R2 memR1 R2
8
Evolution of Architectures
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
(Vax, Intel 8086 1977-80)
RISC
(Mips,Sparc,88000,IBM RS6000, . . .1987)
9
Addressing Modes

Types
Register data in a register
Immediate data in the instruction
Memory data in memory
Calculation of Effective Address
Direct address in instruction
Indirect address in register
Displacement address register or PC offset
Indexed address register register
Memory Indirect address at address in register
What is the influence on ISA?

10
Types of Addressing Mode (VAX)

Addressing Mode Example Action
1. Register direct Add R4, R3 R4 lt- R4 R3
2. Immediate Add R4, 3 R4 lt- R4 3
3. Displacement Add R4, 100(R1) R4 lt- R4 M100
R1
4. Register indirect Add R4, (R1) R4 lt- R4
MR1
5. Indexed Add R4, (R1 R2) R4 lt- R4 MR1
R2
6. Direct Add R4, (1000) R4 lt- R4 M1000
7. Memory Indirect Add R4, _at_(R3) R4 lt- R4
MMR3
8. Autoincrement Add R4, (R2) R4 lt- R4 MR2
R2 lt- R2 d
9. Autodecrement Add R4, (R2)- R4 lt- R4 MR2
R2 lt- R2 - d
10. Scaled Add R4, 100(R2)R3 R4 lt- R4
M100 R2 R3d
Studies by Clark and Emer indicate that modes
1-4 account for 93 of all operands on the VAX

11
Operations

Types
ALU Integer arithmetic and logical functions
Data transfer Loads/stores
Control Branch, jump, call, return, traps,
interrupts
System O/S calls, virtual memory management
Floating point Floating point arithmetic
Decimal Decimal arithmetic (BCD binary coded
decimal)
String moves, compares, search, etc.
Graphics Pixel/vertex operations
Vector Vector (SIMD) functions
more complex ones
Addressing
Which addressing modes for which operands are
supported?

12
80x86 Instruction Frequency
13
Relative Frequency of Control Instructions

Design hardware to handle branches quickly,
since these occur most frequently

14
Frequency of Operand Sizeson 32-bit Load-Store
Machines

For floating-point want good performance for 64
bit operands.
For integer operations want good performance for
32 bit operands
Recent architectures also support 64-bit integers

15
Instruction Encoding

Variable
Instruction length varies based on opcode and
address specifiers
For example, VAX instructions vary between 1 and
53 bytes, while x86 instruction vary between 1
and 17 bytes.
Good code density, but difficult to decode and
pipeline
Fixed
Only a single size for all instructions
For example MIPS, Power PC, Sparc all have 32 bit
instructions
Not as good code density, but easier to decode
and pipeline
Hybrid
Have multiple format lengths specified by the
opcode
For example, IBM 360/370
Compromise between code density and ease of decode

16
Instruction Encoding
17
Example MIPS
Operands mostly at fixed positions
Fixed instruction size few formats
18
Compilers and ISA

Compiler Goals
All correct programs compile correctly
Most compiled programs execute quickly
Most programs compile quickly
Achieve small code size
Provide debugging support
Multiple Source Compilers
Same compiler can compile different languages
Multiple Target Compilers
Same compiler can generate code for different
machines
'cross-compiler'

19
Compiler basics trajectory
Source program
Preprocessor
Compiler
Error messages
Assembler
Library code
Loader/Linker
Object program
20
Compiler basics structure / passes
Source code
Lexical analyzer
token generation
check syntax check semantic
parse tree generation
Parsing
Intermediate code
data flow analysis local optimizations
global optimizations
Code optimization
code selection peephole optimizations
Code generation
making interference graph graph
coloring
spill code insertion
caller / callee save and restore code
Register allocation
Sequential code
Scheduling and allocation
exploiting ILP
Object code
21
Compiler basics structure Simple compilation
example
position initial rate 60
Lexical analyzer
temp1 intoreal(60) temp2 id3 temp1 temp3
id2 temp2 id1 temp3
id id id 60
Syntax analyzer
Code optimizer
temp1 id3 60.0 id1 id2 temp1
Code generator
movf id3, r2 mulf 60, r2, r2 movf id2,
r1 addf r2, r1 movf r1, id1
Intermediate code generator
22
Designing ISA to Improve Compilation

Provide enough general purpose registers to ease
register allocation ( at least 16)
Provide regular instruction sets by keeping the
operations, data types, and addressing modes
largely orthogonal
Provide primitive constructs rather than trying
to map to a high-level language
Allow compilers to help make the common case fast

23
A "Typical" RISC

32-bit fixed length instruction
Only few instruction formats
32 32-bit GPRs (general purpose registers)
3-address, reg-reg-reg / reg-imm-reg arithmetic
instruction
Single address mode for load/store base
displacement
no indirection
Simple branch conditions
Pipelined implementation
Separate Instruction and Data level-1 caches
Delayed branch ?

24
Comparison MIPS with 80x86

How would you expect the x86 and MIPS
architectures to compare on the following
CPI on SPEC benchmarks
Ease of design and implementation
Ease of writing assembly language compilers
Code density
Overall performance
What other advantages/disadvantages are there to
the two architectures?

25
Instruction Set ExtensionsSubword parallelism

Support graphics and multimedia applications
Intels MMX Technology (introduced in 1997)
Intels Internet Streaming SIMD Extensions (SSE
SSE4)
AMDs 3DNow! Technology
Suns Visual Instruction Set
Motorolas and IBMs AltiVec Technology
These extensions improve the performance of
Computer-aided design
Internet applications
Computer visualization
Video games
Speech recognition

26
MMX Data Types

MMX Technology supports operations on the
following 64-bit integer data types

Packed byte (eight 8-bit elements)
Packed word (four 16-bit elements)
Packed double word (two 32-bit elements)
Packed quad word (one 64-bit elements)
27
SIMD Operations

MMX Technology allows a Single Instruction to
work on Multiple pieces of Data (SIMD)
PADDW Packed add word
In the above example, 4 parallel adds are
performed on 16-bit elements
Most MMX instructions only require a single cycle

A3
A2
A1
A0
B3
B2
B1
B0
A3B3
A2B2
A1B1
A0B0
28
Saturating Arithmetic

Both wrap-around and saturating adds are
supported
With saturating arithmetic, results that
overflow/underflow are set to the
largest/smallest value

PADDW Packed wrap-around add
PADDUSW Packed saturating add
29
Pack and Unpack Instructions

Pack and unpack instructions provide conversion
between standard data types and packed data types

PACKSSDW Pack signed, with saturating, double
to packed word
30
Multiply-Add Operations

Many graphics applications require
multiply-accumulate operations
Vector Dot Products a b
Matrix Multiplies
Fast Fourier Transforms (FFTs)
Filter implementations

PMADDWD Packed multiply-add word to double
31
Vector Dot Product

A dot product on an 8-element vector can be
performed using 9 MMX instructions
Without MMX 40 instructions are required

a0c0.. a3c3
a4c4.. a7c7
0
0
a0c0.. a7c7
32
Packed Compare Instructions

Packed compare instructions allow a bit mask to
be set or cleared
This is useful when images with certain qualities
need to be extracted

33
MMX Instructions

MMX Technology adds 57 new instructions to the
x86 architecture.
Some of these instructions include
(bbytew32-bitd64-bit)
PADD(b, w, d) Packed addition
PSUB(b, w, d) Packed subtraction
PCMPE(b, w, d) Packed compare equal
PMULLw Packed word multiply low
PMULHw Packed word multiply high
PMADDwd Packed word multiply-add
PSRL(w, d, q) Pack shift right logical
PACKSS(wb, dw) Pack data
PUNPCK(bw, wd, dq) Unpack data
PAND, POR, PXOR Packed logical operations

34
MMX Performance Comparison
35
MMX Technology Summary

MMX technology extends the Intel x86 architecture
to improve the performance of multimedia and
graphics applications.
It provides a speedup of 1.5 to 2.0 for certain
applications.
MMX instructions are hand-coded in assembly or
implemented as libraries to achieve high
performance.
MMX data types use the x86 floating point
registers to avoid adding state to the processor.
Makes it easy to handle context switches
Makes it hard to perform MMX and floating point
instructions at the same time
Only increase the chip area by about 5.

36
Questions on MMX

What are the strengths and weaknesses of MMX
Technology?
How could MMX Technology potentially be improved?
How did the developers of MMX preserve backward
compatibility with the x86 architecture?
Why was this important?
What are the disadvantages of this approach?
What restrictions/limitations are there on the
use of MMX Technology?

37
Internet Streaming SIMD Extensions (SSE)

Help improve the performance of video and 3D
applications
Are designed for streaming data, which is used
once and then discarded.
70 new instructions beyond MMX Technology
Adds 8 new 128-bit vector registers (XMM0 XMM7)
Provide the ability to perform multiple floating
point operations
Four parallel operations on 32-bit numbers
Reciprocal and reciprocal root instructions -
normalization
Packed average instruction Motion compensation
Provide data prefetch instructions
Make certain applications 1.5 to 2.0 times faster

38
Beyond SSE

SSE2 SIMD on any data type from 8-bit int to
64-bit double, using XMM vector registers
SSE4 dot-product operation
AVX (Advanced Vector Extensions) 2010
16 256-bit vector registers, YMM0-YMM15
later extended to 512 and 1024 bits
3 operand instructions (instead of 2, with one
implicit register operand)

Write a Comment

User Comments (0)