Chapter 2: Instruction Set Architecture - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Chapter 2: Instruction Set Architecture

Description:

Chapter 2: Instruction Set Architecture. Principles underlying modern ISA ... Sematic Clash: '... by giving too much semantic content to the instruction, the ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 44
Provided by: sari158
Category:

less

Transcript and Presenter's Notes

Title: Chapter 2: Instruction Set Architecture


1
Chapter 2 Instruction Set Architecture
  • Principles underlying modern ISA (Sections 2.1
    2.10)
  • Compilers (Section 2.11)
  • Examples (Sections 2.12 and 2.13)
  • CISC vs. RISC (Section 2.1)
  • Recent advances

2
ISA Principles
  • Application area
  • Operands in CPU
  • ALU operands
  • Data storage endianness and alignment
  • Addressing modes
  • Operand type and size
  • Operations
  • Control instructions
  • Encoding

3
Dependence on Application Area
  • Desktop
  • Performance
  • Integer and floating point programs
  • Servers
  • Performance
  • Integer and character strings
  • Embedded systems
  • Code size
  • Realtime performance on continuous data streams
  • Hand optimized kernels

4
Operand Storage in CPU
  • Why in CPU?
  • Accumulator one implicit register (lt 1960)
  • Minimum hardware resources
  • High memory traffic
  • Stack LIFO storage (1960s 1970s and Java
    Virtual Machine!)
  • Instructions implicitly access top of stack
  • Good code density
  • Stack can become bottleneck, especially with
    pipelining
  • Registers 8 to 256 words (1960s ???)
  • Flexible temporaries and variables
  • Registers must be named
  • Most general purpose systems now use registers,
    our focus too

5
Operand Storage in CPU
  • Why in CPU?
  • Faster access, Shorter address
  • Accumulator one implicit register (lt 1960)
  • Minimum hardware resources
  • High memory traffic
  • Stack LIFO storage (1960s 1970s and Java
    Virtual Machine!)
  • Instructions implicitly access top of stack
  • Good code density
  • Stack can become bottleneck, especially with
    pipelining
  • but stack can be cached
  • Registers 8 to 256 words (1960s ???)
  • Flexible temporaries and variables
  • Registers must be named
  • Most general purpose systems now use registers,
    our focus too

6
Registers vs. Caches
  • Register Advantages
  • Register Disadvantages

7
Registers vs. Caches DitzelMcLellan1982
  • Register Advantages
  • Faster (no addressing modes, no tags)
  • Deterministic (no misses) ? can schedule for
    pipeline
  • Small ? can duplicate for two ports
  • Short identifier (3 8 bits)
  • Register Disadvantages
  • Save/restore on procedure calls
  • Can't take the address of a register
  • Fixed size (FP, strings, structures)
  • Compiler must control (an advantage?)

8
How Many Registers?
  • More registers ?

9
How Many Registers?
  • More registers ?
  • Hold operands longer (decreases memory traffic,
    execution time)
  • Longer register specifier
  • Slower registers
  • More state means slower context switches

10
ALU Operands
  • Number of explicit operands
  • Two (destination equals one source)
  • Small instruction
  • Three
  • Few instructions, Orthogonal
  • Number of operands for memory
  • Any (MemoryMemory), VAX
  • At least one register (RegisterMemory), IBM 360
  • Zero (LoadStore), MIPS, Alpha, SPARC, Cray
  • Fixedsize instructions
  • Simple code generation model all similar ALU
    instructions take the same time
  • Facilitates pipelining no page faults, simple
    decoding
  • Needs load/stores, higher instruction count

11
Endianness
  • Order of bytes in words
  • Big endian MSB at address xxxxx00
  • Little endian LSB at address xxxxx00
  • Big Endian (IBM, Motorola)
  • Word Address LSB MSB
  • 0 0 1 2 3
  • 4 4 5 6 7
  • Little Endian (DEC, Intel)
  • Word Address MSB LSB
  • 0 3 2 1 0
  • 4 7 6 5 4
  • Does not matter

12
Alignment
  • What is alignment?
  • Address mod Size 0
  • Example Aligned word (4 bytes)
  • Example Unaligned word (4 bytes)

3
2
1
0
2
1
0
3
13
Alignment (Cont.)
  • No restrictions on alignment ?
  • Software is simple
  • Hardware must detect misalignment and
    (typically) make two memory accesses
  • Expensive logic
  • Usually slows down all references
  • Restricted alignment
  • Software must guarantee alignment
  • Hardware only detects misalignment and traps
  • Middle ground (VAX 8800)
  • Misaligned data ok, but slow
  • Traps on misaligned access, 10 cycles penalty

14
Addressing Modes
  • Possibilites
  • 1. Register
  • 2. Displacement
  • 3. Immediate
  • 4. Register deferred
  • 5. Indexed
  • 6. Absolute
  • 7. Memory deferred
  • 8. Autoincrement
  • 9. Autodecrement
  • 10. Scaled
  • Which modes to support and why?
  • Modes 14 account for 93 of all operands on the
    VAX!
  • Displacement and immediate modes are most common

15
Addressing Modes (Cont.)
  • What length of displacements to support?
  • Figure 2.8
  • What length of immediates to support?
  • Figure 2.10

16
DSP Addressing Mode Examples
  • Modulo or circular addressing
  • Handles circular buffers for infinite continuous
    streams
  • Bit reverse addressing
  • Handles shuffles in FFT
  • Compiler will find difficult to generate above
  • But lots of DSP applications use assembly code

17
Type and Size of Operands
  • Type usually encoded in opcode
  • Desktops and servers type also gives size
  • Character 1 byte
  • Half word 16 bits
  • Word 32 bits
  • Single-precision floating point 1 word
  • Double-precision floating point 2 words
  • Decimal less common
  • Packed data types for multimedia see 4 slides
    later
  • Graphics
  • 2D pixels x, y, z coordinates (z says which
    images are visible)
  • 3D add a coordinate for color and hidden
    surfaces
  • Each coordinate is 8, 16, or 32 bits

18
Type and Size of Operands (Cont.)
  • DSP processors
  • Fixed point cheap floating point
  • Fraction between 1 and 1
  • Exponent is separate
  • Programmer must ensure alignment of result w/
    exponent
  • Wide internal registers to avoid roundoff errors

19
Operations
  • Arithmetic and logical
  • Memory
  • Control
  • System
  • Floating point
  • Decimal
  • String
  • Graphics, multimedia, DSP
  • First four categories supported by all systems

20
Multimedia Instructions
  • Recent general-purpose processors include
    multimedia instructions
  • Multimedia data derived from sampling analog
    input
  • Correctness dictated by human perception
  • Smaller data types - 8-bit, 16-bit
  • Compare with 32 and 64 bit processor data paths
  • Significant levels of data parallelism
  • Large collection of small data elements
  • Identical processing of similar elements
  • e.g. Image Addition
  • For I 1 to 1024
  • For J 1 to 1024
  • destI,J
  • src1I,Jsrc2I,J

21
Multimedia - Packed Data Types

16 bits
Operand 1

48 bits are wasted! Can we use them in any way?
Operand 2

Result
64 bits
16 bits
4 operations in 1 cycle SPEEDUP 4X?? Called
SIMD single-instruction multiple-data
parallelism
22
Other Multimedia Extensions
  • Saturation arithmetic
  • Example image addition
  • Saturation ensures clamping of values

For I 1 to 1024 For J 1 to 1024 destI,J
src1I,Jsrc2I,J
If (dest gt 255) dest 255 If (dest lt 0)
dest 0
23
Other Multimedia Extensions (Cont.)
  • Sub-word Rearrangement
  • How do we go from unpacked data types to packed
    data types?
  • Provide ISA support for pack, unpack, expand,
    align,
  • Support for other types of sub-word rearrangement
  • Shift, rotate, permute, ...
  • E.g., for FFT butterfly algorithm
  • Many others
  • Conditional execution, memory instructions,
    special-purpose instructions,

24
Example Intel MMX ISA Extensions
57 new instructions Use FP registers, 32-bit data
path, SIMD, saturation, ... More information
available from MMX Technology Overview, Intel
web site. http//developer.intel.com/drg/mmx/manua
ls/overview/
25
Example Intel SSE ISA Extensions
  • 70 instructions
  • Separate register state, 128-bit data path,
    alignment support, cache hints, SIMD,...
  • More information available from
  • The Internet Streaming SIMD Extensions,
    Shreekanth Thakkar and Tom Huff, Intel Technology
    Journal Q2, 1999. http//developer.intel.com/techn
    ology/itj/q21999/articles/art_1.htm

26
Control Instructions
  • Example Conditional branches, unconditional
    jumps, procedure calls/returns, O.S.
    calls/returns
  • Key aspects
  • Taken or not taken?
  • Where is the target?

27
Taken or Not Taken
  • Compare and branch instruction
  • No extra compare instruction
  • No state is passed between instructions
  • Requires ALU operation
  • Condition codes (Z,N,V,C)
  • Condition in general purpose register
  • No special state to save and implement
  • Uses up a register
  • DSPs repeat instruction repeats loop specified
    of times

28
Taken or Not Taken
  • Compare and branch instruction
  • No extra compare instruction
  • No state is passed between instructions
  • Requires ALU operation
  • Condition codes (Z,N,V,C)
  • Can be set for free
  • Constrains code reordering
  • Extra state to save and implement
  • Condition in general purpose register
  • No special state to save and implement
  • Uses up a register
  • DSPs repeat instruction repeats loop specified
    of times

29
Taken or Not Taken (Cont.)
  • Some data for compare-and-branch
  • Figure 2.22

30
Where is the Target?
  • Could use arbitrary specifier
  • Powerful
  • More bits to specify
  • More time to decode
  • PCrelative with immediate
  • Position independence (helps linking)
  • Short immediate sufficient
  • HP - most instructions use less than 8 bits
    (Figure 2.20)
  • Target must be known statically
  • Can't jump arbitrarily far other techniques
    are required for returns and distance jumps

31
Where is the Target (Cont.)
  • Register
  • Short specification
  • Can jump anywhere
  • Dynamic target ok
  • Extra instruction to load register
  • (Vectored) Trap
  • Critical for O.S. calls
  • Common compromise
  • (Conditional) Branches pcrel with short
    immediates
  • (Unconditional) Jumps pcrel, register
  • Procedure calls pcrel, register
  • Procedure returns and indirect jumps register
  • O.S. calls trap
  • O.S. returns register

32
Encoding the Instruction Set
  • Encoding affects size of program and
    implementation
  • Depends on many aspects of the ISA
  • Variable length (opcode only tells number of
    operands, not the type)
  • Minimize code size
  • Hard to pipeline
  • Fixed length (opcode tells number of operands and
    address mode)
  • Easy to decode and pipeline
  • Increase code size
  • Hybrid approach

33
Compilers
  • Compilers form a GIANT case analysis
  • Too many choices make it hard
  • Provide orthogonal instruction sets
  • Operation
  • Addressing mode
  • Data type

34
Compilers (Cont.)
  • One solution or All possible solutions
  • 2 Branch Conditions (EQ,LT)
  • Or all 6 (EQ,LT,GT,NE,LE,GE)
  • Not 3 or 4
  • Primitives, NOT Solutions
  • Sematic Clash ... by giving too much semantic
    content to the instruction, the machine designer
    made it possible to use the instruction only in
    limited contexts.
  • In many of these cases, the highlevel
    instructions are synthesized from more primitive
    operations which, if the compiler writer could
    access them, could be recomposed to more closely
    model the features actually needed.''

35
Example ISA 1 MIPS64 (Section 2.8)
  • RISC architecture
  • 32-bit byte addresses (aligned)
  • Load/Store, only immediate and displacement
    addressing with 16 bits
  • Registers
  • 32 64bit general purpose registers R0 to R31
    (R0 always 0)
  • 32 64bit floating point registers can use as
    single or double precision
  • Control
  • Conditional branch 0 and ? 0
  • Jump - PC relative and register
  • Others for FP, linking, trap
  • Three fixed length instruction formats 32 bits
  • Special operations
  • Paired single two 32 bit FP on a 64 bit data
    for graphics
  • Multiply-add for DSP

36
Example ISA 2 Trimedia CPU64 (Section 2.9)
  • Media processor
  • For multimedia workloads
  • Focused on parallelism
  • 128 64bit registers for integer or floating
    point
  • SIMD instructions, saturation arithmetic
  • Very Long Instruction Word (VLIW)
  • Multiple independent operations encoded in a
    single instruction
  • Five operations for Trimedia CPU64
  • NOPs in instructions if five operations not
    available
  • More on VLIW later
  • Compacts instructions in memory, decoded in
    I-cache
  • 25 functional units

37
Example ISA 3 - VAX
  • CISC architecture
  • Introduced by DEC in 1977
  • 16 GPRs (r15 is PC, r14 is SP)
  • Extremely orthogonal, memorymemory
  • Decode as byte stream
  • Op code operation, number of operands
  • Variablelength address specifiers
  • Virtually all addressing modes
  • Includes complex instructions CRC, INSQUE

38
MIPS vs. VAX
  • VAX has too many modes and formats
  • Serial semantics can limit parallel
    interpretation
  • The big deal with RISC is not REDUCED numbers of
    instructions it is few modes and formats to
    facilitate pipelining

39
CISC vs. RISC
  • Why CISC (60s and 70s)

Why RISC (70s and 80s)
40
CISC vs. RISC
  • Why CISC (60s and 70s)
  • Assembly programming
  • Small memory (dense encoding)
  • Microprogrammed control ? complex instructions ok

Why RISC (70s and 80s) Advances in
compilers Large memory, caches VLSI
(single-chip processor), pipelining, hardwired
control ? simple instructions
41
Outcome of RISC vs. CISC?
42
Outcome of RISC vs. CISC?
  • Millions of transistors per chip
  • Made sophisticated decoders for CISC possible
  • Focus on instruction-level parallelism
  • Decoding single instruction small part of
    hardware and performance
  • Caches dominate chip
  • Internally, CISC processors decode into RISC-like
    instructions and execute on microarchitecture
    similar to RISCs
  • Above factors narrowed CISC vs. RISC gap
  • Non-technical issues played a large role

43
Recent Developments in Instruction Sets
  • Branches important ? Predication
  • Memory latency important ? Speculative loads,
    prefetching
  • Multimedia applications ? Multimedia ISA
    extensions
  • Embedded processors ? Variable length instructions
Write a Comment
User Comments (0)
About PowerShow.com