IA-64 Architecture (Think Intel Itanium) - PowerPoint PPT Presentation

About This Presentation
Title:

IA-64 Architecture (Think Intel Itanium)

Description:

Departure from superscalar trend. Note: Became the architecture of the Intel Itanium ... [qp] mnemonic [.comp] dest = srcs ;; // qp - predicate register ... – PowerPoint PPT presentation

Number of Views:257
Avg rating:3.0/5.0
Slides: 36
Provided by: adria216
Category:

less

Transcript and Presenter's Notes

Title: IA-64 Architecture (Think Intel Itanium)


1
IA-64 Architecture(Think Intel Itanium)
also known as (EPIC Extremely Parallel
Instruction Computing) a new kind of superscalar
computer

HW 5 - Due 12/4 Please clean up boards in lab by
Dec 3 Put good wires in the box
Take chips off of the board using chip puller
Put parts away in the proper bins.
THANKS!
2
Superpipelined Superscaler Machines
  • Superpipelined machine
  • Superpiplined machines overlap pipe stages
  • Relies on stages being able to begin operations
    before the last is complete.
  • Superscaler Machine
  • A Superscalar machine employs multiple
    independent pipelines to executes multiple
    independent instructions in parallel.
  • Particularly common instructions (arithmetic,
    load/store, conditional branch) can be executed
    independently.

3
Why A New Architecture Direction?
  • Processor designers obvious choices for use of
    increasing number of transistors on chip and
    extra speed
  • Bigger Caches ? diminishing returns
  • Increase degree of Superscaling by adding more
    execution units ? complexity wall more logic,
    need improved branch prediction, more renaming
    registers, more complicated dependencies.
  • Multiple Processors ? challenge to use them
    effectively in general computing
  • Longer pipelines ? greater penalty for
    misprediction

4
IA-64 Background
  • Explicitly Parallel Instruction Computing (EPIC)
  • - Jointly developed by Intel
    Hewlett-Packard (HP)
  • New 64 bit architecture
  • Not extension of x86 series
  • Not adaptation of HP 64bit RISC architecture
  • To exploit increasing chip transistors and
    increasing speeds
  • Utilizes systematic parallelism
  • Departure from superscalar trend
  • Note Became the architecture of the Intel Itanium

5
Basic Concepts for IA-64
  • Instruction level parallelism
  • EXPLICIT in machine instruction, rather than
    determined at run time by processor
  • Long or very long instruction words (LIW/VLIW)
  • Fetch bigger chunks already preprocessed
  • Predicated Execution
  • Marking groups of instructions for a late
    decision on execution.
  • Control Speculation
  • Go ahead and fetch decode instructions, but
    keep track of them so the decision to issue
    them, or not, can be practically made later
  • Data Speculation (or Speculative Loading)
  • Go ahead and load data early so it is ready when
    needed, and have a practical way to recover if
    speculation proved wrong
  • Software Pipelining
  • - Multiple iterations of a loop can be executed
    in parallel

6
General Organization
7
Predicate Registers
  • Used as a flag for instructions that may or may
    not be executed.
  • A set of instructions is assigned a predicate
    register when it is uncertain whether the
    instruction sequence will actually be executed
    (think branch).
  • Only instructions with a predicate value of true
    are executed.
  • When it is known that the instruction is going to
    be executed, its predicate is set. All
    instructions with that predicate true can now be
    completed.
  • Those instructions with predicate false are now
    candidates for cleanup.

8
Predication
9
Speculative Loading
10
General Organization
11
IA-64 Key Hardware Features
  • Large number of registers
  • IA-64 instruction format assumes 256 Registers
  • 128 64 bit integer, logical general purpose
  • 128 82 bit floating point and graphic
  • 64 predicated execution registers
  • (To support high degree of parallelism)
  • Multiple execution units
  • Probably pipelined
  • 8 or more ?

12
IA-64 Register Set
13
Relationship between Instruction Type
Execution Unit
14
IA-64 Execution Units
  • I-Unit
  • Integer arithmetic
  • Shift and add
  • Logical
  • Compare
  • Integer multimedia ops
  • M-Unit
  • Load and store
  • Between register and memory
  • Some integer ALU operations
  • B-Unit
  • Branch instructions
  • F-Unit
  • Floating point instructions

15
Instruction Format Diagram
16
Instruction Format
  • 128 bit bundles
  • Can fetch one or more bundles at a time
  • Bundle holds three instructions plus template
  • Instructions are usually 41 bit long
  • Have associated predicated execution registers
  • Template contains info on which instructions can
    be executed in parallel
  • Not confined to single bundle
  • e.g. a stream of 8 instructions may be executed
    in parallel
  • Compiler will have re-ordered instructions to
    form contiguous bundles
  • Can mix dependent and independent instructions in
    same bundle

17
Field Encoding Instr Set Mapping
Note BAR indicates stops Possible dependencies
with Instructions after the stop
18
Assembly Language Format
  • qp mnemonic .comp dest srcs //
  • qp - predicate register
  • 1 at execution ? execute and commit result to
    hardware
  • 0 ? result is discarded
  • mnemonic - name of instruction
  • comp one or more instruction completers used to
    qualify mnemonic
  • dest one or more destination operands
  • srcs one or more source operands
  • - instruction groups stops (when
    appropriate)
  • Sequence without read after write or write after
    write
  • Do not need hardware register dependency checks
  • // - comment follows

19
Assembly Example
Register Dependency
  • ld8 r1 r5 //first group
  • add r3 r1, r4 //second group
  • Second instruction depends on value in r1
  • Changed by first instruction
  • Can not be in same group for parallel execution
  • Note ends the group of instructions that can
    be executed in parallel

20
Assembly Example
Multiple Register Dependencies
  • ld8 r1 r5 //first group
  • sub r6 r8, r9 //first group
  • add r3 r1, r4 //second group
  • st8 r6 r12 //second group
  • Last instruction stores in the memory location
    whose address is in r6, which is established in
    the second instruction

21
Assembly Example Predicated Code
Consider the Following program with branches
  • if (ab)
  • j j 1
  • else
  • if(c)
  • k k 1
  • else
  • k k 1
  • i i 1

22
Assembly Example Predicated Code
Pentium Assembly Code cmp a, 0
compare with 0 je L1 branch to L1 if a
0 cmp b, 0 je L1 add j, 1 j j
1 jmp L3 L1 cmp c, 0 je L2 add k,
1 k k 1 jmp L3 L2 sub k, 1 k
k 1 L3 add i, 1 i i 1
  • Source Code
  • if (ab)
  • j j 1
  • else
  • if(c)
  • k k 1
  • else
  • k k 1
  • i i 1

23
Assembly Example Predicated Code
Pentium Code cmp a, 0 je L1 cmp b,
0 je L1 add j, 1 jmp L3 L1 cmp c,
0 je L2 add k, 1 jmp L3 L2 sub k,
1 L3 add i, 1
IA-64 Code cmp. eq p1, p2 0, a (p2)
cmp. eq p1, p3 0, b (p3) add j 1, j (p1)
cmp. ne p4, p5 0, c (p4) add k 1, k (p5)
add k -1, k add i 1, i
  • Source Code
  • if (ab)
  • j j 1
  • else
  • if(c)
  • k k 1
  • else
  • k k 1
  • i i 1

24
Example of Prediction
25
Data Speculation
  • Load data from memory before needed
  • What might go wrong?
  • Load moved before store that might alter memory
    location
  • Need subsequent check in value

26
Assembly Example Data Speculation
Consider the Following program
  • (p1) br some_label // cycle 0
  • ld8 r1 r5 // cycle 0 (indirect
    memory op 2 cycles)
  • add r1 r1, r3 // cycle 2

27
Assembly Example Data Speculation
Consider the Following program
Original code
Speculated Code
ld8.s r1 r5 //cycle -2 //
other instructions (p1) br some_label
//cycle 0 chk.s r1, recovery //cycle 0
add r2 r1, r3 //cycle 0
  • (p1) br some_label //cycle 0
  • ld8 r1 r5 //cycle 0
  • add r1 r1, r3 //cycle 2

28
Assembly Example Data Speculation
Consider the Following program
  • st8 r4 r12 //cycle 0
  • ld8 r6 r8 //cycle 0 (indirect memory op
    2 cycles)
  • add r5 r6, r7 //cycle 2
  • st8 r18 r5 //cycle 3

What if r4 and r8 point to the same address?
29
Assembly Example Data Speculation
Consider the Following program Without Data
Speculation With Data
Speculation
ld8.a r6 r8 //cycle -2, adv // other
instructions st8 r4 r12 //cycle 0 ld8.c
r6 r8 //cycle 0, check add r5 r6, r7
//cycle 0 st8 r18 r5 //cycle 1
  • st8 r4 r12 //cycle 0
  • ld8 r6 r8 //cycle 0
  • add r5 r6, r7 //cycle 2
  • st8 r18 r5 //cycle 3


30
Assembly Example Data Speculation
Data Dependencies Speculation
Speculation with data
dependency
  • ld8.a r6 r8 //cycle -3,adv ld
  • // other instructions
  • add r5 r6, r7 //cycle -1,uses r6
  • // other instructions
  • st8 r4 r12 //cycle 0
  • chk.a r6, recover //cycle 0, check
  • back //return pt
  • st8 r18 r5 //cycle 0
  • recover
  • ld8 r6 r8 //get r6 from r8
  • add r5 r6, r7 //re-execute
  • be back //jump back

ld8.a r6 r8 //cycle-2 // other
instructions st8 r4 r12 //cycle 0 ld8.c
r6 r8 //cycle 0 add r5 r6, r7
//cycle 0 st8 r18 r5 //cycle 1
31
Software Pipelining
  • // yi xi c
  • L1 ld4 r4r5,4 //cycle 0 load postinc 4
  • add r7r4,r9 //cycle 2
  • st4 r6r7,4 //cycle 3 store postinc 4
  • br.cloop L1 //cycle 3
  • Adds constant to one vector and stores result in
    another
  • No opportunity for instruction level parallelism
    in one iteration
  • Instruction in iteration x all executed before
    iteration x1 begins
  • If no address conflicts between loads and stores
    can move independent instructions from loop x1
    to loop x

32
Pipeline - Unrolled Loop, Pipeline Display
  • Unrolled loop
  • ld4 r32r5,4 //cycle 0
  • ld4 r33r5,4 //cycle 1
  • ld4 r34r5,4 //cycle 2
  • add r36r32,r9 //cycle 2
  • ld4 r35r5,4 //cycle 3
  • add r37r33,r9 //cycle 3
  • st4 r6r36,4 //cycle 3
  • ld4 r36r5,4 //cycle 3
  • add r38r34,r9 //cycle 4
  • st4 r6r37,4 //cycle 4
  • add r39r35,r9 //cycle 5
  • st4 r6r38,4 //cycle 5
  • add r40r36,r9 //cycle 6
  • st4 r6r39,4 //cycle 6
  • st4 r6r40,4 //cycle 7

Original Loop L1 ld4 r4r5,4 //cycle 0
load postinc 4 add r7r4,r9 //cycle 2
st4 r6r7, 4 //cycle 3 store postinc 4
br.cloop L1 //cycle 3
Pipeline Display
33
Unrolled Loop Observations
  • Completes 5 iterations in 7 cycles
  • Compared with 20 cycles in original code
  • Assumes two memory ports
  • Load and store can be done in parallel

34
Support For Software Pipelining
  • Automatic register renaming
  • Fixed size are of predicate and fp register file
    (p16-P32, fr32-fr127) and programmable size area
    of gp register file (max r32-r127) capable of
    rotation
  • Loop using r32 on first iteration automatically
    uses r33 on second
  • Predication
  • Each instruction in loop predicated on rotating
    predicate register
  • Determines whether pipeline is in prolog, kernel,
    or epilog
  • Special loop termination instructions
  • Branch instructions that cause registers to
    rotate and loop counter to decrement

35
Intels Itanium Implements the IA-64
Write a Comment
User Comments (0)
About PowerShow.com