Cosc 2150 - PowerPoint PPT Presentation

About This Presentation
Title:

Cosc 2150

Description:

Cosc 2150 Chapter 9 a Instruction Level Parallelism and Superscalar Processors * * * * * * * * * Speedups of Machine Organizations Without Procedural Dependencies ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 38
Provided by: csUwyoEd
Learn more at: https://www.cs.uwyo.edu
Category:
Tags: cosc | issue | operation | order

less

Transcript and Presenter's Notes

Title: Cosc 2150


1
Cosc 2150
  • Chapter 9 a
  • Instruction Level Parallelism
  • and Superscalar Processors

2
Introduction
  • Before we can look at different methods that are
    used to increase the speed of processors
  • We need to take a closer look at the
    fetch/execute cycle

3
Micro-Operations
  • A computer executes a program
  • Fetch/Execute cycle
  • Each cycle has a number of steps
  • see pipelining
  • Called micro-operations
  • Each step does very little
  • Atomic operation of CPU

4
Constituent Elements of Program Execution
5
Fetch - 4 Registers
  • Memory Address Register (MAR)
  • Connected to address bus
  • Specifies address for read or write op
  • Memory Buffer Register (MBR)
  • Connected to data bus
  • Holds data to write or last data read
  • Program Counter (PC)
  • Holds address of next instruction to be fetched
  • Instruction Register (IR)
  • Holds last instruction fetched

6
Fetch Sequence
  • Address of next instruction is in PC
  • Address (MAR) is placed on address bus
  • Control unit issues READ command
  • Result (data from memory) appears on data bus
  • Data from data bus copied into MBR
  • PC incremented by 1 (in parallel with data fetch
    from memory)
  • Data (instruction) moved from MBR to IR
  • MBR is now free for further data fetches

7
Fetch Sequence (symbolic)
  • (tx time unit/clock cycle)
  • t1 MAR lt- (PC)
  • t2 MBR lt- (memory)
  • PC lt- (PC) 1
  • t3 IR lt- (MBR)
  • or
  • t1 MAR lt- (PC)
  • t2 MBR lt- (memory)
  • t3 PC lt- (PC) 1
  • IR lt- (MBR)

8
Rules for Clock Cycle Grouping
  • Proper sequence must be followed
  • MAR lt- (PC) must precede MBR lt- (memory)
  • Conflicts must be avoided
  • Must not read write same register at same time
  • MBR lt- (memory) IR lt- (MBR) must not be in same
    cycle
  • Also PC lt- (PC) 1 involves addition
  • Use ALU
  • May need additional micro-operations

9
Indirect Cycle
  • MAR lt- (IRaddress) address field of IR
  • MBR lt- (memory)
  • IRaddress lt- (MBRaddress)
  • MBR contains an address
  • IR is now in same state as if direct addressing
    had been used

10
Interrupt Cycle
  • t1 MBR lt-(PC)
  • t2 MAR lt- save-address
  • PC lt- routine-address
  • t3 memory lt- (MBR)
  • This is a minimum
  • May be additional micro-ops to get addresses
  • N.B. saving context is done by interrupt handler
    routine, not micro-ops

11
Execute Cycle
  • Different for each instruction
  • In general, complete the task of the instruction
  • Example
  • ADD R1,X - add the contents of location X to
    Register 1 , result in R1
  • t1 MAR lt- (IRaddress)
  • t2 MBR lt- (memory)
  • t3 R1 lt- R1 (MBR)

12
Execute Cycle (BSA)
  • BSA X - Branch and save address
  • Address of instruction following BSA is saved in
    X
  • Execution continues from X1
  • t1 MAR lt- (IRaddress)
  • MBR lt- (PC)
  • t2 PC lt- (IRaddress)
  • memory lt- (MBR)
  • t3 PC lt- (PC) 1

13
Instruction Cycle
  • Each phase decomposed into sequence of elementary
    micro-operations
  • E.g. fetch, indirect, and interrupt cycles
  • Execute cycle
  • One sequence of micro-operations for each opcode
  • Need to tie sequences together
  • Assume new 2-bit register
  • Instruction cycle code (ICC) designates which
    part of cycle processor is in
  • 00 Fetch
  • 01 Indirect
  • 10 Execute
  • 11 Interrupt

14
What is Superscalar?
  • Common instructions (arithmetic, load/store,
    conditional branch) can be initiated and executed
    independently
  • Equally applicable to RISC CISC
  • In practice usually RISC

15
General Superscalar Organization
16
Superpipelined
  • Many pipeline stages need less than half a clock
    cycle
  • Double internal clock speed gets two tasks per
    external clock cycle
  • Superscalar allows parallel fetch and execute

17
Superscalar vSuperpipeline
18
Limitations
  • Instruction level parallelism
  • Compiler based optimisation
  • Hardware techniques
  • Limited by
  • True data dependency
  • Procedural dependency
  • Resource conflicts
  • Output dependency
  • Antidependency

19
True Data Dependency
  • ADD r1, r2 (r1 r1r2)
  • MOVE r3,r1 (r3 r1)
  • Can fetch and decode second instruction in
    parallel with first
  • Can NOT execute second instruction until first is
    finished

20
Procedural Dependency
  • Can not execute instructions after a branch in
    parallel with instructions before a branch
  • Also, if instruction length is not fixed,
    instructions have to be decoded to find out how
    many fetches are needed
  • This prevents simultaneous fetches

21
Resource Conflict
  • Two or more instructions requiring access to the
    same resource at the same time
  • e.g. two arithmetic instructions
  • Can duplicate resources
  • e.g. have two arithmetic units

22
Effect of Dependencies
23
Design Issues
  • Instruction level parallelism
  • Instructions in a sequence are independent
  • Execution can be overlapped
  • Governed by data and procedural dependency
  • Machine Parallelism
  • Ability to take advantage of instruction level
    parallelism
  • Governed by number of parallel pipelines

24
Instruction Issue Policy
  • Order in which instructions are fetched
  • Order in which instructions are executed
  • Order in which instructions change registers and
    memory

25
In-Order Issue In-Order Completion
  • Issue instructions in the order they occur
  • Not very efficient
  • May fetch gt1 instruction
  • Instructions must stall if necessary

26
In-Order Issue In-Order Completion (Diagram)
27
In-Order Issue Out-of-Order Completion
  • Output dependency
  • R3 R3 R5 (I1)
  • R4 R3 1 (I2)
  • R3 R5 1 (I3)
  • I2 depends on result of I1 - data dependency
  • If I3 completes before I1, the result from I1
    will be wrong - output (read-write) dependency

28
In-Order Issue Out-of-Order Completion (Diagram)
29
Out-of-Order IssueOut-of-Order Completion
  • Decouple decode pipeline from execution pipeline
  • Can continue to fetch and decode until this
    pipeline is full
  • When a functional unit becomes available an
    instruction can be executed
  • Since instructions have been decoded, processor
    can look ahead

30
Out-of-Order Issue Out-of-Order Completion
(Diagram)
31
Antidependency
  • Write-write dependency
  • R3R3 R5 (I1)
  • R4R3 1 (I2)
  • R3R5 1 (I3)
  • R7R3 R4 (I4)
  • I3 can not complete before I2 starts as I2 needs
    a value in R3 and I3 changes R3

32
Register Renaming
  • Output and antidependencies occur because
    register contents may not reflect the correct
    ordering from the program
  • May result in a pipeline stall
  • Registers allocated dynamically
  • i.e. registers are not specifically named

33
Register Renaming example
  • R3bR3a R5a (I1)
  • R4bR3b 1 (I2)
  • R3cR5a 1 (I3)
  • R7bR3c R4b (I4)
  • Without subscript refers to logical register in
    instruction
  • With a subscript then hardware register allocated
  • Note R3a R3b R3c

34
Speedups of Machine Organizations Without
Procedural Dependencies
35
Machine Parallelism
  • Duplication of Resources
  • Out of order issue
  • Renaming
  • Not worth duplication functions without register
    renaming
  • Need instruction window large enough (more than
    8)

36
Superscalar Implementation
  • Simultaneously fetch multiple instructions
  • Logic to determine true dependencies involving
    register values
  • Mechanisms to communicate these values
  • Mechanisms to initiate multiple instructions in
    parallel
  • Resources for parallel execution of multiple
    instructions
  • Mechanisms for committing process state in
    correct order

37
Q
A
Write a Comment
User Comments (0)
About PowerShow.com