Instruction Level Distributed Processing - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Instruction Level Distributed Processing

Description:

Single cycle wastes a lot of resources. Break the operation of every instructions ... Control logic counts the cycle based on decoded information. Pipelining ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 20
Provided by: qia73
Category:

less

Transcript and Presenter's Notes

Title: Instruction Level Distributed Processing


1
Instruction Level Distributed Processing
  • Wenyin Fu
  • ECE Department
  • University of Wisconsin - Madison

2
History
  • ENIAC - The worlds first general purpose
    electronic computer
  • It is programmable by manually plugging cables
    and setting switches
  • EDVAC - Stored-program concept by John von
    Neumann

3
von Neumann architecture
4
von Neumann architecture
  • Store both data and instructions in the memory
  • A one-dimensional single memory
  • Sequential execution of instructions dictated by
    programs order

5
Single cycle machine
  • Instruction fetch
  • Instruction decode and read register
  • Execution
  • Data access
  • Write register

All these are done in a single cycle, which means
the cycle time should be chosen to accommodate
the longest operation!
6
Multi cycle machine
  • Not every instruction takes the same time
  • Single cycle wastes a lot of resources
  • Break the operation of every instructions into
    small pieces, each takes a cycle
  • Control logic counts the cycle based on decoded
    information

7
Pipelining
  • Overlap multiple instructions execution

8
Control hazard and data hazard
  • What to do when the branch condition isnt known
    yet? Stall the pipe. Easy but crude.
  • What to do if the operands you need have been
    produced by the previous instruction? Stall the
    pipe? Why not let independent instructions go
    ahead?

9
Dynamic execution
  • Start executing instruction when its operands are
    ready, not necessarily in program order
  • Need some mechanism to ensure the sequential
    semantics, e.g. ROB (Reorder buffer)

10
Dynamic execution
11
Superscalar processor
  • Why issue only one instruction every cycle?
  • Issue multiple instructions every cycle
  • Needs more transistors to check control
    dependence (e.g. conditional branch ) and data
    dependence (e.g. add after load)
  • Todays CMOS technology makes the big transistor
    budget practical

12
Speculation
  • What do we do when meeting a conditional branch
    instruction? Wait until the condition is
    resolved? NO.
  • Use a predictor to guess where the program will
    go next.
  • Needs recovering mechanism when the prediction is
    wrong.
  • Predictors accuracy is important because the
    penalty for a mispredicted branch in a deep
    pipelined processor can be overwhelming.

13
ILDP the future paradigm?
  • What do we want to build with 1billion
    transistors? ? Put more stuff on the chip
  • Wire delay is increasing ? Design the chip
    in a distributed way
  • Cycle time is still a useful method to speed up
    the processor ? Simple logic

14
Problems
  • Synchronization is the key.
  • Where to synchronize?
  • How to synchronize?
  • Is it possible to implement the design?
  • What is the impact on performance?

15
An ILDP processor
16
Issues of a distributed front-end
  • Bubbles caused by switches among branch
    predictors
  • Bubbles caused by misaligned branch instructions
  • Extra two pipe stages increases mispredictions
    penalty

Quantitatively how big are they?
17
Simulation
  • Quantitative study calls for simulation
  • Needs to be accurate in terms of cycle counts
  • Faithfully simulate the microarchitecture
  • Easy to configure and debug
  • Speed is also one consideration

18
Simulation results
19
Performance loss breakdown
Write a Comment
User Comments (0)
About PowerShow.com