Ch 6: Pipelining Modified from Dave Pattersons notes - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Ch 6: Pipelining Modified from Dave Pattersons notes

Description:

EE30332 Ch6 DP .1. Ch 6: Pipelining Modified from Dave Patterson's notes. Laundry Example ... EE30332 Ch6 DP .2. Sequential Laundry ... EE30332 Ch6 DP .16 ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 66
Provided by: davidec
Category:

less

Transcript and Presenter's Notes

Title: Ch 6: Pipelining Modified from Dave Pattersons notes


1
Ch 6 Pipelining Modified from Dave Pattersons
notes
  • Laundry Example
  • Ann, Brian, Cathy, Dave each have one load of
    clothes to wash, dry, and fold
  • Washer takes 30 minutes
  • Dryer takes 30 minutes
  • Folder takes 30 minutes
  • Stasher takes 30 minutesto put clothes into
    drawers

A
B
C
D
2
Sequential Laundry
2 AM
12
6 PM
7
8
11
1
10
9
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
T a s k O r d e r
Time
  • Sequential laundry takes 8 hours for 4 loads
  • If they learned pipelining, how long would
    laundry take?

3
Pipelined Laundry Start work ASAP
2 AM
12
6 PM
8
1
7
10
11
9
Time
T a s k O r d e r
  • Pipelined laundry takes 3.5 hours for 4 loads!

4
Pipelining Lessons
  • Pipelining doesnt help latency of single task,
    it helps throughput of entire workload
  • Multiple tasks operating simultaneously using
    different resources
  • Potential speedup Number pipe stages
  • Pipeline rate limited by slowest pipeline stage
  • Unbalanced lengths of pipe stages reduces speedup
  • Time to fill pipeline and time to drain it
    reduces speedup
  • Stall for Dependences

6 PM
7
8
9
Time
T a s k O r d e r
5
Pipelining
  • Improve performance by increasing instruction
    throughput.
  • To increase throughput, is to minimize the
    individual stage time duration.
  • One natural way to minimize the stage time
    duration is to split an instruction into more
    stages.
  • One disadvantage for more stage is branching,
    because it will alter the instruction flow, which
    is sequential.
  • What do we need to add to actually split the
    datapath into stages?
  • Ans By adding storage devices between stages
    to increase more stages.

6
The Five Stages of Load
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Load
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec Registers Fetch and Instruction Decode
  • Exec Calculate the memory address
  • Mem Read the data from the Data Memory
  • Wr Write the data back to the register file

7
Conventional Pipelined Execution Representation
Time
Program Flow
8
Single Cycle, Multiple Cycle, vs. Pipeline
Cycle 1
Cycle 2
Clk
Single Cycle Implementation
Load
Store
Waste
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
Multiple Cycle Implementation
Load
Store
R-type
Pipeline Implementation
Load
Store
R-type
9
Why Pipeline?
  • Suppose we execute 100 instructions
  • Single Cycle Machine
  • 45 ns/cycle x 1 CPI x 100 inst 4500 ns
  • Multicycle Machine
  • 10 ns/cycle x 4.6 CPI (due to inst mix) x 100
    inst 4600 ns
  • Ideal pipelined machine
  • 10 ns/cycle x (1 CPI x 100 inst 4 cycle drain)
    1040 ns

10
Why Pipeline? Because the resources are there!
Time (clock cycles)
I n s t r. O r d e r
Inst 0
Inst 1
Inst 2
Inst 3
Inst 4
11
Can pipelining get us into trouble?
  • Yes Pipeline Hazards
  • structural hazards attempt to use the same
    resource two different ways at the same time
  • E.g., combined washer/dryer would be a structural
    hazard or folder busy doing something else
    (watching TV)
  • data hazards attempt to use item before it is
    ready
  • E.g., one sock of pair in dryer and one in
    washer cant fold until get sock from washer
    through dryer
  • instruction depends on result of prior
    instruction still in the pipeline
  • control hazards attempt to make a decision
    before condition is evaulated
  • E.g., washing football uniforms and need to get
    proper detergent level need to see after dryer
    before next load in
  • branch instructions
  • Can always resolve hazards by waiting
  • pipeline control must detect the hazard
  • take action (or delay action) to resolve hazards

12
Single Memory is a Structural Hazard
Time (clock cycles)
I n s t r. O r d e r
Mem
Reg
Reg
Load
Instr 1
Instr 2
Mem
Mem
Reg
Reg
Instr 3
Instr 4
Detection is easy in this case! (right half
highlight means read, left half write)
13
Structural Hazards limit performance
  • Example if 1.3 memory accesses per instruction
    and only one memory access per cycle then
  • average CPI 1.3
  • otherwise resource is more than 100 utilized

14
Control Hazard Solutions
  • Stall wait until decision is clear
  • Its possible to move up decision to 2nd stage by
    adding hardware to check registers as being read
  • Impact 2 clock cycles per branch instruction
    slow

I n s t r. O r d e r
Time (clock cycles)
Mem
Reg
Reg
Add
Mem
Reg
Reg
Beq
Load
Mem
Reg
Reg
15
Control Hazard Solutions
  • Predict guess one direction then back up if
    wrong
  • Predict not taken
  • Impact 1 clock cycles per branch instruction if
    right, 2 if wrong (right 50 of time)
  • More dynamic scheme history of 1 branch ( 90)

I n s t r. O r d e r
Time (clock cycles)
Mem
Reg
Reg
Add
Mem
Reg
Reg
Beq
Load
Mem
Mem
Reg
Reg
16
Control Hazard Solutions
  • Redefine branch behavior (takes place after next
    instruction) delayed branch
  • Impact 0 clock cycles per branch instruction if
    can find instruction to put in slot ( 50 of
    time)
  • As launch more instruction per clock cycle, less
    useful

I n s t r. O r d e r
Time (clock cycles)
Mem
Reg
Reg
Add
Mem
Reg
Reg
Beq
Misc
Mem
Mem
Reg
Reg
Load
Mem
Mem
Reg
Reg
17
Data Hazard on r1
add r1 ,r2,r3
sub r4, r1 ,r3
and r6, r1 ,r7
or r8, r1 ,r9
xor r10, r1 ,r11
18
Data Hazard on r1
  • Dependencies backwards in time are hazards

Time (clock cycles)
IF
ID/RF
EX
MEM
WB
add r1,r2,r3
Reg
ALU
Reg
Im
Dm
I n s t r. O r d e r
sub r4,r1,r3
Dm
Reg
Reg
Dm
Reg
and r6,r1,r7
Reg
Im
Dm
Reg
ALU
Reg
or r8,r1,r9
xor r10,r1,r11
19
Data Hazard Solution
  • Forward result from one stage to another
  • or OK if define read/write properly

Time (clock cycles)
IF
ID/RF
EX
MEM
WB
add r1,r2,r3
Reg
ALU
Reg
Im
Dm
I n s t r. O r d e r
sub r4,r1,r3
Dm
Reg
Reg
Dm
Reg
and r6,r1,r7
Reg
Im
Dm
Reg
ALU
Reg
or r8,r1,r9
xor r10,r1,r11
20
Forwarding (or Bypassing) What about Loads
  • Dependencies backwards in time are
    hazards
  • Cant solve with forwarding
  • Must delay/stall instruction dependent on loads

Time (clock cycles)
IF
ID/RF
EX
MEM
WB
lw r1,0(r2)
Reg
ALU
Reg
Im
Dm
sub r4,r1,r3
Dm
Reg
Reg
21
Designing a Pipelined Processor
  • Go back and examine your datapath and control
    diagram
  • associated resources with states
  • ensure that flows do not conflict, or figure out
    how to resolve
  • assert control in appropriate stage

22
Pipelined Processor (almost) for slides
  • What happens if we start a new instruction every
    cycle?

Valid
IRex
IR
IRwb
Inst. Mem
IRmem
WB Ctrl
Dcd Ctrl
Ex Ctrl
Mem Ctrl
Equal
Reg. File
Reg File
Exec
PC
Next PC
Mem Access
Data Mem
23
Control and Datapath
IR A S S S S If Cond PC M MemS Rrd Rrd Rrt Equal
Reg. File
Reg File
Exec
PC
IR
Next PC
Inst. Mem
Mem Access
Data Mem
24
Pipelining the Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Clock
2nd lw
3rd lw
  • The five independent functional units in the
    pipeline datapath are
  • Instruction Memory for the Ifetch stage
  • Register Files Read ports (bus A and busB) for
    the Reg/Dec stage
  • ALU for the Exec stage
  • Data Memory for the Mem stage
  • Register Files Write port (bus W) for the Wr
    stage

25
The Four Stages of R-type
Cycle 1
Cycle 2
Cycle 3
Cycle 4
R-type
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec Registers Fetch and Instruction Decode
  • Exec
  • ALU operates on the two register operands
  • Update PC
  • Wr Write the ALU output back to the register file

26
Pipelining the R-type and Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
Ops! We have a problem!
R-type
R-type
Load
R-type
R-type
  • We have pipeline conflict or structural hazard
  • Two instructions try to write to the register
    file at the same time!
  • Only one write port

27
Important Observation
  • Each functional unit can only be used once per
    instruction
  • Each functional unit must be used at the same
    stage for all instructions
  • Load uses Register Files Write Port during its
    5th stage
  • R-type uses Register Files Write Port during its
    4th stage
  • 2 ways to solve this pipeline hazard.

28
Solution 1 Insert Bubble into the Pipeline
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
Load
R-type
Pipeline
R-type
R-type
Bubble
  • Insert a bubble into the pipeline to prevent 2
    writes at the same cycle
  • The control logic can be complex.
  • Lose instruction fetch and issue opportunity.
  • No instruction is started in Cycle 6!

29
Solution 2 Delay R-types Write by One Cycle
  • Delay R-types register write by one cycle
  • Now R-type instructions also use Reg Files write
    port at Stage 5
  • Mem stage is a NOOP stage nothing is being done.

4
1
2
3
5
Mem
R-type
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
R-type
R-type
Load
R-type
R-type
30
The Four Stages of Store
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Store
Wr
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec Registers Fetch and Instruction Decode
  • Exec Calculate the memory address
  • Mem Write the data into the Data Memory

31
The Three Stages of Beq
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Beq
Wr
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec
  • Registers Fetch and Instruction Decode
  • Exec
  • compares the two register operand,
  • select correct branch target address
  • latch into PC

32
Control Diagram
IR A S S S S If Cond PC M MemS M M Rrd Rrd Rrt Equal
Reg. File
Reg File
Exec
PC
IR
Next PC
Inst. Mem
Mem Access
Data Mem
33
Lets Try it Out
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
these addresses are octal
34
Start Fetch 10
Inst. Mem
Decode
WB Ctrl
Mem Ctrl
IR
im
rs
rt
Reg. File
Reg File
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
10
PC
35
Fetch 14, Decode 10
lw r1, r2(35)
Inst. Mem
Decode
WB Ctrl
Mem Ctrl
IR
im
2
rt
Reg. File
Reg File
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
14
PC
36
Fetch 20, Decode 14, Exec 10
addI r2, r2, 3
Inst. Mem
Decode
WB Ctrl
lw r1
Mem Ctrl
IR
35
2
rt
Reg. File
Reg File
r2
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
20
PC
37
Fetch 24, Decode 20, Exec 14, Mem 10
sub r3, r4, r5
addI r2, r2, 3
Inst. Mem
Decode
WB Ctrl
lw r1
Mem Ctrl
IR
3
4
5
Reg. File
Reg File
r2
r235
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
24
PC
38
Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10
beq r6, r7 100
Inst. Mem
Decode
WB Ctrl
addI r2
lw r1
sub r3
Mem Ctrl
IR
6
7
Reg. File
Reg File
r4
Mr235
r23
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
30
PC
39
Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14
ori r8, r9 17
Inst. Mem
Decode
WB Ctrl
addI r2
sub r3
Mem Ctrl
beq
IR
9
xx
100
r1Mr235
Reg. File
Reg File
r6
r23
r4-r5
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
34
PC
40
Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20
Inst. Mem
Decode
ori r8
WB Ctrl
sub r3
beq
add r10, r11, r12
Mem Ctrl
11
12
17
Reg. File
r1Mr235
IR
Reg File
r9
r4-r5
r2 r23
xxx
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
100
PC
ooops, we should have only one delayed instruction
41
Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24
n
Inst. Mem
Decode
add r10
WB Ctrl
beq
ori r8
Mem Ctrl
and r13, r14, r15
14
15
xx
Reg. File
r1Mr235
IR
Reg File
r11
xxx
r9 17
r2 r23
Exec
r3 r4-r5
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
104
PC
Squash the extra instruction in the branch shadow!
42
Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30
n
Inst. Mem
Decode
ori r8
add r10
WB Ctrl
and r13
Mem Ctrl
xx
Reg. File
r1Mr235
IR
Reg File
r14
r9 17
r2 r23
r11r12
Exec
r3 r4-r5
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
110
PC
Squash the extra instruction in the branch shadow!
43
Fetch 114, Dcd 110, Ex 104, Mem 100, WB 34
n
NO WB NO Ovflow
and r13
Inst. Mem
Decode
add r10
WB Ctrl
Mem Ctrl
Reg. File
r1Mr235
IR
Reg File
r11r12
r2 r23
r14 R15
Exec
r3 r4-r5
r8 r9 17
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
114
PC
Squash the extra instruction in the branch shadow!
44
Summary Pipelining
  • What makes it easy
  • all instructions are the same length
  • just a few instruction formats
  • memory operands appear only in loads and stores
  • What makes it hard?
  • structural hazards suppose we had only one
    memory
  • control hazards need to worry about branch
    instructions
  • data hazards an instruction depends on a
    previous instruction
  • Well build a simple pipeline and look at these
    issues
  • Well talk about modern processors and what
    really makes it hard
  • exception handling
  • trying to improve performance with out-of-order
    execution, etc.

45
Summary
  • Pipelining is a fundamental concept
  • multiple steps using distinct resources
  • Utilize capabilities of the Datapath by pipelined
    instruction processing
  • start next instruction while working on the
    current one
  • limited by length of longest stage (plus
    fill/flush)
  • detect and resolve hazards

46
Ch6 Supplementary
  • Branch prediction
  • Branch prediction is critical for superpipelining
    and superscalar computers
  • More instructions issued at same time, larger the
    penalty of hazards
  • Statistically, 60 conditional branches will
    branch
  • Higher-level (and more powerful) instruction sets
    need less conditional branches, such as those
    support variable-length operands
  • Conditional branching can be classified into two
    types
  • Program loops
  • Random decision making

47
Static branch prediction
  • Static and Dynamic Branch Predictions
  • Static are compiler determining conditional
    branches
  • Dynamic ones are run-time (during execution)
    generated
  • Static good for looping
  • Loop Exit at Loop Start Predict continue
  • Loop Exit at Loop End Predict branching
  • Random decision guess branch to be taken

48
Dynamic branch prediction
  • Dynamic Data sensitive, at run-time (at
    execution)
  • One-bit dynamic branch prediction
  • Predict as previous record shows
  • Two-bit dynamic branch prediction
  • If previous two are same, predict the same
  • If previous two are alternating, predict
    alternating
  • Branch branch predict branch
  • Not branch not branch predict not branch
  • Branch not branch predict branch
  • Not branch branch predict not branch

49
Branch prediction cont.
  • Branch prediction cache
  • Cache with entries for instruction addresses that
    host the branch instructions, and a bit to
    indicate the previous branching or not branching

Last B or NB Previous to last B or
NB B-Branch NB-Not Branch
Branch inst address
?
Valid AND Matched
50
Other schemes for minimizing incorrect branch
prediction penalty
  • Speculative execution
  • Execute first and there is a way to roll back
  • By doing the storeback on shadow copy
  • By having a backup copy for roll undo
  • Conditional execution
  • To minimize conditional branching for some common
    action such as clear, set-to-1, move, or add
  • Delayed (conditional) branching
  • Always execute the next instruction and then do
    the conditional branching
  • Save a cycle or more

51
Branching
  • Call and Return are branching instructions
    through Stack
  • Software interrupts are implicit branchings and
    transparent to the program even though their
    actions are carried out. They are treated as
    parts of the program.

52
Superscalar
  • Superscalar
  • Fetch and execute two or more instructions
    concurrently
  • To achieve CPI
  • Dynamic issue decided at run-time to schedule
    executing two or more instructions concurrently
  • Require multiple copies of functional units such
    as instruction fetch, arithmetic execution, etc.,
    and multi-port register files and caches (or
    cache buffers)
  • There are more potential data dependency hazards,
    resource hazards, and control hazards
  • Penalty for incorrect branch prediction is BIG

53
VLIW
  • Very Long Instruction Word
  • A VLIW instruction is machine (implementation)
    dependent
  • A VLIW instruction consists of various fields
  • Each field specifies the operation of a
    functional unit, such as Ifetch (instruction
    fetch), Idecode, Ofetch (operand fetch), EX
    (integer execute), FPA (Floating-point add),
    FPMUL and FPDIV
  • Static issue instructions generated by
    compilers
  • Multiple instructions issued and being executed
    at same time

54
VLIW advantages
  • Static generating codes (by compiler)
  • Compilers can take a lot of time to pack the VLIW
    instructions, else they are done dynamically by
    hardware instruction scheduler (analyzed by
    circuitry for scheduling functional units)
  • Easier to power down individual functional units
    if they are not used, and easier for compilers to
    deliberately arrange the functional unit
    executions to minimize power consumption
  • Can execute different computer architecture
    instruction sets with a machine through
    respective compilers.
  • However, the functional units must be so
    constructed to support these instruction sets and
    architectures.

55
VLIW disadvantages
  • Compilers are hard to build
  • Machine dependent must have different compilers
    for different machines of the same architecture
  • Binary incompatible must have different binary
    codes for different machines of the same
    architecture
  • Cannot see input data when compiling, must
    prepare for all possible cases of input data
  • Difficult to recover from compiler mistakes, and
    the time penalty can be BIG
  • Difficult to debug
  • Non-VLIW machines can also power down individual
    functional units when not used

56
Ch6 Supplementary
  • Branch prediction
  • Branch prediction is critical for superpipelining
    and superscalar computers
  • More instructions issued at same time, larger the
    penalty of hazards
  • Statistically, 60 conditional branches will
    branch
  • Higher-level (and more powerful) instruction sets
    need less conditional branches, such as those
    support variable-length operands
  • Conditional branching can be classified into two
    types
  • Program loops
  • Random decision making

57
Static branch prediction
  • Static and Dynamic Branch Predictions
  • Static are compiler determining conditional
    branches
  • Dynamic ones are run-time (during execution)
    generated
  • Static good for looping
  • Loop Exit at Loop Start Predict continue
  • Loop Exit at Loop End Predict branching
  • Random decision guess branch to be taken

58
Dynamic branch prediction
  • Dynamic Data sensitive, at run-time (at
    execution)
  • One-bit dynamic branch prediction
  • Predict as previous record shows
  • Two-bit dynamic branch prediction
  • If previous two are same, predict the same
  • If previous two are alternating, predict
    alternating
  • Branch branch predict branch
  • Not branch not branch predict not branch
  • Branch not branch predict branch
  • Not branch branch predict not branch

59
Branch prediction cont.
  • Branch prediction cache
  • Cache with entries for instruction addresses that
    host the branch instructions, and a bit to
    indicate the previous branching or not branching

Last B or NB Previous to last B or
NB B-Branch NB-Not Branch
Branch inst address
?
Valid AND Matched
60
Other schemes for minimizing incorrect branch
prediction penalty
  • Speculative execution
  • Execute first and there is a way to roll back
  • By doing the storeback on shadow copy
  • By having a backup copy for roll undo
  • Conditional execution
  • To minimize conditional branching for some common
    action such as clear, set-to-1, move, or add
  • Delayed (conditional) branching
  • Always execute the next instruction and then do
    the conditional branching
  • Save a cycle or more

61
Branching
  • Call and Return are branching instructions
    through Stack
  • Software interrupts are implicit branchings and
    transparent to the program even though their
    actions are carried out. They are treated as
    parts of the program.

62
Superscalar
  • Superscalar
  • Fetch and execute two or more instructions
    concurrently
  • To achieve CPI
  • Dynamic issue decided at run-time to schedule
    executing two or more instructions concurrently
  • Require multiple copies of functional units such
    as instruction fetch, arithmetic execution, etc.,
    and multi-port register files and caches (or
    cache buffers)
  • There are more potential data dependency hazards,
    resource hazards, and control hazards
  • Penalty for incorrect branch prediction is BIG

63
VLIW
  • Very Long Instruction Word
  • A VLIW instruction is machine (implementation)
    dependent
  • A VLIW instruction consists of various fields
  • Each field specifies the operation of a
    functional unit, such as Ifetch (instruction
    fetch), Idecode, Ofetch (operand fetch), EX
    (integer execute), FPA (Floating-point add),
    FPMUL and FPDIV
  • Static issue instructions generated by
    compilers
  • Multiple instructions issued and being executed
    at same time

64
VLIW advantages
  • Static generating codes (by compiler)
  • Compilers can take a lot of time to pack the VLIW
    instructions, else they are done dynamically by
    hardware instruction scheduler (analyzed by
    circuitry for scheduling functional units)
  • Easier to power down individual functional units
    if they are not used, and easier for compilers to
    deliberately arrange the functional unit
    executions to minimize power consumption
  • Can execute different computer architecture
    instruction sets with a machine through
    respective compilers.
  • However, the functional units must be so
    constructed to support these instruction sets and
    architectures.

65
VLIW disadvantages
  • Compilers are hard to build
  • Machine dependent must have different compilers
    for different machines of the same architecture
  • Binary incompatible must have different binary
    codes for different machines of the same
    architecture
  • Cannot see input data when compiling, must
    prepare for all possible cases of input data
  • Difficult to recover from compiler mistakes, and
    the time penalty can be BIG
  • Difficult to debug
  • Non-VLIW machines can also power down individual
    functional units when not used
Write a Comment
User Comments (0)
About PowerShow.com