Title: CSECE 365 Computer Architecture
1CS/ECE 365 Computer Architecture
- Soundararajan Ezekiel
- Department of Computer Science
- Ohio Northern University
2Plan for Today
- Home work out today
- Thursday Quiz
- Study Hazard for DLX Architecture-- Do Problems
- structural, control, and data
- Data path--- Introduction
3Structural Hazard
- A machine with only one memory port will generate
a conflict whenever a memory reference occurs
- in this example the load instruction uses the
memory for a data access at the same time
instruction 3 wants to fetch an instruction from
memory
4 2 instruction need access to memory in clock
cycle4. This is the big reason for having
separate memory I memory for instruction and D
memory for data value
5Two instructions need access memory in clock
cycle 4. We had to stall to fix this as it
6Example
- Suppose that data transfer constitute 40 of the
mix, and that the ideal CPI of the pipelined
machine, ignoring the structural hazard, is 1.
Assume that the machine with the structural
hazard has a clock rate that is 1.05 times
higher than the clock rate of the machine without
the hazard. Disregarding any other performance
losses, is the pipeline with or without the
structural hazard faster, and by how much?
7Answer
- Several ways we can do this problem
- simplest form compute average instruction time
for 2 machines
- Ave instruction time CPIClock cycle time
- since no stall, the average instruction time for
the ideal machinesimply then clock cycle time
ideal
- The average instruction time for the machine with
the structural hazard is
- Ave Instruction timeCPI Clock cycle time
- (10.41)(clock cycle time
ideal/1.05
- (1.33Clock Cycle Time ideal)
- machine without hazard is faster-- 1.33 times
faster
8Data Hazards
- A major effect of pipelining is to change the
relative timing of instruction by overlapping
their execution
- ADD R1, R2, R3
- SUB R4,R1,R5
- AND R6,R1,R7
- OR R8,R1,R9
- XOR R10, R1,R11
- All the instruction after the ADD use the result
of the ADD instruction
9Time (in clock cycle)
CC1 CC2 CC3 CC4
CC5 CC6
ALU
IM
Reg
DM
Reg
ADD R1,R2,R3
ALU
DM
Reg
IM
Reg
SUB R4,R1,R5
ALU
DM
IM
Reg
AND R6,R1,R7
ALU
IM
Reg
OR R8,R1,R9
IM
Reg
XOR R10,R1,R11
The use of the result of the add instruction in
the next 3 instruction causes a
hazard, since the register is not written until
after those instructions read it
10Remedy
- The problem posed in above slide can be solved
with a simple hardware technique called
forwarding
- That is ADD produce the result in EX/MEM
register
- SUB need this value at ALU input latch
- Forward move the result from EX/MEM to ALU
input latch
11Data Hazard Classification
- consider two instruction i and j , which i
occurring before j
- possible data hazards
- RAW(read after write)--j tries to read a source
before I writes it----j incorrectly gets older
value---more general --- forwarding will
overcome - WAW(write after write) j tries to write an
operand before it is written by I
- This will happen when the pipelines that write
more than one pipe stage
- The DLX integer pipeline writes s register only
in WB and avoids this class of hazard
- we will discuss this situation later
12Continue
- WAR(write after read) --j tries to write a
destination before it is read by I, so I
incorrectly gets a new value
- Not all potential data hazards can be handled by
bypassing
13Example
- Suppose that 30 of the instructions are loads,
and half the time the instruction following a
load instruction depends on the result of the
load.If this hazard creates a single cycle delay,
how much faster is the ideal pipeline
machine(with CPI of 1)that does not delay the
pipeline than the real pipeline? Ignore any
stalls other than pipeline stalls.
14Ans
- The ideal machine will be faster by the ratio of
CPI
- The CPI for an instruction following a load is
1.5(since it stall half the time)
- the effective CPI is (0.71 0.31.5)1.15
- this means that the ideal machine is 1.15 faster
15compiler scheduling for data hazard
- many types of stall are quite frequent
- Typical code generation pattern for a statement
such as ABC produces a stall for a load of the
second data value C
- Next slide shows that the store of A need not
cause another stall, since the result of the
addition can be forwarded to the data memory for
use by the store
16Figure
LW R1,B IF ID EX MEM WB LW R2,C IF ID E
X MEM WB ADD R3,R1,R2 IF ID STALL EX MEM W
B
SW A, R3 IF STALL ID EX MEMWB
- The DLX code sequence for ABC. The ADD
instruction must be stalled to allow the load of
C complete. The SW need not be delayed further
the forwarding hardware passes the result from
the MEM?WB directly to the data memory input for
storing
17pipeline scheduling or instruction scheduling
- Rather than just allow the pipeline to stall, the
compiler could try to schedule the pipeline to
avoid these stalls by rearranging the code
sequence to eliminate the hazard. - Example the compiler could try to avoid
generating a code with a load followed by the
immediate use of the load destination register.
- This technique is called pipeline scheduling or
instruction scheduling
- First used in 1960-- 1980 it become more popular
18Implementing the control for the DLX Pipeline
- The process of letting an instruction move from
the instruction decode stage (ID) into the
execution stage (EX) of this pipeline is usually
called instruction issue an instruction that has
made this step is said to have issued - For DLX integer pipeline all the data hazards can
be checked during the ID phase of the pipeline
19- The load instruction has a delay or latency that
cannot be eliminated by forwarding alone.
Instead, we need to add hardware, called a
pipeline interlock, to preserve the correct
execution pattern. - In general, pipeline interlock detects a hazard
and stall the pipeline until hazard is cleared
20Datapath and control Introduction
- We did performance of a machine
- 3 factors--- instruction count---clock cycle
time--and clock cycles per instruction(CPI)
- CPU time ICCPIClock cycle time
- clock cycle time 1/clock rate
- Clock cycle time Hardware technology and
organization
- CPI Organization and instruction set
architecture
- Instruction count Instruction set architecture
and compiler technology
21continue
- We will discuss datapath and control unit for two
different implementation of the MIPS instruction
set
- which includes
- Memory -reference instructions load word (lw)
and store word(sw)
- Arithmetical-Logical Instruction add, sub, and
, or, slt
- The instructions branch equal (beq) and jump (j)
22(No Transcript)