CSECE 365 Computer Architecture - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

CSECE 365 Computer Architecture

Description:

Manifesto for Agile Software Development. We are uncovering better ways of developing ... not a single approach to software development. ... – PowerPoint PPT presentation

Number of Views:126

Avg rating:3.0/5.0

Slides: 23

Provided by: eso67

Category:

more less

Transcript and Presenter's Notes

Title: CSECE 365 Computer Architecture

1
CS/ECE 365 Computer Architecture

Soundararajan Ezekiel
Department of Computer Science
Ohio Northern University

2
Plan for Today

Home work out today
Thursday Quiz
Study Hazard for DLX Architecture-- Do Problems
structural, control, and data
Data path--- Introduction

3
Structural Hazard

A machine with only one memory port will generate
a conflict whenever a memory reference occurs
in this example the load instruction uses the
memory for a data access at the same time
instruction 3 wants to fetch an instruction from
memory

4
2 instruction need access to memory in clock
cycle4. This is the big reason for having
separate memory I memory for instruction and D
memory for data value
5
Two instructions need access memory in clock
cycle 4. We had to stall to fix this as it
6
Example

Suppose that data transfer constitute 40 of the
mix, and that the ideal CPI of the pipelined
machine, ignoring the structural hazard, is 1.
Assume that the machine with the structural
hazard has a clock rate that is 1.05 times
higher than the clock rate of the machine without
the hazard. Disregarding any other performance
losses, is the pipeline with or without the
structural hazard faster, and by how much?

7
Answer

Several ways we can do this problem
simplest form compute average instruction time
for 2 machines
Ave instruction time CPIClock cycle time
since no stall, the average instruction time for
the ideal machinesimply then clock cycle time
ideal
The average instruction time for the machine with
the structural hazard is
Ave Instruction timeCPI Clock cycle time
(10.41)(clock cycle time
ideal/1.05
(1.33Clock Cycle Time ideal)
machine without hazard is faster-- 1.33 times
faster

8
Data Hazards

A major effect of pipelining is to change the
relative timing of instruction by overlapping
their execution
ADD R1, R2, R3
SUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9
XOR R10, R1,R11
All the instruction after the ADD use the result
of the ADD instruction

9
Time (in clock cycle)
CC1 CC2 CC3 CC4
CC5 CC6
ALU
IM
Reg
DM
Reg
ADD R1,R2,R3
ALU
DM
Reg
IM
Reg
SUB R4,R1,R5
ALU
DM
IM
Reg
AND R6,R1,R7
ALU
IM
Reg
OR R8,R1,R9
IM
Reg
XOR R10,R1,R11
The use of the result of the add instruction in
the next 3 instruction causes a
hazard, since the register is not written until
after those instructions read it
10
Remedy

The problem posed in above slide can be solved
with a simple hardware technique called
forwarding
That is ADD produce the result in EX/MEM
register
SUB need this value at ALU input latch
Forward move the result from EX/MEM to ALU
input latch

11
Data Hazard Classification

consider two instruction i and j , which i
occurring before j
possible data hazards
RAW(read after write)--j tries to read a source
before I writes it----j incorrectly gets older
value---more general --- forwarding will
overcome
WAW(write after write) j tries to write an
operand before it is written by I
This will happen when the pipelines that write
more than one pipe stage
The DLX integer pipeline writes s register only
in WB and avoids this class of hazard
we will discuss this situation later

12
Continue

WAR(write after read) --j tries to write a
destination before it is read by I, so I
incorrectly gets a new value
Not all potential data hazards can be handled by
bypassing

13
Example

Suppose that 30 of the instructions are loads,
and half the time the instruction following a
load instruction depends on the result of the
load.If this hazard creates a single cycle delay,
how much faster is the ideal pipeline
machine(with CPI of 1)that does not delay the
pipeline than the real pipeline? Ignore any
stalls other than pipeline stalls.

14
Ans

The ideal machine will be faster by the ratio of
CPI
The CPI for an instruction following a load is
1.5(since it stall half the time)
the effective CPI is (0.71 0.31.5)1.15
this means that the ideal machine is 1.15 faster

15
compiler scheduling for data hazard

many types of stall are quite frequent
Typical code generation pattern for a statement
such as ABC produces a stall for a load of the
second data value C
Next slide shows that the store of A need not
cause another stall, since the result of the
addition can be forwarded to the data memory for
use by the store

16
Figure
LW R1,B IF ID EX MEM WB LW R2,C IF ID E
X MEM WB ADD R3,R1,R2 IF ID STALL EX MEM W
B
SW A, R3 IF STALL ID EX MEMWB

The DLX code sequence for ABC. The ADD
instruction must be stalled to allow the load of
C complete. The SW need not be delayed further
the forwarding hardware passes the result from
the MEM?WB directly to the data memory input for
storing

17
pipeline scheduling or instruction scheduling

Rather than just allow the pipeline to stall, the
compiler could try to schedule the pipeline to
avoid these stalls by rearranging the code
sequence to eliminate the hazard.
Example the compiler could try to avoid
generating a code with a load followed by the
immediate use of the load destination register.
This technique is called pipeline scheduling or
instruction scheduling
First used in 1960-- 1980 it become more popular

18
Implementing the control for the DLX Pipeline

The process of letting an instruction move from
the instruction decode stage (ID) into the
execution stage (EX) of this pipeline is usually
called instruction issue an instruction that has
made this step is said to have issued
For DLX integer pipeline all the data hazards can
be checked during the ID phase of the pipeline

The load instruction has a delay or latency that
cannot be eliminated by forwarding alone.
Instead, we need to add hardware, called a
pipeline interlock, to preserve the correct
execution pattern.
In general, pipeline interlock detects a hazard
and stall the pipeline until hazard is cleared

20
Datapath and control Introduction

We did performance of a machine
3 factors--- instruction count---clock cycle
time--and clock cycles per instruction(CPI)
CPU time ICCPIClock cycle time
clock cycle time 1/clock rate
Clock cycle time Hardware technology and
organization
CPI Organization and instruction set
architecture
Instruction count Instruction set architecture
and compiler technology

21
continue

We will discuss datapath and control unit for two
different implementation of the MIPS instruction
set
which includes
Memory -reference instructions load word (lw)
and store word(sw)
Arithmetical-Logical Instruction add, sub, and
, or, slt
The instructions branch equal (beq) and jump (j)

22
(No Transcript)

Write a Comment

User Comments (0)