Title: CSECE 365 COMPUTER ARCHITECURE
1CS/ECE 365 COMPUTER ARCHITECURE
- Soundararajan Ezekiel
- Department of Computer Science
- Ohio Northern University
2Performance of a single cycle CPU with FP
Instruction
- if we have FP unit that requires 8ns for FP
add--- 16ns for FP multiply - memory2ns
- ALU and adder 2ns(FP ALU)
- Register file(read or write )1 ns
- Find performance Ratio between variable clock
single clock
3Assumption
- All loads take the same time and comprise 31 of
the instruction - All store--same time-- 21
- R-format-- 27
- Branch 5
- jump 2
- FP add and subtract take the same time ---
together 7 - FP multiply and divide same time --together 7
4Instr class IM Reg read ALU op DM
Re write total Rformat
2 1 2
0 1 6 Lw
2 1 2
2 1
8 sw 2 1
2 2
7 branch 2 1
2
5 jump 2
2
5Answer
- cycle time for single cycle machine FP multiply
2116120ns( longest instructions time) - the time for FP add instruction 218112ns
- cycle time for variable cycle machine831721
62755221272078.1ns - ratio20/8.12.469 variable clock machine is
faster than single clock by 2.469
6A multicycle implementation
- from the above example, we break each instruction
into a series of steps corresponding to the
functional unit operations that were needed - use these to create multicycle implementation
- each step 1 clock cycle
- it allows functional unit to be used more than
once per instruction as long as it is used on
different clock cycles - this sharing reduces hardware requirement
7the simple datapath for the MIPS architecture
8High level view of multicycle datapath
PC
Instruction register
data
A
Address
Reg
ALU out
ALU
Instruction or data
Reg
B
Reg
Mem data reg
data
registers
memory
9difference
- single memory unit is used for both instruction
and data - there is a single ALU, rather than an ALU and two
adders - One or more registers are added after every major
functional units to hold the output of that unit
until that value is used in a subsequent clock
cycle
10Note
- at the end of a clock cycle, all data that are
used in subsequent clock cycles must be stored in
a state element - data used by subsequent instructions in later
clock cycle is stored into one of the
programmer-visible state element( reg file, PC,
memory) - data used by the same instruction in a later
cycle must be stored into one of these additional
registers
11position of additional reg
- the position of additional registers is
determined by 2factor - 1. What combinational units will fit in a clock
cycle - 2. What data are needed in later cycles
implementing the instruction
12temporary reg
- The instruction Register(IR)gt Save the output
of the memory for an instruction read - memory data register(MDR) gt for data read
- The A and B registers are used to hold the
register operand values read from the reg file - The ALUOut register holds the output of the ALU
13write control signal
- all the registers excepts the IR hold data only
between a pair of adjacent clock cycles and will
not need a write control signal - the IR needs to hold the instruction until the
end of execution of that instruction and thus
will need write control signal
14multiplexor
- several functional units are shared for different
purposes - add--- extend multiplexors
- one memory is used for both instruction and
datagt add one mux - need a mux to select between the 2 sources for a
memory address, namely PC and ALUOut(data access)
15replace ALU
- replacing 3 ALUs (single cycle path) by a single
ALU gt it should accommodate all the inputs that
used to go to 3 different ALUs - it requires 2 changes
- an additional mux is added for the first ALU
input. The multiplexor chooses between the A
register and the PC
16- the multiplexor on the second ALU input is
changed from 2-way to 4-way mux. - Two additional inputs to the multiplexor are the
constant 4(used to increment the PC) and the sign
extended and shifted offset filed
17multicycle datapath for MIPS handles the basic
instructions
18control signals
- the datapath shown above multiple clock cycles
per instruction, it will require a different set
of control signals - PC, memory, reg, IR ---gt need write control
signal - memory ---gt need read signal
- two- input mux------ signal control line,
- four-input mux needs----gt two-control limes
19multicycle datapath for MIPS handles the basic
instructions fig 5.32
20jump and branch instruction
- multicycle data path still require additions to
support branches and jumps - after these additions figure 5.33shows the
complete multicycle datapath
21Figure 5.33
22performance of multicycle implementation
23- The correct answer should consider
- the clock cycle time as well as
- the execution time per instruction
24Example
- In estimating the performance of the single-cycle
implementation, we assumed that only the major
functional units had any delay(I.e.the delay of
mux, control unit, PC access, sign extension unit
and wires were considered to be negligible)
Assume that we change the delays specified in
last class such that we use a different type of
adder for simple addition - ALU2 ns,
- adder for PC4 X ns
- adder for branch address computation Y ns
25What would the cycle time be if X3, Y3
- the key is to understand that it is the length of
the longest path in the combinational logic that
is determining the cycle time. We compute the
length of longest path for each instruction and
then must take the one with the maximum value. - At present the lw instruction is the longest path
of 8 ns - ANS this change will not change the current max
value of 8 ns
26what would the cycle time be if X5 and Y5
- consider beq instruction now it needs XY10 ns
- so cycle time is 10 ns
27what would the cycle time be if X1 and Y8
- you may think beq needs XY9 ns
- But it is not correct
- You think about it
28SUMMARY