The single cycle CPU - PowerPoint PPT Presentation

About This Presentation
Title:

The single cycle CPU

Description:

Performance of Single-Cycle Machines Memory Unit 2 ns ALU and Adders 2 ns Register file (Read or Write) 1 ns Class Fetch Decode ALU Memory ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 63
Provided by: toda7
Category:
Tags: cpu | cycle | single | works

less

Transcript and Presenter's Notes

Title: The single cycle CPU


1
The single cycle CPU
2
Performance of Single-Cycle Machines
  • Memory Unit 2 ns
  • ALU and Adders 2 ns
  • Register file (Read or Write) 1 ns
  • Class Fetch Decode ALU
    Memory Write Back Total
  • R-format 2 1 2 0 1 6
  • LW 2 1 2 2 1 8
  • SW 2 1 2 2 7ns
  • Branch 2 1 2 5ns
  • Jump 2 2ns

3
?? ??? ???? ?? cycle ?? ????? ??? ????? ?????
  • ????? ???? ?????? ?? ??????? ???? ?? ??????
  • Rtype 44, LW 24, SW 12
  • BRANCH 18, JUMP 2
  • I - ???? ?????? ???????
  • T - ???? ????? ????
  • CPI - ???? ??????? ?????? 1
  • ExecutionITCPI 824712644518226
    .3 ns

4
??????
  • EXE Single cycle T single clock I
    T single clock 8
  • EXE Variable T variable clock I T
    variable clock 6.3
  • ??? ?? 1.27. ???? ???? ???? ???? ???? ????
    ?????? ??????? ??? ?????? ?? floating point
  • ?????? ???? ???? ????? ????? - ????? ??????
    ?????.
  • ?????? ????? ????? ???? ????? ?? cycles.

5
Multicycle Approach
?????? ?????? ???? ?- Multicycle ?????? ????
?? ????? ??? ?? ???? ??????? ????? ???????
??. ?????? ??????? ????? ????? ???? ??????
????? ?? ??????.
6
???? ????? ?? ?????????? ?- Multicycle
??? ?? ?????? ??????. ?? ??? cycle - ??? ??
???? ?????? ?????? ??? ???. - ???? ?? ????
?????? ?????? ??? ??? - ?? ??? ???? ?? ????? ???
???????????. ????? ?? ????? ???? - ???? ??
?????? ???? ?????? ?????. - ???? ?????? ????? ??
???????? ??????? ??????.

7
Timing of a lw instruction in a single cycle CPU
PC
0x400000
I.Mem data
Memory output
Rs, Rt
ALU inputs
D.Mem adrs
ALU output (address)
D. Mem data
Mem data
We want to replace a long single CK cycle with 5
short ones
fetch
execute
memory
Write back
decode
2ns
1ns
2ns
2ns
1ns
0
1
3
4
5(0)
2
PC
0x400000
fetch
Instruction in IR
IR
decode
ALU calculates something
A,B
execute
Timing of a lw instruction in a multi-cycle CPU
ALUout
Mem data
memory
MDR
Write back
8
Therefore we should add registers to the single
cycle CPU shown below
4
Adder
Reg File
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
5
2016Rt
Address
D. Out
5
Rd
D.In
16
150
Sext 16-gt32
9
Adding registers to split the instruction to 5
stages
4
Adder
A
Reg File
ALUout
MDR
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
IR
5
2016Rt
Address
D. Out
2
5
Rd
D.In
B
PCWrite
4
3
0
1
5
16
150
Sext 16-gt32
10
Here is the books version of the multi-cycle CPU
Only PC and IR have write enable signals All
other registers hold data for a single cycle
11
Here is our version of A mult--cycle CPU capable
of R-type lw/sw branch instructions
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
ltlt2
IR
12
Let us explain the multi-cycle CPU
  • First well look at a CPU capable of performing
    only R-type instructions
  • Then, well add the lw instruction
  • And the sw instruction
  • Then, the beq instruction
  • And finally, the j instruction

13
Let us remind ourselves how works a single cycle
CPU capable of performingR-type
instructions.Here you see the data-path and the
timing of an R-typeinstruction.
4
Adder
6
3126
Reg File
Instruction Memory
PC
ALU
6
50funct
14
A single cycle CPU demo R-type instruction
4
Instruction Memory
Reg File
ALU
PC
15
A multi cycle CPU capable of performing R-type
instructions
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
16
A multi cycle CPU capable of R-type
instructionsfetch
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
0
1
17
A multi cycle CPU capable of R-type
instructionsdecode
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
1
2
18
A multi cycle CPU capable of R-type
instructionsexecute
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
3
2
19
A multi cycle CPU capable of R-type
instructionswrite back
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
Rd
5
Rd
B
3
4
20
PC
0x400000
Inst. Mem data
Memory output the instruction
Timing of an R-type instruction in a single cycle
CPU
Rs, Rt
ALU inputs
ALU output (Data result of cala.)
GPR input
fetch
execute
Write Back
decode
3
4 (0)
0
1
2
PC
Mem data
Timing of an R-type instruction in a multi-cycle
CPU
fetch
Previous inst.
Current instruction
IR
decode
A,B
execute
ALUout
Write back
21
fetch
PC
Mem data
Current instruction
IRM ( PC )
Previous inst.
Current instruction
next inst.
IR
decode
GPR outputs
A Rs, B Rt
A,B
execute
ALU output
ALUuot A op B
Write back
Rd ALUout
ALUout
At the rising edge of CK RdALUout
R-Type instruction takes 4 CKs
IRWrite
The state diagram
A Rs, B Rt
ALUout A op B
IRM(PC)
RdALUout
22
A multi-cycle CPU capable of R-type instructions
(PC calc. )
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
23
fetch
next PC current PC4
current PC
PC
Mem data
IR
next inst.
Previous inst.
current instruction
decode
GPR outputs
A,B
execute
ALU output
Write back
ALUout
At the rising edge of CK RdALUout
ALUuot A op B
PC PC4
PCWrite
24
A multi cycle CPU capable of R-type
instructionsfetch
Reg File
A
5
IR2521Rs
Instruction Memory
PC
ALUout
IR
ALU
ALU
5
IR2016Rt
5
Rd
B
4
25
The state diagram of a CPU capable of R-type
instructions only
IRM(PC) PC PC4
ARs BRt
ALUoutA op B
Rd ALUout
26
The state diagram of a CPU capable of R-type and
lw instructions
ALUout Asext(imm)
MDR M(ALUout)
Rt MDR
27
We added registers to split the instruction to
5 stages.Lets discuss the lw instruction
4
Adder
A
Reg File
ALUout
MDR
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
IR
5
2016Rt
Address
D. Out
2
5
Rd
D.In
B
PCWrite
4
3
0
1
5
16
150
Sext 16-gt32
28
First we draw a multi-cycle CPU capable of R-type
lw instructions
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
We just moved the data memory
All parts related to lw only are blue
29
A multi-cycle CPU capable of R-type lw
instructionsfetch
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
30
A multi-cycle CPU capable of R-type lw
instructionsdecode
Reg File
A
5
IR2521Rs
Instruction Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
Data Memory
MDE
31
A multi-cycle CPU capable of R-type lw
instructionsAdrCmp
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
32
A multi-cycle CPU capable of R-type lw
instructionsmemory
Branch Address
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
Data Memory
MDR
33
A multi-cycle CPU capable of R-type lw
instructionsWB
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
Rt
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
34
Can we unite the Instruction Data memories?
(They are not used simultaneously as in the
single cycle CPU)
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
35
So here is a multi-cycle CPU capable of R-type
lw instructionsusing a single memory for
instructions data
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
MDR
36
PC
0x400000
I.Mem data
Memory output
Timing of a lw instruction in a single cycle CPU
Rs, Rt
ALU inputs
D.Mem adrs
ALU output (address)
D. Mem data
Mem data
fetch
execute
memory
Write back
decode
PC
PC4
fetch
current instruction
IR
Previous inst.
Timing of a lw instruction in a multi-cycle CPU
decode
A,B
execute
Data address
ALUout
Mem data
memory
Data to Rt
MDR
Write back
37
fetch
PC
IRM ( PC ) PC PC4
Mem data
IR
Previous inst.
current instruction
decode
GPR outputs
A Rs, B Rt
A,B
execute
ALU output
Data address
ALUuot Asext(imm)
ALUout
Data address
memory
Mem data
MDRM(ALUout)
Write back
Data to Rt
MDR
At the rising edge of CK RtMDR
PCWrite, IRWrite
38
The state diagram of a CPU capable of R-type and
lw instructions
IRM(PC) PC PC4
Fetch
0
ARs BRt
Decode
1
lw
R-type
ALUout Asext(imm)
AdrCmp
ALU
ALUoutA op B
2
6
Load
MDR M(ALUout)
3
WBR
Rt MDR
Rd ALUout
7
4
39
A multi-cycle CPU capable of R-type lw sw
instructions
Branch Address
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
MDR
lw
sw
40
The state diagram of a CPU capable of R-type and
lw and sw instructions
IRM(PC) PC PC4
ARs BRt
ALUout Asext(imm)
ALUoutA op B
M(ALUout)B
MDR M(ALUout)
Rd ALUout
Rt MDR
41
A multi-cycle CPU capable of R-type lw/sw
branch instructions
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
ltlt2
IR
42
Adding the instruction beq to the state diagram
Calc Rs -Rt (just to produce the zero signal)
Calc PCPCsext(imm)ltlt2
43
Adding the instruction beq to the state diagram,
a more efficient way Lets use the decode state
in which the ALU is doing nothing to compute the
branch address.Well have to store it for 1 more
CK cycle, until we know whether to branch or not!
(We store it in the ALUout reg.)
Calc Rs - Rt. If zero, load the PC with ALUout
data, else do not load the PC
44
A multi-cycle CPU capable of R-type lw/sw
branch instructions
PC4
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt2
IR
Branch Address
45
Adding the instruction j to the state diagram
PC PC3128 IR250ltlt2
46
A multi-cycle CPU capable of R-type lw/sw
branch jump instructions
PC4 next address
IR250
Jump address
ltlt2
PC3128
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt2
IR
Branch Address
47
????? ???? ??????? ??????
0
1
2
8
9
6
3
7
4
5
48
MultiCycle implementation with Control
49
Final State Machine

50
The final state diagram
51
(No Transcript)
52
MultiCycle implementation with Control
53
Finite State Machine for Control (The books
version)
  • Implementation

54
The Control Finite State Machine
current state
State reg
Outputs decoder
next state
control signals
next state calculation
Opcode IR3126 zero, neg, etc.
ck
For 10 states coded 0-9, we need 4 bits, i.e.,
S3,S2,S1,S0
55
The control signals decoder
We just implement the table of slide 54
Lets look at ALUSrcA it is 0 in states 0 and
1 and it is 1 in states 2, 6 and 8. In all
other states we dont care. lets look at
PCWrite it is 1 in states 0 and 9. In all
other states it must be 0. And so, well fill
the table below and build the decoder.
56
The state machine next state calc. logic
R-type
lwsw
lw
sw
R-type000000, lw100011, sw101011, beq000100,
bne000101, lui001111, j0000010, jal000011,
addi001000
57
The Control Finite State Machine
current state
Moore machine
State reg
Outputs decoder
next state
control signals
next state calculation
Opcode IR3126
PCWrite
to PC
ck
PCWriteCond
zero
Meally machine
58
Finite State Machine for Control
0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
59
ROM Implementation
  • ROM "Read Only Memory"
  • values of memory locations are fixed ahead of
    time
  • A ROM can be used to implement a truth table
  • if the address is m-bits, we can address 2m
    entries in the ROM.
  • our outputs are the bits of data that the address
    points to.m is the "heigth", and n is the
    "width"

0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
60
ROM Implementation
  • How many inputs are there? 6 bits for opcode, 4
    bits for state 10 address lines (i.e., 210
    1024 different addresses)
  • How many outputs are there? 16 datapath-control
    outputs, 4 state bits 20 outputs
  • ROM is 210 x 20 20K bits (and a rather
    unusual size)
  • Rather wasteful, since for lots of the entries,
    the outputs are the same i.e., opcode is often
    ignored

61
ROM vs PLA
  • Break up the table into two parts 4 state bits
    tell you the 16 outputs, 24 x 16 bits of
    ROM 10 bits tell you the 4 next state bits,
    210 x 4 bits of ROM Total 4.3K bits of ROM
  • PLA is much smaller can share product terms
    only need entries that produce an active
    output can take into account don't cares
  • Size is (inputs product-terms) (outputs
    product-terms) For this example
    (10x17)(20x17) 460 PLA cells
  • PLA cells usually about the size of a ROM cell
    (slightly bigger)

62
End of multi-cycle implementation
Write a Comment
User Comments (0)
About PowerShow.com