Title: 55:035 Computer Architecture and Organization
155035 Computer Architecture and Organization
2Outline
- Building a CPU
- Basic Components
- MIPS Instructions
- Basic 5 Steps for CPU
- Single-Cycle Design
- Multi-cycle Design
- Comparison of Single and Multi-cycle Designs
3Overview
- Brief look
- Digital logic
- CPU Datapath
- MIPS Example
4Digital Logic
Multiplexer
A
F
B
S (Select input)
D-type Flip-flop with Enable
5Digital Logic
4 Bits
N Bits
EN
Clock (edge- triggered)
Registers
6Digital Logic
Tri-state Driver (Buffer)
In Drive Out
0 0 Z
1 0 Z
0 1 0
1 1 1
in
out
drive
What is Z ??
7Digital Logic
Adder/Subtractor or ALU
B
A
Carry-out
Carry-in
F
8Overview
- Brief look
- Digital logic
- How to Design a CPU Datapath
- MIPS Example
9Designing a CPU 5 Steps
- Analyze the instruction set ? datapath
requirements - MIPS ADD, SUB, ORI, LW, SW, BR
- Meaning of each instruction given by RTL
(register transfers) - 2 types of registers CPU/ISA registers,
temporary registers - Datapath requirements ? select the datapath
components - ALU, register file, adder, data memory, etc
- Assemble the datapath
- Datapath must support planned register transfers
- Ensure all instructions are supported
- Analyze datapath control required for each
instruction - Assemble the control logic
10Step 1a Analyze ISA
- All MIPS instructions are 32 bits long.
- Three instruction formats
- R-type
- I-type
- J-type
- R registers, I immediate, J jumps
- These formats intentionally chosen to simplify
design
11Step 1b Analyze ISA
- Meaning of the fields
- op operation of the instruction
- rs, rt, rd the source and destination register
specifiers - Destination is either rd (R-type), or rt (I-type)
- shamt shift amount
- funct selects the variant of the operation in
the op field - immediate address offset or immediate value
- target address target address of the jump
instruction
12MIPS ISA subset for today
- ADD and SUB
- addU rd, rs, rt
- subU rd, rs, rt
- OR Immediate
- ori rt, rs, imm16
- LOAD and STORE Word
- lw rt, rs, imm16
- sw rt, rs, imm16
- BRANCH
- beq rs, rt, imm16
13Step 2 Datapath Requirements
- REGISTER FILE
- MIPS ISA requires 32 registers, 32b each
- Called a register file
- Contains 32 entries
- Each entry is 32b
- AddU rd,rs,rt or SubU rd,rs,rt
- Read two sources rs, rt
- Operation rs rt or rs rt
- Write destination rd ? rs/-rt
- Requirements
- Read two registers (rs, rt)
- Perform ALU operation
- Write a third register (rd)
How to implement?
14Step 3 Datapath Assembly
- ADDU rd, rs, rt SUBU rd, rs, rt
- Need an ALU
- Hook it up to REGISTER FILE
- REGFILE has 2 read ports (rs,rt), 1 write port
(rd)
rs
Parameters Come FromInstruction Fields
rt
rd
Control Signals Depend Upon Instruction
Fields Eg ALUop f(Instruction)
f(op, funct)
15Steps 2 and 3 ORI Instruction
- ORI rt, rs, Imm16
- Need new ALUop for OR function, hook up to
REGFILE - 1 read port (rs), 1 write port (rt), 1 const
value (Imm16)
rs
rt
FromInstruction
X
rt rd
Control Signals Depend Upon Instruction
Fields E.g. ALUsrc f(Instruction)
f(op, funct)
16Steps 2 and 3 Destination Register
- Must select proper destination, rd or rt
- Depends on Instruction Type
- R-type may write rd
- I-type may write rt
FromInstruction
17Steps 2 and 3 Load Word
- LW rt, rs, Imm16
- Need Data Memory data ? MemAddr
- Addr is rsImm16, Imm16 is signed, use ALU for
- Store in rt rt ? MemrsImm16
18Steps 2 and 3 Store Word
- SW rt, rs, Imm16
- Need Data Memory MemAddr ? data
- Addr is rsImm16, Imm16 is signed, use ALU for
- Store in Mem MemrsImm16 ? rt
19Writes Need to Control Timing
- Problem write to data memory
- Data can come anytime
- Addr must come first
- MemWrite must come after Addr
- Else? writes to wrong Addr!
- Solution use ideal data memory
- Assume everything works ok
- How to fix this for real?
- One solution synchronous memory
- Another solution delay MemWr to come late
- Problems? write to register file
- Does RegWrite signal come after WrReg number?
- When does the write to a register happen?
- Read from same register as being written?
20Missing Pieces Instruction Fetching
- Where does the Instruction come from?
- From instruction memory, of course!
- Recall stored-program concept
- Alternatives? How about hard-coding wires and
switches? This is how ENIAC was programmed! - How to branch?
- BEQ rs, rt, Imm16
21Instruction Processing
- Fetch instruction
- Execute instruction
- Fetch next instruction
- Execute next instruction
- Fetch next instruction
- Execute next instruction
- Etc
- How to maintain sequence? Use a counter!
- Branches (out of sequence) ? Load the counter!
22Instruction Processing
- Program Counter
- Points to current instruction
- Address to instruction memory
- Instr ? InstrMemPC
- Next instruction counts up by 4
- Remember memory is byte-addressable,
instructions are 4 bytes - PC ? PC 4
- Branch instruction replace PC contents
23Step 1 Analyze Instructions
- Register Transfer Language
op rs rt rd shamt funct InstrMem PC
op rs rt Imm16
InstrMem PC Instr Register Transfers ADDU Rrd
? Rrs Rrt PC ? PC 4 SUBU Rrd ? Rrs
Rrt PC ? PC 4 ORI Rrt ? Rrs
zero_ext(Imm16) PC ? PC 4 LOAD Rrt ? MEM
Rrs sign_ext(Imm16) PC ? PC 4 STORE MEM
Rrs sign_ext(Imm16) ? Rrt PC ? PC
4 BEQ if ( Rrs Rrt ) then PC ?
PC 4 sign_ext(Imm16) b00
else
PC ? PC 4
24Steps 2 and 3 Datapath Assembly
Add
4
Read address
PC
Instruction310
Instruction 310
Instruction Memory
- PC a register
- Counter, counts by 4
- Provides address to Instruction Memory
25Steps 2 and 3 Datapath Assembly
0Mux1
Add
Add
Add result
4
Shift Left 2
PCSrc
Instruction2521
Read address
PC
Instruction2016
Instruction 310
Instruction Memory
Instruction1511
- PC a register
- Counter, counts by 4
- Sometimes, must add SignExtendImm16b00 for
branch instructions
Sign/ Zero Extend
Instruction150 (Imm16)
16
32
Note the sign-extender for Imm16is already in
the datapath(everything else is new)
ExtOp
26Steps 2 and 3 Add Previous Datapath
0Mux1
Add
Add
Add result
4
Shift Left 2
RegWrite
PCSrc
Instruction2521
Read reg. 1
Read address
PC
Read data 1
Instruction2016
MemtoReg
ALUSrc
Zero
ALU
Read reg. 2
Instruction 310
ALU result
Addr-ess
0Mux1
Read data 2
Read data
Write reg.
Instruction Memory
1Mux0
0Mux1
Instruction1511
Register File
Write data
Write data
RegDst
Data Memory
Sign/ Zero Extend
Instruction150 (Imm16)
ALU Control
16
32
MemWrite
ExtOp
Instruction50 (funct)
ALUOp
27What have we done?
- Created a simple CPU datapath
- Control still missing (next slide)
- Single-cycle CPU
- Every instruction takes 1 clock cycle
- Clocking ?
28One Clock Cycle
- Clock Locations
- PC, REGFILE have clocks
- Operation
- On rising edge, PC will get new value
- Maybe REGFILE will have one value updated as well
- After rising edge
- PC and REGFILE cant change
- New value out of PC
- Instruction out of INSTRMEM
- Instruction selects registers to read from
REGFILE - Instruction controls ALUop, ALUsrc, MemWrite,
ExtOp, etc - ALU does its work
- DataMem may be read (depending on instruction)
- Result value goes back to REGFILE
- New PC value goes back to PC
- Await next clock edge
Lots to do in only 1 clock cycle !!
29Missing Steps?
- Control is missing (Steps 4 and 5 we mentioned
earlier) - Generate the green signals
- ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc
- These are all f(Instruction), where f() is a
logic expression - Will look at control strategies in upcoming
lecture - Implementation Details
- How to implement REGFILE?
- Read port tristate buffers? Multiplexer?
Memory? - Two read ports two of above?
- Write port how to write only 1 register?
- How to control writes to memory? To register
file? - More instructions
- Shift instructions
- Jump instruction
- Etc
301-Cycle CPU Datapath
0Mux1
Add
Add
Add result
4
Shift Left 2
RegWrite
PCSrc
Instruction2521
Read reg. 1
Read address
PC
Read data 1
Instruction2016
MemtoReg
ALUSrc
Zero
ALU
Read reg. 2
Instruction 310
ALU result
Addr-ess
0Mux1
Read data 2
Read data
Write reg.
Instruction Memory
1Mux0
0Mux1
Instruction1511
Register File
Write data
Write data
RegDst
Data Memory
Sign/Zero Extend
Instruction150 (Imm16)
ALU Control
16
32
MemWrite
ExtOp
Instruction50 (funct)
ALUOp
311-cycle CPU Datapath Control
Add
Add
Add result
4
PCSrc
Shift Left 2
RegDst
Branch
Instruction 3126
MemRead
Con- trol
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Instruction2521
Read reg. 1
Read address
Read data 1
PC
Instruction2016
Zero
Read reg. 2
Instruction 310
ALU
Read data
Addr-ess
ALU result
Read data 2
Write reg.
Instruction Memory
Instruction1511
Data Memory
Register File
Write data
Write data
Sign/Zero Extend
Instruction150
ALU control
Instruction50
321-cycle CPU Control Lookup Table
Input or Output Signal Name R-format Lw Sw Beq
Inputs Op5 0 1 1 0
Inputs Op4 0 0 0 0
Inputs Op3 0 0 1 0
Inputs Op2 0 0 0 1
Inputs Op1 0 1 1 0
Inputs Op0 0 1 1 0
Outputs RegDst 1 0 X X
Outputs ALUSrc 0 1 1 0
Outputs MemtoReg 0 1 X X
Outputs RegWrite 1 1 0 0
Outputs MemRead 0 1 0 0
Outputs MemWrite 0 0 1 0
Outputs Branch 0 0 0 1
Outputs ALUOp1 1 0 0 0
Outputs ALUOp0 0 0 0 1
- Also I-type instructions (ORI) ExtOp
(sign-extend control), etc.
331-cycle CPU Jump Instruction
Instruction250
Jump address 31..0
PC 4 31..28
Instruction 3126
Instruction2521
Instruction2016
Instruction1511
Instruction150
Instruction50
341-cycle CPU Problems?
- Every instruction 1 cycle
- Some instructions do more work
- Eg, lw must read from DATAMEM
- All instructions must have same clock period
- Many instructions run slower than necessary
- Tricky timing on MemWrite, RegWrite(?) signals
- Write signal must come after address is stable
- Need extra resources
- PC4 adder, ALU for BEQ instruction,
DATAMEMINSTRMEM
35Performance!
- Single-Cycle CPU Performance
- Execute one instruction per clock cycle (CPI1)
- Clock cycle time? Note dataflow includes
- INSTRMEM read
- REGFILE access
- Sign extension
- ALU operation
- DATAMEM read
- REGFILE/PC write
- Not every instruction uses all resources (eg,
DATAMEM read) - Can we change clock period for each instruction?
- No! (Why not?)
- One clock period the worst case!
- This is why a single-cycle CPU is not good for
performance
361-cycle CPU Datapath Controller
Instruction250
Jump address 31..0
PC 4 31..28
Instruction 3126
Instruction2521
Instruction2016
Instruction1511
Instruction150
Instruction50
371-cycle CPU Summary
- Operation
- 1 cycle per instruction
- Control signals held fixed during entire cycle
(except BRANCH) - Only 2 registers
- PC, updated every clock cycle
- REGFILE, updated when required
- During clock cycle, data flows from
register-outputs to register-inputs - Fixed clock frequency / period
- Performance
- 1 instruction per cycle
- Slowest instruction determines clock frequency
- Outstanding issue MemWrite timing
- Assume this signal writes to memory at end of
clock cycle
38Multi-cycle CPU Goals
- Improve performance
- Break each instruction into smaller steps /
multiple cycles - LW instruction ? 5 cycles
- SW instruction ? 4 cycles
- R-type instruction ? 4 cycles
- Branch, Jump ? 3 cycles
- Aim for 5x clock frequency
- Complex instructions (eg, LW) ? 5 cycles ? same
performance as before - Simple instructions (eg, ADD) ? fewer cycles ?
faster - Save resources (gates/transistors)
- Re-use ALU over multiple cycles
- Put INSTR DATA in same memory
- MemWrite timing solved?
39Multi-cycle CPU Datapath
PC
Instruction 2521
RdReg1
Address
A
RdData1
Instruction 2016
Memory
ALU
Zero
RdReg2
ALUOut
MemData
Registers
Instruction 150
ALUresult
Instruction 1511
Write reg
B
Writedata
RdData2
Instruction Register
4
Write data
Instr150
Memory Data Register
Shift Left 2
Sign Extend
Instruction50
- Add multiplexers control signals (IorD,
MemtoReg, ALUSrcA, ALUSrcB) - Move signal paths (4, Shift Left 2)
40Multi-cycle CPU Datapath
PC
Instruction 2521
RdReg1
Address
A
RdData1
Instruction 2016
Memory
ALU
Zero
RdReg2
ALUOut
MemData
Registers
Instruction 150
ALUresult
Instruction 1511
Write reg
B
Writedata
RdData2
Instruction Register
4
Write data
Instr150
Memory Data Register
Shift Left 2
Sign Extend
Instruction50
- Add registers control signals (IR, MDR, A, B,
ALUOut) - Registers with no control signal load value every
clock cycle (eg, PC)
41Instruction Execution Example
- Execute a Load Word instruction
- LW rt, 0(rs)
- 5 Steps
- Fetch instruction
- Read registers
- Compute address
- Read data
- Write registers
42Load Word Instruction Sequence
PC
Instruction 2521
RdReg1
Address
A
RdData1
Instruction 2016
Memory
ALU
RdReg2
Zero
ALUOut
MemData
Registers
ALUresult
Instruction 150
Write reg
Instruction 1511
B
Writedata
RdData2
Instruction Register
4
Write data
Instr150
Memory Data Register
Shift Left 2
Sign Extend
Instruction50
1. Fetch Instruction InstructionRegister ?
MemPC
43Load Word Instruction Sequence
PC
Instruction 2521
RdReg1
Address
A
RdData1
Instruction 2016
Memory
ALU
RdReg2
Zero
ALUOut
MemData
Registers
Instruction 150
ALUresult
Write reg
Instruction 1511
B
Writedata
RdData2
Instruction Register
4
Write data
Instr150
Memory Data Register
Shift Left 2
Sign Extend
Instruction50
2. Read Registers A ? RegistersRs
44Load Word Instruction Sequence
PC
Instruction 2521
RdReg1
Address
A
RdData1
Instruction 2016
Memory
ALU
RdReg2
Zero
ALUOut
MemData
Registers
Instruction 150
ALUresult
Write reg
Instruction 1511
B
Writedata
RdData2
Instruction Register
4
Write data
Instr150
Memory Data Register
Shift Left 2
Sign Extend
Instruction50
3. Compute Address ALUOut ? A
SignExt(Imm16),b00
45Load Word Instruction Sequence
PC
Instruction 2521
RdReg1
Address
A
RdData1
Instruction 2016
Memory
ALU
RdReg2
Zero
ALUOut
MemData
Registers
Instruction 150
ALUresult
Write reg
Instruction 1511
B
Writedata
RdData2
Instruction Register
4
Write data
Instr150
Memory Data Register
Shift Left 2
Sign Extend
Instruction50
4. Read Data MDR ? MemoryALUOut
46Load Word Instruction Sequence
PC
Instruction 2521
RdReg1
Address
A
RdData1
Instruction 2016
Memory
ALU
RdReg2
Zero
ALUOut
MemData
Registers
Instruction 150
ALUresult
Write reg
Instruction 1511
B
Writedata
RdData2
Instruction Register
4
Write data
Instr150
Memory Data Register
Shift Left 2
Sign Extend
Instruction50
5. Write Registers RegistersRt ? MDR
47Load Word Instruction Sequence
PC
Instruction 2521
RdReg1
Address
A
RdData1
Instruction 2016
ALU
Memory
Zero
RdReg2
ALUOut
MemData
Registers
ALUresult
Instruction 150
Instruction 1511
Write reg
B
Writedata
RdData2
Instruction Register
4
Write data
Instr150
Memory Data Register
Shift Left 2
Sign Extend
Instruction50
All 5 Steps Shown
48Multi-cycle Load Word Recap
- 1. Fetch Instruction InstructionRegister ?
MemPC - 2. Read Registers A ? RegistersRs
- 3. Compute Address ALUOut ? A SignExt(Imm16)
- 4. Read Data MDR ? MemoryALUOut
- 5. Write Registers RegistersRt ? MDR
- Missing Steps?
-
49Multi-cycle Load Word Recap
- 1. Fetch Instruction InstructionRegister ?
MemPC PC ? PC 4 - 2. Read Registers A ? RegistersRs
- 3. Compute Address ALUOut ? A SignExt(Imm16)
- 4. Read Data MDR ? MemoryALUOut
- 5. Write Registers RegistersRt ? MDR
- Missing Steps?
- Must increment the PC
- Do it as part of the instruction fetch (in step
1) - Need PCWrite control signal
50Multi-cycle R-Type Instruction
- 1. Fetch Instruction InstructionRegister ?
MemPC PC ? PC 4 - 2. Read Registers A ? RegistersRs B ?
RegistersRt - 3. Compute Value ALUOut ? A op B
- 4. Write Registers RegistersRd ? ALUOut
- RTL describes data flow action in each clock
cycle - Control signals determine precise data flow
- Each step implies unique control values
51Multi-cycle R-Type Instruction Control Signal
Values
- 1. Fetch Instruction InstructionRegister ?
MemPC PC ? PC 4 - MemRead1, ALUSrcA0, IorD0, IRWrite,
ALUSrcB01, ALUop00, PCWrite, PCSource00 - 2. Read Registers A ? RegistersRs B ?
RegistersRt - ALUSrcA0, ALUSrcB11, ALUop00
- 3. Compute Value ALUOut ? A op B
- ALUSrcA1, ALUSrcB00, ALUop10
- 4. Write Registers RegistersRd ? ALUOut
- RegDst1, RegWrite, MemtoReg0
- Each step implies unique control values
- Fixed for entire cycle
- Default value implied if unspecified
52Check Your Work Is RTL Valid ?
- 1. Datapath check
- Within one cycle
- Each cycle has valid data flow path (path exists)
- Each register gets only one new value
- Across multiple cycles
- Register value is defined before use in previous
(earlier in time) clock cycle - Eg, A ? 3 must occur before B ? A
- Make sure register value doesnt disappear if
set gt1 cycle earlier - 2. Control signal check
- Each cycle, RTL describing the datapath flow
implies a value for each control signal - 0 or 1 or default or dont care
- Each control signal gets only one fixed value the
entire cycle - 3. Overall check
- Does the sequence of steps work ?
53Multi-cycle BEQ Instruction
- 1. Fetch Instruction
- InstructionRegister ? MemPC PC ? PC 4
- 2. Read Registers, Precompute Target
- A ? RegistersRs B ? RegistersRt ALUOut
? PC SignExtImm16,b00 - 3. Compare Registers, Conditional Branch
- if( (A B) 0 )
PC ? ALUOut - Green shows PC calculation flow (in parallel with
other operations)
54Multi-cycle Datapath with Control Signals
PCSrc
PCWrite
IRWrite
ALUSrcA
IorD
RegWrite
Jump address 31..0
MemRead
Instr250
RegDst
PC31..28
Instr2521
Instr2016
Instr150
In1511
Instr150
ALU Control
MemWrite
ALUSrcB
MemtoReg
Instruction50
ALUOp
55Multi-cycle Datapath with Controller
Instr. 3126
Jump address 31..0
Instr250
Instr3126
PC31..28
Instr2521
Instr2016
Instr150
In1511
Instr150
Instruction50
56Multi-cycle BEQ Instruction
- 1. Fetch Instruction
- InstructionRegister ? MemPC PC ? PC
4 - 2. Read Registers, Precompute Target
- A ? RegistersRs B ? RegistersRt ALUOut
? PC SignExtImm16,b00 - 3. Compare Registers, Conditional Branch
- if( (A B) 0 )
PC ? ALUOut - Green shows PC calculation flow (in parallel with
other operations)
57Multi-cycle Datapath with Control Signals
PCSrc
PCWrite
IRWrite
ALUSrcA
IorD
RegWrite
Jump address 31..0
MemRead
Instr250
RegDst
PC31..28
Instr2521
Instr2016
Instr150
In1511
Instr150
ALU Control
MemWrite
ALUSrcB
MemtoReg
Instruction50
ALUOp
58Multi-cycle Datapath with Controller
Instr. 3126
Jump address 31..0
Instr250
Instr3126
PC31..28
Instr2521
Instr2016
Instr150
In1511
Instr150
Instruction50
59Multi-cycle CPU Control Overview
- General approach Finite State Machine (FSM)
- Need details in each branch of control
- Precise outputs for each state (Mealy depends on
inputs, Moore does not) - Precise next state for each state (can depend
on inputs)
60How to Implement FSM ?
- Manually with logic gates FFs
- Bubble diagram, next-state table, state
assignment - Karnaugh map for each state bit, each output bit
(painful!) - High-level language description (eg, Verilog,
VHDL) - Describe FSM bubble diagram (next-states, output
values) - Automatically synthesized into gates FFs
- Microcode (µ-code) description
- Sequence through many µ-ops for each CPU
instruction - One µ-op (µ-instruction) sends correct control
signal for 1 cycle - µ-op similar to one bubble in FSM
- Acts like a mini-CPU within a CPU
- µPC microcode program counter
- Microcode storage memory contains µ-ops
- Can look similar to RTL or some new assembly
language
61FSM Specification Bubble Diagram
Can build this by examining RTL It is possible
to automatically convert RTLinto this form !
62FSM Gates FFs Implementation
FSM High-level Organization
63FSM Microcode Implementation
MicrocodeStorage (memory)
Datapathcontroloutputs
Outputs
Inputs
1
Sequencingcontrol
Microprogram Counter
Adder
Address Select Logic
Inputs from instruction register opcode field
64Multi-cycle CPU with Control FSM
Conditional Branch
FSM Control Outputs
Instr. 3126
Jump address 31..0
Instr250
Instr3126
PC31..28
Instr2521
Instr2016
Instr150
In1511
Instr150
Instruction50
65Control FSM Overview
- General approach Finite State Machine (FSM)
- Need details in each branch of control
66Detailed FSM
67Detailed FSM
68Detailed FSM Instruction Fetch
69Detailed FSM Memory Reference
LW
SW
70Detailed FSM R-Type Instruction
71Detailed FSM Branch Instruction
72Detailed FSM Jump Instruction
73Performance Comparison
- Single-cycle CPU
- vs
- Multi-cycle CPU
74Simple Comparison
75Whats really happening?
Single-cycle CPU
Ideally
Fetch
Decode
Memory
Write
CalcAddr
( Load Word Instruction )
Multi-cycle CPU
76In practice, steps differ in speeds
Load Word Instruction
77Single-cycle vs Multi-cycle
LW instruction faster for single-cycle
78Single-cycle vs Multi-cycle
SW instruction same speed
79Single-cycle vs Multi-cycle
BEQ, J instruction faster for multi-cycle
80Performance Summary
- Which CPU implementation is faster?
- LW ? single-cycle is faster
- SW,R-type ? about the same
- BEQ,J ? multi-cycle is faster
- Real programs use a mix of these instructions
- Overall performance depends instruction frequency
!
81Implementation Summary
- Single-cycle CPU
- 1 instruction per cycle (eg, 1MHz ? 1 MIPS)
- No wasted time on most complex instruction
- Large wasted time on simpler instructions
- Simple controller (just a lookup table or memory)
- Simple instructions
- Multi-cycle CPU
- ltlt 1 instruction per cycle (eg, 1MHz ? 0.2 MIPS)
- Small time wasted on most complex instruction
- Hence, this instruction always slower than
single-cycle CPU - Small time wasted on simple instructions
- Eliminates large wasted time by using fewer
clock cycles - Complex controller (FSM)
- Potential to create complex instructions