CSC 4250 Computer Architectures - PowerPoint PPT Presentation

About This Presentation

Title:

CSC 4250 Computer Architectures

Description:

All operations on data apply to data in registers ... The register file is used in two stages: one for reading in ID and one for writing in WB. ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 23

Provided by: stude6

Learn more at: http://www.cs.rpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSC 4250 Computer Architectures

1
CSC 4250Computer Architectures

September 15, 2006Appendix A. Pipelining

2
What is Pipelining?

Implementation technique whereby multiple
instructions are overlapped in execution
Pipelining exploits parallelism among the
instructions in a sequential instruction stream
Recall the formula CPU time IC CPI cct
Pipelining yields a reduction in the average
execution time per instruction i.e., it
decreases the CPI

3
RISC Architectures

Reduced Instruction Set Computer
All operations on data apply to data in registers
Only operations that affect memory are loads and
stores that move data from memory to register or
to memory from register, respectively
Instruction formats are few in number with all
instructions typically the same in size

4
Three Classes of Instructions

We consider
ALU instructions
Load and store instructions
Branches (no jumps)

5
ALU Instructions

Take either two registers or a register and a
sign-extended immediate, operate on them, and
store result into a third register
DADD R1,R2,R3
Opcode R2 R3 R1 shamt opx
rs rt rd
RegR1 ? RegR2 RegR3
DADDI R1,R2,3
Opcode R2 R1 Immediate
rs rt
RegR1 ? RegR2 3

6
Load and Store Instructions

Take register source (base register) and
immediate field (offset). The sum (effective
address) is memory address. Second register is
destination (load) or source (store) of data.
LD R2,30(R1)
Opcode R1 R2 Immediate
RegR2 ? Mem30RegR1
SD R2,30(R1)
Opcode R1 R2 Immediate
MemoffsetRegR1 ? RegR2

7
Branches

Branches are conditional transfers of control
Branch destination obtained by adding a
sign-extended offset to current PC
We consider only comparison against zero
BEQZ R1,name
BEQZ is pseudo-instruction for BEQ with R0
BEQ R1,R0,name
Opcode R1 R0 Immediate

8
RISC Instruction Set

At most five clock cycles
Instruction fetch cycle (IF)
Instruction decode/register fetch cycle (ID)
Execution/effective address cycle (EX)
Memory access/branch completion (MEM)
Write-back cycle (WB)

9
Instruction Fetch (IF)

Send program counter (PC) to memory and fetch
current instruction from memory
Update PC by adding 4 (why 4?).
Operations
IR ? MemPC
NPC ? PC 4

10
Instruction Decode/Register Fetch (ID)

Decode instruction
Read registers
Decoding is done in parallel with reading
registers (fixed-field decoding)
Sign-extend the offset field
Operations
A ? Regrs
B ? Regrt
Imm ? sign-extended immediate field of IR
(A and B are temporary registers).

11
Execution/Effective Address (EX)

ALU operates on the operands prepared in ID,
performing one of four possible functions
Memory ref. (add base register and offset)
ALUOutput ? A Imm
Register-Register ALU instruction
ALUOutput ? A func B
Register-Immediate ALU instruction
ALUOutput ? A op Imm
Branch
ALUOutput ? NPC (Imm ltlt 2)
Cond ? (A 0)

12
Memory Access/Branch Completion (MEM)

PC is updated
PC ? NPC
Access memory if needed
LMD Load Memory Data Register
LMD ? MemALUOutput
or
MemALUOutput ? B
Branch
If (cond) PC ? ALUOutput

13
Write Back (WB)

Register-Register ALU
Regrd ? ALUOutput
Register-Immediate ALU
Regrt ? ALUOutput
Load
Regrt ? LMD

14
Simple RISC Pipeline

Clock Number
Instr. 1 2 3 4 5
6 7 8 9
Instr. i IF ID EX ME WB
Instr. i1 IF ID EX ME WB
Instr. i2 IF ID EX
ME WB
Instr. i3 IF ID EX ME
WB
Instr. i4 IF ID EX ME WB
What are the stages needed for an ALU
instruction?
What are the stages needed for a Store
instruction?
What are the stages needed for a Branch
instruction?
Which stage is expected to take the most time?

15
Figure A.2. Pipeline
16
Three Observations on Overlapping Execution

Use separate instruction and data memories, which
is typically implemented with separate
instruction and data caches. The use of separate
caches eliminates a conflict for a single memory
that would arise between instruction fetch and
data memory access.

17
Three Observations on Overlapping Execution

The register file is used in two stages one for
reading in ID and one for writing in WB. These
uses are distinct. Hence, we need to perform two
reads and one write every clock cycle (why two
reads?). To handle reads and a write to the same
register (and for another reason that will
arise), we perform the register write in the
first half and the reads in the second half.

18
Three Observations on Overlapping Execution

To start a new instruction every clock, we must
increment and store the PC every clock, and this
must be done during the IF stage in preparation
for the next instruction. Another problem is
that a branch does not change the PC until the
MEM stage (this problem will be handled soon).

19
Pipeline Registers

Prevent interference between two different
instructions in adjacent stages in pipeline.
Carry data of a given instruction from one stage
to the next.
Registers are triggered by clock edge - values
change instantaneously on clock edge.
Add pipelining overhead.

20
Figure A.3. Pipeline Registers
21
Example

Consider unpipelined processor. Assume 1 ns
clock cycle, 4 cycles for ALU operations and
branches, and 5 cycles for memory operations.
Suppose relative frequencies are 40, 20, and
40, respectively. The pipelining overhead is 0.2
ns. What is the speedup from pipelining?

22
Answer