Title: CS 230: Computer Organization and Assembly Language
1CS 230 Computer Organization and Assembly
Language
Department of Computer Science and
Engineering School of Computing and
Informatics Arizona State University
Slides courtesy Prof. Yann Hang Lee, ASU, Prof.
Mary Jane Irwin, PSU, Ande Carle, UCB
2Announcements
- Project 3
- MIPS Assembler
- Project 4
- MIPS Simulator
- Due Nov 10, 2009
- Quiz 4
- Nov 5, 2009
- Single-cycle implementation
- Finals
- Tuesday, Dec 08, 2009
- Please come on time (Youll need all the time)
- Open book, notes, and internet
- No communication with any other human
3Single Cycle - Abstract View
- Abstract View
- elements that operate on data values
(combinational) - elements that contain state (sequential)
- Implementation
- Design the datapath
- Design the control
Write Data
Instruction Memory
Address
Read Data
Register File
Reg Addr
Data Memory
Read Data
Address
Instruction
ALU
PC
Reg Addr
Read Data
Write Data
Reg Addr
4Single cycle Datapath
Jump
32
26
1
Shift left 2
28
PC431-28
0
Add
Add
4
Shift left 2
PCSrc
Instruction Memory
Read Address
Instruction
PC
5Single cycle Datapath Control
Instr25-0
1
Shift left 2
32
28
26
0
PC431-28
0
Add
Add
1
4
Shift left 2
PCSrc
Jump
ALUOp
Branch
MemRead
MemtoReg
Control Unit
Instr31-26
MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr25-21
Read Addr 1
Instruction Memory
Read Data 1
Address
Register File
zero
Instr20-16
Read Addr 2
Data Memory
Read Address
Instr31-0
PC
Read Data
1
0
ALU
Write Addr
Read Data 2
0
1
Write Data
0
Instr15 -11
Write Data
1
Sign Extend
Instr15-0
ALU control
16
32
Instr5-0
6Single cycle Control Unit
Instr RegDst ALUSrc MemtoReg RegWr MemRd MemWr Branch ALUOp1 ALUOp0
R-type 000000 1 0 0 1 X 0 0 1 X
lw 100011 0 1 1 1 1 0 0 0 0
sw 101011 X 1 X 0 X 1 0 0 0
beq 000100 X 0 X 0 X 0 1 X 1
- Completely determined by the instruction opcode
field - Note that a multiplexor whose control input is 0
has a definite action, even if it is not used in
performing the operation
7Disadvantages of Single Cycle Implementation
- Uses the clock cycle inefficiently the clock
cycle must be timed to accommodate the slowest
instruction - especially problematic for more complex
instructions like floating point multiply - Is wasteful of area since some functional units
must be duplicated since they can not be shared
during an instruction execution - e.g., need separate adders to do PC update and
branch target address calculations, as well as an
ALU to do R-type arithmetic/logic operations and
data memory address calculations
8How to make it fast?
- Parallelism
- Short-cuts or Caching, or Bypassing
- Prediction
- Skip some work
- First form of parallelism is Pipelining
9Pipelining Its Natural!
- Laundry Example
- Ann, Brian, Cathy, Dave each have one load of
clothes to wash, dry, and fold - Washer takes 30 minutes
- Dryer takes 40 minutes
- Folder takes 20 minutes
10Sequential Laundry
6 PM
Midnight
7
8
9
11
10
Time
30
40
20
30
40
20
30
40
20
30
40
20
T a s k O r d e r
- Sequential laundry takes 6 hours for 4 loads
11Pipelined Laundry
6 PM
Midnight
7
8
9
11
10
Time
T a s k O r d e r
Note More time to do project 4
- Pipelined laundry takes 3.5 hours for 4 loads
12Pipelining Lessons
6 PM
7
8
9
- Multiple tasks operating simultaneously
- Pipelining doesnt help latency of single task,
it helps throughput of entire workload - Pipeline rate limited by slowest pipeline stage
- Potential speedup Number pipe stages
- Unbalanced lengths of pipe stages reduces speedup
- Also, need time to fill and drain the
pipeline.
Time
T a s k O r d e r
13Pipelining Some terms
- If youre doing laundry or implementing a mP,
each stage where something is done called a pipe
stage - In laundry example, washer, dryer, and folding
table are pipe stages clothes enter at one end,
exit other - In a mP, instructions enter at one end and have
been executed when they leave - Another example auto assembly line
- Throughput is how often stuff comes out of a
pipeline
14Technical details
- If times for all S stages are equal to T
- Time for one initiation to complete still ST
- Time between 2 initiates T not ST
- Initiations per second 1/T
- Pipelining Overlap multiple executions of same
sequence - Improves THROUGHPUT, not the time to perform a
single operation - Other examples
- Automobile assembly plant, chemical factory,
garden hose, cooking
15More technical details
- Books approach to draw pipeline timing diagrams
- Time runs left-to-right, in units of stage time
- Each row below corresponds to distinct
initiation - Boundary b/t 2 column entries pipeline register
- (i.e. hamper)
- Must look at column contents to see what stage is
doing what
0 1 2 3 4 5 6
Wash 1 Dry 1 Fold 1 Pack 1
Wash 2 Dry 2 Fold 2 Pack 2
Wash 3 Dry 3 Fold 3 Pack 3
Wash 4 Dry 4 Fold 4 Pack 4
Wash 5 Dry 5 Fold 5
Wash 6 Dry 6
Time for N initiations to complete NT (S-1)T
Throughput Time per initiation T (S-1)T/N ?
T!
16Ideal pipeline speedup
Unpipelined
combinational logic delay t
combinational logic delay t
combinational logic delay t
combinational logic delay t
delay for 1 piece of data 4t latch setup
(assume small)
Latch
Latch
approximate delay for 1000 pieces of data 4000t
Pipelined
combinational logic delay t
combinational logic delay t
combinational logic delay t
combinational logic delay t
Latch
Latch
delay for 1 piece of data 4(t latch setup)
approximate delay for 1000 pieces of data 3t
1000t
4000
4
speedup for 1000 pieces of data
1003
Ideal speedup of pipeline stages
17The new look dataflow
IF/ID
ID/EX
EX/MEM
MEM/WB
4
M u x
ADD
PC
Branch taken
Comp.
IR6...10
M u x
Inst. Memory
IR11..15
Register File
ALU
MEM/ WB.IR
M u x
Data Mem.
Data must be stored from one stage to the
next in pipeline registers/latches. hold
temporary values between clocks and needed info.
for execution.
M u x
Sign Extend
16
32
18Another way to look at it
Clock Number
Inst. 1 2 3 4 5 6 7 8
Inst. i IF ID EX MEM WB
Inst. i1 IF ID EX MEM WB
Inst. i2 IF ID EX MEM WB
Inst. i3 IF ID EX MEM WB
Time
Program execution order (in instructions)
19Questions about control signals
- Following discussion relevant to a single
instruction - Q Are all control signals active at the same
time? - Q Can we generate all these signals at the same
time?
20Passing control w/pipe registers
- Analogy send instruction with car on assembly
line - Install Corinthian leather interior on car 6 _at_
stage 3
21Pipelined datapath w/control signals
Registers
22A Pipelined Processor
- Pipeline latches pass the status and result of
the current instruction to next stage - Comparison
Clock
Dec/Reg
Exec
Ifetch
Mem
Ifetch
Single-cycle
sw
lw
23Yoda says
- Ohhh. Great warrior. Wars not make one great