William Stallings Computer Organization and Architecture 5th Edition

About This Presentation

Title:

William Stallings Computer Organization and Architecture 5th Edition

Description:

William Stallings Computer Organization and Architecture 5th Edition Chapter 11 CPU Structure and Function CPU Topics Processor Organization ... – PowerPoint PPT presentation

Number of Views:789

Avg rating:3.0/5.0

Slides: 109

Provided by: Adr453

Category:

more less

Transcript and Presenter's Notes

Title: William Stallings Computer Organization and Architecture 5th Edition

1
William Stallings Computer Organization and
Architecture5th Edition

Chapter 11
CPU Structure and Function
CPU??????

2
Topics

Processor Organization
Register Organization
Instruction Cycle
Instruction Pipelining
The Pentium Processor

3
CPU Structure

CPU must CPU???????
Fetch instructions ??????????
Interpret instructions ?????????
Fetch data ????????
Process data ???????
Write data ????????????
CPU??????????????????

CPU needs a small internal memory
4
CPU With Systems Bus
5
CPU Internal Structure
6
Registers (???)

CPU must have some working space (temporary
storage) CPU???????????????
Called registers ?????????
Number and function vary between processor
designs ??????????????????
One of the major design decisions
??????CPU??????????
Top level of memory hierarchy ????????????
Two categories ????
User-visible registers ???????
Control and status registers ????????

7
User Visible Registers ???????

General Purpose ?? ???
Data ?? ???
Address ?? ???
Condition Codes ???? ???

8
User Visible Registers

General Purpose Registers
May be true general purpose ???????
May be restricted ????????
Data registers
Accumulator register ?????
Addressing registers
Segment pointers ????
Index registers ?????
Stack Pointer ?????

9
General or Special? ??

Make them general purpose
Increase flexibility and programmer options
???????????????
Increase instruction size complexity
????????????
Make them specialized
Smaller (faster) instructions ??????
Less flexibility ?????
The trend seems to be toward the use of
specialized registers.
??????????

10
How Many GP Registers? ??

Between 8 32 ??8-32?
Fewer more memory references
???????,?????????
More does not reduce memory references
???????????????????

11
How big? ??????

Large enough to hold the largest address
??????????
Large enough to hold most data types
??????????????
Often possible to combine two data registers
????????????????
C programming
double a
long int a

12
Condition Code Registers

Condition codes are bits set by the CPU hardware
as the result of operations.
Sets of individual bits ??????
e.g. result of last operation was zero
At least partially visible to the user
?????????
Can be read (implicitly) by programs ??????
e.g. Jump if zero
Can not (usually) be set by programs
???????????

13
Control Status Registers

Program Counter ?????(PC)
Instruction Register ?????(IR)
Memory Address Register ???????MAR
Memory Buffer Register ???????MBR
Revision what do these all do?

14
Program Status Word ?????PSW

A set of bits,Includes Condition Codes ?????
Sign of last result ?????????????
Zero ????????????
Carry ????????????
Equal ??????????????
Overflow ????????????
Interrupt enable/disable ????/??
Supervisor ????CPU?????????
????????

15
Program Status Word - Example

Motorola 68000s PSW
System Byte User Byte
Interrupt Mask
Supervisor Status
Trace Mode

15 14 13 12 11 10 9 8 7 6
5 4 3 2 1 0
T S I2 I1 I0
X N Z V C
16
Other Registers

May have registers pointing to
Process control blocks (see O/S)
?????(PCB)
Interrupt Vectors (see O/S)
????
N.B. CPU design and operating system design are
closely linked
CPU?????????????

17
Example Register Organizations
18
Instruction Cycle
An Instruction cycle includes the following
subcycles ???????????
19
Indirect Addressing Cycle

May require memory access to fetch operands
?????????????????
Indirect addressing requires more memory accesses
??????????????
Can be thought of as additional instruction
subcycle
???????????????

20
Instruction Cycle with Indirect
21
Instruction Cycle State Diagram
22
Data Flow (Instruction Fetch)

Depends on CPU design
??????,??????????CPU???
In general
Fetch ?????
PC contains address of next instruction
??PC????????????
Address moved to MAR ??????MAR
Address placed on address bus ????????
Control unit requests memory read ?????????
Result placed on data bus, copied to MBR, then to
IR
?????????????MBR,?????IR
Meanwhile PC incremented by 1 ??PC?1

23
Data Flow (Fetch Diagram)
2
3
1
6
4
5
24
Data Flow (Indirect Cycle)

IR is examined ??????????IR???
If indirect addressing, indirect cycle is
performed
??????????????,?????????
Right most N bits of MBR transferred to MAR
MBR???N????????,????MAR
Control unit requests memory read
???????????
Result (address of operand) moved to MBR
??????????????MBR
op-code address
instruction format

25
Data Flow (Indirect Diagram)
2
1
3
26
Data Flow (Execute Cycle)

May take many forms ??????????
Depends on instruction being executed
??????????
May include
Memory read/write ?????
Input/Output I/O?????
Register transfers ????????
ALU operations ALU??

27
Data Flow (Interrupt Cycle)

Current PC saved to allow resumption after
interrupt
PC??????????,???????CPU????????
Contents of PC copied to MBRPC??????MBR
Special memory location (e.g. stack pointer)
loaded to MAR ????????????????MAR
MBR written to memory ?MBR????????
PC loaded with address of interrupt handling
routine
??????????PC
Next instruction (first of interrupt handler) can
be fetched
??????????????????

28
Data Flow (Interrupt Diagram)
2
3
5
1
4
29
Pipelining ????

Laundry Example
Ann, Brian, Cathy, Dave each have one load of
clothes to wash, dry, and fold
??4???????????
Washer takes 30 minutes
??30??
Dryer takes 40 minutes ?40?
Folder takes 20 minutes ?20?

30
Sequential Laundry
6 PM
Midnight
7
8
9
11
10
Time
30
40
20
30
40
20
30
40
20
30
40
20
T a s k O r d e r

Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would
laundry take?

31
Pipelined Laundry
6 PM
Midnight
7
8
9
11
10
Time
T a s k O r d e r

Pipelined laundry takes 3.5 hours for 4 loads

32
Pipelining Lessons(1)

Pipelining doesnt help latency of single task,
it helps throughput of entire workload
Pipeline rate limited by slowest pipeline stage
Multiple tasks operating simultaneously
Potential speedup Number pipe stages
Unbalanced lengths of pipe stages reduces speedup
Time to fill pipeline and time to drain it
reduces speedup

6 PM
7
8
9
Time
T a s k O r d e r
33
Pipelining Lessons(2)

??????????????,
???????????????
?????????????????
????????????
????? ?????
??????????????
??????????????????????

34
Instruction Pipelining

Similar to assembly line in manufacturing plants
Products at various stages can be worked on
simultaneously
? Performance improved
??????,??????????????????????,????????
First attempt 2 stages ???????2?
Fetch ???
Execution ??

35
Prefetch

Fetch accessing main memory ???????
Execution, usually there are times, does not
access main memory
???????????????
Can fetch next instruction during execution of
current instruction
????????????????
Called instruction prefetch ??????
Ideally instruction cycle time would be halved
(if durationF durationE )
????????????

36
Improved Performance(1)

But not doubled ??????????
Fetch usually shorter than execution
???????????
Conditional branch makes next address
unknown???????????????????????
???????????????????

37
Two Stage Instruction Pipeline
38
Improved Performance (2)

Reduce time loss due to branching by guessing
??????????????????
Prefetch instruction after branching instruction
??????????????????
If not branched
??????
use the prefetched instruction. ??????
else
????
discard the prefetched instruction ??????
fetch new instruction
??????

39
Pipelining

Add more stages to improve performance
????????????????????

Instruction Cycle State Diagram
40
Pipelining

More stages ? more speedup
FI Fetch instruction ???
DI Decode instruction ????
CO Calculate operands ???????
FO Fetch operands ????
EI Execute instructions ????
WO Write result ???
Various stages are of nearly equal duration
?????????
Overlap these operations ?????????

41
Timing of Pipeline
42
Speedup of Pipelining (1)

9 instructions 6 stages
w/o pipelining __ time units
w/ pipelining __ time units
speedup _____
Q 100 instructions 6 stages, speedup ____
Q ? instructions k stages, speedup ____
Can you prove it (formally)?

43
Pipelining - Discussion

Assume all stages are needed in one instruction
e.g., LOAD WO not needed
????????????????,???LOAD???????WO??
Assume all stages can be performed in parallel
e.g., FI, FO, and WO ? memory conflicts
???????????,???FI?FO?WO???
Timing is set up assuming all stages are needed
by each instruction
? Simplify pipeline hardware
??????????,???????????????????????
Assuming no conditional branch instructions
??????????

44
Limitation by Branching

Conditional branch instructions can invalidate
several instruction prefetches ????????????
In our example (see next slide)
Instruction 3 is a conditional branch to
instruction 15
Next instructions address wont be known till
instruction 3 is executed (at time unit 7)
??3??????????????
pipeline must be cleared
No instruction is finished from time units 9 to
12
performance penalty
???9-12????????,??????

45
Branch in a Pipeline
46
(No Transcript)
47
Limitation by Data Dependencies

Data needed by current instruction may depend on
a previous instruction that is still in pipeline
??????????????????????????
E.g., A ? B C
D ? A E

48
Limitation by stage overhead

Ideally, more stages, more speedup
?????,??????,?????
However,
more overhead in moving data between buffers
???????????????
more overhead in preparation and delivery
functions
??????????????
more complex circuit for pipeline hardware
???????????

49
Pipeline Performance

Cycle time ??max?id ?md ????
?m maximum stage delay ????? knumber of
stages ????? dtime delay of a latch
????

50
Pipeline Performance
Memory Access
Write Back
Instruction Fetch
Instr. Decode Reg. Fetch
Execute Addr. Calc
Next PC
MUX
Next SEQ PC
Next SEQ PC
Zero?
RS1
Reg File
MUX
Memory
RS2
Data Memory
MUX
MUX
Sign Extend
WB Data
Imm
RD
RD
RD
51
Pipeline Performance

Time to execute n instructions without
pipelining T1 nk? ?????n??????
Execute n instruction time with pipelining
Tkk(n-1)? ????n??????
Speedup ???SkT1/Tknk? /k(n-1)? nk
/k(n-1)

52
Pipeline Performance

Speedup of k-stage pipelining compared to without
pipelining
Q ? instructions k stages, speedup ____

53
(No Transcript)
54
Example

???????????????????????????4????.???4????????,????
???????t,4?????????????

???
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
55
Example

????T1 nk? 44 ?t 16?t Tkk(n-1)?
4(4-1) ?t 7 ?t SkT1/Tk16/7
?????????????T,??????,???????,?? T/4
???????????,??????4?

56
???????

??????????????????????
??????????????????????????
?????
?????

57
???????

??????????????????,????????????????????????t
,???
?????????????????????,??????????,??????????,??,??
?????????????

58
???????

??????????????????????????????????????????????????
?????,?

59
???????

?????????????

1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
t
1 2 3 4 5 6 7

60
???????

?????
?????
??

61
???????
??
??
??
??
t
t
3t
t
1
??
2
3
4
1 2
??
3 4
1 2 3
??
4
1 2 3
??
4
t
T
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15
62
???????

??n???,??t??1, t??2, t??3,n1000,??????????n???
??????
????
?????
???????????

63
???????

??????????
? t1t2t3t4?,???????10???????,??????????
? t1t2, t32t1, t44t1?,???????

??
??
??
??
t1
t2
t3
t4
64
Dealing with Branches

Multiple Streams ????
Prefetch Branch Target ??????
Loop buffer ????
Branch prediction ????
Delayed branching ????

65
Multiple Streams????

Have two pipelines, Prefetch each branch into a
separate pipeline, Use appropriate pipeline
??????????????????
Problems
Leads to register memory contention delays
????????????????
Multiple branches (i.e., additional branch
entering pipelines before original branch
decision made) lead to further pipelines being
needed
?????????????????
Can improve performance, anyway
e.g., IBM 370/168 ??????,???????

66
Prefetch Branch Target??????

Target of branch is prefetched in addition to
instructions following branch
?????????????,
?????????????
Keep target until branch is executed
????????????????
Used by IBM 360/91

67
Loop Buffer (1)????

Small, very fast memory ????????
Maintained by fetch (IF) stage of pipeline
?????????
Contains the n most recently fetched instructions
in sequence ??n??????????
If a branch is to be taken ?????
Hardware checks whether the target is in buffer
If YES then ??????????,??
next instruction is fetched from the buffer
?????????????
else fetch from memory ??,?????

68
Loop Buffer (2)

Reduce memory access time ????????
Very good for small loops or jumps
?????????????
If buffer is big enough to contain entire loop,
instructions in the loop need to be fetched from
memory only once at the first iteration
?????????,???????????,???????????????????????????
??????
Used by CRAY-1

69
Loop Buffer Diagram
70
Branch Prediction (1)????

Predict whether a branch will be taken
??????????????
If the prediction is right ????
? No branch penalty ?????
If the prediction is wrong????
? Empty pipeline ????
Fetch correct instruction ?????
? Branch penalty ????

71
Branch Prediction (2)

Predict techniques
Static
Predict never taken ???????
Predict always taken ??????
Predict by opcode ???????
Dynamic
Taken/not taken switch ??/?????
Branch history table ?????

72
Branch Prediction (3)

Predict never taken
Assume that jump will not happen ???????
Always fetch next instruction ????????
68020 VAX 11/780
VAX will not prefetch after branch if a page
fault would result (O/S v CPU design)
???????????????,????????
Predict always taken
Assume that jump will happen ??????
Always fetch target instruction ????????
More than 50 ?????????50

73
Branch Prediction (4)

Predict by Opcode ??????????????
Some instructions are more likely to result in a
jump than others
???????????????
Decision based on the opcode of the branch
instruction.
???????????
Can get up to 75 success ?????75

74
Branch Prediction (5)

Taken/Not taken switch
One or more bits can be associated with each
conditional branch instruction that reflect the
recent history of the instruction.
?????????????????????.?????????????????????????
???????????????
Good for loops ?????

75
???????????

??1????
?? ??
1 1
1 1
1 1
1 0
0 1
1 1
1 1
1 0
0 1
1 1
1 1
1 0

??2???? ?? ??1 ??2 1 1 1 1 1 1 1 1 1 1 0 1 1
1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1
0 1
For ( . ) xxxxx For ( . ) xxxxx For (
. ) xxxxx
76
Branch Prediction State Diagram

1????

77
Branch Prediction State Diagram

2????

78
Branch Prediction State Diagram
79
Branch Prediction Flowchart

Taken/Not taken switch
Use two bits recording history

80
??

11.5

81
Dealing With Branches

??????????
????????,????????????
??????????????
??????????,????????
????????!
?????????(branch history table)????????
??????????(branch target buffer)
???????????????

82
Dealing With Branches

Branch history table ?????
A small cache memory associated with the
instruction fetch stage of the pipeline.
?????????????????????????????
There are three elements
???????????
The address of a branch instruction ???????
Target instruction / target address ????/??
The state of use the instruction ????????

83
Predict Never Taken Strategy

?11.17??????????????????????
??????????,??????????????????????????????????????,
?????????????(????????)?

84
Dealing With Branches
85
Branch History Table Strategy

??????????????????????????????????????????????????
?,??????????????????????,????????????????????????
?,???????????????
????????,?????????????????,???????????????????????
???,????????????????????????????????,????????,????
????????

86
Dealing With Branches
87
Delayed Branch ????

Rearrange instructions
??????????
Do not take jump until you have to
??????????????????
????????,?????????????????????,????????,?????????
????????????????????
?????????????????????

88
Normal and Delayed Branch

Address Normal Delayed Optimized
100 LOAD X,A LOAD X,A LOAD X,A
101 ADD 1,A ADD 1,A JUMP 105
102 JUMP 105 JUMP 106 ADD 1,A
103 ADD A,B NOOP ADD A,B
104 SUB C,B ADD A,B SUB C,B
105 STORE A,Z SUB C,B STORE A,Z
106 STORE A,Z

89
Use of Delayed Branch
90
Use of Delayed Branch
91
Use of Delayed Branch
1 2 3 4 5
6 7
92
Use of Delayed Branch
1 2 3 4 5
6 7
93
Scheduling the Branch-delay Slot

ADD R1,R2,R3
IF R20 THEN
Delay slot

IF R20 THEN
ADD R1,R2,R3

Becomes
94
Scheduling the Branch-delay Slot

SUB R4,R5,R6
ADD R1,R2,R3
IF R10 THEN
Delay slot

SUB R4,R5,R6
ADD R1,R2,R3
IF R20 THEN
SUB R4,R5,R6

Becomes
95
Scheduling the Branch-delay Slot

ADD R1,R2,R3
IF R10 THEN
Delay slot
SUB R4,R5,R6

ADD R1,R2,R3
IF R10 THEN
SUB R4,R5,R6

Becomes
96
Intel 80486 Pipelining(1)

Fetch
From cache or external memory
?Cache?????????
Put in one of two 16-byte prefetch buffers
????16????????????
Fill buffer with new data as soon as old data
consumed
?????????????????????????
Average 5 instructions fetched per load
?????5???
Independent of other stages to keep buffers full
??????????????????????

97
Intel 80486 Pipelining(2)

Decode stage 1
Opcode address-mode info
??????????
At most first 3 bytes of instruction
????????????????????3??
Can direct D2 stage to get rest of instruction
??D2?????????
Decode stage 2
Expand opcode into control signals
??????????ALU?????
Computation of complex address modes
????????????

98
Intel 80486 Pipelining(3)

Execute ??
ALU operations, cache access, register update
??ALU??? cache????????
Writeback ??
Update registers flags
?????????????????????
Results sent to cache bus interface write
buffers
????????cache? ??/??? ?????

99
80486 Instruction Pipeline
100
Pentium 4 Registers
101
EFLAGS Register
102
Control Registers
103
MMX Register Mapping

MMX uses several 64 bit data types
MMX???????64?????
Use 3 bit register address fields
??3????????
8 registers ??8?MMX??????
No MMX specific registers ?????MMX???
Aliasing to lower 64 bits of existing floating
point registers
????????64?(??)????8?MMX???

104
MMX Register Mapping Diagram
105
Pentium Interrupt Processing

Interrupts generated by hardware at random times
?????????
???????????????
Maskable ?????
Nonmaskable ??????
Exceptions generated from software and provoked
by the execution of an instruction
??????????????
Processor detected ???????
Programmed ????

106
Pentium Interrupt Processing

Interrupt vector table ?????
Each interrupt type assigned a number
??????????????
Index to vector table ?????????
256 32 bit interrupt vectors?????32256
5 priority classes
???????????5??

107
Pentium Interrupt Processing(1)

Interrupt Handling ????
the current stack segment register and the
current extended stack pointer(ESP) register are
pushed onto the stack
???????????,?????????????????????????????
2. EFLAGS register is pushed on to stack
EFLAGS????????????

108
Pentium Interrupt Processing(2)

3. Interrupt(IF) and trap(TF) flags are cleared
????????????
4. CS pointer and IP are pushed
???????????????????????
5. Error code is pushed
??????????.???????????
6. The interrupt vector contents are fetched and
loaded into the CS and IP or EIP registers
??????????CS?IP(?EIP)???

Write a Comment

User Comments (0)

About PowerShow.com

William Stallings Computer Organization and Architecture 5th Edition - PowerPoint PPT Presentation

William Stallings Computer Organization and Architecture 5th Edition

William Stallings Computer Organization and Architecture 5th Edition Chapter 11 CPU Structure and Function CPU Topics Processor Organization ... – PowerPoint PPT presentation