William Stallings Computer Organization and Architecture 5th Edition - PowerPoint PPT Presentation

1 / 108
About This Presentation
Title:

William Stallings Computer Organization and Architecture 5th Edition

Description:

William Stallings Computer Organization and Architecture 5th Edition Chapter 11 CPU Structure and Function CPU Topics Processor Organization ... – PowerPoint PPT presentation

Number of Views:789
Avg rating:3.0/5.0
Slides: 109
Provided by: Adr453
Category:

less

Transcript and Presenter's Notes

Title: William Stallings Computer Organization and Architecture 5th Edition


1
William Stallings Computer Organization and
Architecture5th Edition
  • Chapter 11
  • CPU Structure and Function
  • CPU??????

2
Topics
  • Processor Organization
  • Register Organization
  • Instruction Cycle
  • Instruction Pipelining
  • The Pentium Processor

3
CPU Structure
  • CPU must CPU???????
  • Fetch instructions ??????????
  • Interpret instructions ?????????
  • Fetch data ????????
  • Process data ???????
  • Write data ????????????
  • CPU??????????????????

CPU needs a small internal memory
4
CPU With Systems Bus
5
CPU Internal Structure
6
Registers (???)
  • CPU must have some working space (temporary
    storage) CPU???????????????
  • Called registers ?????????
  • Number and function vary between processor
    designs ??????????????????
  • One of the major design decisions
  • ??????CPU??????????
  • Top level of memory hierarchy ????????????
  • Two categories ????
  • User-visible registers ???????
  • Control and status registers ????????

7
User Visible Registers ???????
  • General Purpose ?? ???
  • Data ?? ???
  • Address ?? ???
  • Condition Codes ???? ???

8
User Visible Registers
  • General Purpose Registers
  • May be true general purpose ???????
  • May be restricted ????????
  • Data registers
  • Accumulator register ?????
  • Addressing registers
  • Segment pointers ????
  • Index registers ?????
  • Stack Pointer ?????

9
General or Special? ??
  • Make them general purpose
  • Increase flexibility and programmer options
  • ???????????????
  • Increase instruction size complexity
  • ????????????
  • Make them specialized
  • Smaller (faster) instructions ??????
  • Less flexibility ?????
  • The trend seems to be toward the use of
    specialized registers.
  • ??????????

10
How Many GP Registers? ??
  • Between 8 32 ??8-32?
  • Fewer more memory references
  • ???????,?????????
  • More does not reduce memory references
  • ???????????????????

11
How big? ??????
  • Large enough to hold the largest address
  • ??????????
  • Large enough to hold most data types
  • ??????????????
  • Often possible to combine two data registers
  • ????????????????
  • C programming
  • double a
  • long int a

12
Condition Code Registers
  • Condition codes are bits set by the CPU hardware
    as the result of operations.
  • Sets of individual bits ??????
  • e.g. result of last operation was zero
  • At least partially visible to the user
  • ?????????
  • Can be read (implicitly) by programs ??????
  • e.g. Jump if zero
  • Can not (usually) be set by programs
  • ???????????

13
Control Status Registers
  • Program Counter ?????(PC)
  • Instruction Register ?????(IR)
  • Memory Address Register ???????MAR
  • Memory Buffer Register ???????MBR
  • Revision what do these all do?

14
Program Status Word ?????PSW
  • A set of bits,Includes Condition Codes ?????
  • Sign of last result ?????????????
  • Zero ????????????
  • Carry ????????????
  • Equal ??????????????
  • Overflow ????????????
  • Interrupt enable/disable ????/??
  • Supervisor ????CPU?????????
  • ????????

15
Program Status Word - Example
  • Motorola 68000s PSW
  • System Byte User Byte
  • Interrupt Mask
  • Supervisor Status
  • Trace Mode

15 14 13 12 11 10 9 8 7 6
5 4 3 2 1 0
T S I2 I1 I0
X N Z V C
16
Other Registers
  • May have registers pointing to
  • Process control blocks (see O/S)
  • ?????(PCB)
  • Interrupt Vectors (see O/S)
  • ????
  • N.B. CPU design and operating system design are
    closely linked
  • CPU?????????????

17
Example Register Organizations
18
Instruction Cycle
An Instruction cycle includes the following
subcycles ???????????
19
Indirect Addressing Cycle
  • May require memory access to fetch operands
  • ?????????????????
  • Indirect addressing requires more memory accesses
  • ??????????????
  • Can be thought of as additional instruction
    subcycle
  • ???????????????

20
Instruction Cycle with Indirect
21
Instruction Cycle State Diagram
22
Data Flow (Instruction Fetch)
  • Depends on CPU design
  • ??????,??????????CPU???
  • In general
  • Fetch ?????
  • PC contains address of next instruction
  • ??PC????????????
  • Address moved to MAR ??????MAR
  • Address placed on address bus ????????
  • Control unit requests memory read ?????????
  • Result placed on data bus, copied to MBR, then to
    IR
  • ?????????????MBR,?????IR
  • Meanwhile PC incremented by 1 ??PC?1

23
Data Flow (Fetch Diagram)
2
3
1
6
4
5
24
Data Flow (Indirect Cycle)
  • IR is examined ??????????IR???
  • If indirect addressing, indirect cycle is
    performed
  • ??????????????,?????????
  • Right most N bits of MBR transferred to MAR
  • MBR???N????????,????MAR
  • Control unit requests memory read
  • ???????????
  • Result (address of operand) moved to MBR
  • ??????????????MBR
  • op-code address
  • instruction format

25
Data Flow (Indirect Diagram)
2
1
3
26
Data Flow (Execute Cycle)
  • May take many forms ??????????
  • Depends on instruction being executed
  • ??????????
  • May include
  • Memory read/write ?????
  • Input/Output I/O?????
  • Register transfers ????????
  • ALU operations ALU??

27
Data Flow (Interrupt Cycle)
  • Current PC saved to allow resumption after
    interrupt
  • PC??????????,???????CPU????????
  • Contents of PC copied to MBRPC??????MBR
  • Special memory location (e.g. stack pointer)
    loaded to MAR ????????????????MAR
  • MBR written to memory ?MBR????????
  • PC loaded with address of interrupt handling
    routine
  • ??????????PC
  • Next instruction (first of interrupt handler) can
    be fetched
  • ??????????????????

28
Data Flow (Interrupt Diagram)
2
3
5
1
4
29
Pipelining ????
  • Laundry Example
  • Ann, Brian, Cathy, Dave each have one load of
    clothes to wash, dry, and fold
  • ??4???????????
  • Washer takes 30 minutes
  • ??30??
  • Dryer takes 40 minutes ?40?
  • Folder takes 20 minutes ?20?

30
Sequential Laundry
6 PM
Midnight
7
8
9
11
10
Time
30
40
20
30
40
20
30
40
20
30
40
20
T a s k O r d e r
  • Sequential laundry takes 6 hours for 4 loads
  • If they learned pipelining, how long would
    laundry take?

31
Pipelined Laundry
6 PM
Midnight
7
8
9
11
10
Time
T a s k O r d e r
  • Pipelined laundry takes 3.5 hours for 4 loads

32
Pipelining Lessons(1)
  • Pipelining doesnt help latency of single task,
    it helps throughput of entire workload
  • Pipeline rate limited by slowest pipeline stage
  • Multiple tasks operating simultaneously
  • Potential speedup Number pipe stages
  • Unbalanced lengths of pipe stages reduces speedup
  • Time to fill pipeline and time to drain it
    reduces speedup

6 PM
7
8
9
Time
T a s k O r d e r
33
Pipelining Lessons(2)
  • ??????????????,
  • ???????????????
  • ?????????????????
  • ????????????
  • ????? ?????
  • ??????????????
  • ??????????????????????

34
Instruction Pipelining
  • Similar to assembly line in manufacturing plants
  • Products at various stages can be worked on
    simultaneously
  • ? Performance improved
  • ??????,??????????????????????,????????
  • First attempt 2 stages ???????2?
  • Fetch ???
  • Execution ??

35
Prefetch
  • Fetch accessing main memory ???????
  • Execution, usually there are times, does not
    access main memory
  • ???????????????
  • Can fetch next instruction during execution of
    current instruction
  • ????????????????
  • Called instruction prefetch ??????
  • Ideally instruction cycle time would be halved
  • (if durationF durationE )
  • ????????????

36
Improved Performance(1)
  • But not doubled ??????????
  • Fetch usually shorter than execution
  • ???????????
  • Conditional branch makes next address
    unknown???????????????????????
    ???????????????????

37
Two Stage Instruction Pipeline
38
Improved Performance (2)
  • Reduce time loss due to branching by guessing
  • ??????????????????
  • Prefetch instruction after branching instruction
  • ??????????????????
  • If not branched
    ??????
  • use the prefetched instruction. ??????
  • else
    ????
  • discard the prefetched instruction ??????
  • fetch new instruction
    ??????

39
Pipelining
  • Add more stages to improve performance
  • ????????????????????

Instruction Cycle State Diagram
40
Pipelining
  • More stages ? more speedup
  • FI Fetch instruction ???
  • DI Decode instruction ????
  • CO Calculate operands ???????
  • FO Fetch operands ????
  • EI Execute instructions ????
  • WO Write result ???
  • Various stages are of nearly equal duration
  • ?????????
  • Overlap these operations ?????????

41
Timing of Pipeline
42
Speedup of Pipelining (1)
  • 9 instructions 6 stages
  • w/o pipelining __ time units
  • w/ pipelining __ time units
  • speedup _____
  • Q 100 instructions 6 stages, speedup ____
  • Q ? instructions k stages, speedup ____
  • Can you prove it (formally)?

43
Pipelining - Discussion
  • Assume all stages are needed in one instruction
  • e.g., LOAD WO not needed
  • ????????????????,???LOAD???????WO??
  • Assume all stages can be performed in parallel
  • e.g., FI, FO, and WO ? memory conflicts
  • ???????????,???FI?FO?WO???
  • Timing is set up assuming all stages are needed
    by each instruction
  • ? Simplify pipeline hardware
  • ??????????,???????????????????????
  • Assuming no conditional branch instructions
  • ??????????

44
Limitation by Branching
  • Conditional branch instructions can invalidate
    several instruction prefetches ????????????
  • In our example (see next slide)
  • Instruction 3 is a conditional branch to
    instruction 15
  • Next instructions address wont be known till
    instruction 3 is executed (at time unit 7)
  • ??3??????????????
  • pipeline must be cleared
  • No instruction is finished from time units 9 to
    12
  • performance penalty
  • ???9-12????????,??????

45
Branch in a Pipeline
46
(No Transcript)
47
Limitation by Data Dependencies
  • Data needed by current instruction may depend on
    a previous instruction that is still in pipeline
  • ??????????????????????????
  • E.g., A ? B C
  • D ? A E

48
Limitation by stage overhead
  • Ideally, more stages, more speedup
  • ?????,??????,?????
  • However,
  • more overhead in moving data between buffers
  • ???????????????
  • more overhead in preparation and delivery
    functions
  • ??????????????
  • more complex circuit for pipeline hardware
  • ???????????

49
Pipeline Performance
  • Cycle time ??max?id ?md ????
    ?m maximum stage delay ????? knumber of
    stages ????? dtime delay of a latch
    ????

50
Pipeline Performance
Memory Access
Write Back
Instruction Fetch
Instr. Decode Reg. Fetch
Execute Addr. Calc
Next PC
MUX
Next SEQ PC
Next SEQ PC
Zero?
RS1
Reg File
MUX
Memory
RS2
Data Memory
MUX
MUX
Sign Extend
WB Data
Imm
RD
RD
RD
51
Pipeline Performance
  • Time to execute n instructions without
    pipelining T1 nk? ?????n??????
  • Execute n instruction time with pipelining
    Tkk(n-1)? ????n??????
  • Speedup ???SkT1/Tknk? /k(n-1)? nk
    /k(n-1)

52
Pipeline Performance
  • Speedup of k-stage pipelining compared to without
    pipelining
  • Q ? instructions k stages, speedup ____

53
(No Transcript)
54
Example
  • ???????????????????????????4????.???4????????,????
    ???????t,4?????????????

???
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
55
Example
  • ????T1 nk? 44 ?t 16?t Tkk(n-1)?
    4(4-1) ?t 7 ?t SkT1/Tk16/7
  • ?????????????T,??????,???????,?? T/4
    ???????????,??????4?

56
???????
  • ??????????????????????
  • ??????????????????????????
  • ?????
  • ?????

57
???????
  • ??????????????????,????????????????????????t
    ,???
  • ?????????????????????,??????????,??????????,??,??
    ?????????????

58
???????
  • ??????????????????????????????????????????????????
    ?????,?

59
???????
  • ?????????????

1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
t
1 2 3 4 5 6 7

60
???????
  • ?????
  • ?????
  • ??

61
???????
??
??
??
??
t
t
3t
t
1
??
2
3
4
1 2
??
3 4
1 2 3
??
4
1 2 3
??
4
t
T
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15
62
???????
  • ??n???,??t??1, t??2, t??3,n1000,??????????n???
    ??????
  • ????
  • ?????
  • ???????????

63
???????
  • ??????????
  • ? t1t2t3t4?,???????10???????,??????????
  • ? t1t2, t32t1, t44t1?,???????

??
??
??
??
t1
t2
t3
t4
64
Dealing with Branches
  • Multiple Streams ????
  • Prefetch Branch Target ??????
  • Loop buffer ????
  • Branch prediction ????
  • Delayed branching ????

65
Multiple Streams????
  • Have two pipelines, Prefetch each branch into a
    separate pipeline, Use appropriate pipeline
  • ??????????????????
  • Problems
  • Leads to register memory contention delays
  • ????????????????
  • Multiple branches (i.e., additional branch
    entering pipelines before original branch
    decision made) lead to further pipelines being
    needed
  • ?????????????????
  • Can improve performance, anyway
  • e.g., IBM 370/168 ??????,???????

66
Prefetch Branch Target??????
  • Target of branch is prefetched in addition to
    instructions following branch
  • ?????????????,
  • ?????????????
  • Keep target until branch is executed
  • ????????????????
  • Used by IBM 360/91

67
Loop Buffer (1)????
  • Small, very fast memory ????????
  • Maintained by fetch (IF) stage of pipeline
  • ?????????
  • Contains the n most recently fetched instructions
    in sequence ??n??????????
  • If a branch is to be taken ?????
  • Hardware checks whether the target is in buffer
  • If YES then ??????????,??
  • next instruction is fetched from the buffer
  • ?????????????
  • else fetch from memory ??,?????

68
Loop Buffer (2)
  • Reduce memory access time ????????
  • Very good for small loops or jumps
  • ?????????????
  • If buffer is big enough to contain entire loop,
    instructions in the loop need to be fetched from
    memory only once at the first iteration
  • ?????????,???????????,???????????????????????????
    ??????
  • Used by CRAY-1

69
Loop Buffer Diagram
70
Branch Prediction (1)????
  • Predict whether a branch will be taken
  • ??????????????
  • If the prediction is right ????
  • ? No branch penalty ?????
  • If the prediction is wrong????
  • ? Empty pipeline ????
  • Fetch correct instruction ?????
  • ? Branch penalty ????

71
Branch Prediction (2)
  • Predict techniques
  • Static
  • Predict never taken ???????
  • Predict always taken ??????
  • Predict by opcode ???????
  • Dynamic
  • Taken/not taken switch ??/?????
  • Branch history table ?????

72
Branch Prediction (3)
  • Predict never taken
  • Assume that jump will not happen ???????
  • Always fetch next instruction ????????
  • 68020 VAX 11/780
  • VAX will not prefetch after branch if a page
    fault would result (O/S v CPU design)
  • ???????????????,????????
  • Predict always taken
  • Assume that jump will happen ??????
  • Always fetch target instruction ????????
  • More than 50 ?????????50

73
Branch Prediction (4)
  • Predict by Opcode ??????????????
  • Some instructions are more likely to result in a
    jump than others
  • ???????????????
  • Decision based on the opcode of the branch
    instruction.
  • ???????????
  • Can get up to 75 success ?????75

74
Branch Prediction (5)
  • Taken/Not taken switch
  • One or more bits can be associated with each
    conditional branch instruction that reflect the
    recent history of the instruction.
  • ?????????????????????.?????????????????????????
    ???????????????
  • Good for loops ?????

75
???????????
  • ??1????
  • ?? ??
  • 1 1
  • 1 1
  • 1 1
  • 1 0
  • 0 1
  • 1 1
  • 1 1
  • 1 0
  • 0 1
  • 1 1
  • 1 1
  • 1 0

??2???? ?? ??1 ??2 1 1 1 1 1 1 1 1 1 1 0 1 1
1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1
0 1
For ( . ) xxxxx For ( . ) xxxxx For (
. ) xxxxx
76
Branch Prediction State Diagram
  • 1????

77
Branch Prediction State Diagram
  • 2????

78
Branch Prediction State Diagram
79
Branch Prediction Flowchart
  • Taken/Not taken switch
  • Use two bits recording history

80
??
  • 11.5

81
Dealing With Branches
  • ??????????
  • ????????,????????????
  • ??????????????
  • ??????????,????????
  • ????????!
  • ?????????(branch history table)????????
  • ??????????(branch target buffer)
  • ???????????????

82
Dealing With Branches
  • Branch history table ?????
  • A small cache memory associated with the
    instruction fetch stage of the pipeline.
  • ?????????????????????????????
  • There are three elements
  • ???????????
  • The address of a branch instruction ???????
  • Target instruction / target address ????/??
  • The state of use the instruction ????????

83
Predict Never Taken Strategy
  • ?11.17??????????????????????
  • ??????????,??????????????????????????????????????,
    ?????????????(????????)?

84
Dealing With Branches
85
Branch History Table Strategy
  • ??????????????????????????????????????????????????
    ?,??????????????????????,????????????????????????
    ?,???????????????
  • ????????,?????????????????,???????????????????????
    ???,????????????????????????????????,????????,????
    ????????

86
Dealing With Branches
87
Delayed Branch ????
  • Rearrange instructions
  • ??????????
  • Do not take jump until you have to
  • ??????????????????
  • ????????,?????????????????????,????????,?????????
    ????????????????????
  • ?????????????????????

88
Normal and Delayed Branch
  • Address Normal Delayed Optimized
  • 100 LOAD X,A LOAD X,A LOAD X,A
  • 101 ADD 1,A ADD 1,A JUMP 105
  • 102 JUMP 105 JUMP 106 ADD 1,A
  • 103 ADD A,B NOOP ADD A,B
  • 104 SUB C,B ADD A,B SUB C,B
  • 105 STORE A,Z SUB C,B STORE A,Z
  • 106 STORE A,Z

89
Use of Delayed Branch
90
Use of Delayed Branch
91
Use of Delayed Branch
1 2 3 4 5
6 7
92
Use of Delayed Branch
1 2 3 4 5
6 7
93
Scheduling the Branch-delay Slot
  • ADD R1,R2,R3
  • IF R20 THEN
  • Delay slot
  • IF R20 THEN
  • ADD R1,R2,R3

Becomes
94
Scheduling the Branch-delay Slot
  • SUB R4,R5,R6
  • ADD R1,R2,R3
  • IF R10 THEN
  • Delay slot
  • SUB R4,R5,R6
  • ADD R1,R2,R3
  • IF R20 THEN
  • SUB R4,R5,R6

Becomes
95
Scheduling the Branch-delay Slot
  • ADD R1,R2,R3
  • IF R10 THEN
  • Delay slot
  • SUB R4,R5,R6
  • ADD R1,R2,R3
  • IF R10 THEN
  • SUB R4,R5,R6

Becomes
96
Intel 80486 Pipelining(1)
  • Fetch
  • From cache or external memory
  • ?Cache?????????
  • Put in one of two 16-byte prefetch buffers
  • ????16????????????
  • Fill buffer with new data as soon as old data
    consumed
  • ?????????????????????????
  • Average 5 instructions fetched per load
  • ?????5???
  • Independent of other stages to keep buffers full
  • ??????????????????????

97
Intel 80486 Pipelining(2)
  • Decode stage 1
  • Opcode address-mode info
  • ??????????
  • At most first 3 bytes of instruction
  • ????????????????????3??
  • Can direct D2 stage to get rest of instruction
  • ??D2?????????
  • Decode stage 2
  • Expand opcode into control signals
  • ??????????ALU?????
  • Computation of complex address modes
  • ????????????

98
Intel 80486 Pipelining(3)
  • Execute ??
  • ALU operations, cache access, register update
  • ??ALU??? cache????????
  • Writeback ??
  • Update registers flags
  • ?????????????????????
  • Results sent to cache bus interface write
    buffers
  • ????????cache? ??/??? ?????

99
80486 Instruction Pipeline
100
Pentium 4 Registers
101
EFLAGS Register
102
Control Registers
103
MMX Register Mapping
  • MMX uses several 64 bit data types
  • MMX???????64?????
  • Use 3 bit register address fields
  • ??3????????
  • 8 registers ??8?MMX??????
  • No MMX specific registers ?????MMX???
  • Aliasing to lower 64 bits of existing floating
    point registers
  • ????????64?(??)????8?MMX???

104
MMX Register Mapping Diagram
105
Pentium Interrupt Processing
  • Interrupts generated by hardware at random times
    ?????????
  • ???????????????
  • Maskable ?????
  • Nonmaskable ??????
  • Exceptions generated from software and provoked
    by the execution of an instruction
  • ??????????????
  • Processor detected ???????
  • Programmed ????

106
Pentium Interrupt Processing
  • Interrupt vector table ?????
  • Each interrupt type assigned a number
  • ??????????????
  • Index to vector table ?????????
  • 256 32 bit interrupt vectors?????32256
  • 5 priority classes
  • ???????????5??

107
Pentium Interrupt Processing(1)
  • Interrupt Handling ????
  • the current stack segment register and the
    current extended stack pointer(ESP) register are
    pushed onto the stack
  • ???????????,?????????????????????????????
  • 2. EFLAGS register is pushed on to stack
  • EFLAGS????????????

108
Pentium Interrupt Processing(2)
  • 3. Interrupt(IF) and trap(TF) flags are cleared
  • ????????????
  • 4. CS pointer and IP are pushed
  • ???????????????????????
  • 5. Error code is pushed
  • ??????????.???????????
  • 6. The interrupt vector contents are fetched and
    loaded into the CS and IP or EIP registers
  • ??????????CS?IP(?EIP)???
Write a Comment
User Comments (0)
About PowerShow.com