Lecture 14: Course Review - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 14: Course Review

Description:

Title: PowerPoint Presentation Last modified by: lenovo Created Date: 1/1/1601 12:00:00 AM Document presentation format: (4:3) Other titles – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 205
Provided by: educ5460
Category:

less

Transcript and Presenter's Notes

Title: Lecture 14: Course Review


1
Lecture 14 Course Review
  • Kai Bu
  • kaibu_at_zju.edu.cn
  • http//list.zju.edu.cn/kaibu/comparch2015

2
THANK YOU
3
  • Email LinkedIn Twitter Weibo... Don't hesitate
    to keep in touch)

4
(No Transcript)
5
Lectures 02-03
  • Fundamentals of Computer Design

6
Classes of Parallel Architectures
  • by Michael Flynn
  • according to the parallelism
  • in the instruction and data
  • streams called for by the
  • instructions at the most
  • constrained component of
  • the multiprocessor
  • SISD, SIMD, MISD, MIMD

7
SISD
  • Single instruction stream, single data stream
    uniprocessor
  • Can exploit instruction-level parallelism

8
SIMD
  • Single instruction stream, multiple data stream
  • The same instruction is executed by multiple
    processors using different data streams.
  • Exploits data-level parallelism
  • Data memory for each processor
  • whereas a single instruction memory and control
    processor.

9
MISD
  • Multiple instruction streams, single data stream
  • No commercial multiprocessor of this type yet

10
MIMD
  • Multiple instruction streams, multiple data
    streams
  • Each processor fetches its own instructions and
    operates on its own data.
  • Exploits task-level parallelism

11
Instruction Set Architecture
  • ISA
  • actual programmer-visible instruction set
  • the boundary between software and hardware
  • 7 major dimensions

12
ISA Class
  • Most are general-purpose register architectures
    with operands of either registers or memory
    locations
  • Two popular versions
  • register-memory ISA e.g., 80x86
  • many instructions can access memory
  • load-store ISA e.g., ARM, MIPS
  • only load or store instructions can access
    memory

13
ISA Memory Addressing
  • Byte addressing
  • Aligned address
  • object width s bytes
  • address A
  • aligned if A mod s 0

14
Each misaligned object requires two memory
accesses
15
ISA Addressing Modes
  • Specify the address of a memory object
  • Register, Immediate, Displacement

16
Trends in Cost
  • Cost of an Integrated Circuit
  • wafer for test chopped into dies for
  • packaging

17
Trends in Cost
  • Cost of an Integrated Circuit

percentage of manufactured devices that
survives the testing procedure
18
Trends in Cost
  • Cost of an Integrated Circuit

19
Trends in Cost
  • Cost of an Integrated Circuit

20
Trends in Cost
  • Cost of an Integrated Circuit
  • N process-complexity factor for measuring
    manufacturing difficulty

21
Dependability
  • Two measures of dependability
  • Module reliability
  • Module availability

22
Dependability
  • Two measures of dependability
  • Module reliability
  • continuous service accomplishment from a
    reference initial instant
  • MTTF mean time to failure
  • MTTR mean time to repair
  • MTBF mean time between failures
  • MTBF MTTF MTTR

23
Dependability
  • Two measures of dependability
  • Module reliability
  • FIT failures in time
  • failures per billion hours
  • MTTF of 1,000,000 hours
  • 109/106
  • 1000 FIT

24
Dependability
  • Two measures of dependability
  • Module availability

25
Measuring Performance
  • Execution time
  • the time between the start and the completion of
    an event
  • Throughput
  • the total amount of work done in a given time

26
Measuring Performance
  • Computer X and Computer Y
  • X is n times faster than Y

27
Quantitative Principles
  • Parallelism
  • Locality
  • temporal locality recently accessed items are
    likely to be accessed in the near future
  • spatial locality items whose addresses are near
    one another tend to be referenced close together
    in time

28
Quantitative Principles
  • Amdahls Law

29
Quantitative Principles
  • Amdahls Law two factors
  • 1. Fractionenhanced
  • e.g., 20/60 if 20 seconds out of a 60-second
    program to enhance
  • 2. Speedupenhanced
  • e.g., 5/2 if enhanced to 2 seconds while
    originally 5 seconds

30
(No Transcript)
31
Quantitative Principles
  • The Processor Performance Equation

32
(No Transcript)
33
(No Transcript)
34
Lecture 04
  • Instruction Set Principles

35
ISA Classification
  • Classification Basis
  • the type of internal storage
  • stack
  • accumulator
  • register
  • ISA Classes
  • stack architecture
  • accumulator architecture
  • general-purpose register architecture (GPR)

36
ISA ClassesStack Architecture
  • implicit operands
  • on the Top Of the Stack
  • C A B
  • Push A
  • Push B
  • Add
  • Pop C
  • First operand removed from stack
  • Second op replaced by the result

memory
37
ISA ClassesAccumulator Architecture
  • one implicit operand the accumulator
  • one explicit operand mem location
  • C A B
  • Load A
  • Add B
  • Store C
  • accumulator is both
  • an implicit input operand
  • and a result

memory
38
ISA ClassesGeneral-Purpose Register Arch
  • Only explicit operands
  • registers
  • memory locations
  • Operand access
  • direct memory access
  • loaded into temporary storage first

39
ISA ClassesGeneral-Purpose Register Arch
  • Two Classes
  • register-memory architecture
  • any instruction can access memory
  • load-store architecture
  • only load and store instructions can access
    memory

40
ISA ClassesGeneral-Purpose Register Arch
  • Two Classes
  • register-memory architecture
  • any instruction can access mem
  • C A B
  • Load R1, A
  • Add R3, R1, B
  • Store R3, C

41
ISA ClassesGeneral-Purpose Register Arch
  • Two Classes
  • load-store architecture
  • only load and store instructions
  • can access memory
  • C A B
  • Load R1, A
  • Load R2, B
  • Add R3, R1, R2
  • Store R3, C

42
GPR Classification
  • ALU instruction has 2 or 3 operands?
  • 2 1 resultsource op 1 source op
  • 3 1 result op 2 source op
  • ALU instruction has 0, 1, 2, or 3 operands of
    memory address?

43
Addressing Modes
  • How instructions specify addresses
  • of objects to access
  • Types
  • constant
  • register
  • memory location effective address

44
frequently used
45
(No Transcript)
46
Lectures 05-07
  • Pipelining

47
Pipelining
start executing one instruction before completing
the previous one
48
Pipelined Laundry
3.5 Hours
  • Observations
  • No speed up for individual task
  • e.g., A still takes 30402090
  • But speed up for average task execution time
  • e.g., 3.560/452.5 lt 30402090

Time
30 40 40
40 40 20
A
Task Order
B
C
D
49
MIPS Instruction
  • at most 5 clock cycles per instruction
  • IF ID EX MEM WB

50
MIPS Instruction
IF ID EX MEM WB
IR ? MemPC NPC ? PC 4
51
MIPS Instruction
IF ID EX MEM WB
A ? Regsrs B ? Regsrt Imm ? sign-extended
immediate field of IR
(lower 16 bits)
52
MIPS Instruction
IF ID EX MEM WB
ALUOutput ? A Imm ALUOutput ? A func
B ALUOutput ? A op Imm ALUOutput ? NPC
(Immltlt2) Cond ? (A 0)
53
MIPS Instruction
IF ID EX MEM WB
LMD ? MemALUOutput MemALUOutput ? B if
(cond) PC ? ALUOutput
54
MIPS Instruction
IF ID EX MEM WB
Regsrd ? ALUOutput Regsrt ?
ALUOutput Regsrt ? LMD
55
MIPS Instruction Demo
  • Prof. Gurpur Prabhu, Iowa State Univ
    http//www.cs.iastate.edu/prabhu/Tutorial/PIPELIN
    E/DLXimplem.html
  • Load, Store
  • Register-register ALU
  • Register-immediate ALU
  • Branch

56
Load
57
Load
58
Load
59
Load
60
Load
61
Load
62
Store
63
Store
64
Store
65
Store
66
Store
67
Store
68
Register-Register ALU
69
Register-Register ALU
70
Register-Register ALU
71
Register-Register ALU
72
Register-Register ALU
73
Register-Register ALU
74
Register-Immediate ALU
75
Register-Immediate ALU
76
Register-Immediate ALU
77
Register-Immediate ALU
78
Register-Immediate ALU
79
Register-Immediate ALU
80
Branch
81
Branch
82
Branch
83
Branch
84
Branch
85
Branch
86
Structural Hazard
MEM
  • Example
  • 1 mem port
  • mem conflict
  • data access
  • vs
  • instr fetch

Load
Instr i1
Instr i2
IF
Instr i3
87
Structural Hazard
  • Stall Instr i3
  • till CC 5

88
Data Hazard
R1
DADD
R1, R2, R3
DSUB
R4, R1, R5
AND
R6, R1, R7
No hazard 1st half cycle w 2nd half cycle r
OR
R8, R1, R9
XOR
R10, R1, R11
89
Data Hazard
  • Solution forwarding
  • directly feed back EX/MEMMEM/WB
  • pipeline regs results to the ALU inputs
  • if forwarding hw detects that previous ALU has
    written the reg corresponding to a source for the
    current ALU,
  • control logic selects the forwarded result as
    the ALU input.

90
Data Hazard Forwarding
R1
DADD
R1, R2, R3
DSUB
R4, R1, R5
AND
R6, R1, R7
OR
R8, R1, R9
XOR
R10, R1, R11
91
Data Hazard Forwarding
R1
EX/MEM
DADD
R1, R2, R3
DSUB
R4, R1, R5
AND
R6, R1, R7
OR
R8, R1, R9
XOR
R10, R1, R11
92
Data Hazard Forwarding
R1
MEM/WB
DADD
R1, R2, R3
DSUB
R4, R1, R5
AND
R6, R1, R7
OR
R8, R1, R9
XOR
R10, R1, R11
93
Data Hazard Forwarding
  • Generalized forwarding
  • pass a result directly to the functional unit
    that requires it
  • forward results to not only ALU inputs but also
    other types of functional units

94
Data Hazard Forwarding
  • Generalized forwarding

DADD
R1, R2, R3
R1
R1
R4
LD
R4, 0(R1)
R1
SD
R4, 12(R1)
R1
R4
95
Data Hazard
  • Sometimes stall is necessary

MEM/WB
R1
LD
R1, 0(R2)
DSUB
R4, R1, R5
R1
Forwarding cannot be backward.
Has to stall.
96
Branch Hazard
  • Redo IF
  • If the branch is untaken,
  • the stall is unnecessary.

essentially a stall
97
Branch Hazard Solutions
  • 4 simple compile time schemes 1
  • Freeze or flush the pipeline
  • hold or delete any instructions after the branch
    till the branch dst is known
  • i.e., Redo IF w/o the first IF

98
Branch Hazard Solutions
  • 4 simple compile time schemes 2
  • Predicted-untaken
  • simply treat every branch as untaken
  • when the branch is untaken,
  • pipelining as if no hazard.

99
Branch Hazard Solutions
  • 4 simple compile time schemes 2
  • Predicted-untaken
  • but if the branch is taken
  • turn fetched instr into a no-op (idle)
  • restart the IF at the branch target addr

100
Branch Hazard Solutions
  • 4 simple compile time schemes 3
  • Predicted-taken
  • simply treat every branch as taken
  • not apply to the five-stage pipeline
  • apply to scenarios when branch target addr is
    known before branch outcome.

101
Branch Hazard Solutions
  • 4 simple compile time schemes 4
  • Delayed branch
  • delay the branch execution after the next
    instruction
  • pipelining sequence
  • branch instruction
  • sequential successor
  • branch target if taken

Branch delay slot the next instruction
102
Branch Hazard Solutions
  • Delayed branch

103
(No Transcript)
104
Lectures 08-10
  • Memory Hierarchy

105
Memory Hierarchy
106
Cache Performance
107
Cache Performance
  • Memory stall cycles
  • the number of cycles during processor is stalled
    waiting for a mem access
  • Miss rate
  • number of misses over number of accesses
  • Miss penalty
  • the cost per miss (number of extra clock cycles
    to wait)

108
Block Placement
109
Block Identification
  • Block address block offset
  • Block address tag index
  • Index select the set
  • Tag check all blocks in the set
  • Block offset the address of the desired data
    within the block chosen by index tag
  • Fully associative caches have no index field

110
Write Strategy
  • Write-through
  • info is written to both the block in the cache
    and to the block in the lower-level memory
  • Write-back
  • info is written only to the block in the cache
  • to the main memory only when the modified cache
    block is replaced

111
Write Strategy
  • Options on a write miss
  • Write allocate
  • the block is allocated on a write miss
  • No-write allocate
  • write miss not affect the cache
  • the block is modified in the lower-level memory
  • until the program tries to read the block

112
Write Strategy
113
Write Strategy
  • No-write allocate 4 misses 1 hit
  • cache not affected- address 100 not in the
    cache
  • read 200 miss, block replaced, then write
    200 hits
  • Write allocate 2 misses 3 hits

114
Avg Mem Access Time
  • Average memory access time
  • Hit time Miss rate x Miss penalty

115
Opt 4 Multilevel Cache
  • Two-level cache
  • Add another level of cache between the original
    cache and memory
  • L1 small enough to match the clock cycle time of
    the fast processor
  • L2 large enough to capture many accesses that
    would go to main memory, lessening miss penalty

116
Opt 4 Multilevel Cache
  • Average memory access time
  • Hit timeL1 Miss rateL1 x Miss penaltyL1
  • Hit timeL1 Miss rateL1
  • x(Hit timeL2Miss rateL2xMiss penaltyL2)
  • Average mem stalls per instruction
  • Misses per instructionL1 x Hit timeL2
  • Misses per instrL2 x Miss penaltyL2

117
Opt 4 Multilevel Cache
  • Local miss rate
  • the number of misses in a cache
  • divided by the total number of mem accesses to
    this cache
  • Miss rateL1, Miss rateL2
  • Global miss rates
  • the number of misses in the cache
  • divided by the number of mem accesses generated
    by the processor
  • Miss rateL1, Miss rateL1 x Miss rateL2

118
  • Answer
  • 1. various miss rates?
  • L1 local global
  • 40/1000 4
  • L2
  • local 20/40 50
  • global 20/10000 2

119
  • Answer
  • 2. avg mem access time?
  • average memory access time
  • Hit timeL1 Miss rateL1
  • x(Hit timeL2Miss rateL2xMiss penaltyL2)
  • 1 4 x (10 50 x 200)
  • 5.4

120
  • Answer
  • 3. avg stall cycles per instruction?
  • average stall cycles per instruction
  • Misses per instructionL1 x Hit timeL2
  • Misses per instrL2 x Miss penaltyL2
  • (1.5x40/1000)x10(1.5x20/1000)x200
  • 6.6

121
Virtual Memory
122
Virtual Memory
  • Program uses
  • discontiguous memory locations
  • Use secondary/non-memory storage

123
Virtual Memory
  • Program thinks
  • contiguous memory locations
  • larger physical memory

124
Virtual Memory
  • relocation
  • allows the same program to run in any location in
    physical memory

125
Virtual Memory
  • Paged virtual memory
  • page fixed-size block
  • Segmented virtual memory
  • segment variable-size block

126
Virtual Memory
  • Paged virtual memory
  • page address page offset
  • Segmented virtual memory
  • segment address seg offset

127
Address Translation
  • Example Opteron data TLB

Steps 12 send the virtual address to all tags
Step 2 check the type of mem access against
protection info in TLB
128
Address Translation
  • Example Opteron data TLB

Steps 3 the matching tag sends phy addr through
multiplexor
129
Address Translation
  • Example Opteron data TLB

Steps 4 concatenate page offset to phy page
frame to form final phy addr
130
Virtual Memory Caches
131
(No Transcript)
132
Lectures 11-12
  • Storage

133
Disk
  • http//cf.ydcdn.net/1.0.1.19/images/computer/MAGDI
    SK.GIF

134
Disk
  • http//www.cs.uic.edu/jbell/CourseNotes/Operating
    Systems/images/Chapter10/10_01_DiskMechanism.jpg

135
Disk Capacity
  • Areal Density
  • bits/inch2
  • (tracks/inch) x (bits-per-track/inch)

136
Disk Arrays
  • Disk arrays with redundant disks to tolerate
    faults
  • If a single disk fails, the lost information is
    reconstructed from redundant information
  • Striping simply spreading data over multiple
    disks
  • RAID redundant array of inexpensive/independent
    disks

137
RAID
138
RAID 0
  • JBOD just a bunch of disks
  • No redundancy
  • No failure tolerated
  • Measuring stick for other RAID levels in terms of
    cost, performance, and dependability

139
RAID 1
  • Mirroring or Shadowing
  • Two copies for every piece of data
  • one logical write two physical writes
  • 100 capacity/space
  • overhead
  • http//www.petemarovichimages.com/wp-content/uploa
    ds/2013/11/RAID1.jpg

140
  • https//www.icc-usa.com/content/raid-calculator/ra
    id-0-1.png

141
RAID 2
  • http//www.acnc.com/raidedu/2
  • Each bit of data word is written to a data disk
    drive
  • Each data word has its (Hamming Code) ECC word
    recorded on the ECC disks
  • On read, the ECC code verifies correct data or
    corrects single disks errors

142
RAID 3
  • http//www.acnc.com/raidedu/3
  • Data striped over all data disks
  • Parity of a stripe to parity disk
  • Require at least 3 disks to implement

143
RAID 3
  • Even Parity
  • parity bit makes
  • the of 1 even
  • p sum(data1) mod 2
  • Recovery
  • if a disk fails,
  • subtract good data
  • from good blocks
  • what remains is missing data

144
RAID 4
  • http//www.acnc.com/raidedu/4
  • Favor small accesses
  • Allows each disk to perform independent reads,
    using sectors own error checking

145
RAID 5
  • http//www.acnc.com/raidedu/5
  • Distributes the parity info across all disks in
    the array
  • Removes the bottleneck of a single parity disk as
    RAID 3 and RAID 4

146
RAID 6 Row-diagonal Parity
  • RAID-DP
  • Recover from two failures
  • xor
  • row 00112233r4
  • diagonal 011131r1d1

147
Double-Failure Recovery
148
Double-Failure Recovery
149
Double-Failure Recovery
150
Double-Failure Recovery
151
Double-Failure Recovery
152
Double-Failure Recovery
153
Double-Failure Recovery
154
Double-Failure Recovery
155
Double-Failure Recovery
156
RAID Further Readings
  • Raid Types Classifications
  • BytePile.com
  • https//www.icc-usa.com/content/raid-calculator/r
    aid-0-1.png
  • RAID
  • JetStor
  • http//www.acnc.com/raidedu/0

157
Littles Law
  • Assumptions
  • multiple independent I/O requests in
    equilibrium
  • input rate output rate
  • a steady supply of tasks independent for how
    long they wait for service

158
Littles Law
  • Mean number of tasks in system
  • Arrival rate x Mean response time

159
Littles Law
  • Mean number of tasks in system
  • Arrival rate x Mean response time
  • applies to any system in equilibrium
  • nothing inside the black box creating new tasks
    or destroying them

160
Littles Law
  • Observe a sys for Timeobserve mins
  • Sum the times for each task to be serviced
    Timeaccumulated
  • Numbertask completed during Timeobserve
  • TimeaccumulatedTimeobserve
  • because tasks can overlap in time

161
Littles Law
162
Single-Server Model
  • Queue / Waiting line
  • the area where the tasks accumulate, waiting to
    be serviced
  • Server
  • the device performing the requested service is
    called the server

163
Single-Server Model
  • Timeserver
  • average time to service a task
  • average service rate 1/Timeserver
  • Timequeue
  • average time per task in the queue
  • Timesystem
  • average time/task in the system, or the response
    time
  • the sum of Timequeue and Timeserver

164
Single-Server Model
  • Arrival rate
  • average of arriving tasks per second
  • Lengthserver
  • average of tasks in service
  • Lengthqueue
  • average length of queue
  • Lengthsystem
  • average of tasks in system,
  • the sum of Lengthserver and Lengthqueue

165
Server Utilization / traffic intensity
  • Server utilization
  • the mean number of tasks being serviced divided
    by the service rate
  • Service rate 1/Timeserver
  • Server utilization
  • Arrival rate x Timeserver
  • (littles law again)

166
Server Utilization
  • Example
  • an I/O sys with a single disk gets on average 50
    I/O requests per sec
  • 10 ms on avg to service an I/O request
  • server utilization
  • arrival rate x timeserver
  • 50 x 0.01 0.5 1/2
  • Could handle 100 tasks/sec, but only 50

167
Queue Discipline
  • How the queue delivers tasks to server
  • FIFO first in, first out
  • Timequeue
  • Lengthqueue x Timeserver
  • Mean time to complete service of task when
    new task arrives if server is busy

168
Queue
  • with exponential/Poisson distribution of
    events/requests

169
Lengthqueue
  • Example
  • an I/O sys with a single disk gets on average 50
    I/O requests per sec
  • 10 ms on avg to service an I/O request
  • Lengthqueue

170
(No Transcript)
171
Lectures 13
  • Multiprocessors

172
centralized shared-memory
eight or fewer cores
173
centralized shared-memory
Share a single centralized memory All processors
have equal access to
174
centralized shared-memory
All processors have uniform latency from
memory Uniform memory access (UMA)
multiprocessors
175
distributed shared memory
more processors
physically distributed memory
176
distributed shared memory
more processors
physically distributed memory
Distributing mem among the nodes increases
bandwidth reduces local-mem latency
177
distributed shared memory
more processors
physically distributed memory
NUMA nonuniform memory access access time
depends on data word loc in mem
178
distributed shared memory
more processors
physically distributed memory
Disadvantages more complex inter-processor
communication more complex software to handle
distributed mem
179
Cache Coherence Problem
write-through cache
180
Cache Coherence Problem
  • Global state defined by main memory
  • Local state defined by the individual caches

181
Cache Coherence Problem
  • A memory system is Coherent if any read of a data
    item returns the most recently written value of
    that data item
  • Two critical aspects
  • coherence defines what values can be returned
    by a read
  • consistency determines when a written value
    will be returned by a read

182
Coherence Property
  • A read by processor P to location X that follows
    a write by P to X, with writes of X by another
    processor occurring between the write and the
    read by P,
  • always returns the value written by P.
  • preserves program order

183
Coherence Property
  • A read by a processor to location X that follows
    a write by anther processor to X returns the
    written value if the read the write are
    sufficiently separated in time and no other
    writes to X occur between the two accesses.

184
Consistency
  • When a written value will be seen is important
  • For example, a write of X on one processor
    precedes a read of X on another processor by a
    very small time, it may be impossible to ensure
    that the read returns the value of the data
    written,
  • since the written data may not even have left
    the processor at that point

185
Cache Coherence Protocols
  • Directory based
  • the sharing status of a particular block of
    physical memory is kept in one location, called
    directory
  • Snooping
  • every cache that has a copy of the data from a
    block of physical memory could track the sharing
    status of the block

186
Snooping Coherence Protocol
  • Write invalidation protocol
  • invalidates other copies on a write
  • exclusive access ensures that no other readable
    or writable copies of an item exist when the
    write occurs

187
Snooping Coherence Protocol
  • Write invalidation protocol
  • invalidates other copies on a write

write-back cache
188
Snooping Coherence Protocol
  • Write update/broadcast protocol
  • update all cached copies of a data item when
    that item is written
  • consumes more bandwidth

189
Write Invalidation Protocol
  • To perform an invalidate, the processor simply
    acquires bus access and broadcasts the address to
    be invalidated on the bus
  • All processors continuously snoop on the bus,
    watching the addresses
  • The processors check whether the address on the
    bus is in their cache
  • if so, the corresponding data in the cache is
    invalidated.

190
Coherence Miss
  • True sharing miss
  • first write by a processor to a shared cache
    block causes an invalidation to establish
    ownership of that block
  • another processor reads a modified word in that
    cache block
  • False sharing miss

191
Coherence Miss
  • True sharing miss
  • False sharing miss
  • a single valid bit per cache block
  • occurs when a block is invalidated (and a
    subsequent reference causes a miss) because some
    word in the block, other than the one being read,
    is written into

192
Coherence Miss
  • Example
  • assume words x1 and x2 are in the same cache
    block, which is in shared state in the caches of
    both P1 and P2.
  • identify each miss as a true sharing miss, a
    false sharing miss, or a hit?

193
Coherence Miss
  • Example
  • 1. true sharing miss
  • since x1 was read by P2 and needs to be
    invalidated from P2

194
Coherence Miss
  • Example
  • 2. false sharing miss
  • since x2 was invalidated by the write of x1 in
    P1,
  • but that value of x1 is not used in P2

195
Coherence Miss
  • Example
  • 3. false sharing miss
  • since the block is in shared state, need to
    invalidate it to write
  • but P2 read x2 rather than x1

196
Coherence Miss
  • Example
  • 4. false sharing miss
  • need to invalidate the block
  • P2 wrote x1 rather than x2

197
Coherence Miss
  • Example
  • 5. true sharing miss
  • since the value being read was written by P2
    (invalid -gt shared)

198
Lab/Experiment
  • Refer to also archlab website

199
?
200
  • Exam July 5
  • one A4 handwritten notes
  • Gook Luck)

201
A Few More Words
202
  • Dont Settle
  • Strive for Better

203
  • If you can dream it,
  • you can accomplish it.

LinkedIn www.youtube.com/watch?vU6JxljIXzGw
204
Reid Hoffman http//t.cn/zTrc5bd The 3 Secrets of
Highly Successful Graduates
Write a Comment
User Comments (0)
About PowerShow.com