Title: Caches for Parallel Architectures (Coherence)
1Caches for Parallel Architectures(Coherence)
- Figures, examples ap?
- Parallel Computer Architecture A
Hardware/Software Approach, D. E. Culler, J. P.
Singh, Morgan Kaufmann Publishers, INC. 1999. - Transactional Memory, D. Wood, Lecture Notes in
ACACES 2009
2S?ed?as? ?pe?e??ast??
- Moores Law (1964)
- Transistors per IC doubles every 2 years (or 18
months) - ??a?t??? ? ap?d?s? t?? epe?e??ast? d?p?as???eta?
???e 2 ?????a. - ??? ?a? pe??ss?te?a p??ß??µata
- Memory wall
- 1980 memory latency 1 instruction
- 2006 memory latency 1000 instructions
- Power and cooling walls
- ????s? p???p????t?ta? s?ed?asµ?? ?a? epa???e?s??
(design and test complexity) - ?e?????sµ??a pe??????a pe?a?t??? e?µet???e?s??
ILP - ? ?a??????e? ????te?t??????
3?a??????e? ????te?t?????? (1)
- ?? p???epe?e??ast?? ?????sa? ?d?a?te?? a??pt???
ap? t? de?aet?a t?? 90s - Servers
- Supercomputers ??a t?? ep?te??? µe?a??te???
ep?d?s?? se s?????s? ??a epe?e??ast? - St?? µ??e? µa? (CMPs)
- ?e??s? ??st??? s?ed?asµ?? µ?s? epa?a???s?µ?p???s??
(replication) s?ed??? - ??µet???e?s? Thread-Level Parallelism (TLP) ??a
t?? a?t?µet?p?s? t?? memory wall - ?aµ???te?? per-core power, pe??ss?te?a cores.
- ?p?d?t??? ???s?µ?p???s? p???epe?e??ast??
(?d?a?te?a se servers) ?p?? ?p???e? thread-level
parallelism - ????s? e?d?af????t?? ??a t? s?ed?as? servers ?a?
t?? ap?d?s? t???
4?a??????e? ????te?t?????? (2)
- ??a a?t? ?d????? se µ?a ??a ep??? ?p?? t?? ?????
???? d?ad?aµat????? ?? p???epe?e??ast?? - Desktop µ??a??µata ??a ???e ???st? µe 2, 4, 6, 8,
p????e? - We are dedicating all of our future product
development to multicore designs. We believe this
is a key inflection point for the industry - Intel CEO Paul Otellini, 2005
5?a????µ?s? ?a???????? ????te?t??????
- Single Instruction stream, Single Data stream
(SISD) - Single Instruction stream, Multiple Data streams
(SIMD) - ????ap??? epe?e??ast??, ?d?e? e?t????,
d?af??et??? ded?µ??a (data-level parallelism). - Multiple Instruction streams, Single Data stream
(MISD) - ????? s?µe?a de? ??e? eµfa??ste? st?? a????
??p??? t?t??? s?st?µa (e??a? ?????? ??a fault
tolerance, p.?. ?p?????st?? p?? e??????? pt?s?
ae??s?af??). - Multiple Instruction streams, Multiple Data
streams (MIMD) - O ???e epe?e??ast?? e?te?e? t?? d???? t?? e?t????
?a? epe?e????eta? ta d??? t?? ded?µ??a. ????ap??
pa??????a ??µata (thread-level parallelism). - Ta as???????µe ?????? µe MIMD s?st?µata.
- Thread-level parallelism
- ??e????a ?e?t?????a e?te ?? single-user
multiprocessors est?????ta? st?? ap?d?s? µ?a?
efa?µ????, e?te ?? multiprogrammed
multiprocessors e?te???ta? p???ap??? ?e?t?????e?
ta?t?????a. - ??e??e?t?µata ??st???-ap?d?s?? ???s?µ?p????ta?
off-the-self epe?e??ast??.
6(No Transcript)
7MIMD S?st?µata (1)
- ?a?ade??µata MIMD s?st?µ?t??
- Clusters (commodity/custom clusters)
- Multicore systems
- ???e epe?e??ast?? e?te?e? d?af??et??? process
(d?e??as?a). - process A segment of code that can be executed
independently. Se ??a p???p????aµµat?st???
pe??ß?????, ?? epe?e??ast?? e?te???? d?af??et???
tasks ?? ep?µ???? ???e process e??a? a?e???t?t?
ap? t?? ?p????pe?. - ?ta? p???ap?? processes µ???????ta? ??d??a ?a?
???? d?e????se?? (address space) t?te ???µ????ta?
threads (??µata). - S?µe?a ? ???? thread ???s?µ?p??e?ta? ??a ?a
pe??????e? ?e???? p???ap??? e?te??se??, ?? ?p??e?
µp??e? ?a p?a?µat?p??????? se d?af??et?????
epe?e??ast?? a?e???t?ta ap? t? a? µ???????ta? ?
??? t? address space. - ?? multithreaded (p?????µat????) a???te?t??????
ep?t??p??? t?? ta?t?????? e?t??es? p???ap???
processes µe d?af??et??? address space, ?a??? ?a?
p???ap??? threads p?? µ???????ta? t? ?d?? address
space.
8MIMD S?st?µata (2)
- G?a t?? ap?d?t??? ???s? e??? MIMD s?st?µat?? µe n
epe?e??ast??, apa?t???ta? t??????st?? n
threads/processes. - ??µ??????a ap? t?? p????aµµat?st? ? t?? compiler
- Grain Size To µ??e??? (amount of computation)
t?? ???e thread - Fine-grain ?e????? de??de? e?t???? (p.?. ??p??e?
epa?a???e?? e??? loop, instruction-level
parallelism) - Coarse-grain ??at?µµ???a e?t???? (thread-level
parallelism) - ?a MIMD s?st?µata ???????ta? se 2 ?at?????e? µe
ß?s? t?? ??????s? t?? ?e?a???a? t?? µ??µ?? t???. - Centralized shared-memory architectures
(????te?t?????? s???e?t??µ???? ?????? µ??µ??) - Distributed memory architectures (????te?t??????
f?s??? ?ata?eµ?µ???? µ??µ??)
9Centralized Shared-Memory Architectures
- ?????? a???µ?? epe?e??ast?? (????te??? ap? 100 t?
2006). - ???? ?? epe?e??ast?? µ???????ta? µ?a ?e?t????
µ??µ? - ????ap?? banks
- point-to-point connections, switches
- ?e?????sµ??? scalability
- Symmetric multiprocessors (SMPs)
- ? µ??µ? ??e? s?µµet???? s??s? µe t???
epe?e??ast?? - ?µ???µ??f?? ?????? p??sp??as?? (Uniform Memory
Access UMA)
10Distributed Memory Architectures (1)
- ? µ??µ? µ?????eta? t?p??? se ???e epe?e??ast?.
- ??e??e?t?µata
- ?e?a??te?? e???? ????? µ??µ?? a?
? p?e????f?a
t?? p??spe??se??
?????ta? t?p??? se ???e ??µß?. - ?e??s? ?????? p??sßas?? se
ded?µ??a ap????e?µ??a
st??
µ??µ? t?? ???e ??µß??. - ?e???e?t?µata
- ????p???? a?ta??a?? ded?µ????
µeta?? epe?e??ast??. - ??? d?s???? pa?a???? software ??a
t?? e?µet???e?s?
t?? a???µ????
e????? ????? t?? µ??µ??. - ??? µ??t??a ep????????a? ??a a?ta??a?? ded?µ????
- Shared Address space
- Message Passing
11Distributed Memory Architectures (2)
- Shared address space
- ?? f?s??? ?ata?eµ?µ??e? µ??µe?
???s?µ?p?????ta? sa? ??a?
µ??ad????, d?aµ???a??µe???
????? ded?µ????. - ? ?d?a f?s??? d?e????s? se 2
epe?e??ast?? a?af??eta? st??
?d?a t?p??es?a st? ?d?? ??µµ?t?
t?? f?s????
µ??µ??. - ?p????????a µ?s? t?? ??????
????? (implicitly, µe ???s? ap???
Loads ?a? Stores se shared
variables). - ?? p???epe?e??ast?? a?t??
???µ????ta? Distributed
Shared-Memory (DSM). - ? ?????? p??sßas?? e?a?t?ta?
ap? t?? t?p??es?a st?? ?p??a
ß??s???ta? ta ded?µ??a ? NUMA (Non-Uniform Memory
Access).
12Distributed Memory Architectures (3)
- Private address space
- ? ???e epe?e??ast?? ??e? t? d??? t?? address
space, t? ?p??? de? µp??e? ?a p??spe?aste? ap?
??p???? ????. - ? ?d?a f?s??? d?e????s? se 2 epe?e??ast??
a?af??eta? se d?af??et???? t?p??es?e? se
d?af??et??? ??µµ?t?a µ??µ??. - ?p????????a (explicitly) µ?s? µ???µ?t?? ?
Message-Passing Multiprocessors.
- ???e s??d?asµ?? send-receive p?a?µat?p??e? ??a
s???????sµ? ?e????? (pairwise synchronization)
?a??? ?a? µ?a µetaf??? ded?µ???? ap? µ??µ? se
µ??µ? (memory-to-memory copy) - p.?. clusters
13Shared Memory Architectures (1)
14Shared Memory Architectures (2)
- ?as??? ?d??t?ta t?? s?st?µ?t?? µ??µ??
- ???e a?????s? µ?a? t?p??es?a?, ?a p??pe? ?a
ep?st??fe? t?? te?e?ta?a t?µ? p?? ???ft??e se
a?t?. - ?as??? ?d??t?ta t?s? ??a ta se???a?? p?????µµata,
?s? ?a? ??a ta pa??????a. - ? ?d??t?ta a?t? d?at??e?ta? ?ta? p???ap?? threads
e?te????ta? se ??a epe?e??ast?, ?a??? ß??p???
t?? ?d?a ?e?a???a µ??µ??. - Sta p???epe?e??ast??? s?st?µata, ?µ??, ???e
epe?e??ast?? ??e? t? d??? t?? µ???da ???f??
µ??µ?? (cache). - ???a?? p??ß??µata
- ??t???afa µ?a? µetaß??t?? e??a? p??a??? ?a
?p?????? se pa?ap??? ap? µ?a caches. - ?? µ?a e???af? de? e??a? ??at? ap? ????? t???
epe?e??ast??, t?te ?p???e? pe??pt?s? ??p???? ?a
d?aß????? t?? pa??? t?µ? t?? µetaß??t?? p?? e??a?
ap????e?µ??? st?? cache t???. - ???ß??µa S???fe?a? ???f?? ???µ?? (Cache
Coherence)
15???ß??µa coherence sta µ???epe?e??ast???
s?st?µata?
16Direct Memory Access
- DMA CPU st?? µ??µ?
- ??se??
- a) HW cache invalidation for DMA writes or cache
flush for DMA reads - b) SW OS must ensure that the cache lines are
flushed before an outgoing DMA transfer is
started and invalidated before a memory range
affected by an incoming DMA transfer is accessed - c) Non cacheable DMAs
17?a??de??µa ???ß??µat?? S???fe?a? ???f?? ???µ??
- ?? epe?e??ast?? ß??p??? d?af??et??? t?µ? ??a t?
µetaß??t? u µet? t? ?e?t?????a 3 - ?e t?? write back caches, ? t?µ? p?? ???feta?
p?s? st? µ??µ? e?a?t?ta? ap? t? p??a cache ?a?
p?te d????e? ? a?t????fe? ded?µ??a - ?pa??de?t?, a??? s?µßa??e? s????!
18?a??de??µa
- ??? ta?t?????e? a?a???e?? 100 ap? t?? ?d??
???a??asµ? ap? 2 d?af??et??? ATMs. - ???e transaction se d?af??et??? epe?e??ast?.
- ? d?e????s? pe????eta? st?? ?ata????t? r3.
19?a??de??µa (????? caches)
- ????? caches ? ?a???a p??ß??µa!
20?a??de??µa (Incoherence)
- Write-back caches
- 3 p??a?? a?t???afa memory, p0, p1
- To s?st?µa e??a? p??a?? ?a e??a? incoherent.
21?a??de??µa (Incoherence)
- Write-through caches
- ???a 2 d?af??et??? a?t???afa!
- ?a? p??? p??ß??µa! (p.?. ?st? ?t? ? p0 e?te?e?
?a? ???? a??????) - ?? write-through caches de? ?????? t? p??ß??µa!
22Cache Coherence (1)
- ??at???s? t?? ßas???? ?d??t?ta?
- ???e a?????s? µ?a? t?p??es?a?, ?a p??pe? ?a
ep?st??fe? t?? ????????? t?µ? p?? ???ft??e se
a?t?. - ??? ????eta? t? ?????????
- Se???a?? p?????µµata
- ????eta? s?µf??a µe t? se??? p?? ep?ß???eta? ap?
t?? ??d??a. - ?a??????a p?????µµata
- ??? threads µp??e? ?a ??????? st?? ?d?a d?e????s?
t?? ?d?a ??????? st??µ?. - ??a thread µp??e? ?a d?aß?se? µ?a µetaß??t?
a???ß?? µet? t?? e???af? t?? ap? ??p??? ????,
a??? ???? t?? ta??t?ta? µet?d?s?? ? e???af? a?t?
de? ??e? ???e? a??µa ??at?. - ? se??? p?? ep?ß???e? ? ??d??a? ????eta? e?t??
t?? thread. - ?pa?te?ta? ?µ?? ?a? ? ???sµ?? µ?a? se???? p?? ?a
af??? ??a ta threads (global ordering).
23Cache Coherence (2)
- ?st? ?t? ?p???e? µ?a ?e?t???? µ??µ? ?a? ?aµ?a
cache. - ???e ?e?t?????a se µ?a ??s? µ??µ?? p??spe???e?
t?? ?d?a f?s??? ??s?. - ? µ??µ? ep?ß???e? µ?a ?a?????? se??? st??
?e?t?????e? ???? t?? threads se a?t? t? ??s?. - ?? ?e?t?????e? ???e thread d?at????? t? se??? t??
p?????µµat?? t??. - ???e d??ta?? p?? d?at??e? t? se??? t??
?e?t??????? t?? ep?µ????? p????aµµ?t?? e??a?
ap?de?t? / ??????. - O? te?e?ta?a ????eta? ? p?? p??sfat? ?e?t?????a
se µ?a ?p??et??? a???????a p?? d?at??e? t??
pa?ap??? ?d??t?te?. - Se ??a p?a?µat??? s?st?µa de? µp??e? ?a
?atas?e?aste? a?t? ? ?a?????? se???. - ???s? caches.
- ?p?f??? serialization.
- ?? s?st?µa p??pe? ?a e??a? ?atas?e?asµ??? ?ste ta
p?????µµata ?a s?µpe??f????ta? sa? ?a ?p???e a?t?
? ?a?????? se???.
24Cache Coherence - ???sµ??
- ??a s?st?µa e??a? coherent (s??af??) a? ??a ???e
e?t??es? ta ap?te??sµata (?? t?µ?? p??
ep?st??f??ta? ap? t?? ?e?t?????e? a?????s??)
e??a? t?t??a, ?ste se ???e ??s? ?a µp????µe ?a
?atas?e??s??µe µ?a ?p??et??? a???????a?? se???
???? t?? ?e?t??????? st? ??s? a?t?, p?? ?a e??a?
s??ep?? µe ta ap?te??sµata t?? e?t??es?? ?a? st??
?p??a - ?? ?e?t?????e? ???e thread p?a?µat?p?????ta? µe
t?? se??? ?at? t?? ?p??a ??????a? ap? a?t? t?
thread. - ? t?µ? p?? ep?st??feta? ap? µ?a ?e?t?????a
a?????s?? e??a? ? t?µ? t?? te?e?ta?a? e???af??
st? s???e???µ??? ??s? s?µf??a µe t?? ?p??et???
a???????a?? se???. - 3 s?????e? ??a ?a e??a? ??a s?st?µa coherent.
25Cache Coherence - S?????e?
- 1. A read by processor P to a location X that
follows a write by P to X, with no writes of X by
another processor occurring between the write and
the read by P, always returns the value written
by P. - ??at???s? t?? se???? t?? p?????µµat??.
- ?s??e? ?a? ??a uniprocessors.
- 2. A read by a processor to location X that
follows a write by another processor to X returns
the written value if the read and write are
sufficiently separated in time and no other
writes to X occur between the two accesses. - write propagation
- ??a ?e?t?????a a?????s?? de? µp??e? ?a ep?st??fe?
pa???te?e? t?µ??. - 3. Writes to the same location are serialized
that is, two writes to the same location by any
two processors are seen in the same order by all
processors. (e.g. if values 1 and then 2 are
written to a location, processors can never see
the value of the location as 2 and then later
read it as 1) - write serialization. ??e?a??µaste read
serialization
26Bus Snooping Cache Coherence (1)
- ???s? d?ad??µ??
- ???sf??e? µ?a ap?? ?a? ??µ?? ???p???s? ??a cache
coherence. - ???ß??µata scalability.
- ??e? ?? s?s?e??? p?? e??a? s??dedeµ??e? p??? st?
d??d??µ? µp????? ?a pa?a????????? ??a ta bus
transactions. - ??e?? f?se?? se ???e transaction
- ??a?t?s?a ? bus arbiter ap?fas??e? p??a s?s?e??
??e? t? d??a??µa ?a ???s?µ?p???se? t? bus - ?p?st??? e?t????/d?e????s?? ? ep??e?µ??? s?s?e??
µetad?de? t? e?d?? t?? e?t???? (read / write)
?a??? ?a? t? d?e????s? t?? a?t?st????? ??s??.
???? pa?a????????? ?a? ap?fas????? a? t???
e?d?af??e? ? ???. - ?etaf??? ded?µ????
27Bus Snooping Cache Coherence (2)
- ??µet???e?s? t?? cache block state
- ???e cache µa?? µe ta tag ?a? data ap????e?e? ?a?
t?? ?at?stas? st?? ?p??a ß??s?eta? t? block (p.?.
invalid, valid, dirty). - ??s?ast??? ??a ???e block ?e?t????e? µ?a µ??a??
pepe?asµ???? ?atast?se?? (FSM) - ???e p??sßas? se ??a block ? se ??p??a d?e????s?
p?? a?t?st???e? st? ?d?? cache line µe a?t? t?
block, p???a?e? µ?a µetaß??? t?? state ? a?????
µ?a a??a?? ?at?stas?? st? FSM. - Se multiprocessor s?st?µata t? state e??? block
e??a? ??a? p??a?a? µ????? p, ?p?? p ? a???µ?? t??
caches. - To ?d?? FSM ?a?????e? t?? a??a??? ?atast?se??
??a ??a ta blocks se ??e? t?? caches. - To state e??? block µp??e? ?a d?af??e? ap? cache
se cache.
28Hardware ??a Cache Coherence
- Coherence Controller (CC)
- ?a?a??????e? t?? ????s? st? d??d??µ? (d?e????se??
?a? ded?µ??a) - ??te?e? t? p??t?????? s???fe?a? (coherence
protocol). - ?p?fas??e? t? ?a ???e? µe t? t?p??? a?t???af? µe
ß?s? a?t? p?? ß??pe? ?a µetad?d??ta? st? d??d??µ?.
29Bus Snooping Cache Coherence (3)
- ???p???s? ???t???????
- ? e?e??t?? t?? cache d??eta? e?s?d? ap? 2 µe????
- ??t?se?? p??sßas?? st? µ??µ? ap? t?? epe?e??ast?.
- ? ?at?s??p?? (bus snooper) e??µe???e? ??a bus
transactions p?? p?a?µat?p????? ?? ?p????pe?
caches. - Se ???e pe??pt?s? a?tap?????eta?
- ???µe???e? t?? ?at?stas? t?? block µe ß?s? t?
FSM. - ?p?st??? ded?µ????.
- ?a?a???? ???? bus transactions.
- ???e p??t?????? ap?te?e?ta? ap? ta pa?a??t?
d?µ??? st???e?a - ?? s????? t?? ep?t?ept?? states ??a ???e block
st?? caches. - To state transition diagram p?? µe e?s?d? t?
state t?? block ?a? t? a?t?s? t?? epe?e??ast? ?
t? pa?at????µe?? bus transaction ?p?de????e? ??
???d? t? ep?µe?? ep?t?ept? state ??a t? block
a?t?. - ??? e????e?e? p?? ep?ß???eta? ?a p?a?µat?p???????
?at? t?? a??a?? ?at?stas?? t?? block.
30Simple Invalidation-based protocol (1)
- write-through, write-no-allocate caches
- 2 states ??a ???e block
- Valid
- Invalid
- Se pe??pt?s? e???af?? e??? block
- ???µe???eta? ? ????a µ??µ? µ?s?
e??? bus transaction. - ???e bus snooper e??µe???e? t?? cache
controller t??, ? ?p???? a?????e? t?
t?p??? a?t???af? a? ?p???e?. - ?p?t??p??ta? p???ap??? ta?t?????e? a?a???se??
(multiple readers). ??a e???af? ?µ?? t???
a?????e?. - ???a? coherent
31Simple Invalidation-based protocol (2)
- ?p????µe ?a ?atas?e??s??µe µ?a ?a?????? se??? p??
?a ??a??p??e? t? se??? t?? p?????µµat?? ?a? t?
se????p???s? t?? e???af?? - ?p???t??µe atomic bus transactions ?a? memory
operations. - ??a transaction ???e f??? st? bus.
- ???e epe?e??ast?? pe??µ??e? ?a ?????????e? µ?a
p??sßas? t?? st? µ??µ? p??? a?t??e? ?a??????a. - ?? e???af?? (?a? ?? a????se??) ???????????ta?
?at? t? d????e?a t?? bus transactions. - ??e? ?? e???af?? eµfa?????ta? st? bus
(write-through protocol). - ?? e???af?? se µ?a ??s? se????p?????ta? s?µf??a
µe t? se??? µe t?? ?p??a eµfa?????ta? st? bus.
(bus order) - ?? a????se?? p?a?µat?p?????ta? ep?s?? s?µf??a µe
t? bus order. - ??? pa?eµß?????µe t?? a?a???se?? st? se??? a?t?
- ?? a?a???se?? de? e??a? ?p???e?t??? ?a
p???a??s??? bus transaction ?a? µp????? ?a
e?te????ta? a?e???t?ta ?a? ta?t?????a st??
caches.
32Simple Invalidation-based protocol (3)
- Se????p???s? a?a???se??
- Read hit ? read miss?
- Read Miss
- ??a??p??e?ta? µ?s? bus transaction. ?p?µ????
se????p??e?ta? µa?? µe t?? e???af??. - Ta de? t?? t?µ? t?? te?e?ta?a? e???af?? s?µf??a
µe t? bus order. - Read Hit
- ??a??p??e?ta? ap? t?? t?µ? p?? ß??s?eta? µ?sa
st?? cache. - ???pe? t?? t?µ? t?? p?? p??sfat?? e???af?? ap?
t?? ?d?? epe?e??ast? ? t?? p?? p??sfat??
a?????s?? (read miss). - ?a? ta 2 (write ?a? read miss) ??a??p?????ta?
µ?s? bus transactions. - ?p?µ???? ?a? ta read hits ß??p??? t?? t?µ??
s?µf??a µe t? bus order.
33VI protocol - ?a??de??µa (write-back caches)
- To ld t?? p1 d?µ?????e? ??a BusRd
- O p0 apa?t? ???f??ta? p?s? t? modified block (WB)
?a? a???????ta? t? st?? cache t?? (µet?ßas? st??
?at?stas? I)
34MSI Write-Back Invalidation Protocol (1)
- To VI p??t?????? de? e??a? ap?d?t???
- VI ? MSI
- Sp?s?µ? t?? V se 2 ?atast?se??
- 3 ?atast?se?? (states)
- ???p?p???µ??? Modified(M)
- ????a??µe?? Shared(S)
- ????? Invalid(I)
- 2 t?p?? a?t?se?? ap? t?? epe?e??ast?
- PrRd (a?????s?) ?a? PrWr (e???af?)
- 3 bus transactions
- BusRd ??t? a?t???af? ????? s??p? ?a t?
t??p?p???se? - BusRdX ??t? a?t???af? ??a ?a t? t??p?p???se?
- BusWB ???µe???e? t? µ??µ?
35MSI Write-Back Invalidation Protocol (2)
- ?????aµµa ?et?ßas?? ?atast?se??
- ?etaß?se?? e?a?t?a? ?e?t??????? t?? t?p????
epe?e??ast?. - ?etaß?se?? e?a?t?a? t?? pa?at????µe??? bus
transactions. - ?/? ?? ? cache controller pa?at???se? t? ?,
t?te e?t?? ap? t? µet?ßas? st? ??a
?at?stas? p???a?e? ?a? t? ?. - -- ?aµ?a e????e?a.
- ?e? pe???aµß????ta? ?? µetaß?se?? ?a? ??
e????e?e? ?at? t?? a?t??at?stas? e??? block st??
cache. - ?s? p?? ???? st? d????aµµa ß??s?eta? ??a block,
t?s? p?? ste?? s??dedeµ??? (bound)
e??a? µe t?? epe?e??ast?.
36MSI protocol - ?a??de??µa (write-back caches)
- To ld t?? p1 d?µ?????e? ??a BusRd
- O p0 apa?t? ???f??ta? p?s? t? modified block (WB)
?a? a??????ta? t? a?t???af? t?? se S - To st t?? p1 d?µ?????e? ??a BusRdX
- O p0 apa?t? a???????ta? t? a?t???af? t??
(µet?ßas? se ?)
37MSI Coherence
- ? d??d?s? t?? e???af?? e??a? p??fa???.
- Se???p???s? e???af??
- ??e? ?? e???af?? p?? eµfa?????ta? st? d??d??µ?
(BusRdX) d?at?ss??ta? ap? a?t??. - ?? a?a???se?? p?? eµfa?????ta? st? d??d??µ?
d?at?ss??ta? ?? p??? t?? e???af??. - G?a t?? e???af?? p?? de? eµfa?????ta? st?
d??d??µ? - ??a a???????a t?t???? e???af?? µeta?? 2 bus
transactions ??a t? ?d?? block p??pe? ?a
p???????ta? ap? t?? ?d?? epe?e??ast? P. - St? se????p???s? ? a???????a eµfa???eta? µeta??
a?t?? t?? 2 transactions. - ?? a?a???se?? ap? t?? ? ?a ß??p??? t?? e???af??
µe a?t? t? se??? ?? p??? t?? ?p????pe? e???af??. - ?? a?a???se?? ap? ?????? epe?e??ast??
d?a???????ta? ap? t?? a???????a µe ??a bus
transaction, ? ?p??a t?? t?p??ete? ?ts? se se???
?? p??? t?? e???af??. - ?? a?a???se?? ap? ????? t??? epe?e??ast?? ß??p???
t?? e???af?? µe t?? ?d?a se???.
38M?SI Write-Back Invalidation Protocol (1)
- ???ß??µa MSI
- 2 transactions ??a a?????s? ?a? t??p?p???s? e???
block, a??µa ?a? a? de? ta µ?????eta? ?a?e??. - 4 ?atast?se?? (states)
- ???p?p???µ??? Modified(M)
- ?p???e?st??? Exclusive(E) ? ???? a?t? ? cache
??e? a?t???af? (µ? t??p?p???µ???). - ????a??µe?? Shared(S) ? ??? ? pe??ss?te?e?
caches ????? a?t???af?. - ????? Invalid(I)
- ?? ?a?e?? de? ??e? a?t???af? t?? block, t?te
??a PrRd ??e? sa? ap?t??esµa t?? µet?ßas? ? ? ?. - St? d??d??µ? ??e???eta? ??a s?µa shared ??
ap??t?s? se ??a BusRd.
39MESI Write-Back Invalidation Protocol (2)
- ?????aµµa ?et?ßas?? ?atast?se??
- ?etaß?se?? e?a?t?a? ?e?t??????? t?? t?p????
epe?e??ast?. - ?etaß?se?? e?a?t?a? t?? pa?at????µe??? bus
transactions. - ?/? ?? ? cache controller pa?at???se? t? ?,
t?te e?t?? ap? t? µet?ßas? st? ??a
?at?stas? p???a?e? ?a? t? ?. - -- ?aµ?a e????e?a.
- ??a block µp??e? ?a ß??s?eta? se ?at?stas? S e??
de? ?p?????? ???a a?t???afa. - ???
40??a?efa?a??s? - Coherence Snooping Protocols
- ??at????µe t?? epe?e??ast?, t? ????a µ??µ? ?a?
t?? caches. - ?p??tas? t?? cache controller - e?µet???e?s? t??
bus. - Write-back caches
- ?p?d?t??? a???p???s? t?? pe?????sµ???? bus
bandwidth. - ?e? p???a???? bus transactions ??e? ??
?e?t?????e? µ??µ??. - ??? d?s???? ???p???s? t?? s???fe?a?.
- ???s? t?? modified state (t??p?p???µ???
?at?stas?) - ?p???e?st??? ?d???t?s?a ? de? ?p???e? ???? ??????
a?t???af?. - ? ????a µ??µ? µp??e? ?a ??e? ? ?a µ?? ??e?
a?t???af?. - ? cache e??a? ?pe????? ?a pa???e? t? block se
?p???? t? ??t?se?. - Exclusivity (ap???e?st???t?ta)
- ? cache µp??e? ?a t??p?p???se? t? block ????? ?a
e?d?p???se? ?a???a ? ????? bus transaction - ???? t?? e???af? p??pe? ?a ap??t?se?
ap???e?st???t?ta. - ???µa ?a? a? t? block e??a? valid ? write miss
41Invalidation Protocols
- Write-miss
- ????a?e? ??a e?d??? transaction read-exclusive
(RdX) - ??d?p??e? t??? ?p????p??? ?t? a??????e? e???af?
?a? ap??t? ap???e?st??? ?d???t?s?a. - ???? ?s?? d?a??t??? a?t???af? t?? block t?
d?a???f???. - ???? µ?a RdX ep?t?????e? ???e f???. ????ap???
a?t?se?? se????p?????ta? ap? t? d??d??µ?. - ?e???? ta ??a ded?µ??a ???f??ta? st?? ????a µ??µ?
?ta? t? block e?d????e? ap? t?? cache. - ?? ??a block de? ??e? t??p?p????e? (modified
state), t?te de? ??e???eta? ?a ??afte? st?? ????a
µ??µ? ?ta? e?d????e? ap? t?? cache.
42Update Protocols
- ??a ?e?t?????a e???af?? e??µe???e? ?a? t????
a?t???afa t?? block st?? ?p????pe? caches. - ??e??e?t?µata
- ?????te?? ?a??st???s? p??sßas?? st? block ap? t??
???e? caches. - ???? e??µe?????ta? µe ??a µ??? transaction.
- ?e???e?t?µata
- ????ap??? e???af?? st? block ap? t?? ?d??
epe?e??ast? p???a???? p???ap?? transactions ??a
t?? e??µe??se??.
43Dragon Write-Back Update Protocol (1)
- 4 ?atast?se?? (states)
- ?p???e?st??? Exclusive (E) ? ???? a?t? ? cache
??e? a?t???af? (µ? t??p?p???µ???). ? ????a µ??µ?
e??a? e??µe??µ??? (up-to-date). - ????a??µe??-?a?a?? Shared-clean (Sc) ? ??? ?
pe??ss?te?e? caches ????? a?t???af?. ? ????a
µ??µ? de? e??a? ?p???e?t??? up-to-date. - ????a??µe??-t??p?p???µ??? Shared-modified (Sm)
? ??? ? pe??ss?te?e? caches ????? a?t???af?, ?
????a µ??µ? de? e??a? up-to-date ?a? ? cache a?t?
??e? t?? e????? ?a e??µe??se? t?? ????a µ??µ?
?ta? e?d???e? t? block. - ???p?p???µ??? Modified (M) ? ???? ? cache a?t?
d?a??te? t? t??p?p???µ??? block e?? ? ????a µ??µ?
de? e??a? up-to-date. - ?e? ?p???e? Invalid state.
- To p??t?????? d?at??e? p??ta ta blocks p??
ß??s???ta? st?? caches up-to-date. - ??? ??e? a?t?se?? ap? t?? epe?e??ast? PrRdMiss,
PrWrMiss - ??a ??? bus transaction BusUpd
44Dragon Write-Back Update Protocol (2)
45Dragon ?a??de??µa
?????e?a st?? epe?e??ast? ?at?stas? ?1 ?at?stas? ?2 ?at?stas? ?3 ?????e?a st? d??d??µ? ?a ded?µ??a pa?????ta? ap?
?1 d?aß??e? u ? --- --- BusRd Mem
?3 d?aß??e? u Sc --- Sc BusRd Mem
?3 ???fe? u Sc --- Sm BusUpd ?3 Cache
?1 d?aß??e? u Sc --- Sm --- ---
?2 d?aß??e? u Sc Sc Sm BusRd ?3 Cache
46Invalidation vs. Update Protocols
- Se ??p??a cache ???eta? e???af? se ??a block.
???? t?? ep?µe?? e???af? st? ?d?? block, ???e?
??p???? ????? ?a t? d?aß?se? - ?a?
- Invalidation
- Read-miss ? p??a??? p???ap?? transactions ?
- Update
- Read-hit a? e??a? ap? p??? a?t???afa? e??µ???s?
µe ??a µ??? transaction ? - ???
- Invalidation
- ????ap??? e???af?? ????? ep?p???? ????s? st? bus
? - ???a????s? a?t????f?? p?? de ???s?µ?p?????ta? ?
- Update
- ????ap??? a??e?aste? e??µe??se?? (?a? se p??a???
?e??? a?t???afa) ?
47Protocol Design Tradeoffs (1)
- ? s?ed?as? p???epe?e??ast???? s?st?µ?t?? e??a?
p???p???? - ????µ?? epe?e??ast??
- ?e?a???a µ??µ?? (levels, size, associativity, bs,
) - ???d??µ??
- Memory System (interleaved banks, width of banks,
) - I/O subsystem
- Cache Coherence Protocol (Protocol class, states,
actions, ) - ?? p??t?????? ep??e??e? ?a?a?t???st??? t??
s?st?µat??, ?p?? latency ?a? bandwidth. - ? ep????? t?? p??t??????? ep??e??eta? ap? t?
??t??µe?? ap?d?s? ?a? s?µpe??f??? t?? s?st?µat??
?a??? ?a? ap? t?? ??????s? t?? ?e?a???a? µ??µ??
?a? t?? ep????????a?.
48Protocol Design Tradeoffs (2)
- Write-Update vs. Write-Invalidate
- Write-run ??a se??? e???af?? ap? ??a epe?e??ast?
se ??a block µ??µ??, ? a??? ?a? t? t???? t??
?p??a? ??????ta? ap? ?e?t?????e? se a?t? t? block
ap? ?????? epe?e??ast??. - W2, R1, W1, W1, R1, W1, R3
- Write-run length 3
- Write-Invalidate ??a write-run ?p????d?p?te
µ????? ?a d?µ??????se? ??a µ??ad??? coherence
miss. - Write-Update ??a write-run µ????? L ?a
p???a??se? L updates.
494C Cache Misses Model
- Compulsory misses (cold)
- ???t? p??sßas? se ??a block.
- ????s? t?? block size.
- Capacity misses
- To block de ???? st?? cache (a??µa ?a? se full
associative cache). - ????s? cache size.
- Conflict misses
- To block de ???? st? set p?? ???eta? mapped.
- ????s? associativity.
- Coherence misses (communication)
- True sharing ?ta? ??a data word ???s?µ?p??e?ta?
ap? 2 ? pa?ap??? epe?e??ast??. - False sharing ?ta? a?e???t?ta data words p??
???s?µ?p?????ta? ap? d?af??et????? epe?e??ast??
a?????? st? ?d?? cache block.
50(No Transcript)
51Protocol Design Tradeoffs (3)
- Cache Block Size
- ????s? t?? block size µp??e? ?a ?d???se?
- ?e??s? t?? miss rate (good spatial locality).
- ????s? t?? miss penalty ?a? ?s?? t?? hit cost.
- ????s? t?? miss rate e?a?t?a? false sharing (poor
spatial locality). - ????s? t?? traffic st? bus, ???? µetaf????
a??e?ast?? ded?µ???? (mismatch fetch/access
size, false sharing). - ?p???e? ? t?s? ??a ???s?µ?p???s? µe?a??te???
cache blocks. - ?p?sßes? ??st??? t?? bus transaction ?a? t??
p??sßas?? st? µ??µ? µetaf????ta? pe??ss?te?a
ded?µ??a. - Hardware ?a? software µ??a??sµ?? ??a a?t?µet?p?s?
t?? false sharing.
52False sharing reduction
- ?e?t??µ??? data layout p???e?µ???? ?a ap?fe???e?
? t?p???t?s? a?e???t?t?? ded?µ???? st? ?d??
block. - Data Padding
- eg. Dummy variables µeta?? lock variables p??
e??a? t?p??et?µ??e? ???t? ? µ?a st?? ????. - Tradeoff locality vs. false sharing
- ???s? array of arrays ?ste ?a ßeßa?????µe ?t?
???e submatrix e??a? t?p??et?µ??? s??e??µe?a st?
µ??µ?. - Tradeoff false sharing vs. instruction overhead
- Partial-Block Invalidation
- To block sp?e? se sub-blocks, ??a ???e ??a ap?
ta ?p??a d?at??e?ta? t? state. - Se ???e miss f?????µe ??a ta invalid sub-blocks.
- ?????µe invalidate µ??? t? sub-block p?? pe????e?
ta ded?µ??a p?? ?a t??p?p???????. - Tradeoff less false sharing miss vs. more
invalidation messages
53Scalable Multiprocessor Systems
- ?a s?st?µata p?? st??????ta? st? ???s? d?ad??µ??
de? e??a? scalable. - ??a ta modules (cores, memories, etc) s??d???ta?
µe ??a set ?a??d???. - ?e?????sµ??? bandwidth ? ?e? a????eta? µe t??
p??s?es? pa?ap??? epe?e??ast?? ? Saturation
(???esµ??). - ?e?a??te?? bus ? ?e?a??te?? latency.
- ??a scalable s?st?µa p??pe? ?a a?t?µet?p??e? a?t?
ta p??ß??µata. - ?? s??????? bandwidth ?a p??pe? ?a a????e? µe t??
a???µ? t?? epe?e??ast??. - ? ?????? p?? apa?te?ta? ??a ??p??a e????e?a de ?a
p??pe? ?a a????e? p??? (p?. ???et???) µe t?
µ??e??? t?? s?st?µat??. - ???pe? ?a e??a? cost-effective.
- ?????µe ßas???? ?d??t?te? t?? d?ad??µ??.
- ????st?? a???µ?? ta?t??????? transactions.
- ?e? ?p???e? global arbitration.
- ?a ap?te??sµata (p?. a??a??? st? state) ?????ta?
ape??e?a? ??at? µ??? ap? t??? ??µß??? p??
s?µµet????? st? transaction.
54Scalable Cache Coherence
- Interconnect
- ??t??at?stas? t?? d?ad??µ?? µe scalable
interconnects (point-to-point networks, eg. mesh) - Processor snooping bandwidth
- ????? t??a ta p??t?????a ??a?a? broadcast (spam
everyone!) - ?e???? p?s?st? snoops de? p???a???? ??p??a
µet?ßas? - G?a loosely shared data, ?at? p?sa p??a??t?ta
µ??? ??a? epe?e??ast?? ??e? a?t???af? - ? Scalable Directory protocol
- ??d?p???s? µ??? t?? epe?e??ast?? p?? t???
e?d?af??e? ??a s???e???µ??? block (spam only
those that care!)
55Directory-Based Cache Coherence (1)
- To cache block state de? µp??e? ?a ?a????ste?
p???? pa?a????????ta? ta requests st? shared bus.
(implicit determination) - ?a?????eta? ?a? d?at??e?ta? se ??a µ????
(directory) ?p?? ta requests µp????? ?a
ape???????? ?a? ?a t? a?a?a??????. (explicit
determination) - ???e memory block ??e? ??a directory entry
- Book-keeping (p???? nodes ????? a?t???afa, t?
state t?? memory copy, ) - ??a ta requests ??a t? block p??a????? st?
directory.
56Directory-Based Cache Coherence (2)
57Directory-Based Cache Coherence (3)
58Directory Protocol Taxonomy
59Directory-Based Cache Coherence (4)
- Directory Protocols
- ?aµ???te?? ?ata????s? bandwidth
- - ?e?a??te?e? ?a??ste??se?? (latency)
- ??? pe??pt?se?? read miss
- Unshared block ? get data from memory
- Bus 2 hops (P0 ? memory ? P0)
- Directory 2 hops (P0 ? memory ? P0)
- S/E block ? get data from processor (P1)
- Bus 2 hops (P0 ? P1 ? P0) (?p???t??ta? ?t?
ep?t??peta? ? cache-to-cache µetaf??? ded?µ????) - Directory 3 hops (P0 ? memory ? P1? P0)
- ? de?te?? pe??pt?s? pa?at??e?ta? a??et? s???? se
p???epe?e??ast??? s?st?µata - ????? p??a??t?ta ?a ??e? t? block ??a?
epe?e??ast??