Thoughts on Shared Caches - PowerPoint PPT Presentation

About This Presentation

Title:

Thoughts on Shared Caches

Description:

Thoughts on Shared Caches. Jeff Odom. University of Maryland ... False Sharing. Occurs when two CPUs access different data structures on the same cache line ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 35

Provided by: jeffre104

Learn more at: https://research.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Thoughts on Shared Caches

1
Thoughts on Shared Caches

Jeff OdomUniversity of Maryland

2
A Brief History of Time

First there was the single CPU
Memory tuning new field
Large improvements possible
Life is good
Then came multiple CPUs
Rethink memory interactions
Life is good (again)
Now theres multi-core on multi-CPU
Rethink memory interactions (again)
Life will be good (we hope)

3
SMP vs. CMP

Symmetric Multiprocessing (SMP)
Single CPU core per chip
All caches private to each CPU
Communication via main memory
Chip Multiprocessing (CMP)
Multiple CPU cores on one integrated circuit
Private L1 cache
Shared second-level and higher caches

4
CMP Features

Thread-level parallelism
One thread per core
Same as SMP
Shared higher-level caches
Reduced latency
Improved memory bandwidth
Non-homogeneous data decomposition
Not all cores are created equal

5
CMP Challenges

New optimizations
False sharing/private data copies
Delaying reads until shared
Fewer locations to cache data
More chance of data eviction in high-throughput
computations
Hybrid SMP/CMP systems
Connect multiple multi-core nodes
Composite cache sharing scheme
Cray XT4
2 cores/chip
2 chips/node

6
False Sharing

Occurs when two CPUs access different data
structures on the same cache line

7
False Sharing (SMP)
8
False Sharing (SMP)
9
False Sharing (SMP)
10
False Sharing (SMP)
11
False Sharing (SMP)
12
False Sharing (SMP)
13
False Sharing (SMP)
14
False Sharing (SMP)
15
False Sharing (CMP)
16
False Sharing (CMP)
17
False Sharing (CMP)
18
False Sharing (CMP)
19
False Sharing (CMP)
20
False Sharing (CMP)
21
False Sharing (CMP)
22
False Sharing (CMP)
23
False Sharing (SMP vs. CMP)

With private L2 (SMP), modification of
co-resident data structures results in trips to
main memory
In CMP, false sharing impact is limited by the
shared L2
Latency from L1 to L2 much less than L2 to main
memory

24
Maintaining Private Copies

Two threads modifying the same cache line will
want to move data to their L1
Simultaneous reading/modification causes
thrashing between L1s and L2
Keeping a copy of data in separate cache line
keeps data local to the processor
Updates to shared data occur less often

25
Delaying Reads Until Shared

Often the results from one thread are pipelined
to another
Typical signal-based sharing
Thread 1 accesses data, is pulled into L1T1
T1 modifies data
T1 signals T2 that data is ready
T2 requests data, forcing eviction from L1T1 into
L2Shared
Data is now shared
L1 line not filled in, wasting space

26
Delaying Reads Until Shared

Optimized sharing
T1 pulls data into L1T1 as before
T1 modifies data
T1 waits until it has other data to fill the line
with, then uses that to push data into L2Shared
T1 signals T2 that data is ready
T1 and T2 now share data in L2Shared
Eviction is side-effect of loading line

27
Hybrid Models