Cache - Basics - PowerPoint PPT Presentation

About This Presentation

Title:

Cache - Basics

Description:

Random-Access Memory. Static RAM (SRAM) Each cell stores bit with a six-transistor circuit. Retains value indefinitely, as long as it is kept powered – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 30

Provided by: skk5

Learn more at: http://csl.skku.edu

Category:

more less

Transcript and Presenter's Notes

Title: Cache - Basics

1
Cache - Basics
2
Computer System

Instructions and data are stored in memory
Processors access memory for
Instruction fetch ? access memory almost every
cycle
Data load/store (20 of instructions) ? access
memory every 5th cycle

3
Random-Access Memory

Static RAM (SRAM)
Each cell stores bit with a six-transistor
circuit
Retains value indefinitely, as long as it is kept
powered
Faster and more expensive than DRAM
Dynamic RAM (DRAM)
Each cell stores bit with a capacitor and
transistor
Value must be refreshed every 10-100ms
Slower and cheaper than SRAM

4
CPU-Memory Performance Gap

Processor-memory performance gap
Grows 50 per year
No cache before 1980, 2-level cache since 1995

5
CPU-Memory Performance Gap (contd)

The increasing gap between DRAM, disk, and CPU
speeds

6
Memory Hierarchy

Cache small, fast storage
Improves average access time to slow memory
Exploits spatial and temporal locality
Caches other than memory hierarchy
TLB cache on page table
Branch-prediction cache on prediction
information

7
Memory Hierarchies

Some fundamental and enduring properties of
hardware and software
Fast storage technologies cost more per byte and
have less capacity
The gap between CPU and main memory speed is
widening
Well-written programs tend to exhibit good
locality
They suggest an approach for organizing memory
and storage systems known as a memory hierarchy

8
An Example Memory Hierarchy
CPU registers hold words retrieved from L1 cache.
Smaller, Faster, and Costlier (per byte)
L0
registers
on-chip L1 cache (SRAM)
L1
on-chip L2 cache (SRAM)
L2
main memory (DRAM)
L3
Larger, Slower, and Cheaper (per byte)
local secondary storage (local magnetic disks)
L4
remote secondary storage (distributed file
systems, Web servers)
L5
9
Hierarchy Works

How hierarchy works
Place a copy of frequently accessed data at the
higher levels of hierarchy
CPUs search for the highest copy of the data to
be accessed
Principle of locality
Program accesses a small portion of the address
space at any given time period
90/10 rule 90 of the accesses are to 10 of
memory locations
Users want large and fast memories!
Access time (ns) Cost (per
GB) (2004)
SRAM 0.5 5 4000 10,000
DRAM 50 70 100 200
Disk 5M 20M 0.50 2

10
Locality

Principle of Locality
Temporal locality
Recently referenced items are likely to be
referenced in the near future.
Spatial locality
Items with nearby addresses tend to be referenced
close together in time.
Locality Example
Data
Reference array elements in succession Spatial
locality
Reference sum each iteration Temporal locality
Instructions
Reference instructions in sequence Spatial
locality
Cycle through loop repeatedly Temporal locality

sum 0 for (i 0 i lt n i) sum
ai return sum
11
Cache Terms

Our initial focus two levels (upper, lower ?
cache, memory)
Block (or line) minimum unit of data
Hit data requested is in the upper level
Miss data requested is not in the upper level
Hit rate
Hit time SRAM access time time to determine
hit/miss
Miss rate 1 Hit rate
Miss penalty time to fetch a block from lower
level (memory)
Performance
Average access time
hit time miss rate x miss penalty

number of found in the upper level (cache) number
of accesses
12
Caching in a Memory Hierarchy

Access addresses of blocks 4, 10

4
10
4
10
0
1
2
3
4
5
6
7
4
Lower Level
8
9
10
11
10
12
13
14
15
13
General Caching Concepts
Program

Program needs an object, which is stored in some
blocks, 14 and 12
Cache hit
Found at level k (e.g., block 14)
Cache miss
Not found at level k, so fetched from level k1
(e.g., block 12)
If level k is full, a victim block must be
replaced (evicted) (e.g., block 4)
If the victim is clean (not modified), just
replace
If the victim is dirty (modified and different
from the one in level k1), update the lower

Request 14
Request 12
14
12
0
1
2
3
Level k
14
4
12
9
3
14
4
Request 12
4
12
0
1
2
3
4
5
6
7
Level k1
4
8
9
10
11
12
13
14
15
12
14
Locating Data Items

How do we know if a data item in the cache?
Direct mapped
A block can go exactly one place in the cache
(Block address) modulo (Number of cache blocks in
the cache)

15
Matching Address

How do we know if the data in the cache
corresponds to a requested word?
Tag matching
A set of tags stored in the cache along with data
items
Some of upper bits of the address used as tag

address
Block address
Block offset
Tag
Block offset
Index
16
Validating Data Items

How do we know that a cache block has a valid
data item?
Add a valid bit to the cache block entry
If a valid bit 0, not matched
(i.e. information in tag and data block is
invalid)

17
Cache Example (1)
8 word direct-mapped
18
Cache Example (2)
19
Cache Example (3)
20
Cache Example (4)
21
Cache Example (5)
22
Cache Example (6)
A block is replaced, if newly accessed block is
mapped on to the same location
23
Actions on Write

Write through
Data is written to both the block in the current
level (cache) and the block in the lower level
memory
Need write buffers not to wait for the completion
of lower-level write transaction
May result in repeated writes to the same
location
Write back
Data is written only to the block in the current
level (cache)
Modified cache block is written to the lower
level memory when it is replaced (need dirty bit
per cache block)
May result in writes on read misses

24
Actions on Write Misses

Write allocate (fetch on miss)
Allocate an entry in the cache and fetch the data
for the write miss
No-allocate (write around)
Without entry allocation, update the lower level
memory hierarchy

steps Write through Write through Write back
steps Write allocate No allocate Write allocate
1 pick replacement pick replacement
2 write back if dirty
3 fetch block fetch block
4 write cache write cache
5 write lower level write lower level
25
Direct-Mapped Cache