Cache - Basics - PowerPoint PPT Presentation

About This Presentation
Title:

Cache - Basics

Description:

Random-Access Memory. Static RAM (SRAM) Each cell stores bit with a six-transistor circuit. Retains value indefinitely, as long as it is kept powered – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 30
Provided by: skk5
Learn more at: http://csl.skku.edu
Category:

less

Transcript and Presenter's Notes

Title: Cache - Basics


1
Cache - Basics
2
Computer System
  • Instructions and data are stored in memory
  • Processors access memory for
  • Instruction fetch ? access memory almost every
    cycle
  • Data load/store (20 of instructions) ? access
    memory every 5th cycle

3
Random-Access Memory
  • Static RAM (SRAM)
  • Each cell stores bit with a six-transistor
    circuit
  • Retains value indefinitely, as long as it is kept
    powered
  • Faster and more expensive than DRAM
  • Dynamic RAM (DRAM)
  • Each cell stores bit with a capacitor and
    transistor
  • Value must be refreshed every 10-100ms
  • Slower and cheaper than SRAM

4
CPU-Memory Performance Gap
  • Processor-memory performance gap
  • Grows 50 per year
  • No cache before 1980, 2-level cache since 1995

5
CPU-Memory Performance Gap (contd)
  • The increasing gap between DRAM, disk, and CPU
    speeds

6
Memory Hierarchy
  • Cache small, fast storage
  • Improves average access time to slow memory
  • Exploits spatial and temporal locality
  • Caches other than memory hierarchy
  • TLB cache on page table
  • Branch-prediction cache on prediction
    information

7
Memory Hierarchies
  • Some fundamental and enduring properties of
    hardware and software
  • Fast storage technologies cost more per byte and
    have less capacity
  • The gap between CPU and main memory speed is
    widening
  • Well-written programs tend to exhibit good
    locality
  • They suggest an approach for organizing memory
    and storage systems known as a memory hierarchy

8
An Example Memory Hierarchy
CPU registers hold words retrieved from L1 cache.
Smaller, Faster, and Costlier (per byte)
L0
registers
on-chip L1 cache (SRAM)
L1
on-chip L2 cache (SRAM)
L2
main memory (DRAM)
L3
Larger, Slower, and Cheaper (per byte)
local secondary storage (local magnetic disks)
L4
remote secondary storage (distributed file
systems, Web servers)
L5
9
Hierarchy Works
  • How hierarchy works
  • Place a copy of frequently accessed data at the
    higher levels of hierarchy
  • CPUs search for the highest copy of the data to
    be accessed
  • Principle of locality
  • Program accesses a small portion of the address
    space at any given time period
  • 90/10 rule 90 of the accesses are to 10 of
    memory locations
  • Users want large and fast memories!
  • Access time (ns) Cost (per
    GB) (2004)
  • SRAM 0.5 5 4000 10,000
  • DRAM 50 70 100 200
  • Disk 5M 20M 0.50 2

10
Locality
  • Principle of Locality
  • Temporal locality
  • Recently referenced items are likely to be
    referenced in the near future.
  • Spatial locality
  • Items with nearby addresses tend to be referenced
    close together in time.
  • Locality Example
  • Data
  • Reference array elements in succession Spatial
    locality
  • Reference sum each iteration Temporal locality
  • Instructions
  • Reference instructions in sequence Spatial
    locality
  • Cycle through loop repeatedly Temporal locality

sum 0 for (i 0 i lt n i) sum
ai return sum
11
Cache Terms
  • Our initial focus two levels (upper, lower ?
    cache, memory)
  • Block (or line) minimum unit of data
  • Hit data requested is in the upper level
  • Miss data requested is not in the upper level
  • Hit rate
  • Hit time SRAM access time time to determine
    hit/miss
  • Miss rate 1 Hit rate
  • Miss penalty time to fetch a block from lower
    level (memory)
  • Performance
  • Average access time
  • hit time miss rate x miss penalty

number of found in the upper level (cache) number
of accesses
12
Caching in a Memory Hierarchy
  • Access addresses of blocks 4, 10

4
10
4
10
0
1
2
3
4
5
6
7
4
Lower Level
8
9
10
11
10
12
13
14
15
13
General Caching Concepts
Program
  • Program needs an object, which is stored in some
    blocks, 14 and 12
  • Cache hit
  • Found at level k (e.g., block 14)
  • Cache miss
  • Not found at level k, so fetched from level k1
    (e.g., block 12)
  • If level k is full, a victim block must be
    replaced (evicted) (e.g., block 4)
  • If the victim is clean (not modified), just
    replace
  • If the victim is dirty (modified and different
    from the one in level k1), update the lower

Request 14
Request 12
14
12
0
1
2
3
Level k
14
4
12
9
3
14
4
Request 12
4
12
0
1
2
3
4
5
6
7
Level k1
4
8
9
10
11
12
13
14
15
12
14
Locating Data Items
  • How do we know if a data item in the cache?
  • Direct mapped
  • A block can go exactly one place in the cache
  • (Block address) modulo (Number of cache blocks in
    the cache)

15
Matching Address
  • How do we know if the data in the cache
    corresponds to a requested word?
  • Tag matching
  • A set of tags stored in the cache along with data
    items
  • Some of upper bits of the address used as tag

address
Block address
Block offset
Tag
Block offset
Index
16
Validating Data Items
  • How do we know that a cache block has a valid
    data item?
  • Add a valid bit to the cache block entry
  • If a valid bit 0, not matched
  • (i.e. information in tag and data block is
    invalid)

17
Cache Example (1)
8 word direct-mapped
18
Cache Example (2)
19
Cache Example (3)
20
Cache Example (4)
21
Cache Example (5)
22
Cache Example (6)
A block is replaced, if newly accessed block is
mapped on to the same location
23
Actions on Write
  • Write through
  • Data is written to both the block in the current
    level (cache) and the block in the lower level
    memory
  • Need write buffers not to wait for the completion
    of lower-level write transaction
  • May result in repeated writes to the same
    location
  • Write back
  • Data is written only to the block in the current
    level (cache)
  • Modified cache block is written to the lower
    level memory when it is replaced (need dirty bit
    per cache block)
  • May result in writes on read misses

24
Actions on Write Misses
  • Write allocate (fetch on miss)
  • Allocate an entry in the cache and fetch the data
    for the write miss
  • No-allocate (write around)
  • Without entry allocation, update the lower level
    memory hierarchy

steps Write through Write through Write back
steps Write allocate No allocate Write allocate
1 pick replacement pick replacement
2 write back if dirty
3 fetch block fetch block
4 write cache write cache
5 write lower level write lower level
25
Direct-Mapped Cache
  • Block size 4 byte word
  • Address
  • Block offset 2 bits
  • Index 10 bits
  • Tag 20 bits
  • Total size
  • 210 x (12032)
  • 53K bits
  • Locality exploited?

26
Spatial Locality
  • Block size needs to be more than one word

27
4 Word Long Block Size
  • 64KB Direct-Mapped Cache

28
Memory Bandwidth
  • Bandwidth
  • Amount of data transferred
  • per unit time
  • Access 4 words (16 bytes)
  • (A) 1-word-width memory bus
  • 1 4x15 4x1 65 cycles
  • (B) 2-word-width memory bus
  • 1 2x15 2x1 33 cycles
  • (B) 4-word-width memory bus
  • 1 15 1 17 cycles
  • (C) 1-word-width multi-banks bus
  • 1 15 4x1 20 cycles
  • Data interleaving on multiple banks
  • Achieves a high bandwidth memory
  • system with a narrow bus

29
Summary
  • Processor-Memory speed gap increases
  • Memory hierarchy works
  • Cache
  • Small SRAM storage for fast access from the
    processor
  • Performance average access time hit time
    miss rate x miss penalty
  • Locality exploited
  • Keep recently accessed data (temporal locality)
  • Bring data in by the block which is larger than a
    word (spatial locality)
  • Cache mechanism
  • Block placement (mapping), tag matching, valid
    bit,
  • Actions on writes and write misses
Write a Comment
User Comments (0)
About PowerShow.com