Exploiting Memory Hierarchy 7.1, 7.2 - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Exploiting Memory Hierarchy 7.1, 7.2

Description:

Exploiting Memory Hierarchy. 7.1, 7.2. John Ashman. Memory, The More the Merrier. This introduction will explore ways in which programmers create illusions to ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 36
Provided by: juanv
Category:

less

Transcript and Presenter's Notes

Title: Exploiting Memory Hierarchy 7.1, 7.2


1
Exploiting Memory Hierarchy7.1, 7.2
  • John Ashman

2
Memory, The More the Merrier
  • This introduction will explore ways in which
    programmers create illusions to having unlimited,
    speedy memory.
  • The laboriously long library analogy summed up
    looking at more items (books) simultaneously
    saves time.
  • Apply this to memory.

3
Principle of Locality
  • Programs access a relatively small portion of
    their address space at any given moment.
  • Temporal locality (in time)
  • If an item is referenced, it will probably be
    referenced again soon
  • Spatial locality (in space)
  • If an item is referenced, other items with near
    addresses will probably be referenced soon.

4
Why This Applies
  • Memory accesses come from natural program
    structures.
  • Loops exhibit temporal locality.
  • Programs in general show high levels of spatial
    locality (sequential access).
  • Also, access to an array show spatial locality
    often.

5
Memory Hierarchy
  • This is a structure that uses multiple levels of
    memory.
  • Each contain different speeds and sizes
    naturally faster memory is more expensive they
    are also smaller.

6
Memory Types
  • DRAM (dynamic random access memory)
  • This is the main memory of the system. Slower,
    but less costly. Less area per bit of memory.
  • SRAM (static random access memory)
  • This is the level closer to the CPU, namely
    caches.
  • The magnetic disk
  • The lowliest memory of them all.

7
Comparison
8
Memory Hierarchy
9
Accessing Data/Memory
  • Lets consider the higher and lower level of
    memory, as we can only copy between two levels at
    a given time.
  • The minimum unit of information that can be
    present or not in the two-level hierarchy is
    called a block or line i.e. one book.

10
Block Access
11
Hit Rate
  • This is the fraction of memory accesses found in
    cache. Basically, the ratio of success of
    finding requested data in the upper level of
    memory.
  • Miss rate (1 hit rate), ratio of not found.
  • Hit time time taken to access memory in the
    upper level.
  • Miss penalty time taken to replace memory in
    the upper level with the correct data from the
    lower, plus time to send it to the CPU.

12
Structure of Hierarchy
13
7.2 The Basics of Caches
  • Caches first appeared in the 1960s and were the
    first level of memory hierarchy between the CPU
    and main memory.
  • Also today referred to as any storage managing to
    take advantage access locality.

14
1-Word Cache
15
Direct-Mapped Caches
  • This structure allots memory locations to be
    mapped to exactly one memory in the cache.
  • Mapping
  • (block address) modulo (Number of cache blocks in
    the cache)
  • This method is useful since the of entries will
    be a power of two.
  • Thus, the cache can be accessed with low-order
    bits by way of the low-order log2 bits of the
    address.

16
Searching Within a Cache
  • How do we know if the cache contains the
    requested data?
  • Tag a field containing the address information
    identifying if the word within the cache is the
    requested one.
  • Note it only needs to contain the upper portion
    of the address that which is not used as an
    index in the cache.

17
  • Upper 2 of the 5 address bits in the tag. The
    lower 3 select the block.

18
Checking Validity
  • One problem, especially upon initial execution of
    a program, is that the tags will be meaningless.
    Even after a few runs, some caches may still be
    empty.
  • Valid bit this is used to tell the CPU if a
    cache entry is valid. If not, the entry will be
    marked as not a possible match.

19
Accessing a Cache
  • Refer to pages 476 and 477.
  • A cache can taken use temporal locality to its
    advantage more commonly run instructions can
    replace less commonly run instructions in the
    cache.

20
(No Transcript)
21
Sizes
  • 2n values, means the total number of entries is a
    power of two.
  • MIPS multiples of 4-bytes structure leaves the
    least 2 significant bits within a word they are
    ignored.

22
Specific Measurements
  • A direct-mapped cache of size 2n block with
    2m-word blocks will require a tag field of size
    32 (n m 2) bits.
  • n for the index, m for the word, and 2 for byte
    part of the address
  • Total number of bits 2n x (block size tag size
    valid field size)
  • 2n x (m x 32 (32 n m 2) 1
  • 2n x (m x 32 31 n m)

23
Block Size vs. Miss Rate
24
Cost Of A Miss
  • Simply increasing block size raises the cost of a
    miss. With more blocks, time to access
    increases.
  • The benefit of a lower miss rate eventually is
    overshadowed by the miss cost when the block size
    is increased without a proportional cache
    increase.
  • Early restart return the data without finishing
    the block

25
Handling Cache Misses
  • A cache miss is simply when the request for data
    fails due to the lack of the data within the
    cache.
  • A stall occurs at a miss that freezing all
    execution until new memory is accessed. This is
    opposite of an interrupt, that requires
    instructions to still move through the pipeline.

26
Instructions To Be Taken
  • Send the original PC value (PC 4) to memory.
  • Instruct main memory to perform a read and wait
    for the memory to complete its access.
  • Write the cache entry, entering data and address
    info and turning valid bit on.
  • Restart the instruction execution at the first
    step. This will refetch the instruction, where
    it is found in the cache.

27
Handling Writes
  • Suppose we write, and change, main memory. This
    may cause inconsistency between the cache and
    memory.
  • Write-through this method allows for writing to
    both the cache and main memory at the same time
    for writes.
  • This method is inefficient, and can reduce
    performance by as much as a factor of 10.

28
Improvement
  • Write buffer stores data while it is waiting to
    be written to memory.
  • This can help so long as the number of reads
    exceeds the number of writes.
  • It can even still occur with less writes, if the
    writes come in bursts.
  • In either cases, stalls are still required.

29
Write-Back
  • In this alternative to write-through, data is
    written only to the cache. This new block is
    then written to main memory.
  • This is, of course, a more complicated method to
    implement.

30
The Intrinsity FastMath Processor
  • MIPS architecture, simple cache.
  • 12-stage pipeline.
  • Can request an instruction and data word every
    clock cycle has both read and write caches.
  • Each cache is 16 KB (4K words), with 16-word
    blocks.

31
Intrinsity Diagram
32
Designing Memory Systems to Support Caches
  • Cache misses are satisfied from main memory
    (DRAMs density over access time).
  • Miss penalty can be reduced if bandwidth from
    memory to cache is increased.
  • This allows larger block size and keeping the
    lower miss penalty.

33
Example
  • 1 memory bus clock cycle to send the address
  • 15 mbccs for each DRAM access initiated
  • 1 mbcc to send a word of data
  • Cache block of 4 words, one-word-wide bank
  • 1 4(15) 4(1) 65 mbccs (4 x 4) / 65 0.25
  • Cache block of 4 words, two-word-wide bank
  • 1 2(15) 2(1) 33 mbccs (4 x 4) / 33 0.48

34
Interleaving
  • This scheme allows sending an address to multiple
    banks at a time the width of the bus or cache is
    not increased.
  • With four banks
  • 1 cyc to transmit address, 15 for all four banks
    to access memory, and 4 cycles to send words back
  • 1 1(15) 4(1) 20 mbccs
  • (4 x 4) / 20 0.80

35
Summary
  • Direct-mapped caches are the most simple.
  • Write-through allows for both main memory and the
    cache to be written/updated simultaneously.
  • Write-back copies a block back to memory when it
    is replaced. (This will be elaborated upon
    later.
  • The use of a larger block decreases miss rate,
    but can also increase miss penalty.
Write a Comment
User Comments (0)
About PowerShow.com