fig_04_00 - PowerPoint PPT Presentation

About This Presentation
Title:

fig_04_00

Description:

Chapter 4 (continued): Caching; Testing Memory Modules – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 21
Provided by: Andre851
Learn more at: https://eecs.ceas.uc.edu
Category:

less

Transcript and Presenter's Notes

Title: fig_04_00


1
Chapter 4 (continued) Caching Testing Memory
Modules
2
fig_04_30
Memory organization Typical Memory
map For power loss
fig_04_30
3
fig_04_31
Memory hierarchy
fig_04_31
4
fig_04_32
Paging / Caching Why it typically
works locality of reference (spatial/temporal)
working set Note in real-time embedded
systems, behavior may be atypical but caching
may still be a useful technique Here we consider
caching external to the CPUthe CPU may have one
or more levels of caching built in
fig_04_32
5
fig_04_33
Typical memory system with cache hit rate (miss
rate) important Remember! Registers here
fig_04_33
6
fig_04_33
Basic caching strategies Direct-mapped
Associative Block-set associative questio
ns what is associative memory? what
is overhead? what is efficiency (hit
rate)? is bigger cache better?
7
  • Associative memory storage location related to
    data stored
  • Examplehashing
  • --When software program is compiled or assembled,
    a symbol table must be created to link addresses
    with symbolic names
  • --table may be large even binary search of names
    may be too slow
  • --convert each name to a number associated with
    the name, this number will be the symbol table
    index
  • For example, let a 1, b 2, c 3,
  • Then cab has value 1 2 3 6
  • ababab has value 3 (1 2) 9
  • And vvvvv has value 522 110
  • Address will be modulo a prime p, if we expect
    about 50 unique identifiers, can take p 101
    (make storage about twice as large as number of
    items to be stored, reduce collisions)
  • Now array of names in symbol table will look
    like
  • 0gt
  • 1gt
  • 2---gt
  • 6---gtcab
  • 9---gtababab---gtvvvvv

8
Caching the basic processnote OVERHEAD for
each task --program needs information M that is
not in the CPU --cache is checked for M how do
we know if M is in the cache? --hit M is in
cache and can be retrieved and used by
CPU --miss M is not in cache (M in RAM or in
secondary memory) where is M? M must be
brought into cache if there is room, M is
copied into cache how do we know if there is
room? if there is no room, must overwrite some
info M how do we select M? if M has
not been modified, overwrite it how do we know
if M has been modified? if M has been
modified, must save changes how do we save
changes to M?
9
fig_04_34
Example direct mapping 32-bit words, cache holds
64K words, in 128 0.5K blocks Memory addresses 32
bits Main memory 128M words 2K pages, each holds
128 blocks ( cache)
Tag table 128 entries (one for each block in the
cache). Contains Tag page block came from Valid
bit does this block contain data write-through
any change propagated immediately to main
memory delayed write since this data may change
again soon, do not propagate change to main
memory immediatelythis saves overhead instead,
set the dirty bit Intermediate use queue, update
periodically When a new block is brought in, if
the valid bit is true and the dirty bit is true,
the old block must first be copied into main
memory Replacement algorithm none each block
only has one valid cache location
fig_04_34
fig_04_35
fig_04_36
2 bits--byte 9 bits--word address 7 bitsblock
address (index) 11 (of 15)tag (page block is
from)
10
fig_04_37
Problem with direct mapping two frequently used
parts of code can be in different Block0sso
repeated swapping would be necessary this can
degrade performance unacceptably, especially in
realtime systems (similar to thrashing in
operating system virtual memory system) Another
method associative mapping put new block
anywhere in the cache now we need an algorithm
to decide which block should be removed, if cache
is full
fig_04_37
11
fig_04_38
  • Step 1 locate the desired block within the
    cache must search tag table, linear search may
    be too slow search all entries in parallel or
    use hashing
  • Step 2 if miss, decide which block to replace.
  • Add time accessed to tag table info, use temporal
    locality
  • Least recently used (LRU)a FIFO-type algorithm
  • Most recently used (MRU)a LIFO-type algorithm
  • b. Choose a block at random

fig_04_38
Drawbacks long search times Complexity and cost
of supporting logic Advantages more flexibility
in managing cache contents
12
fig_04_39
Intermediate method block-set associative
cache Each index now specifies a set of
blocks Main memory divided into m blocks
organized into n groups Group number m mod n
Cache set number main memory group
number Block from main memory group j can go into
cache set j Search time is less, since search
space is smaller How many blocks simulation ?
answer (one rule of thumb doubling associativity
doubling cache size, gt 4-way probably not
efficient)
Two-way set-associative scheme
fig_04_39
13
  • Example 256K memory-64 groups, 512 blocks
  • Block Group (m mod 64)
  • 0 64 128 . . . 384 448 0
  • 65 129 . . . 385 449 1
  • 66 130 . . . 386 450 2
  • . . .
  • 63 127 192 . . . 447 511 63

14
fig_04_40
Dynamic memory allocation virtual
storage) --for programs larger than main
memory --for multiple processes in main
memory --for multiple programs in main
memory General strategies may not work well
because of hard deadlines for real-time systems
in embedded applicationsgeneral strategies are
nondeterministic Simple setup Can swap
processes/programs And their contexts --Need
storage (may be in firmware) --Need small swap
time compared to run time --Need determinism Ex
chemical processing, thermal control
fig_04_40
15
fig_04_41
Overlays (pre-virtual storage) Seqment program
into one main section and a set of overlays (kept
in ROM?) Swap overlays Choose segmentation
carefully to prevent thrashing
fig_04_41
16
fig_04_42
Multiprogramming similar to paging Fixed
partition size Can get memory fragmentation Examp
le If each partition is 2K and we have 3 jobs
J1 1.5K, J2 0.5K, J3 2.1K Allocate to
successive partitions (4) J2 is using only 0.5 K
J3 is using 2 partitions, one of size 0.1K If a
new job of size 1K enters system, there is no
place for it, even though there is actually
enough unused memory for it
fig_04_42
Variable size Use a scheme like paging Include
compaction Choose parameters carefully to prevent
thrashing
17
fig_04_43
Memory testing Components and basic architecture
fig_04_43
18
fig_04_45
Faults to test data and address lines stuck-at
and bridging (if we assume no internal
manufacturing defects)
fig_04_45
19
fig_04_49
ROM testing stuck-at faults, bridging faults,
correct data stored Method CRC (cyclic
reduncancy check) or signature analysis Use LFSR
to compress a data stream into a K-bit pattern,
similar to error checking (Q how is error
checking done?) ROM contents modeled as NM-bit
data stream, N address size, M word size
fig_04_49
20
  • Error checking simple examples
  • Detect one bit error add a parity bit
  • Correct a 1-bit error Hamming code
  • Example send m message bits r parity bits
  • The number of possible error positions is
  • m r 1, we need 2r gt m r 1
  • If m 8, need r 4 ri checks parity of bits
    with i in binary representation
  • Pattern
  • Bit 1 2 3 4 5 6 7
    8 9 10 11 12
  • Info r0 r1 m1 r2 m2 m3 m4 r3 m5
    m6 m7 m8
  • --- --- 1 --- 1 0
    0 --- 0 1 1 1
  • Set parity 0 for each group
  • r0 bits 1 3 5 7 9 11 r0 1 1
    0 0 1 ? r0 1
  • r1 bits 2 3 6 7 10 11 r1 1 0
    0 1 1 ? r1 1
  • r2 bits 4 5 6 7 12 r2 1 0 1 ?
    r2 0
  • r3 bits 8 9 10 11 12 r3 0 1 1
    1 ? r3 1
  • Exercise suppose message is sent and 1 bit is
    flipped in received message
  • Compute the parity bits to see which bit is
    incorrect
Write a Comment
User Comments (0)
About PowerShow.com