Title: 6'5 Cache Memory
16.5 Cache Memory
- more effective, but expensive
- modern disk drives include a small amount of
internal cache - Relatively smaller in size than MM
- Operates at or near the speed of the processor
- Sits between MM and the CPU
- Contains copies of sections of MM
6-17
2- A portion of RAM used to speed up access to data
on a disk - computer microprocessor can access more quickly
than it can access regular RAM. - L1 and L2 are levels of cache memory in a
computer - L1cache is usually built onto the microprocessor
chip itself - L2 is usually a separate static RAM (SRAM) chip.
6-17
3- If the computer processor can find the data it
needs for its next operation in cache memory, it
will save time compared to having to get it from
RAM - Although caching improves performance, there is
some risk involved. If the computer crashes (due
to a power failure, for example), the system may
not have time to copy the cache back to the disk.
In this case, whatever changes you made to the
data will be lost.
6-17
4Cache-MM Interface
- Assume an access to MM causes a block of K words
to be transferred to the CM - The block transferred is stored in CM as a single
unit called a slot/line/page - Once copied, individual words within a line can
be accessed by the CPU - Data transfer and storage in the cache is done in
h/w (i.e OS doesnt know about the cache)
6-18
5Typical Cache Organisation
6-19
6Cache Operation
- CPU requests content of memory location
- Check CM for this data
- If present, get from CM
- Otherwise, read the required block from MM to CM
- Deliver from CM to MM
- CM includes tags to identify which block of MM is
in each CM slot
6-20
7- Since MMgtgtCM, blocks are mapped to specific lines
in CM through the use of mapping function - 3 mapping functions
- Direct
- Associative
- Set-associative
6-21
8Direct Mapping
- Each MM block is assigned to a specific line in
the CM - If M64, C4
- Line 0 can hold blocks 0, 4, 8, 12,
- Line 1 can hold blocks 1, 5, 9, 13,
- Line 2 can hold blocks 2, 6, 10, 14
- Line 3 can hold blocks 3, 7, 11, 15, ....
- Direct mapping cache treats a MM address as 3
distinct fields - Tag identifier
- Line number identifier
- Word identifier
6-22
9Direct Mapping Cache Organisation
6-23
10- Word identifier specifies the specific word in a
cache line that is to be read - Line identifier specifies the physical line in
the cache that will hold the referenced address - The tag is stored in a cache along with the data
words of the line - For every memory reference that the CPU makes,
the specific line that would hold the reference
is determined - The tag held in the line is checked to see if the
correct block is in the cache
6-24
116-25
12Associative Mapping
- Let a block to be stored in any cache line that
is not in use - Must examine each line in the cache (thru tag id)
to find the right memory block - 2 fields address, word and tag
- Implement cache in 2 parts
- The line themselves in SRAM
- The tag storage in associative memory
6-26
13Associative Mapping Cache Organisation
6-27
14Set Associative Mapping
- Compromise between direct and fully associative
mappings that builds on the strength of both - Divide cache into a number of sets (v), each set
holding a number of lines (k) - A MM block can be stored in any one of the k line
in a set - If a set can hold X lines, the cache is
referenced to as an X-way set associative cache
Commonly 2, 4-way
6-28
15Set-Associative Mapping Cache Organisation
6-29
16Line Replacement Algorithms
Algorithms to determine which line to be replaced
when an AC, SAC is full
- LRU (Least Recently Used)
- FIFO (First In First Out)
- LFU (Least Frequently Used)
- Random
6-30
17To update the original copy of the line in MM
Write policy
- Write through
- Anytime a word in C is changed, it is also
changed in MM - Both copies always agree
- Generates lots of memory writes to MM
- Write back
- During a write, only change the contents of the
cache - Update MM only when the cache line is to be
replaced - Causes cache coherency problems
- Complex circuitry to avoid this problem
6-31
18Number of Caches
- Single vs. 2-level
- On chip cache
- Modern CPU chips have a onboard cache (L1) e.g.
Pentium 16KB, PowerPC up to 64KB - L1 provides best performance gains
- Secondary, off chip cache (L2) provides higher
speed access to MM - Generally 512KB or less otherwise not
cost-effective
6-32
19- Unified vs. split
- Unified cache stores data and instructions in one
cache - Only 1 cache to design and operate
- Cache is flexible and can balance allocation of
space to instructions or data to best fit the
execution of the program i.e higher hit ratio - Only one cache needs to be designed and
implemented. - Split cache uses 2 caches ( 1 for instructions
and 1 for data) - Must build and manage 2 caches
- Static allocation of cache sizes
- Can outperform unified cache in systems that
support parallel execution and pipelining (reduce
cache contention) - Trend favor split cache???
6-33
20(No Transcript)
216.6 External Memory
- Magnetic Disks
- Optical Disks
- Magnetic Tape
- RAID
6-34
22Magnetic Disks
- The disk is a meal or plastic platter coated with
the magnetizable material - Data is recorded onto and later read from the
disk using a conducting coil, the head - Data is organized into concentric rings, called
tracks, on the platter - Tracks are separated by gaps
- Disk rotates at a constant speed
6-35
236-36
24Disk characteristics
- Single vs. multiple platter per drive (each
platter has it own R/W head) - Fixed vs. movable head
- Fixed head has a head per track
- Movable uses one head per platter
- Removable vs. non-removable platters
- Data accessing times
- Seek time position the head over the correct
track - Rotational latency time for desired sector to
come under the head - Access time 1 2
- Block transfer time time to read block (sector)
off the disk and transfer it to MM
6-37
25Optical Disks
6-38
26- WORMs Write Once Read Many
- User can produce CD ROMs in limited quantities
- Specially prepared disk is written to using a
medium power laser - Can be read many times just like a normal CD
ROMs - Permit archival storage
6-39
27- Erasable optical disk
- Combine laser and magnetic technology to permit
information storage - Laser heats an area that can then have e-field
orientation changed to alter information storage - Can be detected using polarized light during reads
6-40
28Magnetic Tapes
- The first kind of secondary memory
- Still widely used
- Popular for back ups
- Very cheap but very slow
- Sequential access
- Data is organized as records with the physical
air gaps between records - One words is stored across the width of the tape
and read using multiple read/write heads
6-41
29RAID Technology
- RAID (Redundant Array of Independent Disks),
developed at Berkeley - Several parallel disks operating as a single unit
- 6 levels 0 5
6-42
306-43
31RAID 0
- No redundancy techniques are used
- Data is distributed over all disks in the array
- Data is divided into strips for actual storage
- Can be used to support high data rate transfer
rates by having block transfer size be in
multiples of the strip - Can support low response time by having block
transfer size equal a strip (support multiple
strip transfers in parallel)
6-44
32RAID 1
- All disks are mirrored (duplicated)
- Data is stored on a disk and its mirror
- Read from either the disk or its mirror
- Write must be done to both the disk and mirror
- Faulty recovery is easy i.e. use the data on the
mirror - Expensive
6-45
33RAID 2
- All disks are used for every access disks are
synchronized together - Data strips are small (byte)
- Error correcting code computer across all disks
and stored on additional disks - Uses fewer disks than RAID 1 but still expensive
6-46
34RAID 3
- Like RAID 2 but only a single redundant disk is
used - Parity bit is computed for the set of individual
bits in the same position on the disks - If a drive fails, parity information on he
redundant disks ca be used to calculate the data
from the failed disk
6-47
35RAID 4
- Access to individual strips rather than to all
disks at once like RAID 3 - Bit-by-bit parity is calculated across
corresponding strips on each disk - Parity bits stored in the redundant disk
- Write penalty
- For every write to a strip, the parity strip must
also be recalculated and written - Thus 1 logical write equals 2 physical disk
accesses
6-48
36RAID 5
- Parity information is distributed on the data
disks in a round robin scheme - No parity disk needed
6-49
376.7 Error Correction
- Semiconductor memories are subject to errors
- Hard (permanent) errors
-
-
-
- Soft (transient) errors
-
-
- Memory systems include logic to detect and / or
correct errors - Width of memory word is increased
- Number of parity bits required depends on the
level of detection and correction needed
6-50
38General Error Detection ad Correction
- A single error is a single bit flip multiple
bit flips can occur in word - 2M valid data words, where M is the data word
- 2MK codeword combinations in the memory, where K
is the code/ parity bits - Distribute the 2M valid data words among the 2MK
codeword combinations such that the distance
between valid words is sufficient to distinguish
the error
6-51
39Single Error Detection and Correction (SED)
- For each valid codeword, there will be 2K-1
invalid codewords - 2K-1 must be large enough to identify which of
the MK bit positions is in error - Therefore 2K-1 MK
- 8-bit data, 4 check bit
- 32-bit data, 6 check bit
- Bit position is checked by bits Ci such that the
sum of the subscripts, i, equals n. e.g. position
10, bit M6, is checked by bits C2 ad C8.
6-52
40Bit Position
Check Bit
Data Bit
Position Number
6-53
41Example 8-bit input word 00111001 C1 1 x 0 x
1 x 1 x 0 1 C2 1 x 0 x 1x 1 x 0 1 C4 0 x
0 x 1 x 0 1 C8 1 x 1 x 0 x 0
0 Thus, the code bit in this case is 0111
Odd 1s 1 Even 1s 0
6-54
42Say data bit 3 in error (i.e changed from 0 to
1), thus the input data is now 00111101 C1 1 x
0 x 1 x 1 x 0 1 C2 1 x 1 x 1x 1 x 0 0 C4
0 x 1 x 1 x 0 0 C8 1 x 1 x 0 x 0
0 The new code bit generated is 0001 Comparing
the two check bits will give syndrome word C8
C4 C2 C1 0 1 1 1 0 0 0
1 0 1 1 0 The result is 0110,
indicating that bit position 6, which contains
data bit 3, is in error.
6-55
43- To detect errors, compare the check bits read
from memory to those computed during the read
operation by using XOR - If the result of the XOR is 0000, no error
- If on-zero, the numerical value of the result
indicates the bit position in error - If the XOR result was 0110, bit position 6(M3) is
in error - Double error detection can be added by adding
another check bit that implements a parity check
for the whole word of MK bits
6-56
44Chapter Exercises
- Suggest reasons why RAMs traditionally have been
organized as only one bit per chip whereas ROMs
are usually organized with multiple bits per
chip. - Suppose an 8-bit data word stored in memory is
11000010. Using the Hamming algorithm, determine
what check bits would be stored in memory with
the data word.
45- Beita Harian 29/1/2005
- CIP memori pemproses data yang didakwa paling
laju di dunia untuk aplikasi multimedia. - Menurut pengeluarnya, Samsung Electronics Co.
(Samsung), cip XDR DRAM (eXtreme-Data-Rate
Dynamic Random Access Memory) 256 megabit ini
(gambar) adalah 10 kali ganda lebih laju daripada
cip memori yang digunakan untuk peralatan video,
konsol games, TV digital, pelayan dan stesen
kerja pada hari ini. Samsung, pembuat cip memori
komputer kedua terbesar di dunia, telah memulakan
pengilangan cip ini yang permintaannya dijangka
meningkat ke 800 juta unit menjelang 2009.