CPE%20626%20CPU%20Resources:%20ARM%20Cache%20Memories

About This Presentation

Title:

CPE%20626%20CPU%20Resources:%20ARM%20Cache%20Memories

Description:

zero wait state access speed. power efficiency. reduced electromagnetic interference ... Option 2: Hardware traps to OS, up to OS to decide what to do ... – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 36

Provided by: Aleksandar84

Learn more at: http://www.ece.uah.edu

Category:

more less

Transcript and Presenter's Notes

Title: CPE%20626%20CPU%20Resources:%20ARM%20Cache%20Memories

1
CPE 626 CPU ResourcesARM Cache Memories

Aleksandar Milenkovic
E-mail milenka_at_ece.uah.edu
Web http//www.ece.uah.edu/milenka

2
On-chip RAM

On-chip memory is essential if a processor is to
deliver its best performance
zero wait state access speed
power efficiency
reduced electromagnetic interference
In many embedded systems simple on-chip RAM is
preferred to a cache
Advantages
simpler, cheaper, less power
more deterministic behavior
Disadvantages
require explicit control by the programmer

3
Unified instruction and data cache
4
Separate data and instruction caches
5
Direct-mapped cache organization
6
Two-way set-associative cache organization
7
Fully associative cache
8
An Example

ARM3 designed in 1989 was the first to
incorporate an on-chip cache
Design steps
analysis using ARM2 collect hardware traces
running typical benchmarks
exploring the upper-bound performance benefit
considering a perfect cache (always contains the
requested data)
Assuming 20MHz cache and 8MHz main memory
performance of various systems is No cache 1,
Instr. only 1.95, Data only 1.13, Instr.
data cache 2.5
investigate different cache organizations and
sizes
write-back, write-through, write-allocate,
write-no-allocate, replacement policies,
associativity, power

9
Summary of cache organizational options
10
Unified cache performance as a function of size
and organization
11
The effect of associativity on performance and
bandwidth requirement
12
ARM3 cache organization
64-way, 4KB cache
13
ARM600 cache control state machine

After initialization, the processor enters ltCheck
taggt state
If address is non-sequential, does not fault in
MMU, and is either read hit of a buffered write,
the state machine remains in the ltCheck taggt
data value is read or written every clock cycle
When the next address is sequential read in the
same cache line or a sequential buffered write,
the state moves to ltSequential fastgt where the
data may be accessed without checking the tag and
without activating the MMU
data value is read or written every clock cycle
If the address is not in the cache or is an
unbuffered write an external access is needed
this begins in the ltStart externalgt state. Reads
from uncacheable memory and unbuffered writes are
completed as single memory transactions.
Cacheable reads perform a quad-word line fetch,
after fetching the necessary translation
information if this was not already in the MMU
Cycles where the CPU does not use memory are
executed in the ltIdlegt state

14
ARM600 cache control state machine
15
Memory Management

Today computer systems typically run multiple
processes, each with its own address space
It would be too expensive to dedicate a full-
address-space worth of memory for each process
(many use only a small part of their address
spac.)
If Principle of Locality allows caches to offer
speed of cache memory with size of DRAM memory,
then recursively DRAM can act as a cache for
secondary storage (disk)? Virtual Memory
Virtual memory divides physical memory into
blocks and allocate them to different processes

16
Virtual Memory Motivation

Historically virtual memory was invented when
programs became too large for physical memory
Allows OS to share memory and protect programs
from each other (main reason today)
Provides illusion of very large memory
sum of the memory of many jobs greater than
physical memory
allows each job to exceed the size of physical
mem.
Allows available physical memory to be very well
utilized
Exploits memory hierarchy to keep average access
time low

17
Virtual Memory Terminology

Virtual Address
address used by the programmer CPU produces
virtual addresses
Virtual Address Space
collection of such addresses
Memory (Physical or Real) Address
address of word in physical memory
Memory mapping or address translation
process of virtual to physical address
translation

18
Paging vs. Segmentation

Fixed size blocks, called pages (4KB 64KB)
Both logical and physical memory are divided into
fixed-size components called pages (typically a
few KBs)
Relationship between the logical and physical
pages is stored in page tables (PTs) which are
held in main memory
Variable size blocks, called segments (1B
64KB/4GB) each segment contains a particular
sort of information
e.g., code segment, data segment, stack segment
Paged segments a segment is an integral number
of pages

19
Segmented memory management

Segmentation allows a program to have its own
private view of memory
Segments are of variable size gt free memory
becomes fragmented over time
it is possible that a new program is unable to
start when the memory is fragmented in small
pieces, none of which is big enough to hold a
segment, even if there is enough free memorygt
OS is responsible to coalesce the free memory
into one large piece

20
Paging memory management

Use table lookup (Page Table) for mappings
Virtual Page number is index
Virtual Memory Mapping Function
Physical Offset Virtual Offset
Physical Page Number (P.P.N. or Page frame)
PageTableVirtual Page Number

Virtual Address
translation
29
0
...
10
9
...
Physical Address
21
Paging memory management (contd)
Virtual Address
virtual page no.
offset
Page Table
Access Rights
Physical Page Number
Valid
index into Page Table
...
offset
physical page no.
Physical Address
22
Paging memory management (contd)

Size of a PT?
4KB pages, 32-bit VA gt 220 x 20 (2.5MB)
Use two or more levels of page table
Example
10 MSBs are used to identify appropriate second
level table page table in the first-level page
table directory
second ten bits of the address then identify the
page table entry which contains the physical page
number

23
Mapping Virtual to Physical Memory

Program with 4 pages (A, B, C, D)
Any chunk of Virtual Memory assigned to any
chuck of Physical Memory (page)

Physical Memory
Virtual Memory
A
0
0
4 KB
B
B
4 KB
8 KB
C
8 KB
A
12 KB
D
12 KB
16 KB
C
20 KB
Disk
D
24 KB
28 KB
24
Fast Address Translation

PTs are stored in main memory? Every memory
access logically takes at least twice as long,
one access to obtain physical address and second
access to get the data
Observation locality in pages of data, must be
locality in virtual addresses of those pages?
Remember the last translation(s)
Address translations are kept in a special cache
called Translation Look-Aside Buffer or TLB
TLB must be on chip its access time is
comparable to cache

25
Typical TLB Format
Virtual Addr. Physical Addr. Dirty Ref Valid Access Rights

Tag Portion of virtual address
Data Physical Page number
Dirty since use write back, need to know whether
or not to write page to disk when replaced
Ref Used to help calculate LRU on replacement
Valid Entry is valid
Access rights R (read permission), W (write
perm.)

26
Translation Look-Aside Buffers

TLBs usually small, typically 128 - 256 entries
Like any other cache, the TLB can be fully
associative, set associative, or direct mapped

hit
PA
VA
miss
TLBLookup
Main Memory
Processor
Cache
hit
miss
Data
Translation
27
TLB Translation Steps

Assume 32 entries, fully-associative TLB (Alpha
AXP 21064)
1 Processor sends the virtual address to all
tags
2 If there is a hit (there is an entry in TLB
with that Virtual Page number and valid bit is 1)
and there is no access violation, then
3 Matching tag sends the corresponding Physical
Page number
4 Combine Physical Page number and Page Offset
to get full physical address

28
What if not in TLB?

Option 1 Hardware checks page table and loads
new Page Table Entry into TLB
Option 2 Hardware traps to OS, up to OS to
decide what to do
When in the operating system, we don't do
translation (turn off virtual memory)
The operating system knows which program caused
the TLB fault, page fault, and knows what the
virtual address desired was requested
So it looks the data up in the page table
If the data is in memory, simply add the entry to
the TLB, evicting an old entry from the TLB

29
What if the data is on disk?

We load the page off the disk into a free block
of memory, using a DMA transfer
Meantime we switch to some other process waiting
to be run
When the DMA is complete, we get an interrupt and
update the process's page table
So when we switch back to the task, the desired
data will be in memory

30
What if we don't have enough memory?

We chose some other page belonging to a program
and transfer it onto the disk if it is dirty
If clean (other copy is up-to-date), just
overwrite that data in memory
We chose the page to evict based on replacement
policy (e.g., LRU)
And update that program's page table to reflect
the fact that its memory moved somewhere else

31
Page Replacement Algorithms

First-In/First Out
in response to page fault, replace the page that
has been in memory for the longest period of time
does not make use of the principle of locality
an old but frequently used page could be
replaced
easy to implement (OS maintains history thread
through page table entries)
usually exhibits the worst behavior
Least Recently Used
selects the least recently used page for
replacement
requires knowledge of past references
more difficult to implement, good performance

32
Page Replacement Algorithms

Not Recently Used (an estimation of LRU)
A reference bit flag is associated to each page
table entry such thatRef flag 1 - if page has
been referenced in recent pastRef flag 0 -
otherwise
If replacement is necessary, choose any page
frame such that its reference bit is 0
OS periodically clears the reference bits
Reference bit is set whenever a page is accessed

33
Virtual and physical caches

When system incorporates both MMU and a cache.
the cache may operate either with virtual or
physical address
Virtual cache
cache access may start immediately there is
no need to activate the MMU if the data is found
in the cache gt save the power, eliminates
address translation from a cache hit
- drawbacks
every time a process is switched VAs refer to
different physical addresses gt cache to flushed
on each process switch
increase the width of the cache address tag with
a PID
OS and user programs may use two different VAs
for the same physical location (synonyms,
aliases) gt may result in two copies of the same
data,
if we modify one, the other will have wrong value

34
Virtual and physical caches (contd)

Paging MMU only affects the high-order address
bits,while the cache is accessed by the
low-order address bits
if these two sets do not overlap, the cache and
MMU may proceed in parallel
the physical address from MMU arrives at the
right time to be compared with the physical
address tags from the cache,hiding the address
translation time behind the cache tag access
Limits if we have 4KB page gtmax cache is 4KB
direct-mapped, or8KB 2-way set associative,
or16KN 4-way, etc ...

35
The ARM710T cache organization
4-way, 4KB, 16B blocks, random replacement
policy, write-through, virtual cache,

Write a Comment

User Comments (0)

About PowerShow.com

CPE%20626%20CPU%20Resources:%20ARM%20Cache%20Memories - PowerPoint PPT Presentation

CPE%20626%20CPU%20Resources:%20ARM%20Cache%20Memories

zero wait state access speed. power efficiency. reduced electromagnetic interference ... Option 2: Hardware traps to OS, up to OS to decide what to do ... – PowerPoint PPT presentation