Title: Application-Controlled File Caching Policies
1ECE 7995 Presentation
- Application-Controlled File Caching Policies
- Pei Cao, Edward W. Felten and Kai Li
- Presented By
- Mazen Daaibes
- Gurpreet Mavi
2Outline (part-1)
- Introduction
- User Level File Caching
- Two level replacement
- Kernel Allocation Policy
- An Allocation Policy
- First try
- Swapping Position
- Place Holders
- Final Allocation Scheme
3Introduction
- File Caching is a widely used technique in file
system implementation. - The major challenge is to provide high cache hit
ratio. - Two-level cache management is used
- Allocation the kernel allocates physical pages
to individual applications. - Replacement each application is responsible for
deciding how to use its physical pages. - Previous work has focused on replacement, largely
ignoring allocation. - Some applications have special knowledge about
their file access patterns which can be used to
make intelligent cache replacement decisions. - Traditionally, these applications buffer file
data in user address space as a way of
controlling replacement. - This approach leads to double buffering because
the kernel tries to cache file data as well. - Also, this approach does not give the application
real control because virtual memory system can
still page out data in the user address space.
4Introduction Cont.
- This paper considers how to improve the
performance of file caching by allowing
user-level control over file cache replacement
decisions. - To reduce the miss ratio, a user-level cache
needs not only an application-tailored
replacement policy but also enough available
cache blocks. - The challenge is to allow each user process to
control its own caching and at the same time to
maintain the dynamic allocation of cache blocks
among processes in a fair way so that overall
system performance improves. - The key element in this approach is a sound
allocation policy for the kernel, which will be
discussed next.
5User Level File Caching
- Two-Level Replacement
- This approach presented in this paper is called
Two-level cache block replacement. - This approach splits responsibility between the
kernel and user levels. - The kernel is responsible for allocating cache
blocks to processes. - Each user process is free to control the
replacement strategy on its share of cache blocks - If the process chooses not to exercise its
choice, the kernel applies a default strategy
(LRU)
6User Level File Caching Cont.
Application P
Application Q
1. P misses
2. Kernel asks Q
4. Reallocates B
3. Q gives up B
Kernel
7Kernel Allocation Policy
- The Kernels allocation policy should satisfy
three principles - A process that never overrules the kernel does
not suffer more misses than it would under global
LRU. - A foolish decision by one process never causes
another process to suffer more misses. - A wise decision by one process never causes any
process, including itself, to suffer more misses.
8First Try
Time Ps ref Qs ref Cache State
Global LRU List
Increasing recency
A W X B
A W X B
A is the least recently used but P suggests B
t0
MISS
t1 Y
A W X Y
A W X Y
P has no choice but to use A
t2
MISS
t3 Z
Z W X Y
W X Y Z
t4
MISS
t5 A
Z A X Y
X Y Z A
t6
MISS
t7 B
Z A B Y
Y Z A B
t8
Total misses 4 (2 by Q, 2 by P) Total misses
under global LRU 3 (2 by Q, 1 by P)
This Violates Principle 3
9Swapping Positions
Time Ps ref Qs ref Cache State
Global LRU List
Increasing recency
A W X B
A W X B
Swap A and B in LRU list then replace B
t0
MISS
t1 Y
A W X Y
W X A Y
t2
MISS
t3 Z
A Z X Y
X A Y Z
t4
t5 A
A Z X Y
X Y Z A
t6
MISS
t7 B
A Z B Y
Y Z A B
t8
Total misses 3 (2 by Q, 1 by P) Total misses
under global LRU 3
10Swapping Positions
- Swapping positions guarantees that if no process
makes foolish choices, the global hit ratio is
the same as or better than it would be under
global LRU.
- But what if the process makes a foolish choice?
11No Place Holders
Time Ps ref Qs ref Cache State
Global LRU List
Increasing recency
X A Y
X A Y
Q makes wrong decision and replaces Y
t0
MISS
t1 Z
X A Z
A X Z
t2
MISS
t3 Y
X Y Z
X Z Y
t4
MISS
t5 A
A Y Z
Z Y A
t6
Total misses 3 (2 by Q, 1 by P) Total misses
under global LRU 1 (by Q)
This Violates Principle 2
12With Place Holders
Time Ps ref Qs ref Cache State
Global LRU List
Increasing recency
X A Y
X A Y
Q makes wrong decision and replaces Y
t0
MISS
t1 Z
X A Z
A X(Y) Z
A place holder is created for Y
t2
MISS
t3 Y
Y A Z
A Z Y
t4
t5 A
Y A Z
Z Y A
t6
Total misses 2 (2 by Q, 0 by P) Total misses
under global LRU 1 (by Q)
Q hurts itself by its foolish decision, but it
doesnt hurt anyone else. Principle 2 is
satisfied.
13To summarize
- If a reference to cache block b hits
- b is moved to the head of the global LRU list
- Place-holder pointing to b is deleted
14To summarize
- If a reference to cache block b misses
- 1st case there is a place-holder for b, pointing
to t. t is replaced and its page is given to b.
If t is dirty, it is written to disk.
- 2nd case no place-holder for b. The kernel finds
the block at the end of the LRU list. Say block
c, belonging to process P. The kernel consults P.
- if P chooses to replace block x. The kernel then
swaps x and c in the LRU list. - If there is place-holder pointing to x, it is
changed to point to c. Otherwise, a place-holder
is built for x, pointing to c. - Finally, xs page is given to b.
15Outline (part-2)
- Design Issues
- - User/ Kernel Interaction
- - Shared Files
- - Prefetching
- Simulation
- - Simulated Application Policies
- - Simulation Environment
- - Results
- Conclusions
16Design Issue 1 User- Kernel Interaction
- Allow each user process to give hints to the
kernel. - Which blocks it no longer needs or which blocks
are less important than others. - Inform kernel about its access pattern for files
(sequential, random etc.) Kernel can then make
decisions for the user process. - Implement a fixed set of replacement policies in
the kernel and the user process can choose from
this menu- LRU, MRU, LRU-K etc. - For full flexibility, the kernel can make an
upcall to the manager process every time a
replacement decision is needed. - Each manager process can maintain a list of
free blocks and the kernel can take blocks off
the list when it needs them. - Kernel can implement some common policies and
rely on upcalls for applications that do not want
to use the common policies.
17Design Issue 2 Shared Files
- Concurrently shared files are handled in one of
the two ways - If all the sharing processes agree to designate a
single process as manager for the shared file,
then the kernel allows this. - If the sharing processes fail to agree,
management reverts to the kernel and the default
global LRU policy is used.
18Design Issue 3 Prefetching
- Kernel prefetching would be responsible for
deciding how aggressively to prefetch. Simply
treat the prefetcher process as another process
competing for memory in the file cache. - Information about future file references
(essential for prefetching) might be valuable to
the replacement code as well. Adding prefetching
may well make the allocators job easier rather
than harder. - Interaction between the allocator and the
prefetcher would also be useful. The allocator
could inform the prefetcher about the current
demand for cache blocks the prefetcher could
then voluntarily free blocks when it realizes
that some prefetched blocks are no longer useful.
19Simulation
- Trace driven simulation has been used to evaluate
two- level replacement. - In these simulations, the user-level managers
used a general replacement strategy that takes
advantage of knowledge about applications file
references. - Two set of traces were used to evaluate the
scheme. - Ultrix
- Sprite
20Simulated Application Policies
- The two-level block replacement enables each user
process to use its own replacement policy. - This solves the problem for those sophisticated
applications that know exactly what replacement
policy they want. - For less sophisticated applications, the
knowledge about an applications file accesses
can be used in replacement policy. - Knowledge about file accesses can often be
obtained through general heuristics or from the
compiler or the application writer.
21Simulated Application Policies (contd.)
- Following replacement policy based on the
principle of RMIN (replace the block whose next
reference is farthest in future) has been
proposed to exploit partial knowledge of the
future file access sequence. - When the kernel suggests a candidate replacement
block to the manager process, - Find all blocks whose next references are
definitely (or with high probability) after the
next reference to the candidate block. - If there is no such block, replace the candidate
block. - Else, choose the block whose reference is
farthest from the next reference of the candidate
block.
22Simulated Application Policies (contd.)
- This strategy can be applied to general
applications with following common file reference
patterns - Sequential Most files are accessed sequentially
most of the time. - File specific sequence Some files are mostly
accessed in one of a few sequences. - Filter Many applications access files one by one
in the order of their names in the command line
and access each file sequentially from beginning
to end. - Same-order a file or group of files are
repeatedly accessed in the same order. - Access Once Many programs do not re-read or
re-write file data that they have already been
accessed.
23Simulation Environment
- Two trace-driven simulations have been used to do
preliminary evaluation of the ideas presented in
this paper - Ultrix
- Traces various applications running on a DEC
5000/200workstation. - 1.6 MB file cache.
- Block size 8K.
- Sprite
- File system traces from UC at Berkeley.
- Recording file activities of about 40 clients
over a period of 48 hours. - Client cache size 7MB.
24Results- Ultrix Traces
- Postgres a relational database system.
- Sequential access pattern as the policy
- The designer of the database system can certainly
give a better user-level policy, thus further
improving the hit ratio.
25Results- Ultrix Traces
- Cscope an interactive tool for examining C
sources. - It reads the database of all source files
sequentially from beginning to end to answer each
query. - Applying right user-level policy (Same-Order
being the access pattern) the miss ratio is
reduced significantly.
26Results- Ultrix Traces
- Link-editing Ultrix linker
- Linker in this system makes a lot of small file
accesses. - Doesnt fit sequential access pattern, but fits
Read-once
27Results- Ultrix Traces
- Multi-process workload Postgres, cscope linker
are running concurrently. - Simulated each application running its own
user-level policy as discussed in previous
slides. - Yields the curve directly above RMIN.
28Results Sprite Traces
- In a system with a slow network (e.g ethernet),
client caching performance - determines the file system performance on each
workstation. - Since most file accesses are sequential, the
sequential heuristic can be - used.
- Sequential pattern improves hit ratio for about
10 of the clients.
29Conclusions
- This paper has proposed a two-level replacement
scheme for file cache management. - Its kernel policy for cache block allocation.
- Several user-level replacement policies.
- Kernel Allocation policy guarantees performance
improvements over the traditional global LRU file
caching approach - The method guarantees that processes that are
unwilling or unable to predict their file access
patterns will perform at least as well as global
LRU. - It guarantees that a process that mis-predicts
its file access patterns cannot cause other
processes to suffer more misses. - The key contribution is the guarantee that a good
user-level policy will improve the file cache hit
ratios of the entire system.
30