Title: DEFER Cache
1DEFER Cache an Implementation
- Sudhindra Rao and Shalaka Prabhu
- Thesis Defense
- Master of Science
- Department of ECECS
OSCAR Lab
2DEFER Cache
- Overview
- Related Work
- DEFER Cache Architecture
- Implementation Motivation and Challenges
- Results and Analysis
- Conclusion
- Future Work
3Overview
- Accessing remote memory is faster than accessing
local disks hence co-operative caching - Current schemes - fast reads, slow writes
- Goal Combine replication with logging for fast,
reliable write-back to improve write performance - DCD and RAPID already do something similar
- New architecture Distributed, Efficient and
Reliable (DEFER) Co-operative Cache - DEFER Cache
- Duplication Logging
- Co-operative caching for both read and write
requests - Vast performance gains (up to 11.5x speedup) due
to write-back9
4Co-operative Caching
- High Performance, Scalable LAN
- Slow speed of the file server disks
- Increasing RAM in the server
- File Server with 1GB RAM
- 64 Clients with 64MB RAM 4GB
- Cost-Effective Solution
- Using Remote Memory for Caching
- 6-12ms for accessing 8KB data from disk Vs 1.05
ms from remote client - Highly Scalable
- But all related work focuses on read performance
5Other Related Work
- N-Chance Forwarding
- Forwards singlets to remote host on capacity miss
- Re-circulates N times and then written to server
disk - Uses write-through cache
- Co-operative Caching using hints
- Global Memory System
- Remote Memory Servers
- Log Structured Storage Systems
- LFS, Disk Caching Disk, RAPID
- NVRAM - not cost effective with current
technology - Whats DEFER?
- Improve Write Performance
- DCD Using Distributed Systems
6Log based write mechanism
- DCD7 and RAPID8 implement log based write
- Improvement in small writes using log
- Reliability and data availability from the log
partition
DCD like structure of DEFER
7Logging algorithm
Writing a segment
2
Mapping Table
5
10
1
Segment Buffer
11
RAM Cache
.
Free
Log segment
Free
Cache disk
- Write 128KB of LRU data to a cache-disk segment,
in one large write - Pickup LRU data to capture temporal locality,
improve reliability and reduce disk traffic - Most data will be overwritten repeatedly
8Garbage Collection
Before garbage collection
11
10
4
8
9
3
42
29
Log disk on client
After garbage collection
- Data is written into the cache-disk continuously
- Cache-disk will fill eventually log writes
- Most of data in the cache-disk is garbage
- caused by data overwriting
- Need to clean the garbage to make free log disk
9DEFER Cache Architecture
- Typical distributed system (client-server)
- Applications run on workstations (clients) and
access files from the Server - Local disks on clients only for booting, swapping
and logging - Local RAM divided into I/O cache and segment
buffer - Local disk has corresponding log partition
10DEFER Cache Algorithms
- DEFER is DCD distributed over the network
- Best of co-operative caching and logging
- Reads handled exactly as in N-chance Forwarding
- Writes are immediately duplicated and eventually
logged after a pre-determined time interval M - Dirty singlets are forwarded like N-chance
- Three logging strategies used
- Server Logging
- Client Logging
- Peer Logging
11Server Logging
- Client copies block to server cache on a write
- Server Table maintains consistency
- Invalidates clients and Logs the contents of
segment buffer - Increased load on the server due to logging
12Client Logging
Server
Client 1
- Advantage Server load is decreased
- Disadvantage Availability of the block is
affected
13Peer Logging
Client 1
- Each workstation is assigned a peer peer
performs logging - Advantage Reduces server load without
compromising availability,
14Reliability
- Every M seconds, blocks that were modified within
the last M to 2M seconds are logged - Thus, for M 15, we guarantee that data modified
within the last 30 seconds is written to disk - Most UNIX systems use a delayed write-back of 30
seconds - M can be reduced, to increase frequency of
logging, without introducing high overhead - With DEFER, blocks are logged and duplicated
15Crash Recovery
- Peer Logging
- Recovery algorithm works on the on-log-disk
version of data - In-memory and on-log-disk copy are in different
hosts - Find the blocks that were updated by the crashed
client and the peer information - Server initiates recovery of the blocks from the
peer
16Simulation Results
Simulation9 using disksim synthetic and real
world traces
17Real Workloads
- Snake Peer Logging 4.9x Server and Client
Logging 4.4x - Cello Peer Logging 8.5x Server and Client
Logging 7.3x
18DEFER Cache Implementation
19DEFER Cache Architecture
20DEFER Cache design
- Follow the design principles
- Use only commodity hardware that is available in
typical systems. - Avoid storage media dependencies such as use of
only SCSI or only IDE disks. - Keep the data structures and mechanisms simple.
- Support reliable persistence semantics.
- Separate mechanisms and policies.
21Implementation
- Implementation with Linux open source
- Implemented client logging as a device driver or
a library No change to the system kernel or
application code - Uses linux device drivers to create a custom
block device attached to the network device
provides system call overload using loadable
kernel module - Network device uses Reliable UDP to ensure fast
and reliable data transfer - Also provides a library for testing and
implementing on non-linux systems provides
system call overload - Alternative approach using NVRAM under test
22DEFER as a module
Network device
23Data Management
- Plugs into the OS as a custom block device that
contains memory and a disk - Disk managed independent of the OS
- request_queue intercepted to redirect to Defer
module - Read/Write override with Defer read/write
- Interfaces with the Network device to transfer
data - Interfaces with the kernel by registering special
capabilities logging, de-stage, garbage
collection, data recovery on crash
24DEFER Cache - Implementation
- Simulation results present a 11.5x speedup
- DEFER Cache implemented in real time system to
support the simulation results. - Multi-hierarchy cache structure can be
implemented at - Application level
- File System level
- Layered device driver
- Controller level
- Kernel device driver selected as it achieves
efficiency and flexibility.
25Implementation Design
- Implementation derived from DCD implementation.
- DEFER Cache can be considered as a DCD over a
distributed system. - Implementation design consists of three modules
- Data management
- Implements the caching activities on the local
machine. - Network interface
- Implements the network transfer of blocks to/from
server/client. - Coordinating daemons
- Coordinates the activities of the above mentioned
two modules.
26Data Management
- Custom block device driver developed and plugged
into the kernel during execution. - Driver modified according to DEFER Cache design.
- Request function of the device driver modified.
- Read/Write for RAM replaced by DEFER Cache
read/write call.
27Network Interface
- Implemented as a network block device (NBD)
driver. - NBD simulates a block device on the local client,
but connects to a remote machine which actually
hosts the data. - Local disk representation for a remote client.
- Can be mounted and accessed as a normal block
device. - All read/write requests transferred over the
network to the remote machine. - Consists of three parts
- NBD client
- NBD driver
- NBD server
28NBD Design
NBD Driver
Kernel
register_blkdev()
init_module()
NBD Client
ioctl()
blk_init_queue()
transmit()
Default Queue
NBD Server
request()
User Space
Kernel Space
29NBD Client
NBD Driver
Kernel
register_blkdev()
init_module()
NBD Client
ioctl()
blk_init_queue()
transmit()
Default Queue
NBD Server
request()
User Space
Kernel Space
30NBD Driver
NBD Driver
Kernel
register_blkdev()
init_module()
NBD Client
ioctl()
blk_init_queue()
transmit()
Default Queue
NBD Server
request()
User Space
Kernel Space
31NBD Driver
NBD Driver
Kernel
register_blkdev()
init_module()
NBD Client
ioctl()
blk_init_queue()
transmit()
Default Queue
NBD Server
request()
User Space
Kernel Space
32Linux Device Driver Issues
- Successfully implemented Linux device drivers for
Data Management and Network Interface module. - Could not be thoroughly tested and validated.
- Poses following problems
- Clustering of I/O requests by kernel
- Kernel memory corruption
- Synchronization problem
- No specific debugging tool
-
33User-mode Implementation
- Implementation of DEFER Cache switched to
User-mode. - Advantages
- High flexibility. All data can be manipulated by
user according to requirements. - Easier to design and debug.
- Good design can improve the performance.
- Disadvantages
- Response time is slower worse if data is swapped
34User-Mode Design
- Simulates drivers in the user-mode.
- All data structures used by device drivers
duplicated in the user-space. - Use raw disk.
- 32MB buffer space allocated for DEFER Cache in
RAM. - Emulates I/O buffer cache.
35DEFER Server - Implementation
- Governs the entire cluster of workstation.
- Maintains its own I/O cache and a directory
table. - Server directory table maintains the consistency
in the system. - server-client handshake performed on every write
update. - Server directory table entry reflects the last
writer. - Used for garbage collection and data recovery.
36Initial Testing
- Basic idea Accessing remote data faster than
accessing data on local disk. - Is LAN speed faster than disk access speed?
- As UDP used as network protocol, UDP transfer
delay measured. - Underlying network - 100Mbps Ethernet network.
- Use UDP monitor program.
37UDP monitor - Results
- Effect varying Response message size on Response
time
38Benchmark Program
- Developed in-house benchmark program.
- Generates requests using a history table.
- Generates temporal locality and spatial locality.
- Runs on each workstation
- Following parameter can be modified at runtime
- Working set size
- Client cache size
- Server cache size
- Block size
- Correlation factor (c)
-
39Results (working set size)
Result of varying file size on Bandwidth (c1)
40Results (small writes)
Effect of Small write
41Results (Response time for small writes)
Result of varying file size on Response time (c1)
42Results (Response time for sharing data)
Result of varying file size on Response time
(c0.75)
43Results (Varying client cache size)
Result of varying Client Cache Size on Bandwidth
44Results (varying server cache size)
Result of varying Server Cache Size on Bandwidth
45Results (latency)
Latency comparison of DEFER Cache and Baseline
System
46Results (Delay measurements)
Delay comparison of DEFER Cache and Baseline
System
47Results (Execution time)
Execution time for DEFER Cache and Baseline system
48Conclusions
- Improves write performance for cooperative
caching. - Reduces small write penalty.
- Ensures reliability and data availability
- Improves overall File system performance.
49Future Work
- Improve user-level implementation.
- Extend kernel-level functionality to user-level.
Intercept system level calls and modify them to
implement DEFER read/write calls. - Kernel-Level Implementation.
- Successfully implement DEFER Cache at kernel
level and plug-in with kernel.
50Thank you