Title: Web Caching File System
1Web Caching File System
- Jonathan Ledlie
- Matt McCormick
2Outline
- Motivation - why design a new file system?
- Current state of affairs
- Design of web caching file system
- Performance comparison - WCFS to Unix
- Future work
- Conclusions
3Two points
- Optimizing on invariants
- Impending I/O bottleneck
4Motivation
- Disks are slow
- Communication rates increasing rapidly
- Web cache anomalies
- only write to files when they are created
- permissions stay constant for files
- all files are have copies at original server
5Web Caching File System vs.Unix File System
50,000 of each operation CFS is using one
thread. Unix file I/O is synchronous 500 mHz
PIII, 8.5 G disk
6Current State of AffairsInternet Topology
Client
Client Side Cache
NETWORK
Client
SERVER
Client
Server Side Cache
Client
Client Side Cache
Client
Client
7Current State of AffairsUnix File System
- Life of file in web cache
- create, write, close
- open, read, close (multiple times)
- delete
- Using i-nodes
- lots of flexibility that is not needed
- extra access to disk for each file reference
- Directory structure and name lookup
8Design of WCFSSpecializations
- Life of file
- create
- read (multiple times)
- delete
- No i-nodes or permanent file status data
- faster create and file access
- In memory hash table stores file locations
- faster file lookup and delete
- All file data written to consecutive blocks
- faster reads and writes
9Design of WCFSObject Diagram
CacheDisk
getNewCacheObject
Disk
Cache
Cache
Cache
RequestQueue
FileTable
BitMap
Request
Request
Request
10Design of WCFSDisk Initialization
CacheDisk
Disk
- First create cache disk object
- creates disk object to represent physical disk
- starts a disk thread running
- Disk object and physical disk
- utilize an SGI raw I/O patch for Linux
- bypass kernel and kernel buffers
11Design of WCFSDisk Object
Disk
RequestQueue
FileTable
BitMap
- FileTable
- stores names and locations of files on disk
- MD5 conversion of url
- RequestQueue
- stores read and write requests from process
threads - whenever anything in queue, disk thread runs
- BitMap
- keeps status of each block on disk
- locates and marks spot on disk for files to be
placed
12Design of WCFSRequest Objects
RequestQueue
Request
Request
Request
- Request
- write ? starting block, length, buffer to write
from - read ? starting block, length, buffer to write
to - (implies files must be smaller than virtual
memory) - Currently queued by FIFO (soon to be one-way
elevator)
13Design of WCFSCache Objects for Threading
CacheDisk
Cache
Cache
Cache
- Multiple threads for handling clients
- Each thread gets a single Cache object
- Cache Object
- create, read, remove, length, sync
- Thread create and read ? Asynchronous
- turned into request objects
- placed in request queue for disk
- Thread calls sync to guarantee its operations are
done
14Design of CFSCode Snippet
- Common web caching operations
- create(url, buffer, size)
- read(url, buffer)
- remove(url)
- sync()
- Equivalent Operations in Unix
- fd creat(url, permissions)
- write(fd, buffer, size)
- close(fd)
- fd open(url, mode)
- read(fd, buffer, size)
- close(fd)
- unlink(url)
15Design of WCFSBasic File System Layout
CacheDisk
getNewCacheObject
Disk
Cache
Cache
Cache
RequestQueue
FileTable
BitMap
Request
Request
Request
16Design of WCFSFeature Recap
- Raw I/O
- Multi-threading
- Asynchronous I/O
- Quick name lookup
- File data on consecutive blocks
17Performance ComparisonsTrace
18Performance ComparisonsCreate
19Performance ComparisonsRead
20Performance ComparisonsDelete
21Two points, revisited
- Optimizing on invariants
- Impending I/O bottleneck
22Whats coming...
- Real raw I/O and proper memory alignment
- Testing with more threads
- Trace testing
- Determining optimal fragmentation and cleaning
- Is MD5 a bottleneck?
- Elevator algorithm
- Adding save on clean shutdown
- Examine memory requirements for FileTable
23Conclusions
- Unix file system induces unnecessary overhead
- Possible to take advantage of application
specific traits - Specialization works