Title: Why panic()? Improving Reliability through Restartable File Systems
1Why panic()? Improving Reliability through
Restartable File Systems
- Swaminathan Sundararaman, Sriram Subramanian,
Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, - Remzi H. Arpaci-Dusseau, Michael M. Swift
2Data Availability
- Applications require data
- Use FS to reliably store data
-
- Both hardware and software can fail
- Typical Solution
- Large clusters for availability
- Reliability through replication
Slave Nodes
GFS Master
GFS Master
Slave Nodes
3User Desktop Environment
- Replication infeasible for desktop environments
- Wouldnt RAID work?
- Can only tolerate H/W failures
- FS crash are more severe
- Services/applications are killed
- Requiring OS reboot and recovery
- Need better reliability in the
- event of file system failures
App
App
App
OS
FS
Disk
Raid Controller
Disks
4Outline
- Motivation
- Background
- Restartable file systems
- Advantages and limitations
- Conclusions
5Failure Handling in File Systems
- Exception paths not tested thoroughly
- Exceptions failed I/O, bad arguments, null
pointer - On errors call panic,BUG,BUG_ON
- After failure data becomes inaccessible
- Reason for no recovery code
- Hard to apply corrective measures
- Not straightforward to add recovery
6Realworld Example Linux 2.6.15
ReiserFS
int journal_mark_dirty(.) struct
reiserfs_journal_cnode cn NULL if (!cn)
cn get_cnode(p_s_sb) if
(!cn) reiserfs_panic(p_s_sb,
"get_cnode failed!\n")
File systems already detect failures
void reiserfs_panic(struct super_block sb,
...) BUG() / this is not actually
called, but makes reiserfs_panic() "noreturn"
/ panic("REISERFS panic s\n,
error_buf)
Recovery simplified by generic recovery mechanism
7Possible Solutions
- Code to recover from all failures
- Not feasible in reality
- Restart on failure
- Previous work have taken
- this approach
- FS need stateful lightweight
- recovery
-
Heavyweight
Lightweight
CuriOS EROS
Stateful
Nooks/Shadow Xen, Minix L4, Nexus
SafeDrive Singularity
Stateless
8Restartable File Systems
- Goal build lightweight stateful solution to
tolerate file-system failures - Solution single generic recovery mechanism
- for any file system failure
- Detect failures through assertions
- Cleanup resources used by file system
- Restore file-system state before crash
- Continue to service new file system requests
FS Failures completely transparent to
applications
9Challenges
- Transparency
- Multiple applications using FS upon crash
- Intertwined execution
- Fault-tolerance
- Handle a gamut of failures
- Transform to fail-stop failures
- Consistency
- OS and FS could be left in an inconsistent state
10Guarantying FS Consistency
- FS consistency required to prevent data loss
- Not all FS support crash-consistency
- FS state constantly modified by applications
- Periodically checkpoint FS state
- Mark dirty blocks as Copy-On-Write
- Ensure each checkpoint is atomically written
- On Crash revert back to the last checkpoint
11Overview of Our Approach
Open (file)
write()
read()
write()
write()
Close()
Application
VFS
checkpoint
File System
Epoch 0
Epoch 1
time
Completed
In-progress
Legend
Crash
12Checkpoint Mechanism
- File systems constantly modified
- Hard to identify a consistent recovery point
- Naïve Solution Prevent any new FS operation and
call sync - Inefficient and unacceptable overhead
13Key Insight
App
App
App
All requests go through the VFS layer
VFS
File System
ext3
VFAT
Control requests to FS and dirty pages to disk
Page Cache
File Systems write to disk through Page Cache
Disk
14Generic COW based Checkpoint
App
App
App
VFS
VFS
VFS
File System
File System
File System
Page Cache
Page Cache
Page Cache
Disk
Disk
Disk
At Checkpoint
After Checkpoint
Regular
Membrane
15Interaction with Modern FSes
- Have built-in crash consistency mechanism
- Journaling or Snapshotting
- Seamlessly integrate with these mechanism
- Need FSes to indicate beginning and end of an
transaction - Works for data and ordered journaling mode
- Need to combine writeback mode with COW
16Light-weight Logging
- Log operations at the VFS level
- Need not modify existing file systems
- Operations open, close, read, write, symlink,
unlink, seek, etc. - Read
- Logs are thrown away after each checkpoint
- What about logging writes?
17Page Stealing Mechanism
- Mainly used for replaying writes
- Goal Reduce the overhead of logging writes
- Soln Grab data from page cache during recovery
Write (fd, buf, offset, count)
VFS
VFS
VFS
File System
File System
File System
Page Cache
Page Cache
Page Cache
Before Crash
During Recovery
After Recovery
18Handling Non-Determinism
19Skip/Trust Unwind Protocol
20Evaluation
21OpenSSH Benchmark
22Postmark Benchmark
23Recovery Time
- Restart ext2 during random-read micro benchmark
24Recovery Time (Cont.)
Data (Mb) Recovery Time (ms)
10 12.9
20 13.2
40 16.1
Open Sessions Recovery Time (ms)
200 11.4
400 14.6
800 22.0
Log Records Recovery Time (ms)
200 11.4
400 14.6
800 22.0
25Advantages
- Improves tolerance to file system failures
- Build trust in new file systems (e.g., ext4,
btrfs) - Quick-fix bug patching
- Developer transform corruptions to restart
- Restart instead of extensive code restructuring
- Encourage more integrity checks in FS code
- Assertions could be seamlessly transformed to
restart - File systems more robust to failures/crashes
26Limitations
- Only tolerate fail-stop failures
- Not address-space based
- Faults could corrupt other kernel components
- FS restart may be visible to application
- e.g., Inode numbers could be changed after
restart
Inode Mismatch
File1 inode 15
File1 inode 12
create (file1)
stat (file1)
write (file1, 4k)
create (file1)
Application
stat (file1)
write (file1, 4k)
VFS
File System
File file1 Inode 15
File file1 Inode 12
Epoch 0
Epoch 0
After Crash Recovery
Before Crash
27Conclusions
- Failures are inevitable in file systems
- Learn to cope and not hope to avoid them
- Generic recovery mechanism for FS failures
- Improves FS reliability availability
of data - Users Install new FSes with confidence
- Developers Ship FS faster as not all exception
cases are now show-stoppers
28Thank You!
Advanced Systems Lab (ADSL) University of
Wisconsin-Madison http//www.cs.wisc.edu/adsl