The Design and Implementation of a LogStructured File System - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

The Design and Implementation of a LogStructured File System

Description:

Intuition: Avg. time disk is busy for writing new data. Definition. includes cleaning ... Intuition: Free space in 'cold' (more stable) segments more valuable ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 33
Provided by: ivan5
Category:

less

Transcript and Presenter's Notes

Title: The Design and Implementation of a LogStructured File System


1
The Design and Implementation of a Log-Structured
File System
  • Mendel Rosenblum, John Ousterhout
  • Presented by Ali Bakhoda, Ivan Sham

2
Motivation for the LFS
  • CPU speeds increased drastically
  • Disk access time did not
  • Applications becoming disk-bound
  • Reads handled by main memory
  • Disk traffic dominated by writes
  • But placement optimized for reads
  • Solution
  • Focus on performance of small file writes
  • Write new data sequentially to a log
  • Eliminates almost all seeks

3
Problems of Current File Systems
  • Many random accesses
  • Inodes separate from file
  • FFS requires 5 seeks for create
  • Poor bandwidth utilization
  • Synchronous metadata writes
  • Metadata updates dominates traffic
  • Bad for crash recovery
  • Must scan entire disk

4
Log Structure File System
  • Improves write performance
  • Buffer disk writes
  • Write sequentially
  • Eliminate seeks for random access
  • Allocate a new version instead of updating the
    old file in place
  • Nearly 100 bandwidth utilization
  • Asynchronous writes
  • Two challenges for LFS
  • How to retrieve information?
  • How to manage free space?

5
Sprite LFS
  • Goals
  • Efficient small file writes
  • Match FFS on reads/large file writes
  • Based on Unix FFS
  • Uses segments and segment cleaner
  • Developed for Sprite OS in 1991
  • Sprite is dead since 1994
  • Other LFS exists

6
Information Retrieval
  • Does not scan whole log
  • inodes (same as Unix FFS)
  • File attributes, block addresses
  • inode map
  • Allow inodes to be written to log
  • Written out to log
  • Locations stored in fixed checkpoint regions
  • Unique ID for file is key
  • Active portions cached

7
Free space management
  • Goal keeping large extents of free space to
    write new data
  • LFS Solution Divide disk into fixed-length
    segments (512kB or 1MB)
  • Each segment written sequentially
  • Only empty segments can be written
  • Large enough to make seek time negligible
  • Older segments get fragmented meanwhile

8
Segment cleaning
  • segment cleaning The process of copying live
    data out of a segment
  • Read a number of segments into memory
  • Identify live data
  • Only write live data back to smaller number of
    clean segments

9
Segment Cleaning (cont.)
  • Basic mechanism
  • Segment summary block identifies information
  • For file block, file number and relative block
    number
  • Liveness can be determined by checking inode
  • Uid (inode number, version) helps to avoid some
    of these checks
  • One consequence No free list, bitmap, B tree
  • Saves memory and disk space
  • Simplifies crash recovery

10
Cleaning policies
  • Four policy issues
  • When to clean?
  • Continuously, when a threshold is reached?
  • How many segments to clean?
  • The more segments, the more opportunities for
    arranging data
  • Which segments to clean?
  • Most fragmented ones?
  • How to group live blocks while cleaning?
  • Arrange by directory, arrange by age?
  • Focus on latter two issues
  • Simple thresholds used for former two

11
The Write cost metric
  • A way of comparing cleaning policies
  • Intuition Avg. time disk is busy for writing new
    data
  • Definition
  • includes cleaning overhead
  • Depends on utilization
  • Large segments - seek and rotational latency
    negligible

12
Cleaning policies
Low utilization low write cost
  • Underutilized disk gives low write cost, but high
    storage cost!
  • But utilization defined only for read segment
    (not overall)
  • Achieve bimodal distribution keep most segments
    nearly full, but a few nearly empty

13
Achieving bimodal distribution
  • First attempt (greedy policy)
  • Always choose segment with lowest utilization
  • sorts by age before writing
  • FAILURE!
  • Bimodal cleaning policy
  • Intuition Free space in cold (more stable)
    segments more valuable
  • Assumption stability of segment proportional to
    age of youngest block (i.e. older colder)
  • Implementation Cost/benefit analysis
  • Clean segments with higher ratio
  • Still group by age before rewriting

14
Effects of Cost/Benefit Policy
  • Cold segments cleaned at 75 utilization
  • Hot segments cleaned at 15 utilization
  • Implementation supported by segment usage table
  • Number of live bytes, most recent modification
    time

15
Crash Recovery
  • Disk may be in inconsistent state
  • New file created, directory not updated
  • Traditional Unix
  • Scan all metadata
  • Very slow, and getting slower
  • Log-structured FS
  • Scan end of log
  • Sprite LFS checkpoints and roll-forward

16
Checkpoints
  • All FS structure are consistent
  • 2 steps to create a checkpoint
  • Write out all modified info to log
  • File data blocks
  • Indirect blocks
  • Inodes, and inode maps
  • Segment usage tables
  • Write checkpoint region

17
Checkpoint regions
  • Contains
  • addresses of all blocks in inode map
  • segment usage table
  • current time
  • pointer to last segment written
  • Two of them, for safety
  • Time stamp updated last, use latest one
  • Located at fixed positions on disk

18
Checkpoint Policy
  • Creation
  • Periodically
  • Unmount
  • Shutdown
  • Potential improvement
  • Create after a certain amount of data has been
    written

19
Roll-Forward
  • Scan latest log after crash
  • Recover info, fix inconsistent state
  • Use segment summary block
  • Update inode-map
  • Ignore data blocks without inode
  • Adjust utilizations

20
Directory entry / inode
  • Might be inconsistent after crash
  • Sprite LFS use special record
  • Directory operation log
  • Appear before directory entry or inode
  • Can recover directory and inodes
  • Cant recover new files with no inode
  • Introduce extra synchronization

21
Experience with Sprite LFS
  • 1 year development
  • Manages 5 partitions / 30 users
  • No roll-forward
  • 30 second checkpoint interval
  • Feels like Unix FFS

22
Microbenchmarks
  • SunOS 4.0.3 8kB block
  • Sprite LFS 4kB block, 1MB segment
  • Sun-4/260
  • 16.67 MHz
  • 32 MB RAM
  • Wren IV Disk
  • 1.3 MB/sec (SATA 3 GB / sec)
  • 17.5 ms seek time (Barracuda 8 ms)
  • 300 MB (Barracuda 750 GB)

23
Microbenchmarks
24
Cleaning Overhead
  • Better performance than simulations
  • Files longer than 1 block
  • Really cold files

25
Miscellaneous Stuff
  • Crash recovery not on production system
  • Calculated recovery time depends on file sizes
    and amount of data
  • High overhead for metadata update
  • 99 disk usage is file data
  • 13 traffic is metadata

26
Questions?
27
Discussion
  • Performance
  • Seems to be very specific to load
  • Requires large space
  • Cold start effects, degradation
  • Slowly growing files

28
Discussion
  • Segment Cleaner
  • Cleaning policy
  • Impact on disk life
  • Impact on performance

29
Discussion
  • Is write traffic really dominating?

30
Discussion
  • Temporal vs. Logical Locality
  • Most of the time, they are the same
  • Not the case for file server

31
Discussion
  • Current state
  • CPUs are faster
  • Why isnt this more widely used
  • Compare to Unix FFS
  • Apply same optimization to FFS
  • Segment size for modern system

32
Discussion
  • Hardware problems
  • Bad sectors
  • Scaling of capacity
  • Combine disk with Flash
Write a Comment
User Comments (0)
About PowerShow.com