Log Structured File System - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Log Structured File System

Description:

File Systems have lots of data structures. bitmap for free blocks. directory. file header ... but not in bitmap = update bitmap. user data consistency ... – PowerPoint PPT presentation

Number of Views:255
Avg rating:3.0/5.0
Slides: 33
Provided by: joonw
Category:

less

Transcript and Presenter's Notes

Title: Log Structured File System


1
Log Structured File System
2
Transactions in File System
  • Main Points
  • reliability from unreliable components
  • concept
  • atomicity all or nothing
  • durability once it happens, it is there
  • serializability transactions appear to happen
    one by one
  • Motivation
  • File Systems have lots of data structures
  • bitmap for free blocks
  • directory
  • file header
  • indirect blocks
  • data blocks
  • for performance reason, all must be cached
  • read requests are easy
  • what about writes?

3
Transactions in File System
  • Write to cache
  • write through cache is not of any help
  • write back data can be lost on a crash
  • Multiple updates that belong to one operation
  • what happen if a crash occurs between updates
  • e.g. 1 move a file between directories
  • delete file from old directory
  • add file to new directory
  • e.g. 2 create a new file
  • allocate space on disk for header, data
  • write new header to disk
  • add the new file to the directory

4
Transactions in File System
  • Unix Approach (ad hoc)
  • meta-data consistency
  • synchronous write-through
  • multiple updates are done in specific order
  • after crash, fsck program fixes up anything in
    progress
  • e.g.
  • file created, but not yet in a directory gt
    delete file
  • blocks allocated, but not in bitmap gt update
    bitmap
  • user data consistency
  • write back to disk every 30 seconds or by user
    request
  • no guarantee blocks are written to disk in any
    order
  • no support for transaction
  • user may want multiple file operation done as a
    unit

5
Transactions in File System
  • Write-ahead logging
  • Almost all the file systems since 1985 use
    write-ahead logging
  • Windows/NT, Solaris, OSF, etc.
  • mechanism
  • operation
  • write all changes in a transaction to log
  • send file changes to disk
  • reclaim log space
  • if crash, read log
  • if log isn't complete, no change!
  • if log is completely written, apply all changes
    to disk
  • if log is zero, then don't worry. All updates
    have gotten to disk
  • pros cons
  • reliability
  • asynchronous write-behind
  • - all data written twice

6
Log-Structured File Systems
  • Idea
  • write data once
  • log is the only copy of the data
  • as you modify disk blocks, store them in log
  • put everything data blocks, file header, etc, on
    log
  • Data fetch
  • if need to get data from disk, get it from the
    log
  • keep map in memory
  • tells you where everything is
  • map should be in the log for crash recovery

7
Log-Structured File Systems
  • Advantage
  • all writes are sequential!!
  • no seeks, except for reads
  • reads can be handled by cache
  • cache is getting bigger
  • in extreme case, disk IO only for writes which
    are sequential
  • same problems of contiguous allocation
  • many files are deleted in the first 5 minutes
  • need garbage collection
  • if disk fills up, problem!!
  • keep disk under-utilized

8
Log-Structured File Systems
  • Mechanism
  • Issues for implementing the log
  • how to retrieve information from the log
  • enough free space for the log
  • Cache file changes, and writes sequentially on
    the disk in a single operation
  • fast writes
  • Information retrieval
  • inode map at a fixed checkpoint region
  • indices to inodes contained in the write
  • most of them are cached in memory
  • fast reads

9
Log Examples
LFS
data
i-node
dir
i-node
data
i-node
dir
i-node
map
log
FFS
i-node
i-node
data
dir
i-node
i-node
dir
data
  • In FFS, each inode is at a fixed location on disk
  • an index into the inode set is sufficient to find
    it
  • in LFS, a map is needed to locate inode since it
    is mixed with data on the log

10
Log-Structured File Systems
  • Space management
  • holes left by deleting files
  • threading
  • use the dispersed holes like a linked list
  • fragmentation will get worse
  • copying
  • copy a file out of the log to a leave large hole
  • expensive especially for long-lived files
  • Segment
  • Concept
  • clean segments are linked (threading)
  • segments with holes may be copied into a clean
    segment
  • collect long-lived files into the same segment
  • segment cleaning policy
  • when? low watermark for clean segments
  • how many segments? high watermark
  • which segments? - most fragmented
  • how to group files?
  • files in the same directory

11
Log-Structured File Systems
  • Recovery
  • checkpoints and roll-forward (NOT a roll-back!!)
  • possible since all the file operations are in the
    log
  • checkpoint
  • a point in the log at which file system is
    completed
  • contains
  • address of inode maps
  • segment usage table
  • current time
  • checkpoint region
  • contains checkpoint
  • placed at a specific location on disk
  • operation
  • 1. write out all modified information to disk
  • 2. write out checkpoint region
  • on a crash,
  • roll-forward operations logged after the last
    checkpoint
  • if the crash occurs while writing a checkpoint,
  • keep old checkpoint

12
Roll-Forward
  • Recover as much information as possible
  • in segment summary block, there exist
  • a new inode then, there must be data blocks
    before it. Just update inode map
  • data blocks without inode ignores them since we
    dont know if the data blocks are complete
  • Each inode has counter to indicate how many
    directories refer it
  • reference counter updated, but directory is not
    written
  • directory is written, but the reference counter
    is not updated
  • employs special write ahead log for directory
    changes

13
Informed Prefetching and Caching
  • Instructor Joonwon Lee

14
Introduction
  • Prefetching
  • memory prefetching (to cache memory)
  • more about the issues in architecture
  • too fast to be controlled by some intelligence
  • disk prefetching (to memory buffer)
  • disk latency is larger in different order of
    magnitude
  • Pros Cons of prefetching
  • reduce latency when the prefetched data is
    accessed
  • file cache may be wasted if the prefetched data
    is unused
  • difficult to know when the prefetched data will
    be used
  • interference with other cached data and virtual
    memory is difficult to understand
  • Assumptions
  • disk parallelism is underutilized
  • file performance is getting more important with
    faster CPU
  • applications provide hints

15
Limits of RAID
  • RAID increases disk throughput when the workload
    can be processed in parallel
  • very large accesses
  • multiple concurrent accesses
  • Many real I/O workload is not parallel
  • get a byte from a file
  • think
  • get another byte from (the same or another) file
  • access only a single disk at a time

16
Real I/O Workload
  • Recent trends
  • faster CPU generated I/O requests more often
  • programs favor larger data objects
  • file cache hit ratio is more important than
    before
  • Most workload is read
  • writes can be done behind in parallel
  • read blocks the applications
  • most access patterns are predictable.
  • Lets use the predictability as hints

17
Overview of Informed Prefetching
  • Application discloses its future resource
    requirements
  • the system makes the final decisions, not
    applications
  • Disclosing hints are issued through ioctl
  • file specifier
  • file name or file descriptor
  • pattern specifier
  • sequential
  • list of ltoffset, lengthgt
  • What to do with the disclosing hints
  • parallelize the I/O request for RAID
  • keep the data in the cache
  • schedule disk to reduce seek time

18
Informed Cache Manager Schematic
19
A System Model
  • Tdisk latency of the disk fetch
  • Tdriver buffer allocation, queueing at the
    driver, and interrupt service

20
Benefit of a buffer
  • Tstall (x) read stall time when there are x
    buffers for x prefetches
  • Tpf (x) service time for a hinted read when
    there are x buffers

- benefit of using one more buffer
issue(a)
use(a)
use(a)
x buffers
gt x(TcpuThitTdriver)
Tstall
21
Stall time for a disk access
  • At worst, it takes Tdisk
  • before x-th request generates, it takes x(Tcpu
    Thit Tdriver) for the CPU at best (all cache
    hits, no stall)
  • prefetch horizon P(TCPU) distance at which
    Tstall becomes zero, i.e., there is no need to
    prefetch beyond this point

22
What really happens
  • 3 buffers are assumed
  • so,

23
Benefit of a single buffer
  • When used for prefetching
  • When used for demand miss
  • ?

24
Model Verification
  • The model underestimates the stall time due to
  • neglecting disk contention
  • variation in disk service time (queueing effect)
  • overall, it is a good estimator

25
Cost of Shrinking LRU buffer cache
  • hit ratio H(n) for file cache with n buffers
  • service time
  • cost for taking a buffer from the file cache
  • H(n) varies with workload
  • need dynamic run time monitoring

26
Cost for Ejecting a Prefetched Block
  • cost is paid when the ejected block is accessed
    again later
  • cost when that block is prefetched in x accesses
    in advance
  • Tstall can be zero beyond the prefetch horizon
  • ejection frees one block for y-x accesses
  • increase in service time per access is

y
x
prefetch
eject
reaccess
region affected by eviction
27
(No Transcript)
28
Seeking Global Optimum
  • Normalization of each estimate LRU, hinted
    prefetch
  • multiply each with usage rate
  • unhinted demand access rate LRU cache
    estimate(TLRU)
  • access rate to the hinted sequence (TPF)
  • When a manager needs a new block
  • each estimator selects the least valuable block
  • hint the block that is accessed in the furthest
    future
  • LRU the block at the bottom of the LRU stack
  • the manager selects the least valuable block
  • compare the benefit with the cost of least
    valuable block

29
Real World Estimator
  • LRU Cache hit ratio
  • can be measured in the cache but the cache size
    varies as time goes on use ghost buffer to
    measure to the maximum size of the cache
  • keeping history of each cache block is too much
    work
  • use segment
  • Use system wide prefetch horizon
  • upper bound
  • For Teject, assume prefetch happens at the
    prefetch horizon
  • assuming

30
After 4 Years,
  • Providing hints is too much burden to programmers
  • Automatic hints generation is desired
  • there are idle CPU times when program blocks for
    I/O
  • speculative execution can provide hints for
    future I/O accesses
  • Approaches made
  • a kernel threads performs the speculative
    execution
  • this speculating thread shares the address space
  • Issues
  • run time overhead
  • incorrectness
  • may affect the correctness of the results
  • incorrect hints may waste I/O bandwidth

31
Ensuring Program Correctness
  • Software copy-on-write
  • prevents code/data distortion
  • for each new write to a memory region, make a
    copy
  • insert code to every load/store to check if it is
    to a copied region
  • software fault isolation
  • code is inserted to a copy of code (shadow code)
  • original code is not changed, so no overhead for
    normal execution
  • Generates no system call
  • system state is not changed by the speculative
    execution
  • Signal handler
  • catches all exceptions that may disturb normal
    execution

32
Generating Correct and Timely Hints
  • Problems
  • the speculating thread may lack behind generating
    stale hints
  • the speculating thread may stray from the
    execution path
  • How to detect the problems
  • the original thread checks the hint log prepared
    by the speculating thread
  • if it is wrong, the original thread prepares a
    copy of register set and sets the flag
  • when the speculating thread is invoked, checks
    the flag
  • if set, restart using the register set
Write a Comment
User Comments (0)
About PowerShow.com