Soft Updates McKusick and Ganger - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Soft Updates McKusick and Ganger

Description:

Undo (roll-back) before writing. Redo (roll-forward) after writing. A1. B2. A,D. B. Ops ... Write-back code and disk scheduler should not be constrained in ordering. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 36
Provided by: Ken667
Category:

less

Transcript and Presenter's Notes

Title: Soft Updates McKusick and Ganger


1
Soft Updates(McKusick and Ganger)
  • Kenneth Chiu

2
Two Kinds of Data
  • Two kinds of data
  • Metadata
  • Directories, inodes, free block maps, etc.
  • File data
  • Actual data in the file.
  • What kinds of consistency guarantees?
  • No pointers to uninitialized space
  • No multiple resource ownership
  • No unreferenced live resources
  • File data guarantees?

3
Kinds of Failures
  • What kinds of failures are there?
  • Power
  • Bad disk
  • Misbehaving controller, wrong blocks
  • Taxonomy
  • Halting failure
  • Byzantine

4
Traditional Synchronicity
  • Creating a file
  • First initialize inode
  • Then point to it.
  • Alternatives
  • Ignore it
  • Logging/atomic/doing-things-carefully
  • Non-volatile
  • Soft-updates

5
Soft-Updates
  • Allow block writes to be reordered.
  • But make sure the data that is written is
    consistent with what has been written before.
  • Use additional bookkeeping to coerce the data in
    the block to be consistent.

6
Set Theory
  • Reflexive
  • Irreflexive
  • Symmetric
  • Antisymmetric, asymmetric
  • Transitive

7
Orders
  • Weak
  • Reflexive, antisymmetric, transitive
  • Strict (Strong)
  • Irreflexive, asymmetric, transitive
  • Partial order
  • Not all elements comparable
  • Total order
  • All elements comparable

8
Dependencies
O1
O1
A?3
O3
O2
O3
O2
B?2
O4
O4
  • Strict or weak? Total or partial?

9
Execution
T1
T2
T1
T2
O1
O1
T1
T2
O3
O2
O1
O3
O2
O4
O3
O2
O4
Step 1
Step 3
O4
Step 2
T1
T2
T1
T2
O3
O2
O3
O2
T1
T2
O4
O4
O4
Step 5
Step 4
Step 6
10
Multiple Views
A gt B
A?3
A3
A1
B0
B2
B?2
On disk
Ops
In core
S1
S2
S3
S4
S5
S6
  • Some invariants to preserve.
  • Some write ordering to preserve those invariants.
  • One view in memory, must be efficient.
  • Another view on disk.
  • May not match.

11
Cycles
A?1
A,B,C
C?3
B?2
  • What if all on one disk block? Solve the problem?
    Strict or weak?

12
Induced Cycles
A?1
A,D
B?2
B
C?3
C
D?4
In-core blocks(false sharing)
Ops
  • Solutions/workarounds?

13
Undo/Redo
A?1
A,D
B?2
C?3
B
C
D?4
In-core blocks
Ops
  • Undo (roll-back) before writing.
  • Redo (roll-forward) after writing.

14
Efficiency
O1
O1
O4
O3
O2
O2,O3,O4
Which block should be written first?
15
Three Rules
  • Never point to a structure before it has been
    initialized. (Why?)
  • An inode must be initialized before a directory
    entry references it.
  • Never reuse a resource before nullifying all
    previous pointers to it.
  • An inodes pointer to a data block must be
    nullified before that disk block may be
    reallocated for a new inode.
  • Never reset the last pointer to a live resource
    before a new pointer has been set.
  • When renaming a file, do not remove the old name
    for an inode until after the new name has been
    written.

16
Previous Solutions
  • Synchronous writes
  • NVRAM
  • Atomic updates
  • Scheduler-enforced ordering
  • Changes to the disk scheduler
  • Interbuffer dependencies
  • Too many synchronous writes to avoid cycles.

17
Characteristics of an Ideal Solution
  • Applications should never wait unless they choose
    to do so.
  • Propagage modified metadata with minimum number
    of writes (allow coalescing).
  • Minimize memory usage.
  • Write-back code and disk scheduler should not be
    constrained in ordering.
  • Any inherent conflicts?

18
Cyclic Dependency
19
Undo/Redo 1
20
Undo/Redo 2
  • Ordering between add and delete?

21
Undo/Redo 3
22
Soft Updates in FFS
  • Block allocation
  • Block deallocation
  • Link addition
  • Link removal

23
Block Allocation
Initializeblock
Set blockpointer
Freebitmap
  • Why?

24
Block Deallocation
Clear blockpointer
Freebitmap
  • Why?

25
Link Addition
Newinode
Newdirectoryentry
Freebitmap
  • Why?

26
Link Removal
Decinoderef cnt
Cleardirectoryentry
  • Why?

27
File System Recovery
  • File system can always be used safely.
  • How would you define safely?
  • Possible inconsistencies
  • Unused blocks may not be in free space maps.
  • Unreferenced nodes may not appear in the free
    inode maps.
  • Inode link counts may exceed the actual number of
    directory entries.
  • Fixed up by running a background job.

28
Implementation Complexities
  • fsync()
  • Requires all data and metadata associated with a
    file be written to disk.
  • For example, /a/b/file.
  • Requires a lot of searching and flushing.
  • Memory usage
  • Memory is allocated dynamically.
  • If disk is slow, could grow without bound.
  • Solve by blocking when memory usage is too high.

29
  • Useless write-backs
  • Writing in the wrong order is bad.
  • Modified syncer algorithms responsible for
    flushing to disk.
  • Also modified buffer reclamation code.

30
Performance Evaluation
  • Tested 3 instances
  • No Order
  • Conventional
  • Soft Updates
  • 300 MHz Pentium II, 128 MB of RAM, FreeBSD 4.0.

31
Microbenchmarks
  • Create, read and delete
  • Cold cache

32
Create
33
Delete
  • Drop due to indirect block.
  • Claim SU faster due to background.

34
Read
35
Summary
  • Key ideas
  • Dependencies, how to deal with.
  • Induced cycles, how to deal with.
  • Two main approaches
  • Apply the writes in the correct order.
  • Apply the writes in any order, but undo/redo to
    maintain consistency.
  • Which is better?
Write a Comment
User Comments (0)
About PowerShow.com