A Low-Bandwidth Network File System - PowerPoint PPT Presentation

About This Presentation
Title:

A Low-Bandwidth Network File System

Description:

Must deal with issues like auto-saves blocking the editor for the duration of transfer ... a user opens a file whose lease has expired, the client asks the ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 25
Provided by: jeha1
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: A Low-Bandwidth Network File System


1
A Low-Bandwidth Network File System
  • A. Muthitacharoen, MIT
  • B. Chen, MIT
  • D. Mazieres, NYU

2
Key Ideas
  • A network file systems for slow or wide-area
    networks
  • Exploits similarities between files or versions
    of the same file
  • Avoids sending data that can be found in the
    servers file system or the clients cache
  • Also uses conventional compression and caching
  • Requires 90 less bandwidth than traditional
    network file systems

3
Working on slow networks
  • Make local copies
  • Must worry about update conflicts
  • Use remote login
  • Only for text-based applications
  • Use instead a LBFS
  • Better than remote login
  • Must deal with issues like auto-saves blocking
    the editor for the duration of transfer

4
LBFS
  • Exploits cross-file similarities especially with
    previous versions of the same file
  • Auto-save files,
  • LBFS file server divides the files it stores
    into chunks and indexes the chunks by hash value
  • LBFS client similarly indexes a large persistent
    file cache
  • LBFS never transfers chunks that the recipient
    already has

5
Previous Work (I)
  • AFS Callbacks require server to notify clients
    when a cached file has been modified
  • Leases achieve same goal but have an expiration
    time
  • Coda supports slow networks and even disconnected
    operation
  • Defers some updates to saves bandwidth
  • OceanStore applies Bayous conflict resolution
    mechanisms to a file system

6
Previous Work (II)
  • Operation-based updates (Lee et al.)
  • Proxy-client close to the server duplicates
    client computations in the hope of duplicating
    its output files
  • Spring and Wetherall propose to use two large
    cooperating caches storing identical copies of
    the last n megabytes of network traffic
  • Rsync uses directory tree mirroring at client and
    server.

7
LBFS
  • LBFS provides close-to-open consistency
  • Similar to AFS session consistency
  • LBFS assumes clients will have a cache large
    enough to contain a users entire working set of
    files
  • When possible, LBFS reconstitutes files using
    chunks of existing data in the file system and
    client cache instead of transmitting those chunks
    over the network

8
Indexing Issues
  • Major challenge is keeping the index a
    reasonable size while dealing with shifting
    offsets
  • Indexing conventional file blocks would not work
  • Indexing and hashing overlapping file blocks at
    all offsets would require too much space

9
LBFS Solution
  • Considers only non-overlapping chunks of files
  • Sets chunk boundaries based on file contents to
    avoid sensitivity to shifting file offset
  • Examines every overlapping 48-byte region of the
    file to selects boundary regions, or
    breakpoints, using Rabin fingerprints
  • Expected chunk size is 8 KB plus the size of the
    48-byte breakpoint window

10
Handling Insertions
11
More Indexing Issues
  • Pathological cases
  • Very small chunks
  • Sending hashes of chunks would consume as much
    bandwidth as just sending the file
  • Very large chunks
  • Cannot be sent in a single RPC
  • LBFS imposes minimum and maximum chuck sizes

12
The Chunk Database
  • Indexes each chunk by the first 64 bits of its
    SHA-1 hash
  • To avoid synchronization problems, LBFS always
    recomputes the SHA-1 hash of any data chunk
    before using it
  • Simplifies crash recovery
  • Recomputed SHA-1 values are also used to detect
    hash collisions in the database

13
Protocol
  • Based on NFS version 3
  • Adds
  • Extensions to exploit inter-file commonality
    (GETHASH)
  • Leases
  • Compresses all traffic using conventional gzip

14
File Consistency (I)
  • Whenever a client makes any RPC on an LBFS file,
    it gets back a read lease on the file.
  • If a user opens a file whose lease has expired,
    the client asks the server for the attributes of
    the file
  • Grants the client a lease on the file.
  • Client can check if it has the current version of
    the file in its cache
  • If the file times have changed, client must
    obtain new contents of file from server

15
File Consistency (II)
  • No need for write leases
  • LBFS provides close-to-open consistency
  • Server never demands back a dirty file
  • If multiple clients are writing the same file,the
    last one to close the file will overwrite changes
    from the others
  • File updates are atomic
  • Limits damage caused by concurrent updates

16
Security Issues
  • LBFS uses SFS security infrastructure
  • Servers have public keys
  • Messages are encrypted
  • Specific security issue
  • A user could check whether the file system
    contains a particular chunk of data by observing
    subtle timing differences in servers answer to
    CONDWRITE request

17
Implementation (I)
18
Implementation (II)
  • Uses NFS
  • Two NFS-related issues
  • When server commits a temporary file to a target
    file, it must copy the contents of the temporary
    file onto the target file to preserve the target
    file i-node
  • Hard to preserve previous contents of a truncated
    file
  • Message order is guaranteed by TCP

19
Evaluation (I)
  • Communality of data in /usr/local

20
Evaluation (II)
  • Normalized bandwidth consumption(2 of 3
    benchmarks)

21
Key
  • First four bars of each workload show upstream
    bandwidth, the second four downstream bandwidth.
  • CIFS is Windows natural network file system
  • LeasesGzip uses LBFS file caching, leases, and
    data compression but not its chunking scheme
  • LBFS, new DB is LBFS starting with a a new
    database

22
Evaluation (III)
Normalized application times
23
Key
  • Execution times weere normalized orma,ized
    execution times Measurements made over a cable
    modem link with 384 Kb/sc uplink and 1.5 Mb/s
    downlink
  • LAN data were obtained on a 100 Mb/s full-duplex
    LAN.

24
Conclusion
  • Under normal circumstances, LBFS consumes 90
    less bandwidth than traditional file systems.
  • Makes transparent remote file access a viable and
    less frustrating alternative to running
    interactive programs on remote machines.
Write a Comment
User Comments (0)
About PowerShow.com