Scale and Performance in a Distributed File System - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Scale and Performance in a Distributed File System

Description:

Callback the server promises to notify it before allowing a modification ... cache and has a callback on it. If D is in the cache but has no callback on it ... – PowerPoint PPT presentation

Number of Views:291
Avg rating:3.0/5.0
Slides: 28
Provided by: dcslab6
Category:

less

Transcript and Presenter's Notes

Title: Scale and Performance in a Distributed File System


1
Scale and Performance in aDistributed File System
  • 2008-20952 ???

2
Contents
  • Andrew File System
  • The Prototype
  • Changes for Performance
  • Effect of Chances for Performance
  • Comparison with NFS
  • Conclusion

3
Andrew File System
  • Developed at Carnegie Mellon University
  • Distributed file system by considerations of
    scale
  • Present a homogeneous, location-transparent file
    name space to all the client workstations
  • Use 4.2BSD
  • Server
  • A set of trusted servers - Vice
  • Clients
  • User level processes Venus
  • File system call hooking
  • Caches files from Vice
  • Store modified copies of files back on the
    servers
  • Contact with servers only on opens and closes for
    a whole-file transfer

4
Andrew File System
  • Overview

5
Prototype (Description)
  • Dedicated process per client
  • Each server contained a directory hierarchy
    mirroring the structure of the Vice files
  • .admin directory Vice file status info (e.g.
    access list)
  • Stub directory location database
  • Vice-Venus Interface by their full pathname
  • Theres no notion of a low-level name such as
    inode.
  • Before using a cached file, Venus verifies its
    timestamp

6
Prototype (Qualitative Observation)
  • stat primitive
  • To test for the presence of files
  • To obtain status information before opening files
  • Each stat call involved a cache validity check
  • Increase total running time and the load on
    servers
  • Dedicated Process
  • Virtue of simplicity / Robust system
  • Excessive context switching overhead
  • Critical resource limits excess
  • High virtual memory paging demands

7
Prototype (Qualitative Observation)
  • Remote Procedure Call (RPC)
  • Simplification of Implmentation
  • Network related resources in the kernel to be
    exceeded
  • Location Database
  • Difficult to move users directories between
    servers
  • Etc
  • Use Vice file without recompilation or relinking

8
Prototype
  • Benchmark
  • Command scripts that operates on a collection of
    files
  • 70files (source code of an application program)
  • 200kb
  • 5 phases

9
Prototype
  • Vice calls profiling

Obtain status information about files absent from
the cache
Validate cache entries
10
Prototype
Load Unit the load placed on a server by a
single client workstation running this
benchmark A load unit gt five Andrew users
  • Benchmark

11
Prototype
  • Benchmark

12
Prototype
  • CPU/disk utilization profiling
  • Performance bottleneck is CPU
  • Frequently context switches
  • The time spent by the servers in traversing full
    pathnames

13
Changes for Performance
  • Cache management
  • Name resolution
  • Communication and server process structure
  • Low-level storage representation

14
Changes for Performance
  • Cache management
  • Previous Cache Management
  • Status(in virtual memory)/Data(in local disk)
    cache
  • Interception only opening/closing operations
  • Modifications to a cached files are reflected
    back to Vice when the file is closed
  • Callback the server promises to notify it
    before allowing a modification
  • This reduces cache validation traffic.
  • Each should maintain callback state information.
    (Restricted)
  • There is a potential for inconsistency.

15
Changes for Performance
  • Name resolution
  • Previous Name Resolution
  • inode - unique, fixed-length
  • pathname one or more, variable-length
  • namei routine maps a pathname to an inode
  • CPU overhead on the servers
  • Each Vice pathname involves implicit namei
    operation.

16
Changes for Performance
  • fid unique, fixed-length
  • Map a component of a pathname to a fid
  • Each 32 bit- Volume number, Vnode number,
    Uniquifuier
  • Volume number Identifying a Volume on one
    server
  • Vnode number Index into an file storage info.
    Array
  • Uniquifuier Allowing Reuse of Vnode number
  • Moving files does not invalidate the contents of
    directories cached on workstation

17
Changes for Performance
  • Communication and server process structure
  • Using LWP instead of a single process
  • An LWP is bound to a particular client only for
    the duration of a single server operation.
  • Using RPC mechanism
  • Low-level storage representation
  • Access files by their inodes
  • vnode on the servers
  • inode on the clients

18
Changes for Performance
  • Overall design
  • If D is in the cache and has a callback on it
  • If D is in the cache but has no callback on it
  • If D is not in the cache

19
Effect of Chances for Performance
19 slower than stand-alone workstation cf)
Previous Prototype is 70 slower.
  • Scalability

20
Effect of Chances for Performance
  • Scalability

21
Effect of Chances for Performance
  • Scalability

22
Effect of Chances for Performance
  • General Observations

23
Effect of Chances for Performance
  • General Observations

24
Comparison with NFS
25
Comparison with NFS (contd)
26
Comparison with NFS (contd)
  • Advantage of remote-open file system
  • Low latency

27
Conclusion
  • Scale impacts Andrew in areas.
  • In early 1987, about 400 workstations, 16
    servers, 3500 users and 6000 MB data.
Write a Comment
User Comments (0)
About PowerShow.com