Distributed File Systems II - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Distributed File Systems II

Description:

It allows users to access remote files and directories and treat those files and ... Automated 'nanny' services: Each fileserver machine runs a BOS Server process, ... – PowerPoint PPT presentation

Number of Views:244
Avg rating:3.0/5.0
Slides: 62
Provided by: Tris61
Category:

less

Transcript and Presenter's Notes

Title: Distributed File Systems II


1
Distributed File Systems - II
  • Trishali Nayar
  • Staff Software Engineer
  • India Systems Technology Lab
  • IBM Pune.

2
Agenda
  • Andrew File System (AFS)
  • Basic Concepts
  • Features
  • Architecture
  • OpenAFS
  • Introduction
  • Availability
  • Advantages for Colleges
  • Universities using this setup

3
Distributed File Systems
  • A distributed file system enables co-operating
    hosts (clients and servers) to efficiently share
    file system resources across both local area and
    wide area networks.
  • It allows users to access remote files and
    directories and treat those files and directories
    as if they were local.
  • Operating system commands can be used to create,
    remove, read, write, and set file attributes for
    remote files and directories.

4
AFS
  • AFS - Andrew File System
  • AFS was pioneered at Carnegie Mellon University
    (CMU) and supported and developed as a product by
    Transarc Corporation (now IBM Pittsburgh Labs)
  • AFS is a distributed file system that enables
    users to share and access files stored in a
    network of computers as easily as they access
    files stored on their local machines.
  • The file system is called distributed for this
    reason files can reside on many different
    machines, but are available to users on every
    machine.
  • It provides location independence, scalability
    and transparent migration capabilities for data.

5
AFS Concepts
  • Client-Server Model
  • Cell
  • Filespace
  • Volumes
  • Mount Points

6
Client-Server Model
  • AFS uses a client/server computing model. In
    client/server computing, there are two types of
    machines.
  • Server machines store data and perform services
    for client machines.
  • Client machines perform computations for users
    and access data and services provided by server
    machines.
  • Some machines act as both clients and servers. In
    most cases, you work on a client machine,
    accessing files stored on a file server machine.

7
Client-Server Model
  • Servers serve, Client requests
  • AFS stores files on a subset of the machines in a
    network, called file server machines.
  • File server machines provide file storage and
    delivery service, along with other specialized
    services, to the other subset of machines in the
    network, the client machines.

8
Cell
  • The cell is the administrative domain in AFS.
  • Each cells administrators determine how client
    machines are configured and how much storage
    space is available to each user.
  • The organization corresponding to a cell can be a
    company, a university department, or any defined
    group of users.
  • From a hardware perspective, a cell is a grouping
    of client machines and server machines defined to
    belong to the same cell.
  • For example, directories and files relevant to
    the ABC Corporation can be stored in the cell
    /afs/abc.com

9
AFS Cell
10
Filespace
  • While each cell organizes and maintains its own
    filespace, it can also connect with the filespace
    of other AFS cells.
  • The result is a huge filespace that enables file
    sharing within and across cells.

11
Volumes and Mount Points
  • The storage disks in a computer are divided into
    sections called partitions. AFS further divides
    partitions into units called volumes, each of
    which houses a subtree of related files and
    directories.
  • Container for storing related files and
    directories.
  • System administrators can move volumes from one
    file server machine to another without your
    noticing, because AFS automatically tracks a
    volumes location.
  • You access the contents of a volume by accessing
    its mount point in the AFS filespace.

12
Volumes and Mount Points
  • A mount point is a special file system element
    that looks and acts like a regular directory, but
    tells AFS the volumes name.
  • When you change to a different directory you
    sometimes cross a mount point and start accessing
    the contents of a different volume than before.
  • You do not notice the crossing, however, because
    AFS automatically interprets mount points and
    retrieves the contents of the new directory from
    the appropriate volume.
  • You do not need to track which volume, partition,
    or file server machine is housing a directorys
    contents.

13
Volumes and Mount Points
  • User volumes are typically named user.username.
    For example, the volume for a user named smith in
    the cell
  • abc.com is called user.smith and is mounted at
    the directory /afs/abc.com/usr/smith.
  • AFS volumes can be stored on different file
    server machines, when a machine becomes
    unavailable only the volumes on that machine are
    inaccessible. Volumes stored on other machines
    are still accessible.
  • If a volumes mount point resides in a volume
    that is stored on an unavailable machine, the
    former volume is also inaccessible.
  • Volumes containing frequently used directories
    are often copied and distributed to many file
    server machines.

14
AFS Features
  • Scalability
  • Ease of Administration
  • Performance
  • Reliability
  • Location Transparency
  • Security
  • Access Control
  • Coexistence
  • Portability and Heterogeneity

15
AFS Features
16
Scalability
  • Smoothly supports 2001 client/server ratios
    within a single installation.

17
Ease of Administration
  • Data mobility Improved and balanced utilization
    of disk resources is facilitated by the fact that
    AFS supports transparent relocation of user data
    between partitions on a single server machine or
    between two different machines. In a situation
    where a machine must be brought down for an
    extended period, all its storage may be migrated
    to other servers so that users may continue their
    work completely unaffected.
  • Statistics Each AFS agent facilitates collection
    of statistical data on its performance,
    configuration, and status via its RPC interface.
    Thus, the system is easy to monitor. Eg- usage
    statistics, current disk capacities, and whether
    the server is unavailable. Administrators
    monitoring this information can thus quickly
    react to correct overcrowded disks and machine
    crashes.

18
Ease of Administration
  • Automated nanny" services Each fileserver
    machine runs a BOS Server process, which assists
    in the machine's administration. This server is
    responsible for monitoring the health of the AFS
    agents under its care, bringing them up in the
    proper order after a system reboot, answering
    requests as to their status and restarting them
    when they fail. It also accepts commands to
    start, suspend, or resume these processes.

19
Ease of Administration
  • Online backup Backups may be performed on the
    data stored by the AFS file server machines
    without bringing those machines down for the
    duration. Copy-on-write snapshots are taken of
    the data to be preserved, and tape backup is
    performed from these clones. One added benefit is
    that these backup clones are online and
    accessible by users.

20
Ease of Administration
  • Systems administrators are able to make
    configuration changes from any client in the AFS
    cell .
  • With AFS it is simple to effect changes without
    having to take systems off-line.

21
Account Manager
22
Account Manager
23
Server Manager
24
Efficiency Boosters
  • Performance Local Caching and Callbacks
  • System availability Replication

25
Replication
  • Replication of databases. Replication refers to
    making a copy, or clone, of a source read/write
    volume and then placing the copy on one or more
    additional file server machines cell.
  • Improves Reliability and availability the
    contents.
  • No one machine need become overburdened with
    requests for a popular file, either, because the
    file is available from several machines.
  • Replication is most appropriate for volumes that
    contain popular files that do not change very
    often.

26
Caching
  • Caching increases the speed and efficiency of
    file access in AFS.
  • Each AFS client machine dedicates a portion of
    its local disk or memory to a cache where it
    stores data temporarily.
  • When an application program (such as a text
    editor) running on a client machine requests data
    from an AFS file, the request passes through the
    Cache Manager.
  • The Cache Manager is a portion of the client
    machines kernel that translates file requests
    from local application programs into
    cross-network requests to the File Server process
    running on the file server machine storing the
    file.

27
Caching
  • When the Cache Manager receives the requested
    data from the File Server, it stores it in the
    cache and then passes it on to the application
    program.

28
Caching Benefits
  • Caching improves the speed of data delivery to
    application programs in the following ways
  • When the application program repeatedly asks for
    data from the same file, it is already on the
    local disk. The application does not have to wait
    for the Cache Manager to request and receive the
    data from the File Server.
  • Caching data eliminates the need for repeated
    request and transfer of the same data, so network
    traffic is reduced. Thus, initial requests and
    other traffic can get through more quickly.

29
Issues with Caching
  • Thorny issue of Cache Consistency.
  • This problem is solved using a mechanism referred
    to as a callback.
  • A callback is a promise by a File Server to a
    Cache Manager to inform the latter when a change
    is made to any of the data delivered by the File
    Server.

30
Callbacks
  • When a File Server delivers a writable copy of a
    file to the Cache Manager, the File Server sends
    along a callback with that file.
  • If the source version of the file is changed by
    another user, the File Server breaks the callback
    associated with the cached version of that
    fileindicating to the Cache Manager that it
    needs to update the cached copy.
  • The callback mechanism ensures that the Cache
    Manager always requests the most up-to-date
    version of a file.

31
AFS Filespace
  • AFS acts as an extension of your machines local
    UNIX file system. Your system administrator
    creates a directory on the local disk of each AFS
    client machine to act as a gateway to AFS. By
    convention, this directory is called /afs, and it
    functions as the root of the AFS filespace.
  • Just like the UNIX file system, AFS uses a
    hierarchical file structure (a tree). Under the
    /afs root directory are subdirectories created by
    your system administrator, including your home
    directory.
  • Files relevant only to the local machine are
    usually stored on the local machine. All other
    files can be stored in AFS, enabling many users
    to share them and freeing the local machines
    disk space for other uses.

32
AFS Client on Windows
33
AFS Client on UNIX
34
Global Namespace and Location Transparency
  • Common Namespace from all locations.
  • User need not know where his/her file is located
    on the server.
  • Encourages collaborative work and dissemination
    of information, as everyone has a common frame of
    reference.

35
Global Namespace and Location Transparency
/afs/ltcellnamegt/project/global_team
Physically separated locations still using same
pathname
36
Security
  • One way AFS provides adequate security is by
    requiring that servers and clients prove their
    identities to one another before they exchange
    information. This is achieved by using Kerberos
    Mutual authentication.
  • Even in a cell where file sharing is especially
    frequent and widespread, it is not desirable that
    every user have equal access to every file.

37
Security - Kerberos Mutual Authentication
  • Mutual Authentication, requires that both server
    and client demonstrate knowledge of a ?shared
    secret? (like a password) known only to the two
    of them.
  • Mutual authentication guarantees that servers
    provide information only to authorized clients
    and that clients receive information only from
    legitimate servers.

38
Security - ACL
  • Users themselves control another aspect of AFS
    security, by determining who has access to the
    directories they own.
  • AFS does not rely on the mode bit protections of
    a standard UNIX system (though its protection
    system does interact with these mode bits).
  • More Granular Access Control Lists, better than
    UNIX rwx, especially for group access.
  • For any directory a user owns, he or she can
    build an access control list (ACL) that grants or
    denies access to the contents of the directory.
    An access control list pairs specific users with
    specific types of access privileges.
  • Seven separate permissions and up to twenty
    different people or groups of people can appear
    on an ACL.

39
ACL Directory Permissions
40
ACL File Permissions
41
ACL Diagram
42
Coexistence
  • Organizations currently employ other distributed
    file systems, most notably NFS. AFS was designed
    to run simultaneously with other DFSs without
    interfering in their operation. In fact, an
    NFS-AFS translator agent exists that allows
    pure-NFS client machines to transparently access
    files in the AFS space.

43
Portability
  • AFS is implemented using the standard VFS and
    vnode interfaces pioneered and advanced by Sun
    Microsystems, hence it is easily portable between
    different platforms from a single vendor or from
    different vendors.

44
Heterogeneity
  • Available on a large number of hardware platforms
    and operating systems
  • Useful for a large community of unrelated
    organizations to utilize a wide variety of
    computing environments.

45
AFS - Architecture
46
File Server Machines
  • File server machines store the files in the
    distributed file system, and a server process
    running on the file server machine delivers and
    receives files. AFS file server machines run a
    number of server processes.
  • Each process has a special function, such as
    maintaining databases important to AFS
    administration, managing security or handling
    volumes. This modular design enables each server
    process to specialize in one area, and thus
    perform more efficient AFS file server machines
    run a number of server processes, so called
    because each provides a distinct specialized
    service one handles file requests, another
    tracks file location, a third manages security,
    and so on.

47
Architecture
  • The Authentication Server helps ensure that
    communications on the network are secure. It
    verifies user identities at login and provides
    the facilities through which participants in
    transactions prove their identities to one
    another (mutually authenticate). It maintains the
    Authentication Database.
  • The Protection Server helps users control who has
    access to their files and directories. Users can
    grant access to several other users at once by
    putting them all in a group entry in the
    Protection Database maintained by the Protection
    Server.

48
Architecture
  • The Volume Server performs all types of volume
    manipulation. It helps the administrator move
    volumes from one server machine to another to
    balance the workload among the various machines.
  • The Volume Location Server (VL Server) maintains
    the Volume Location Database (VLDB), in which it
    records the location of volumes as they move from
    file server machine to another file server
    machine. This service is the key to transparent
    file access for users.
  • The Backup Server maintains the Backup Database,
    in which it stores information related to the
    Backup System. It enables the administrator to
    back up data from volumes to tape. The data can
    then be restored from tape in the event that it
    is lost from the file system.

49
Cache Manager
  • The Cache Manager resides on AFS client rather
    than file server machines. It not a process per
    se, but rather a part of the kernel on AFS client
    machines that communicates with AFS server
    processes. Its main responsibilities are to
    retrieve files for application programs running
    on the client and to maintain the files in the
    cache.

50
AFS Client on Windows
51
AFS Client on Windows
52
A useful infrastructure for University
Environments
53
OpenAFS
  • IBM branched the source of the AFS product, and
    made a copy of the source available for community
    development and maintenance.
  • They called the release OpenAFS.

54
OpenAFS Availability
  • http//www.openafs.org
  • Easily Available
  • Freeware

55
OpenAFS Available on Vast Range of Platforms
  • IBM AIX 5.2, 5.3
  • SGI Irix 6.5
  • HP/UX 11i
  • RHEL- 3.0, 4.0
  • Solaris SPARC 7, 8, 9, 10
  • Windows 2000/XP/2003
  • Latest versions of all O.S are being supported

56
Advantages for College Environments
  • Easily available and Free
  • Available on Multiple Platforms
  • Each AFS site can have its own AFS cell, which is
    usually named after the institution or a research
    group that administers it. For example, CMU cell
    is called andrew.cmu.edu
  • Co-operating colleges can form a Common Namespace
    and share information.

57
Advantages for College Environments
  • A student can access his/her files from any
    terminal, using the same pathname.
  • Access control for users and groups of users can
    easily be added.
  • Replication takes place with no downtime to
    users.
  • Easy to administer and to use.
  • Client caching provides a good performance and
    reduces load on network.

58
Cross-cell Sharing
  • Participating in the AFS global namespace makes a
    cells local file tree visible to AFS users in
    foreign cells and makes other cells file trees
    visible to local users.
  • It makes file sharing across cells as easy as
    sharing within a cell. Making a file tree visible
    does not mean making it vulnerable.
  • Participation in a global namespace is not
    mandatory. Some cells use AFS primarily to
    facilitate file sharing within the cell and are
    not interested in providing their users with
    access to foreign cells.

59
Universities using AFS/OpenAFS
  • Stanford
  • CMU
  • Duke University
  • University of Maryland
  • Penn state University
  • University of North Carolina
  • University of Auckland
  • University of Edinburgh
  • University of Pittsburgh and many others

60
Questions
61
Thank You
Write a Comment
User Comments (0)
About PowerShow.com