Title: Distributed File Systems II
1Distributed File Systems - II
- Trishali Nayar
- Staff Software Engineer
- India Systems Technology Lab
- IBM Pune.
2Agenda
- Andrew File System (AFS)
- Basic Concepts
- Features
- Architecture
- OpenAFS
- Introduction
- Availability
- Advantages for Colleges
- Universities using this setup
3Distributed File Systems
- A distributed file system enables co-operating
hosts (clients and servers) to efficiently share
file system resources across both local area and
wide area networks. - It allows users to access remote files and
directories and treat those files and directories
as if they were local. - Operating system commands can be used to create,
remove, read, write, and set file attributes for
remote files and directories.
4AFS
- AFS - Andrew File System
- AFS was pioneered at Carnegie Mellon University
(CMU) and supported and developed as a product by
Transarc Corporation (now IBM Pittsburgh Labs) - AFS is a distributed file system that enables
users to share and access files stored in a
network of computers as easily as they access
files stored on their local machines. - The file system is called distributed for this
reason files can reside on many different
machines, but are available to users on every
machine. - It provides location independence, scalability
and transparent migration capabilities for data.
5AFS Concepts
- Client-Server Model
- Cell
- Filespace
- Volumes
- Mount Points
6Client-Server Model
- AFS uses a client/server computing model. In
client/server computing, there are two types of
machines. - Server machines store data and perform services
for client machines. - Client machines perform computations for users
and access data and services provided by server
machines. - Some machines act as both clients and servers. In
most cases, you work on a client machine,
accessing files stored on a file server machine.
7Client-Server Model
- Servers serve, Client requests
- AFS stores files on a subset of the machines in a
network, called file server machines. - File server machines provide file storage and
delivery service, along with other specialized
services, to the other subset of machines in the
network, the client machines.
8Cell
- The cell is the administrative domain in AFS.
- Each cells administrators determine how client
machines are configured and how much storage
space is available to each user. - The organization corresponding to a cell can be a
company, a university department, or any defined
group of users. - From a hardware perspective, a cell is a grouping
of client machines and server machines defined to
belong to the same cell. - For example, directories and files relevant to
the ABC Corporation can be stored in the cell
/afs/abc.com
9AFS Cell
10Filespace
- While each cell organizes and maintains its own
filespace, it can also connect with the filespace
of other AFS cells. - The result is a huge filespace that enables file
sharing within and across cells.
11Volumes and Mount Points
- The storage disks in a computer are divided into
sections called partitions. AFS further divides
partitions into units called volumes, each of
which houses a subtree of related files and
directories. - Container for storing related files and
directories. - System administrators can move volumes from one
file server machine to another without your
noticing, because AFS automatically tracks a
volumes location. - You access the contents of a volume by accessing
its mount point in the AFS filespace.
12Volumes and Mount Points
- A mount point is a special file system element
that looks and acts like a regular directory, but
tells AFS the volumes name. - When you change to a different directory you
sometimes cross a mount point and start accessing
the contents of a different volume than before. - You do not notice the crossing, however, because
AFS automatically interprets mount points and
retrieves the contents of the new directory from
the appropriate volume. - You do not need to track which volume, partition,
or file server machine is housing a directorys
contents.
13Volumes and Mount Points
- User volumes are typically named user.username.
For example, the volume for a user named smith in
the cell - abc.com is called user.smith and is mounted at
the directory /afs/abc.com/usr/smith. - AFS volumes can be stored on different file
server machines, when a machine becomes
unavailable only the volumes on that machine are
inaccessible. Volumes stored on other machines
are still accessible. - If a volumes mount point resides in a volume
that is stored on an unavailable machine, the
former volume is also inaccessible. - Volumes containing frequently used directories
are often copied and distributed to many file
server machines.
14AFS Features
-
- Scalability
- Ease of Administration
- Performance
- Reliability
- Location Transparency
- Security
- Access Control
- Coexistence
- Portability and Heterogeneity
15AFS Features
16Scalability
- Smoothly supports 2001 client/server ratios
within a single installation.
17Ease of Administration
- Data mobility Improved and balanced utilization
of disk resources is facilitated by the fact that
AFS supports transparent relocation of user data
between partitions on a single server machine or
between two different machines. In a situation
where a machine must be brought down for an
extended period, all its storage may be migrated
to other servers so that users may continue their
work completely unaffected. - Statistics Each AFS agent facilitates collection
of statistical data on its performance,
configuration, and status via its RPC interface.
Thus, the system is easy to monitor. Eg- usage
statistics, current disk capacities, and whether
the server is unavailable. Administrators
monitoring this information can thus quickly
react to correct overcrowded disks and machine
crashes.
18Ease of Administration
- Automated nanny" services Each fileserver
machine runs a BOS Server process, which assists
in the machine's administration. This server is
responsible for monitoring the health of the AFS
agents under its care, bringing them up in the
proper order after a system reboot, answering
requests as to their status and restarting them
when they fail. It also accepts commands to
start, suspend, or resume these processes.
19Ease of Administration
- Online backup Backups may be performed on the
data stored by the AFS file server machines
without bringing those machines down for the
duration. Copy-on-write snapshots are taken of
the data to be preserved, and tape backup is
performed from these clones. One added benefit is
that these backup clones are online and
accessible by users.
20Ease of Administration
- Systems administrators are able to make
configuration changes from any client in the AFS
cell . - With AFS it is simple to effect changes without
having to take systems off-line.
21Account Manager
22Account Manager
23Server Manager
24Efficiency Boosters
- Performance Local Caching and Callbacks
- System availability Replication
25Replication
- Replication of databases. Replication refers to
making a copy, or clone, of a source read/write
volume and then placing the copy on one or more
additional file server machines cell. - Improves Reliability and availability the
contents. - No one machine need become overburdened with
requests for a popular file, either, because the
file is available from several machines. - Replication is most appropriate for volumes that
contain popular files that do not change very
often.
26Caching
- Caching increases the speed and efficiency of
file access in AFS. - Each AFS client machine dedicates a portion of
its local disk or memory to a cache where it
stores data temporarily. - When an application program (such as a text
editor) running on a client machine requests data
from an AFS file, the request passes through the
Cache Manager. - The Cache Manager is a portion of the client
machines kernel that translates file requests
from local application programs into
cross-network requests to the File Server process
running on the file server machine storing the
file.
27Caching
- When the Cache Manager receives the requested
data from the File Server, it stores it in the
cache and then passes it on to the application
program.
28Caching Benefits
- Caching improves the speed of data delivery to
application programs in the following ways - When the application program repeatedly asks for
data from the same file, it is already on the
local disk. The application does not have to wait
for the Cache Manager to request and receive the
data from the File Server. - Caching data eliminates the need for repeated
request and transfer of the same data, so network
traffic is reduced. Thus, initial requests and
other traffic can get through more quickly.
29Issues with Caching
- Thorny issue of Cache Consistency.
- This problem is solved using a mechanism referred
to as a callback. - A callback is a promise by a File Server to a
Cache Manager to inform the latter when a change
is made to any of the data delivered by the File
Server.
30Callbacks
- When a File Server delivers a writable copy of a
file to the Cache Manager, the File Server sends
along a callback with that file. - If the source version of the file is changed by
another user, the File Server breaks the callback
associated with the cached version of that
fileindicating to the Cache Manager that it
needs to update the cached copy. - The callback mechanism ensures that the Cache
Manager always requests the most up-to-date
version of a file.
31AFS Filespace
- AFS acts as an extension of your machines local
UNIX file system. Your system administrator
creates a directory on the local disk of each AFS
client machine to act as a gateway to AFS. By
convention, this directory is called /afs, and it
functions as the root of the AFS filespace. - Just like the UNIX file system, AFS uses a
hierarchical file structure (a tree). Under the
/afs root directory are subdirectories created by
your system administrator, including your home
directory. - Files relevant only to the local machine are
usually stored on the local machine. All other
files can be stored in AFS, enabling many users
to share them and freeing the local machines
disk space for other uses.
32AFS Client on Windows
33AFS Client on UNIX
34Global Namespace and Location Transparency
- Common Namespace from all locations.
- User need not know where his/her file is located
on the server. - Encourages collaborative work and dissemination
of information, as everyone has a common frame of
reference.
35Global Namespace and Location Transparency
/afs/ltcellnamegt/project/global_team
Physically separated locations still using same
pathname
36Security
- One way AFS provides adequate security is by
requiring that servers and clients prove their
identities to one another before they exchange
information. This is achieved by using Kerberos
Mutual authentication. - Even in a cell where file sharing is especially
frequent and widespread, it is not desirable that
every user have equal access to every file.
37Security - Kerberos Mutual Authentication
- Mutual Authentication, requires that both server
and client demonstrate knowledge of a ?shared
secret? (like a password) known only to the two
of them. - Mutual authentication guarantees that servers
provide information only to authorized clients
and that clients receive information only from
legitimate servers.
38Security - ACL
- Users themselves control another aspect of AFS
security, by determining who has access to the
directories they own. - AFS does not rely on the mode bit protections of
a standard UNIX system (though its protection
system does interact with these mode bits). - More Granular Access Control Lists, better than
UNIX rwx, especially for group access. - For any directory a user owns, he or she can
build an access control list (ACL) that grants or
denies access to the contents of the directory.
An access control list pairs specific users with
specific types of access privileges. - Seven separate permissions and up to twenty
different people or groups of people can appear
on an ACL.
39ACL Directory Permissions
40ACL File Permissions
41ACL Diagram
42Coexistence
- Organizations currently employ other distributed
file systems, most notably NFS. AFS was designed
to run simultaneously with other DFSs without
interfering in their operation. In fact, an
NFS-AFS translator agent exists that allows
pure-NFS client machines to transparently access
files in the AFS space.
43Portability
- AFS is implemented using the standard VFS and
vnode interfaces pioneered and advanced by Sun
Microsystems, hence it is easily portable between
different platforms from a single vendor or from
different vendors.
44Heterogeneity
- Available on a large number of hardware platforms
and operating systems - Useful for a large community of unrelated
organizations to utilize a wide variety of
computing environments.
45AFS - Architecture
46File Server Machines
- File server machines store the files in the
distributed file system, and a server process
running on the file server machine delivers and
receives files. AFS file server machines run a
number of server processes. - Each process has a special function, such as
maintaining databases important to AFS
administration, managing security or handling
volumes. This modular design enables each server
process to specialize in one area, and thus
perform more efficient AFS file server machines
run a number of server processes, so called
because each provides a distinct specialized
service one handles file requests, another
tracks file location, a third manages security,
and so on.
47Architecture
- The Authentication Server helps ensure that
communications on the network are secure. It
verifies user identities at login and provides
the facilities through which participants in
transactions prove their identities to one
another (mutually authenticate). It maintains the
Authentication Database. - The Protection Server helps users control who has
access to their files and directories. Users can
grant access to several other users at once by
putting them all in a group entry in the
Protection Database maintained by the Protection
Server.
48Architecture
- The Volume Server performs all types of volume
manipulation. It helps the administrator move
volumes from one server machine to another to
balance the workload among the various machines. - The Volume Location Server (VL Server) maintains
the Volume Location Database (VLDB), in which it
records the location of volumes as they move from
file server machine to another file server
machine. This service is the key to transparent
file access for users. - The Backup Server maintains the Backup Database,
in which it stores information related to the
Backup System. It enables the administrator to
back up data from volumes to tape. The data can
then be restored from tape in the event that it
is lost from the file system.
49Cache Manager
- The Cache Manager resides on AFS client rather
than file server machines. It not a process per
se, but rather a part of the kernel on AFS client
machines that communicates with AFS server
processes. Its main responsibilities are to
retrieve files for application programs running
on the client and to maintain the files in the
cache.
50AFS Client on Windows
51AFS Client on Windows
52A useful infrastructure for University
Environments
53OpenAFS
- IBM branched the source of the AFS product, and
made a copy of the source available for community
development and maintenance. -
- They called the release OpenAFS.
54OpenAFS Availability
- http//www.openafs.org
- Easily Available
- Freeware
55OpenAFS Available on Vast Range of Platforms
- IBM AIX 5.2, 5.3
- SGI Irix 6.5
- HP/UX 11i
- RHEL- 3.0, 4.0
- Solaris SPARC 7, 8, 9, 10
- Windows 2000/XP/2003
- Latest versions of all O.S are being supported
56Advantages for College Environments
- Easily available and Free
- Available on Multiple Platforms
- Each AFS site can have its own AFS cell, which is
usually named after the institution or a research
group that administers it. For example, CMU cell
is called andrew.cmu.edu - Co-operating colleges can form a Common Namespace
and share information.
57Advantages for College Environments
- A student can access his/her files from any
terminal, using the same pathname. - Access control for users and groups of users can
easily be added. - Replication takes place with no downtime to
users. - Easy to administer and to use.
- Client caching provides a good performance and
reduces load on network.
58Cross-cell Sharing
- Participating in the AFS global namespace makes a
cells local file tree visible to AFS users in
foreign cells and makes other cells file trees
visible to local users. - It makes file sharing across cells as easy as
sharing within a cell. Making a file tree visible
does not mean making it vulnerable. - Participation in a global namespace is not
mandatory. Some cells use AFS primarily to
facilitate file sharing within the cell and are
not interested in providing their users with
access to foreign cells.
59Universities using AFS/OpenAFS
- Stanford
- CMU
- Duke University
- University of Maryland
- Penn state University
- University of North Carolina
- University of Auckland
- University of Edinburgh
- University of Pittsburgh and many others
60Questions
61Thank You