Title: File System Implementation
1File System Implementation
2Objectives
- To describe the details of implementing local
file systems and directory structures - To describe the implementation of remote file
systems - To discuss block allocation and free-block
algorithms and trade-offs
3File-System Structure
- A file system poses two distinct design problems
- Defining how the file system should look to the
user - Creating algorithms and data structures to map
the logical file system onto the physical device - File system resides on secondary storage (disks)
- File system organized into layers.
- Each level uses the lower level to create new
features for use by higher levels - File control block storage structure consisting
of information about a file
4A Typical File Control Block
5File-System Implementation
- Several on-disk and in-memory structures are used
to implement a file system - On-disk structures
- A boot control block
- A partition control block
- A directory structure
- file control blocks
- In-memory structures
- An in-memory partition table
- An in-memory directory structure
- The system-wide open-file table
- The per-process open-file table
6Creating a File
- To create a new file the application program
calls the logical file system (which knows the
format of the directory structures) - Allocates a new FCB
- Reads the appropriate directory into memory
- Updates the directory with new file name and FCB
- Writes is back to disk
- Some operating systems (UNIX) treat a directory
exactly as a file, other operating systems
(Windows), implement separate system calls for
files and directories and treat directories
separate from files.
7Opening a File
- Before a file can be used for I/O operations it
must first be opened - Open call passes the file name to the file system
- The directory structure (usually cached) is
searched for the given file name - Once the file is found, the FCB is copied into
the system-wide open-file table in memory - An entry is made in the per-process open-file
table, with a pointer to the system-wide
open-file table - The open call returns a pointer to the
appropriate entry in the per-process file-system
table, all file operations are performed via this
pointer (file descriptor in Unix, file handle in
Windows)
8Closing a File
- After all I/O operations are complete a file
should be closed - The per-process table entry is removed and the
system-wide entrys open count is decremented - When all users that have opened the file close
it, the updated file information is copied back
to the disk-based directory structure and the
system-wide open-file table entry is removed - Using a caching a scheme, all information about
an open file, except for its actual data blocks,
is in memory
9In-Memory File System Structures
(a) refers to opening a file. (b) refers to
reading a file.
10Disk Partition and Mounting
- The layout of a disk can have many variations,
depending on the operating system - A disk can be divided into multiple partitions,
or a partition can span multiple disks - Raw containing no file system
- Cooked containing a file system
- Boot information can be stored in a separate
partition - The root partition which contains the
operating-system kernel is mounted at boot time
(other partitions can be mounted later) - The operating system notes in its mount table
that a file system is mounted and the type of
file system - Windows mount each partition in a separate drive
letter - UNIX, file systems can be mounted at any
directory
11Virtual File Systems
- Modern operating systems support concurrent
multiple types of file systems - How are multiple types of file systems integrated
into a directory structure - How can users seamlessly move between file
systems - Most operating systems use an object-oriented
implementation - Allows very dissimilar file-system types to be
implemented within the same structure (i.e. NFS) - Users can access files, within multiple file
systems, on the same disk or across the network - Data structures and procedures are used to
isolate the basic system call functionality from
the implementation details
12Virtual File Systems Implementation
- Consists of three major layers
- File-system interface based upon on the open,
read, write and close calls, and file descriptors - Virtual File System (VFS)
- Separates file-system-generic operations from
their implementation - Based upon a file-representation structure,
called a vnode - File-system protocol
- Implements the file-system type or remote file
system protocol
13Directory Implementation
- Linear List
- Uses a linear list of file names with pointers to
data blocks, requires a linear search to find a
particular entry - Simple to program but time-consuming to execute
- Hash Table
- Uses a linear list to stores directory entries
but uses hashing to find the entry - Hashing can greatly decrease the directory search
time - Handle collisions situations where two file
names hash to the same location - Major difficulties with a hash table are its
fixed size and dependence on the hash function
14Allocation Methods
- An allocation method refers to how disk blocks
are allocated for files - Three major methods of allocating disk space are
in wide use - Contiguous allocation
- Linked allocation
- Indexed allocation
- Each method has its advantages and disadvantages
- Some systems support all three but more commonly
a system will use one particular method
15Contiguous-Allocation
- Requires each file to occupy a set of contiguous
blocks on the disk - Disk addresses define a linear ordering on the
disk - Simple only starting location (block ) and
length (number of blocks) are required - For a file n blocks long and starts at location
b, then it occupies blocks b, b1, b2, , bn-1 - The directory entry for each block represents
indicates the starting address of each block and
the length allocated for this file - Both sequential and direct access is supported
16Contiguous Allocation of Disk Space
17Contiguous Allocation (Cont.)
- Contiguous allocation has some problems
- Dynamic storage-allocation
- How to satisfy a request of size n from a list of
free blocks - External fragmentation
- Free space is broken into chunks and the largest
chunk is insufficient for a request - Determining how much space is needed for a file
- Allocate too little and the file may not be
extended - Allocate too much and space is wasted
- File cannot grow
18Extent Based Systems
- To minimize the drawbacks of contiguous file
allocation some file systems (I.e. Veritas File
System) use a modified scheme - A contiguous chunk of space is allocated
initially and when the amount is not large
enough, another chunk of contiguous space
(extent) is added to the initial allocation - Extent-based file systems allocate disk blocks in
extents - Internal fragmentation can still be a problem if
the extents are too large - External fragmentation can be a problem as
extents of various sizes are allocated and
de-allocated
19Linked Allocation
- Solves all the problems of contiguous allocation
- Each file is a linked list of disk blocks blocks
may be scattered anywhere on the disk - The directory contains a pointer to the first and
last blocks of a file
20Linked Allocation
21Linked Allocation (Cont.)
- No external fragmentation
- Any free block on the free-space list can be used
to satisfy a request - A file can grow as long as free blocks are
available, never need to compact disk space - Linked allocation does have disadvantages
- Only effective for sequential-access files
- Space required for the list pointers
- Reliability
- The File Allocation Table (FAT) is a variation to
the linked allocation method used to support
direct access
22File-Allocation Table
23Indexed Allocation
- Solves the external-fragmentation and
size-declaration problems of contiguous
allocation - Supports direct access by bringing all the
pointers together into the index block - Each file has its own index block, which is an
array of disk-block addresses
24Example of Indexed Allocation
25Indexed Allocation (Cont)
- Indexed allocation does suffer from wasted space
- Every file must have an index block. So the block
needs to be as small as possible. A File may
require more than one index blocks. - Linked scheme
- Multilevel scheme
- Combined scheme
26Linked Scheme
- An index block is normally one disk block
- Can be read and written directly by itself
- To allow for large files, link together several
index blocks (no limit on size)
27Indexed Allocation Mapping (Cont.)
- Two-level index (maximum file size is 5123)
Q1
LA / (512 x 512)
R1
Q1 displacement into outer-index R1 is used as
follows
Q2
R1 / 512
R2
Q2 displacement into block of index table R2
displacement into block of file
28Multilevel Index
- Use index of index blocks
- Use a first-level index block to point to a set
of second-level index blocks, which in turn point
to the file blocks - With 4KB blocks and index size of 4 bytes, what
is the maximum file size using 2-level index? - Could be extended to a third or fourth level,
depending on the maximum file size
29Multi-level Index mapping
30Combined Scheme UNIX (4K bytes per block)
- keep the first n pointers of the index block in
the files inode - Indexed-allocation suffers from some of the same
performance problems as does linked allocation - The index blocks can be cached in memory, but the
data blocks may be spread all over a volume
31Free-Space Management
- Need to reuse the space from deleted files for
new files - To keep track of free disk space, the system
maintains a free-space list - Stores all free blocks those not allocated to a
file or directory - To create a file the free-space list is searched
and that space is allocated to the new file, this
space is then removed form the list - When a file is deleted its disk space is added to
the free space list
32Bit Vector
- Frequently, the free-space list is implemented as
a bit-map or bit vector - Each block is represented by 1 bit
- If the block is free the bit is 1 if the block
is allocated the bit is 0
33Linked List
- Link together all the free disk blocks
- The first block contains a pointer to the next
free disk block, - Grouping
- Stores the addresses of n free blocks in the
first free block - Large numbers of free blocks can be found quickly
- Counting
- Stores the address of the first free block and
the number n of free contiguous blocks - The overall list will be shorter
34Linked Free Space List on Disk
35Efficiency and Performance
- Efficiency dependent on
- disk allocation and directory algorithms
- types of data kept in files directory entry
- Performance
- disk cache separate section of main memory for
frequently used blocks - free-behind and read-ahead techniques to
optimize sequential access - improve PC performance by dedicating section of
memory as virtual disk, or RAM disk
36Page Cache
- A page cache caches pages rather than disk blocks
using virtual memory techniques - Memory-mapped I/O uses a page cache
- Routine I/O through the file system uses the
buffer (disk) cache
37Unified Buffer Cache
- A unified buffer cache uses the same page cache
to cache both memory-mapped pages and ordinary
file system I/O
38Recovery
- Care must be taken to ensure that system failure
does result in loss of data or in data
inconsistency - Consistency checking
- Compares data in directory structure with data
blocks on disk, and tries to fix inconsistencies - The allocation and free-space-management
algorithms dictate what types of problems the
checker can find - Backup and Restore
- Use system programs to back up data from disk to
another storage device (floppy disk, magnetic
tape). - Recover lost file or disk by restoring data from
backup
39Log Structured File Systems
- Log structured (or journaling) file systems
record each update to the file system as a
transaction. - All transactions are written to a log. A
transaction is considered committed once it is
written to the log. However, the file system may
not yet be updated. - The transactions in the log are asynchronously
written to the file system. When the file system
is modified, the transaction is removed from the
log. - If the file system crashes, all remaining
transactions in the log must still be performed.
40The Sun Network File System (NFS)
- An implementation and a specification of a
software system for accessing remote files across
LANs (or WANs) - The implementation is part of the Solaris and
SunOS operating systems running on Sun
workstations using an unreliable datagram
protocol (UDP/IP protocol and Ethernet)
41Three Independent File Systems
42Mounting in NFS
Mounts
Cascading mounts
43NFS Mount Protocol
- Establishes initial logical connection between
server and client - Mount operation includes name of remote directory
to be mounted and name of server machine storing
it - Mount request is mapped to corresponding RPC and
forwarded to mount server running on server
machine - Export list specifies local file systems that
server exports for mounting, along with names of
machines that are permitted to mount them - Following a mount request that conforms to its
export list, the server returns a file handlea
key for further accesses - File handle a file-system identifier, and an
inode number to identify the mounted directory
within the exported file system - The mount operation changes only the users view
and does not affect the server side
44NFS (Cont.)
- Interconnected workstations viewed as a set of
independent machines with independent file
systems, which allows sharing among these file
systems in a transparent manner - A remote directory is mounted over a local file
system directory - The mounted directory looks like an integral
subtree of the local file system, replacing the
subtree descending from the local directory - Specification of the remote directory for the
mount operation is nontransparent the host name
of the remote directory has to be provided - Files in the remote directory can then be
accessed in a transparent manner - Subject to access-rights accreditation,
potentially any file system (or directory within
a file system), can be mounted remotely on top of
any local directory
45NFS (Cont.)
- NFS is designed to operate in a heterogeneous
environment of different machines, operating
systems, and network architectures the NFS
specifications independent of these media - This independence is achieved through the use of
RPC primitives built on top of an External Data
Representation (XDR) protocol used between two
implementation-independent interfaces - The NFS specification distinguishes between the
services provided by a mount mechanism and the
actual remote-file-access services
46NFS Protocol
- Provides a set of remote procedure calls for
remote file operations. The procedures support
the following operations - searching for a file within a directory
- reading a set of directory entries
- manipulating links and directories
- accessing file attributes
- reading and writing files
- NFS servers are stateless each request has to
provide a full set of arguments (NFS V4 is just
coming available very different, stateful) - Modified data must be committed to the servers
disk before results are returned to the client
(lose advantages of caching) - The NFS protocol does not provide
concurrency-control mechanisms
47Three Major Layers of NFS Architecture
- UNIX file-system interface (based on the open,
read, write, and close calls, and file
descriptors) - Virtual File System (VFS) layer distinguishes
local files from remote ones, and local files are
further distinguished according to their
file-system types - The VFS activates file-system-specific operations
to handle local requests according to their
file-system types - Calls the NFS protocol procedures for remote
requests - NFS service layer bottom layer of the
architecture - Implements the NFS protocol
48Schematic View of NFS Architecture
49NFS Path-Name Translation
- Performed by breaking the path into component
names and performing a separate NFS lookup call
for every pair of component name and directory
vnode - To make lookup faster, a directory name lookup
cache on the clients side holds the vnodes for
remote directory names
50NFS Remote Operations
- Nearly one-to-one correspondence between regular
UNIX system calls and the NFS protocol RPCs
(except opening and closing files) - NFS adheres to the remote-service paradigm, but
employs buffering and caching techniques for the
sake of performance - File-blocks cache when a file is opened, the
kernel checks with the remote server whether to
fetch or revalidate the cached attributes - Cached file blocks are used only if the
corresponding cached attributes are up to date - File-attribute cache the attribute cache is
updated whenever new attributes arrive from the
server - Clients do not free delayed-write blocks until
the server confirms that the data have been
written to disk