Title: File system general aspects
1File system general aspects
- Organizes and controls all persistent information
- One corner of the security blanket
- Files are an abstraction for many things
2Where does it fit in
System call interface
Virtual file system layer
Ext3 (linux native)
Vfat (MS-DOS)
Descriptor-basedobjects
Page, disk block and inode caches
Device drivers
Networking
3Some basic techiques
4Abstraction
- Virtual file system
- Allows a common interface to a large variety of
file systems - Implemented as an abstraction layer
- File system implementations are in
/usr/src/linux/fs - Uses the fops structure of pointers to methods
this allows adding specialized methods to unusual
objects. - Program can function with several kinds of
objects e. g. data files, interactive windows,
network connections - Device special files
- System calls on device special files use the
standard methods, but the versions implemented in
the device driver - Device driver can be added later when the device
is added to the system or even invented -
5Caching
- Frequently used items are kept in memory
- They are periodically written to background
storage, but this need be done only periodically
or when the object is closed - Note that disk operations are several orders of
magnitude slower than memory - Repeated reads or reads in the same block often
cause several memory operations, but only one
disk operation - Caching can also be used to reduce the number of
network transfers, for example in network file
systems - Global file system data is always cached other
open objects as needed
6Common layered objects
Network connection
Pseudo or real terminal
Application layer
Application
Transport (TCP) layer
Network (IP) layer
Link (LAN) layer
Physical layer
7The parts of a generic file system
- Global information
- File system type and size
- Status information (up-to-date, time stamps,
bootable) - Size and location of each component
- Per-file description
- Type, size, permissions
- Time stamps
- Pointers to data blocks or elements
- Data blocks
- Single data area (Unix/Linux, MS-DOS)
- Multiple attributes (data is one) NTFS
8The 5 basic file operations
- open(path, type of open) returns a descriptor
- Associates a descriptor with a file all other
calls refer using this descriptor - close(descriptor) returns success or failure
- Dissociates object from system descriptor is
available for a new open - read(descriptor, buffer, maxbytes) returns how
many bytes read - Reads information into buffer or obtains it if
read in advance - write(descriptor, buffer, maxbytes) returns how
many bytes written - Transfers contents of buffer into kernel space
kernel writes them in its own good time - Ioctl(descriptor, command)
- Sends a command to the object designated by the
descriptor this command may be device-dependent - Note there are more file operations, these are
the most common and important ones. Details
follow -
9The file-operations structure (half)
10The file-operations structure (half)
11Linux specialties
- Ext2 file system
- Clustered implementation for efficiency
- Has hooks, but not implementation for undelete
- mmap idea
- A file being accessed randomly can be treated as
virtual memory - Accessing a block is handled through the virtual
memory subsystem, not the disk block cache - Dentry (directory entry) objects
- Cached
- Accessed via hashing
12Internal linux structures
- Directory objects
- These are artificially constructed for file
systems without directories (FAT) - Dentry objects
- Multipurpose
- Acts as a controller for the inode cache
- In Linux it is effectively an entry in the system
open file table - In-core inodes
- These are artificially constructed for file
systems without inodes (FAT) - These are always cached hence the term in-core
13The ext2 file system
- Efficiency features
- Block size 1024, 2048, 4096
- Partitioned into groups to shorten seek time
- Choice of blocks/inode
- Preallocated contiguous spaces for regular files
- Fast symbolic links (link info stored in inode)
- Safety features
- Doing operations in repairable order
- Making new hard link done by
- Increment refcount in inode
- Then add new name to directory
- If crash happens between these two operations,
fsck finds and fixes - Support for (even root cant override these)
- Immutable files
- Append-only files
14Planned new features for ext2
- Block fragmentation
- Same block can contain fragments of several file
ends - Access control lists
- Fine-grained and temporary control of access
- Handling compressed and encrypted files
- Note compress first, then encrypt -
- Logical deletion (supports undelete)
- Reason is obvious
15Structure of an ext2 cluster
- Features and size
- Bitmaps are all block size so number of items
ltbits per block - Looking at bitmaps word-by-word allows fast
searching for holes - Superblock and group descriptors are copies of
those in other groups - Number of groups/file system gt
partition_size/group_size
16Contents of superblock and group
descriptors(partial list)
- Superblock
- Must fit in 1024 bytes (why?)
- Flexible version number, etc.
- Size and current number of everything global
- Group descriptor (24 bytes)
- Location and current number of
- Bitmaps, inode tables, directories, and data
blocks - Information on each group is found in each
groups descriptor area each group knows about
all the others
17Inodes and the inode table
- Table size is implied in group descriptor
- Inode size is a fixed 128
- Structure is almost like all other unix
filesystems also - Fragment address
- ACL for descriptor
18Usage of blocks regular files and directories
- Directory
- Names can be up to 255 bytes must contain
length - End of entry is padded to multiple of 4
- Symbolic link
- Destination (if less than 60 char) is in inode,
else in a data block - Device file, pipe, and socket
- Everything fits into inode
- Bitmap caches are used
- One each for inode and data blocks
19Disk space management
- Goals
- Avoid fragmentation
- Time efficiency
- Methods for achieving these goals
- Allocation
- Directories should be evenly distributed among
groups - Files should be near their directories
- Preallocation allocates adjacent blocks near any
new one
20VFS in action
21The basic objects of VFS
The superblock object Stores information
concerning a mounted filesystem. For disk-based
filesystems, this object usually corresponds to a
filesystem control block stored on disk. The
inode object Stores general information about a
specific file. For disk-based filesystems, this
object usually corresponds to a file control
block stored on disk. Each inode object is
associated with an inode number, which uniquely
identifies the file within the filesystem. The
file object Stores information about the
interaction between an open file and a process.
This information exists only in kernel memory
during the period when a process has the file
open. The dentry object Stores information about
the linking of a directory entry (that is, a
particular name of the file) with the
corresponding file. Each diskbased filesystem
stores this information in its own particular way
on disk.
- The superblock object
- Stores information concerning a mounted
filesystem. For disk-based filesystems, this
object usually corresponds to a filesystem
control - block stored on disk.
- The inode object
- Stores general information about a specific file.
For disk-based filesystems, this object usually
corresponds to a file control block stored - on disk. Each inode object is associated with an
inode number, which uniquely identifies the file
within the filesystem. - The file object
- Stores information about the interaction between
an open file and a process. This information
exists only in kernel memory during the - period when a process has the file open.
- The dentry object
- Stores information about the linking of a
directory entry (that is, a particular name of
the file) with the corresponding file. Each
diskbased - filesystem stores this information in its own
particular way on disk.
22(No Transcript)
23(No Transcript)
24llseek(file, offset, origin)Updates the file
pointer.read(file, buf, count, offset)Reads
count bytes from a file starting at position
offset the value offset (which usually
corresponds to the file pointer) isthen
increased.aio_read(req, buf, len, pos)Starts an
asynchronous I/O operation to read len bytes into
buf from file position pos (introduced to support
the io_submit( )system call).write(file, buf,
count, offset)Writes count bytes into a file
starting at position offset the value offset
(which usually corresponds to the file pointer)
isthen increased.aio_write(req, buf, len,
pos)Starts an asynchronous I/O operation to
write len bytes from buf to file position
pos.readdir(dir, dirent, filldir)Returns the
next directory entry of a directory in dirent
the filldir parameter contains the address of an
auxiliary function thatextracts the fields in a
directory entry.poll(file, poll_table)Checks
whether there is activity on a file and goes to
sleep until something happens on it.
- ioctl(inode, file, cmd, arg)Sends a command to
an underlying hardware device. This method
applies only to device files.unlocked_ioctl(file,
cmd, arg)Similar to the ioctl method, but it
does not take the big kernel lock (see the
section "The Big Kernel Lock" in Chapter 5). It
is expectedthat all device drivers and all
filesystems will implement this new method
instead of the ioctl method.compat_ioctl(file,
cmd, arg)Method used to implement the ioctl()
32-bit system call by 64-bit kernels.mmap(file,
vma)Performs a memory mapping of the file into a
process address space (see the section "Memory
Mapping" in Chapter 16).open(inode, file)Opens
a file by creating a new file object and linking
it to the corresponding inode object (see the
section "The open( ) System Call"later in this
chapter).flush(file)Called when a reference to
an open file is closed. The actual purpose of
this method is filesystem-dependent.release(inode
, file)Releases the file object. Called when the
last reference to an open file is closedthat is,
when the f_count field of the file objectbecomes
0.fsync(file, dentry, flag)Flushes the file by
writing all cached data to disk.aio_fsync(req,
flag)Starts an asynchronous I/O flush operation.