A FAST FILE SYSTEM FOR UNIX - PowerPoint PPT Presentation

About This Presentation

Title:

A FAST FILE SYSTEM FOR UNIX

Description:

The designers of UNIX selected an i-node organization that. Wasted little space for small files ... Log-structured file systems do most writes in sequential fashion ... – PowerPoint PPT presentation

Number of Views:381

Avg rating:3.0/5.0

Slides: 43

Provided by: jeha1

Learn more at: https://www2.cs.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: A FAST FILE SYSTEM FOR UNIX

1
A FAST FILE SYSTEM FOR UNIX

Marshall K. Mckusick William N. JoySamuel J.
LefflerRobert S. Fabry
CSRG, UC Berkeley

2
PAPER HIGHLIGHTS

Main objective of FFS was to improve file system
bandwidth
Key ideas were
Subdividing disk partitions into cylinder groups,
each having both i-nodes and data blocks
Using larger blocks but managing block
fragments
Replicating the superblock

3
THE OLD UNIX FILE SYSTEM

Each disk partition contains
a superblock containing the parameters of the
file system disk partition
an i-list with one i-node for each file or
directory in the disk partition and a free list.
the data blocks (512 bytes)

4
More details

File systems cannot span multiple partitions
Must use mount() to merge several file systems
into a single tree
Superblock contains
The number of data blocks in the file system
A count of the maximum number of files
A pointer to the free list

5
File types

Three types of files
ordinary filesuninterpreted sequences of bytes
directoriesaccessed through special system
calls
special files allow access to hardware devices
but are not really files

6
Ordinary files (I)

Five basic file operations are implemented
open() returns a file descriptor
read() reads so many bytes
write() writes so many bytes
lseek() changes position of current byte
close() destroys the file descriptor

7
Ordinary files (II)

All reading and writing are sequential.
The effect of direct access is achieved by
manipulating the offset through lseek()
Files are stored into fixed-size blocks
Block boundaries are hidden from the usersSame
as in FAT and NTFS file systems

8
The file metadata

Include file size, file owner, access rights,
last time the file was modified, but not the
file name
Stored in the file i-node
Accessed through special system callschmod(),
chown(), ...

9
I/O buffering

UNIX caches in main memory
I-nodes of opened files
Recently accessed file blocks
Delayed write policy
Increases the I/O throughput
Will result in lost writes whenever a process or
the system crashes
Terminal I/O are buffered one line at a time

10
Directories (I)

Map file names with i-node addresses
Do not contain any other information!

11
Directories (II)

Two or more directory entries can point to the
same i-node
A file can have several names
Directory subtrees cannot cross file system
boundaries
To avoid loops in directory structure, directory
files cannot have more than one pathname

12
Mounting a file system
Root partition
/
Other partition
usr
mount
bin
After mount, root of second partition can be
accessed as /usr
13
Special files

Map file names with system devices
/dev/tty your terminal screen
/dev/kmem the kernel memory
/dev/fd0 the floppy drive
Main motivation is to allow accessing these
devices as if they were files
no separate I/O constructs for devices

14
A file system
Superblock
I-nodes
Data Blocks
15
The i-node (I)

Each i-node contains
The user-id and the group-id of the file owner
The file protection bits
The file size
The times of file creation, last usage and last
modification

16
The i-node (II)

The number of directory entries pointing to the
file, and
A flag indicating if the file is a directory, an
ordinary file, or a special file.
Thirteen block addresses
The file name(s) can be found in the directory
entries pointing to the i-node.

17
Storing block addresses
18
Addressing file contents

I-node has ten direct block addresses
First 5,120 bytes of a file are directly
accessible from the i-node
Next block address contains address of a block
containing 512/4 128 blockaddresses
Next 64K of a file require one level of
indirection

19
Addressing file contents

Next block address allows to access a total of
(512/4)2 16K data blocks
Next 8 MB of a file require two levels of
indirection
Last block address allows to access a total of
(512/4)3 2M blocks
Next GB of a file requires one level of
indirection

20
Explanation

File sizes can vary from a few hundred bytes to a
few gigabytes with a hard limit of 4 gigabytes
The designers of UNIX selected an i-node
organization that
Wasted little space for small files
Allowed very large files

21
Discussion

What is the true cost of accessing large files?
UNIX caches i-nodes and data blocks
When we access sequentially a very large file we
fetch only once each block of pointers
Very small overhead
Random access will result in more overhead if we
cannot cache all blocks of pointers

22
First Berkeley modifications

Staging modifications to critical file system
information so that they could either be
completed or repaired cleanly after a crash
Increasing the block size to 1,024 bytes
Improved performance by a factor of more than two
Did not let file system use more than four
percent of the disk bandwidth

23
What is disk bandwidth?

Maximum throughput of a file system if disk drive
was continuously transferring data
Actual bandwidths are much lower because of
Disk seeks
Disk rotational latency

24
Major issue

As files were created and deleted, free list
became entirely random
Files were allocated random blocks that could be
anywhere on the disk
Caused a very significant degradation of file
system performance (factor of 5!)
Problem is not unique to old UNIX file system
Still present in FAT and NTFS file systems

25
THE FAST FILE SYSTEM

BSD 4.2 introduced the fast file system
Superblock is replicated on different cylinders
of disk
Have one i-node table per group of cylinders
It minimizes disk arm motions
I-node has now 15 block addresses
Minimum block size is 4K
15th block address is never used

26
Cylinder groups

Each disk partition is subdivided into groups of
consecutive cylinders
Each cylinder group contains a bit map of all
available blocks in the cylinder group
Better than linked list
The file system will attempt to keep consecutive
blocks of the same file on the same cylinder group

27
Larger block sizes

FFS uses larger blocks
At least 4 KB
Blocks can be subdivided into 2, 4, or 8
fragments that can be used to store
Small files
The tails of larger files

28
Replicating the superblock

Each cylinder group has
Ensures that a single head crash would never
delete all copies of the superblock

29
Explanations (I)

Increasing the block size to 4K eliminates the
third level of indirection
Keeping consecutive blocks of the same file on
the same cylinder group reduces disk arm motions

30
Internal fragmentation issues
Since UNIX file systems typically store many very
small files, increasing the block size results
in an unacceptablyhigh level of internal
fragmentation
31
The solution

Using 4K blocks without allowing fragments
would have wasted 45.6 of the disk space
This would be less true today
FFS solution is to allocate block fragments to
small files and tail end or large files
Allows efficient sequential access to large files
Minimizes disk fragmentation

32
Layout policies (I)

FFS tries to place all data blocks for a file in
the same cylinder group, preferably
At rotationally optimal positions
In the same cylinder.
Large files could quickly use up all available
space in the cylinder group

33
Layout policies (II)

FFS redirects block allocation to a different
cylinder group
a file exceeds 48 kilobytes
at every megabyte thereafter

34
PERFORMANCE IMPROVEMENTS

Read rates improved by a factor of seven
Write rates improved by a factor of almost three
Transfer rates for FFS do not deteriorate over
time
No need to defragment the file system from time
to time
Must keep a reasonable amount of free space
Ten percent would be ideal

35
Limitations of approach (I)

Even FFS does not utilize full disk bandwidth
Log-structured file systems do most writes in
sequential fashion
Crashes may leave the file system in an
inconsistent state
Must check the consistency of the file system at
boot time

36
Limitations of approach (II)

Most of the good performance of FFS is due to its
extensive use of I/O buffering
Physical writes are totally asynchronous
Metadata updates must follow a strict order
Cannot create new directory entry before
newi-node it points to
Cannot delete old i-node before deleting last
directory entry pointing to it

37
Example Creating a file (I)
i-node-1
abc
ghi
i-node-3

Assume we want to create new file tuv
38
Example Creating a file (II)
i-node-1
abc
ghi
i-node-3
tuv
?
Cannot write directory entry tuv before i-node
39
Limitations of approach (III)

Out-of-order metadata updates can leave the file
system in temporary inconsistent state
Not a problem as long as the system does not
crash between the two updates
Systems are known to crash
FFS performs synchronous updates of directories
and i-nodes
Solution is safe but costly

40
OTHER ENHANCEMENTS

Longer file names
256 characters
File locking
Symbolic links
Disk quotas

41
File locking

Allows to control shared access to a file
We want a one writer/multiple readers policy
Older versions of UNIX did not allow file locking
System V allows file and record locking at a
byte-level granularity through fcntl()
Berkeley UNIX has purely advisory file
lockslike asking people to knock before entering

42
Symbolic links