Title: File Systems and Disk Management
1File Systems and Disk Management
2File system
- Interface between applications and the mass
storage/devices - Provide abstraction for the mass storage and IO
devices
3File System abstraction
Physical reality File system abstraction
Block-oriented Byte-oriented
Physical sectors Named files
No protection Users protected from one another
Data might be corrupted if machine crashes Robust to machine failures
4File System Components
- Disk management organizes disk blocks into files
- Naming provides file names and directories to
users, instead of tracks and sector numbers. - Protection keeps information secure from other
users - Reliability protects information loss due to
system crashes
5User vs. System View of a File
- User level individual files
- System call level collection of bytes
- Operating system level
- A block is a logical transfer unit
- Even for getc() and putc()
- 4 Kbytes under UNIX
- A sector is a physical transfer unit
- 512-byte sectors on disks
- The block size is a multiple of the sector size
6User vs. System View of a File
- A process
- Read bytes 2 to 12
- OS
- Fetch the block containing those bytes
- Return those bytes to the process
7User vs. System View of a File
- A process
- Write bytes 2 to 12
- OS
- Fetch the block containing those bytes
- Modify those bytes
- Write out the block
8Ways to Access Files
- People use file systems
- Design of file systems involves understanding how
people use file systems - Sequential accessbytes are accessed in order
- Random access (direct access)bytes are accessed
in any order - Content-based accessbytes are accessed according
to constraints on contents - e.g., return 100 bytes starting with aye carumba
9File Usage Patterns
- Most files are small, and most references are to
small files - e.g., .login and .c files
- Large files use up most of the disk space
- e.g., mp3 files
- Large files account for most of the bytes
transferred between memory and disk - Bad news for file system designers
10File System Design Constraints
- High performance
- Efficient access of small files
- Many small files
- Used frequently
- Efficient access of large files
- Consume most disk space
- Account for most of the data movement
11Some Definitions
- A file contains a file header, which associates
the file with its disk sectors
File header
12Some Definitions
- A file system needs a disk allocation bitmap to
represent free space on the disk, one bit per
block
13Disk Allocation Policies
- Contiguous allocation
- Link-list allocation
- Segment-based allocation
- Indexed allocation
- Multi-level indexed allocation
- Hashed allocation
14Contiguous Allocation
- File blocks are stored contiguously on disk
- To allocate a file,
- Specify the file size
- Search the disk allocation bitmap for consecutive
free blocks
File header
15Pros and Cons of Contiguous Allocation
- Fast sequential access
- Ease of computing random file locations
- Adding an offset to the first disk block location
- - External fragmentation
- - Difficulty in growing files
16Linked-List Allocation
- Each file block on a disk is associated with a
pointer to the next block - A special marker to indicate the end of the file
- e.g., MS-DOS file system
- File attribute table (FAT)
File header
17Pros and Cons of Linked-List Allocation
- Files can grow dynamically with incremental
allocation of blocks - - Sequential access may suffer
- Blocks may not be contiguous
- - Horrible random accesses
- May involve multiple sequential searches
- - Unreliable
- A corrupted pointer can lead to loss of the
remaining file
18Indexed Allocation
- Uses a preallocated index to directly track the
file block locations
File header
19Pros and Cons of Indexed Allocation
- Fast lookups and random accesses
- - File blocks may be scattered all over the disk
- Poor sequential access
- Needs defragmenter
- - Needs to reallocate index as the file size
increases
20Segment-Based Allocation
- Needs a segment table to allocate multiple,
contiguous regions of blocks
File header
21Pros and Cons of Segment-Based Allocation
- Relax the requirements for large contiguous
disk regions - - Random accesses not as fast as pure contiguous
allocation
22Multilevel Indexed Allocation
- Certain index entries point to index blocks, as
opposed to data blocks (e.g., Linux ext2)
File header
23Multilevel Indexed Allocation
- A single indirect block contains pointers to data
blocks - A double indirect block contains pointers to
single indirect blocks - A triple indirect block contains pointers to
double indirect blocks
24Pros and Cons of Multilevel Indexed Allocation
- Optimized for small and large files
- Small files accessed through the first 12
pointers - Large files can grow incrementally
- - Multiple disk accesses to fetch a data block
under triple indirect block - - Largest file size capped by the number of
pointers - - Arbitrary file size boundaries among levels
25Hashed Allocation
- Allocates a disk block by hashing the block
content to a disk location
Old file header
New file header
26Pros and Cons of Hashed Allocation
- File blocks of the same content can share the
same disk block to save storage - e.g., empty blocks
- Good for backups and archival
- Small modifications to a large file result in
only additional storage of the changes - - Poor disk performance