Title: Local File Stores
1Local File Stores
2Notice of Review Session
- 12/11 (reading day)
- 1000am1200 noon
- Bowne 111 (this room)
3Job of a File Store
- Recall that the File System is responsible for
namespace management, locking, quotas, etc. - The File Stores responsbility is to mange the
placement of data and index blocks on the disk. - Prior to 4.4 BSD, these were rolled into one
module, also called the filesystem (remember,
there were no vnodes).
4Physical Disk Layout
Overhead view
sector
track
cylinder
The same track on each platter in a disk makes a
cylinder partitions are groups of contiguous
cylinders
Disk blocks are composed of one or more
contiguous sectors.
5Sample Partition Table
Disk /dev/sda 255 heads, 63 sectors, 1106
cylinders Units cylinders of 16065 512
bytes Device Boot Start End Blocks
Id System /dev/sda1 1 261
2096451 83 Linux /dev/sda2 262
1106 6787462 5 Extended /dev/sda5
262 264 24066 83 Linux /dev/sda6
265 814 4417843 83 Linux /dev/sda7
815 1089 2208906 83
Linux /dev/sda8 1090 1106 136521
82 Linux swap
6vnode File Store Operations
- Valloc, vfree
- Update
- Vget, blkatoff, read,
- write, fsync
- Truncate
- Object creation and
- deletion
- Attribute update
- Object read/write
- Change in space
- allocation (size)
7valloc and vfree
- Valloc creates a new object
- returns a number (identifier)
- mapping of names to identifiers is the job of the
namespace code filestore only deals with numbers - Vfree takes the number of an object and releases
the storage holding that object
8update
- Changes attributes of an object
- Owner
- Group
- Permissions
- Timestamps
- Does not interpret these fields in any way
9vget, read, write, blkatoff, fsync
- Vget retrieves an entire object from the
filestore - Read copies data from the object into a buffer
(uses uio structure) - Write copies data from a buffer to the object
(uses uio structure) - Blkatoff is like read, but returns a pointer
instead of copying data (stays in kernel) - Fsync writes out all dirty buffers for the object
10truncate
- Changes the amount of space an object has
- Historically, truncate only shortened objects
(decreased their size) - In 4.4BSD, truncate can expand the size of an
object (confusing name)
11Filestores and Partitions
- One-to-one relationship
- No more than one filestore per partition
- Filestores may not span partitions
- Filestore is responsible for managing space
within its partition - Creation, storage, retrieval, deletion of files
- Flat name space (inode numbers and data block
numbers)
12Allocation Strategies
- The old days contiguous allocation
- All blocks of a file stored together
- As the file grows, it moves around the disk
- Requires compaction
- Use of indexed allocation (inodes) allows
non-contiguous (scattered) allocation - No compaction necessary
13Block I/O
- Even if a user just asks for one byte, the disk
transfers a block - The file system divides files into fixed-size
logical blocks - Size depends on underlying filestore
- Logical blocks are stored in physical disk blocks
- One or more contiguous sectors
- E.g., 8,192-byte blocks and 512-byte sectors
14Disk Request Handling
- User sees array of bytes
- User makes request with a pointer to a buffer and
a length - No alignment guarantees with respect to blocks
- No size guarantees with respect to blocks
- Disk blocks are buffered in filesystem buiffer
cache
15Steps in Request HandlingExample of a simple
write
- Iterate as follows
- Allocate a buffer
- Determine location of physical block on disk
- Request disk controller to read contents of
buffer, and wait. - Copy from the users I/O buffer to the system
buffer - Write block to disk and continue (dont wait)
16Anatomy of a Requestwrite(fd, buffer, cnt)
buffer
Logical file
System buffers
Logical blocks
0
1
2
3
12767
1
disk
32447
2
90255
3
0
82653
17Next Time
- Berkeley FFS
- Log-structured File System
18Traditional Unix File Systems
- Filesystem descriptive information kept in the
Superblock - Number of data blocks
- Maximum number of files
- Pointer to free list
- About 3 of the blocks were inodes
- All inodes grouped together, followed by data
- 512-byte blocks, often on different cylinders
- Drives up seek time per byte transferred
19Berkeley old File System
- Improve reliability and throughput
- Stage modifications to critical file structures
(make them atomic), facilitating recovery - Double block size
- 2x as much data transferred on each read
- More than doubled performance
- More files fit in direct blocks of files
20Problems with Old File System
- The free list started out with nice grouping
- As files were created and destroyed, the free
list fragmented - Essentially random placement of data blocks
- Throughput dropped by a factor of 5 in a few
weeks
21Key Observation
- What is the dominant factor in disk operations?
- Keeping all the data blocks for a file on the
same cylinder, or a few close cylinders, would
ameliorate this - Need to keep inodes near the data, too
22Berkeley Fast File System
- 4,096-byte blocks (or power of 2 larger)
- Allows 232 (2 gigabyte) files with 2 levels of
indirection - Block size recorded in superblock
- Use cylinder groups to reduce scattering
- Groups of consecutive cylinders on the disk
- Inode and data for a file stay in the cylinder
23Cylinder Groups
- Bookkeeping information
- Redundant copy of superblock
- Bitmap of free blocks
- Summary information about allocation
- Default 1 inode per 2048 bytes of space in the
group (more than we should need) - Bookkeeping information staggered across platters
for availability
24Wasted Space with Block Sizes
1993 survey median file size lt 2048, mean 22k
25Tradeoffs
- Previous chart showed tradeoff between block size
and waste - Throughput goes up with block size. Why? Is this
necessarily a good measure? - Maximum file size goes up with block size
- How can we get the best of both worlds?
- Fragments (uniform pieces of blocks) can be
allocated, e.g. a 4096/1024 file system
26Parameterized Filesystems
- File system performance can depend on many
factors - Processor speed
- Hardware (controller) support for large transfers
and caching - Maximum disk bandwidth
- Rotational/seek latencies
27Layout Policies
- Global
- Group data within the same cylinder group
- Looked at another way, spread unrelated data
across cylinder groups - Inode and data block allocation (coarse)
- Local
- Which data blocks to allocate (take into account
ability of disk to read contiguous blocks)
28More on Global Layout
- Inodes
- Inodes in the same directory are often accessed
together (e.g., ls) - When allocating space for a new directory, find a
cylinder group with few directories and a greater
than average number of inodes - Why?
- Data blocks
- Allocate space for large files across cylinder
groups - Keeps blocks in the same group contiguous
- Prevents any one cylinder from being too full
(forcing other files to spill over)
29Log-Structured File System
- The FFS was designed when memory was expensive
- Buffer cache would be relatively small
- Files would need to be read often
- But memory is now cheap
- Does that help us with any problems?
30Problems with FFS
- Synchronous I/O
- File creation/deletion requires up to five
operations, two of which are synchronous - In reality, only a minor issue
- Seek times
- Not a big issue for a single file, if the
allocation routines do their jobs - Is an issue when writing for multiple files
31Enter the LFS
- Store all data in a single, contiguous log
- Never seek between writes (because youre always
writing at the end of the contiguous space) - Also works well for reading small
contiguously-written files - Seek to beginning data block of file
- Read entire file
32LFS Ignores Many Variables
- The LFS ignores processor speed, rotational
latency, etc. when laying out files - Access model
- Reads are cached
- Writes are contiguous
33LFS Data Structures
- Basically the same as FFS
- Superblock
- Inodes
- Directories
- Allows analysis tools for FFS to work on LFS
34LFS Layout
Disk layout
File info
segment summary
disklabel
checksums
Number of blocks
superblock segment 1
next seg.
data block
file count
inode
Version
inode cnt
segment 2
data block
inode
file info 1
data block
Last block size
file info n
Logical block 1
inode
inode daddr
data block
Partial segment
superblock segment 1
inode daddr
Logical block n
Segment summary
35More on Layout
- The disk is divided into segments
- View log as linked list of segments
- Easy to set aside segments to be cleaned
- Reuse portions of the log that are no longer
needed (logically overwritten blocks) - Read in a segment, discard dead blocks, and
write live blocks to end of log
36LFS Operation
- Accumulate dirty blocks in memory
- Write out an entire segment at a time
- Segments are usually 0.5 or 1 Mbyte each
- Inodes are interleaved with data
- No fixed position on disk (unlike FFS)
- Keep the inode map to index by inode and find
place on disk (additional data structure)
37Example Reading a Filewith a Known inode Number
- Read in the superblock extract location of index
file - Read in the block of inodes with the index files
inode find it - Read the data block of the index file containing
the mapping for the requested files inode - Use disk address in inode-map entry and read the
block of inodes for the requested file read it. - Use the disk address in the inode to read the data
38Use of Caches
- Normally, almost everything we just talked about
will be cached - Think about the disk overhead of reading inodes
over and over - This is one of the main overheads of the FFS
- We have cheap, big memory, remember?
39Writing to the Log
- The LFS writes all dirty blocks whenever
- any one block has to be written (e.g., on an
fsync - ΒΌ of the total buffers
- One or more partial segments will be written
40Writing to the Log II
- Traverse vnode list, gathering all dirty blocks
- Sort by file and logical block number
- Support contiguous operations
- Assign disk addresses
- Update metadata (inodes, index blocks) and add
them to data to be written - Format into partial segments
- Create segment summaries
- Checksum
- Write
41Checkpoints
- The LFS checkpoints the filesystem periodically
(assist in crash recovery) - Index file and its metadata must be written
- Superblock must be updated (location of index
file) and written