Title: Chapter 8 File Management
1Chapter 8File Management
28.1 Introduction
- Data should be organized in some convenient and
efficient manner. In particular, users should be
able to - Put data into files
- Find and use files that have previously been
created
3File System
- Set of OS Services that provides Files and
Directories for user applications
48.2 Files
- A file is simply a sequence of bytes that have
been stored in some device (storage) on the
computer
5Files
- Those bytes will contain whatever data we would
like to store in the file such as - A text file just containing characters that we
are interested in - A word processing document file that also
contains data about how to format the text - A database file that contains data organized in
multiple tables. - In general, the File Management system does not
have any knowledge about how the data in a file
is organized. That is the responsibility of the
application programs that create and use the file.
6Permanent (non-volatile) Storage Devices
- Disk Drives
- Flash Memory (Memory stick)
- CDs and DVDs
- Magnetic tape drives
78.2.1 File Attributes
- Name
- Symbolic (Human-readable) name of the file
- Type
- Executable file, print file, etc.
- Location
- Where file is on disk
8File Attributes
- Size
- Protection
- Who can read, write file, etc.
- Time, date
- When file was created, modified, accessed
98.2.2 Folders
- An important attribute of folders is the Name
- Typically, a folder may contain Files and other
Folders (commonly called sub-folders or
sub-directories) - This results in a Tree Structure of Folder and
Files.
10Folder/Directory Tree Structure
118.2.3 Pathnames
- The pathname of a file specifies the sequence of
folders one must traverse to travel down the tree
to the file. - This pathname actually describes the absolute
path of the file, which is the sequence of
folders one must travel from the root of the tree
to the desired file. - A relative path describes the sequence of Folders
one must traverse starting at some intermediate
place on the absolute path. - The Absolute path provides a unique
identification for a file. Two different files
can have the same filename as long as the
resulting pathnames are unique.
12File Links
- Allow a directory entry to point to a file (or
entry) that is not directly below it in the tree
structure - Unix Symbolic Link
- Windows Shortcut
13Link in Directory Tree Structure
148.3 Access Methods
- An access method describes the manner and
mechanisms by which a process accesses the data
in a file. - There are two common access methods
- Sequential
- Random (or Direct)
15File Operations
- When a process needs to use a file, there are a
number of operations it can perform - Open
- Close
- Read
- Write
16Create File
- Allocate space for file
- Make entry for file in the Directory
178.3.1 Open File
- Make files accessible for read/write operations
- Locates files in the Directory
- Returns internal ID for the file
- Commonly called a Handle
- handle open(filename, parameters)
18File Open
198.3.2 Close File
- Makes file no longer accessible from application
- Deletes the Handle created by Open
20File Close
218.3.3 Read File
- System call specifies
- Handle from Open call
- Memory Location, length of information to be read
- Possibly, location in the file where data is to
be read from - read(file handle, buffer)
- read(file handle, buffer, length)
22Read File
- Uses Handle to locate file on disk
- Uses files Read Pointer to determine the
position in the file to read from - Update files Read Pointer
238.3.4 Write File
- System call specifies
- Handle from Open call
- Location, length of information to be written
- Possibly, location in the file where data is to
be written - write(file handle,buffer,length)
24 Write File
- Use Handle to locate file on disk
- Use files Write pointer to determine the
position in the file to write to - Update files Write Pointer
25Delete File
- Deletes entry for file in Directory
- De-allocates disk space used by the file
268.3.5 Sequential Access
- If the process has opened a file for sequential
access, the File Management subsystem will keep
track of the current file position for reading
and writing. - To carry this out, the system will maintain a
file pointer that will be the position of the
next read or write.
27File Pointer
- The value of the file pointer will be initialized
during Open to one of two possible values - Normally, this value will be set to 0 to start
the reading or writing at the beginning of the
file. - If the file is being opened to append data to the
file, the File Position pointer will be set to
the current size of the file. - After each read or write, the File Position
Pointer will be incremented by the amount of data
that was read or written.
288.3.6 Streams, Pipes, and I/O Redirection
- A Stream is the flow of data bytes, one byte
after another, into the process (for reading) and
out of the process (for writing). - This concept applies to Sequential Access and was
originally invented for network I/O, but several
modern programming environments (e.g. Java, C)
have also incorporated it.
29Standard I/O
- Standard Input
- Defaults to keyboard
- Standard Output
- Defaults to console
30I/O Redirection
- Standard Input can come from a file
- app.exe lt def.txt
- Standard Output can go to a file
- App.exe gt def.txt
- Standard Output from one application can be
Standard Input for another - App1.exe app2.exe
Called a Pipe
31A Pipe
32Pipe
- A Pipe is a connection that is dynamically
established between two processes. - When a process reads data, the data will come
from another process rather than a file. Thus, a
pipe has a process at one end that is writing to
the pipe and another process reading data at the
other end of the pipe. - It is often the situation that one process will
produce output that another process needs for
input. - Rather than having the first process write to a
file and the second process read that file, we
can save time by having each process communicate
via a pipe.
33Pipe and Performance
- Using a pipe can improve system performance in
two ways - By not using a file, the applications save time
by not using disk I/O. - A pipe has the characteristic that the receiving
process can read whatever data has already been
written. Thus we do not need to wait until the
first process has written all of the data before
we start executing the second process. This
creates a pipeline similar to an automobile
assembly line to speed up overall performance.
348.4 Directory Functions
- Search for a file
- Create a file
- Delete a file
- List a directory
- Rename a file
- Traverse the file system
358.5 File Space Allocation
- Contiguous
- File is allocated contiguous disk space
36File System Implementation
- A possible file system layout
A Master Boot Record (MBR) is a special type of
boot sector at the very beginning of partitioned
computer mass storage devices. The MBR holds the
information on how the logical partitions,
containing file systems, are organized on that
medium.
37Implementing Files (1)
- (a) Contiguous allocation of disk space for 7
files - (b) State of the disk after files D and E have
been removed
38Contiguous Allocation
- Advantages
- Simple to implement
- Good disk I/O performance
- Disadvantages
- Need to know max file size ahead of time
- Probably will waste disk space
- Necessary space may not be available
39Contiguous Allocation
Read/Write Disk Address Calculation
408.5.1 Cluster Allocation
- Cluster Allocation
- Disk space allocated in blocks
- Space allocated as needed
41Cluster Allocation
42Implementing Files (3)
- Linked list allocation using a file allocation
table in RAM
43Implementing Files (4)
44Cluster Allocation
- Advantages
- Tends not to waste disk space
- Disadvantages
- Additional overhead to keep track of clusters
- Can cause poor disk I/O performance
- May limit maximum size of File System
45Cluster Performance
- Clusters tend to be scattered around the disk
- This is called External Fragmentation
- Can cause poor performance as disk arm needs to
move a lot - Requires De-fragmentation utility
46Cluster Performance
- Large clusters can reduce External Fragmentation
- If lots of small files, then space will be wasted
inside each cluster - This is called Internal Fragmentation
47Managing Cluster Allocation
- Linked
- Each cluster has a pointer to the next cluster
- Indexed
- Single table has pointers to each of the clusters
48Linked Blocks
49Index Block
508.6 Real-World Systems
518.6 Real-World Systems
- Microsoft FAT
- Microsoft NTFS
- Linux Ext2, Ext3
- Others
528.6.1 MS FAT System
- Fat16 (FAT file allocation table )
- MS-Dos, Windows 95
- Max 2GB space for a FileSystem
- Generally bad disk fragmentation
- Fat32
- Windows 98
- Supported by Windows 2000, XP, 2003
53The MS-DOS File System (1)
- The MS-DOS directory entry
54The Windows 98 File System (1)
Bytes
- The extended MOS-DOS directory entry used in
Windows 98
55Cluster Sizes of FAT16 and FAT32
Drive Size Default FAT16 Cluster Size Default FAT32 Cluster Size
260 MB511 MB 8 KB Not supported
512 MB1,023 MB 16 KB 4 KB
1,024 MB2 GB 32 KB 4 KB
2 GB8 GB Not supported 4 KB
8 GB16 GB Not supported 8 KB
16 GB32 GB Not supported 16 KB
gt 32 GB Not supported 32 KB
56Windows FAT Table
578.6.2. Windows NTFS File System
- The NTFS file system (New Technology File System)
is based on a structure called the "master file
table" or MFT, which is able to hold detailed
information on files. This system allows the use
of long names, but, unlike the FAT32 system, it
is case-sensitive, which means that is capable of
distinguishing lower-case and upper-case letters.
- Available on Windows 2000, XP, 2003
- Maintains transaction log to recover after reboot
- Support for file protection
- Large (64 bit) cluster pointers
- Allows small clusters
- Avoids internal fragmentation
58Windows NTFS File System
Master File Table containing records about the
files and directories of the partition. The first
record, called a descriptor, contains information
on the MFT (a copy of it is stored in the second
record). The third record contains the log file,
a file containing all actions performed on the
partition. The following records, making up what
is known as the core, reference each file and
directory of the partition in the form of objects
with assigned attributes.
59File System Structure (1)
- The NTFS master file table
60File System Structure (2)
- The attributes used in MFT records
61File System Structure (3)
- An MFT record for a three-run, nine-block file
628.6.3 Linux Ext2 and Ext3 File System
- Ext2
- Ext2 stands for second extended file system.
- It was introduced in 1993. Developed by Rémy
Card. - Maximum individual file size can be from 16 GB to
2 TB - Ext3
- Ext3 stands for third extended file system.
- It was introduced in 2001. Developed by Stephen
Tweedie. - Starting from Linux Kernel 2.4.15 ext3 was
available. - Maximum individual file size can be from 16 GB to
2 TB
63UNIX File System (1)
- Disk layout in classical UNIX systems
64UNIX File System (3)
- The relation between the file descriptor
table, the open file description
65UNIX File System (2)
Structure of the i-node
66The Linux File System
- Layout of the Linux Ex2 file system.