Linux Virtual File System - PowerPoint PPT Presentation

1 / 159
About This Presentation
Title:

Linux Virtual File System

Description:

I/O, I/O, It's off to disk I go-o-o, A bit or byte to read or write, I/O, ... The same track on each platter in a disk makes a cylinder; partitions are groups ... – PowerPoint PPT presentation

Number of Views:690
Avg rating:3.0/5.0
Slides: 160
Provided by: lwick
Category:

less

Transcript and Presenter's Notes

Title: Linux Virtual File System


1
Linux Virtual File System
  • Robert Ledford
  • Leif Wickland
  • CS518
  • Fall 2004
  • I/O, I/O, It's off to disk I go-o-o, A bit or
    byte to read or write, I/O, I/O, I/O...

2
Overview
  • What is a file system?
  • Historical view of file systems
  • Another layer of indirection
  • Do I have to?
  • File systems have layers
  • How is it done?
  • Sign me up

3
Overview
  • What is a file system?
  • Historical view of file systems
  • Another layer of indirection
  • Do I have to?
  • File systems have layers
  • How is it done?
  • Sign me up

4
What is a file system?
  • Speaking broadly, a file system is the logical
    means for an operating system to store and
    retrieve data on the computers hard disks, be
    they local drives, network-available volumes, or
    exported shares in a storage area network (SAN)

5
What is a file system?
  • There is some ambiguity in the term file
    system. The term can be used to mean any of the
    following
  • - The type of a file system refers to a specific
    implementation such as ext2, reiserfs or nfs,
    each implementation contains the methods and
    data structures that an operating system uses to
    keep track of files on a disk or partition

6
What is a file system?
  • - An instance of a file system refers to a file
    system type residing at a location such as
    /dev/hda4
  • - Additionally a file system can refer to the
    methods and data structures that an operating
    system uses to keep track of files on a disk or
    partition

7
What is a file system?
  • Linux keeps regular files and directories on
    block devices such as disks
  • A Linux installation may have several physical
    disk units, each containing one or more file
    system types
  • Partitioning a disk into several file system
    instances makes it easier for administrators to
    manage the data stored there

8
What is a file system?
Overhead view
sector
track
cylinder
The same track on each platter in a disk makes a
cylinder partitions are groups of contiguous
cylinders
Disk blocks are composed of one or more
contiguous sectors
9
What is a file system?
  • Why have multiple partitions?
  • Encapsulate your data
  • - Since file system corruption is local to a
    partition, you stand to lose only some of your
    data if an accident occurs

10
What is a file system?
  • Increase disk space efficiency
  • - You can format partitions with varying block
    sizes, depending on your usage
  • - If your data is in a large number of small
    files (less than 1k) and your partition uses 4k
    sized blocks, you are wasting 3k for every file
  • - In general, you waste on average one half of a
    block for every file, so matching block size to
    the average size of your files is important if
    you have many files

11
What is a file system?
  • Limit data growth
  • - Runaway processes or maniacal users can
    consume so much disk space that the operating
    system no longer has room on the hard drive for
    its bookkeeping operations
  • - This can lead to disaster. By segregating
    space, you ensure that things other than the
    operating system die when allocated disk space is
    exhausted

12
What is a file system?
  • Partitioning tools and utilities
  • - fdisk
  • rledford_at_leonard gt sudo fdisk -l /dev/had
  • Disk /dev/hda 255 heads, 63 sectors, 4863
    cylinders
  • Units cylinders of 16065 512 bytes
  • Device Boot Start End Blocks Id
    System
  • /dev/hda1 1 26 208813 83
    Linux
  • /dev/hda2 27 1070 8385930 83
    Linux
  • /dev/hda3 1071 1853 6289447 83
    Linux
  • /dev/hda4 1854 4863 24177825 f
    Win95 Ext'd (LBA)
  • /dev/hda5 1854 2375 4192933 83
    Linux
  • /dev/hda6 2376 2897 4192933 83
    Linux
  • /dev/hda7 2898 3028 1052226 82
    Linux swap
  • /dev/hda8 3029 4863 14739606 83
    Linux
  • - parted GNU partition editor

13
What is a file system?
  • Review Questions
  • - Where does Linux keep regular file types?
  • On block devices such as disks.
  • - On average, how much of a block is wasted for
    every file?
  • On average ½ of a block is wasted for every file.

14
What is a file system?
  • Review
  • File system instances reside on partitions
  • Partitioning is a means to divide a single hard
    drive into many logical drives
  • A partition is a contiguous set of blocks on a
    drive that are treated as an independent disk
  • A partition table is an index that relates
    sections of the hard drive to partitions

15
What is a file system?
Entire Disk
Partition Table
Disk Partitions
Master Boot Record
Boot Block
Super Block
Inode List
Data Blocks
A Possible File System Instance Layout
16
What is a file system?
What is a file system?
  • The central structural concepts of a file system
    type are
  • - Boot Block
  • - Super Block
  • - Inode List
  • - Data Block


Boot Block
Super Block
Inode List
Data Blocks
17
What is a file system?
  • Boot Block
  • - Occupies the beginning of a file system
  • - Typically residing at the first sector, it may
    also contain the bootstrap code that is read into
    the machine at boot time
  • - Although only one boot block is required to
    boot the system, every file system may contain a
    boot block

Boot Block
18
What is a file system?
  • Super Block
  • - Describes the state of a file system
  • - How large it is
  • - How many files it can store
  • - Where to find free space in the file system
  • - Additional data that assists the file
    management system with operating on the file
    system

Boot Block
Super Block
19
What is a file system?
  • Super Block
  • - Duplicate copies of the super block may reside
    through out the file system in case the super
    block is corrupted

Boot Block
Super Block
20
What is a file system?
  • Inode List
  • - An inode is the internal representation of a
    file
  • contains the description of the disk layout of
    the file data
  • - file owner
  • - permissions
  • - The inode list contains all of the inodes
    present in an instance of a file system

Boot Block
Super Block
Inode List
21
What is a file system?
  • Data Blocks
  • - Contain the file data in the file system
  • - Additional administrative data
  • - An allocated data block can belong to one and
    only one file in the file system

Boot Block
Super Block
Inode List
Data Blocks
22
What is a file system?
  • On a Linux system, a user or user program sees a
    single file hierarchy, rooted at /
  • Every file and directory can trace its origins on
    a tree back to the root directory

23
What is a file system?
  • /
  • bin boot dev etc home lib mnt
    proc root sbin tmp usr var

24
What is a file system?
  • Review Questions
  • - Where is the boot strap code located
  • In the Boot Block.
  • - What contains information about a files owner
    and the files permissions?
  • The inode.
  • - What is the index that relates sections of the
    hard drive to partitions?
  • The Partition Table.

25
What is a file system?
  • A file system implements the basic operations to
    manipulate files and directories
  • These basic operations include
  • - Opening of files
  • - Closing of files
  • - Creation of directories
  • - Listing of the contents of directories
  • - Removal of files from a directory

26
What is a file system?
  • The kernel deals on a logical level with file
    systems rather than with disks
  • The separate file systems that the system may use
    are not accessed by device identifiers
  • Instead they are combined into a single
    hierarchical tree structure that represents the
    file systems as one whole single entity

27
What is a file system?
  • So what is a file system?
  • - A file system is a set of abstract data types
    that are implemented for the storage,
    hierarchical organization, manipulation,
    navigation, access, and retrieval of data

28
Overview
  • What is a file system?
  • Historical view of file systems
  • File systems have layers
  • Do I have to?
  • Another layer of indirection
  • How is it done?
  • Sign me up

29
Historical view of file systems
  • Managing storage has long been a prominent role
    for operating systems.
  • This role was so important to the MS-DOS OS that
    it was named after that function.
  • DOS stands for Disk Operating System.
  • Created in 1980.

30
Historical view of file systems
  • The file system hasnt changed much since the
    1960s.
  • A research paper was presented in 1965 describing
    A General-Purpose File System For Secondary
    Storage.
  • Laid out the notion of a hierarchal file system
    much as is used today.

31
Historical view of file systems
  • File system features in 1965 paper
  • Files
  • Directories
  • Links
  • Access permissions
  • Create, access and modify times
  • Path nomenclature Directorydirectoryfile
  • All devices mount into a unified hierarchy
  • Backing up
  • Everybody knew it was a good idea back then, too.

32
Historical view of file systems
  • This type of file system was implemented in
    Multics.
  • Unix was created as an emasculated version of
    Multics.
  • Project started as gaming system in 1969.
  • The designers of Unix had worked on Multics and
    brought to Unix a Multics-style file system.

33
Historical view of file systems
File System API in Unix System V, c. 1983
  • chdir change directory
  • chmod change permission
  • chown change owner
  • chroot change root
  • close close a file
  • creat create a file
  • dup copy file descriptor
  • link add a file reference
  • lseek set open file cursor
  • mknod make a special file
  • mount graft in a file system
  • open open a file system
  • pipe create a pipe
  • read read from a file
  • stat get file status
  • umount opposite of mount
  • unlink delete file
  • write write to a file

34
Historical view of file systems
  • Tannenbaum wrote Minix, a pedantic version of
    Unix, in 1987.
  • Of course, it included a Unix-style file API.

35
Historical view of file systems
  • Linus Torvalds introduced Linux in 1991.
  • He developed Linux on Minix.
  • Consequently it was convenient for the OSes to
    share a file system and file API.
  • Thus, Linux inherited the same style file system
    as presented in the 1965 paper.
  • Today Linux supports a superset of the file
    system features available in Unix System V.

36
Historical view of file systems
  • Review Questions
  • In what year was the paper released that
    described the file system design that is the
    ancestor of Linux file system?
  • 1965
  • Yes, thats about 40 years ago

37
Overview
  • What is a file system?
  • Historical view of file systems
  • Another layer of indirection
  • Do I have to?
  • File systems have layers
  • How is it done?
  • Sign me up

38
Another layer of indirection
  • Multics, Unix, Minux, and Linux originally
    supported only one file system.
  • They only understood one type of layout on disk
    for directories and files.
  • Because of its origins, Linux initially supported
    just the Minix file system.
  • Limited to small partitions and short filenames
  • However, it wasnt long before people wanted more
    from their file systems

39
Another layer of indirection
User Program
  • The problem
  • Linux was implemented like this.

Minix FS Interface
Hard Drive
  • To add support for another file system in a
    similar manner was unsavory and didnt scale.
  • User program must call a separate API for each
    type of file system.

User Program
Minix FS
Other FS
Hard Drive B
Hard Drive A
40
Another layer of indirection
  • Any problem in computer science can be solved
    with another layer of indirection.
  • David Wheeler (chief programmer for the EDSAC
    project in the early 1950s)

41
Another layer of indirection
  • The solution was to add a layer of indirection to
    the file system stack.
  • In Linux this layer is called the virtual file
    system (VFS).
  • User programs access any file system through a
    consistent API.
  • All File Systems implement an API which is called
    by the VFS.

User Program
Virtual File System
Minix FS
Other FS
Hard Drive B
Hard Drive A
42
Another layer of indirection
  • The VFS is
  • Another layer of indirection
  • A file system- and device-agnostic layer of the
    operating system
  • A consistent API for user applications to access
    storage independent of the underlying device or
    type of file system

43
Task 1
Task 2
Task n

user space
kernel space
VIRTUAL FILE SYSTEM
minix
ext2
msdos
proc
Buffer Cache
device driver for hard disk
device driver for floppy disk
Linux Kernel
software
hardware
Hard Disk
Floppy Disk
Robbed from http//www.cs.usfca.edu/cruse/cs326/l
esson22.ppt
44
Overview
  • What is a file system?
  • Historical view of file systems
  • Another layer of indirection
  • Do I have to?
  • File systems have layers
  • How is it done?
  • Sign me up

45
Do I have to?
  • Is a VFS worth doing?
  • What do you think?

46
Do I have to?
  • Cons
  • Harms system performance
  • Another layer of indirection
  • Adds to the size of the system because more code
    must be written
  • A conceptually simpler system
  • Pros
  • Enables using multiple file systems
  • Facilitates research
  • Makes the computer more useful
  • My Linux box has ext2, ext3, FAT32, and NTFS
    partitions mounted
  • Facilitates code reuse
  • Simplifies implementation

47
Do I have to?
  • Is a VFS worth doing?
  • Ultimately, the answer is yes for general purpose
    operating systems.
  • All modern commercial operating systems do.
  • What would you do if you had to design an OS
    file system? Would you use a VFS?

48
Overview
  • What is a file system?
  • Historical view of file systems
  • Another layer of indirection
  • Do I have to?
  • File systems have layers
  • How is it done?
  • Sign me up

49
File systems have layers
  • Like onions and ogres
  • In this context file system means the software
    stack that extends from the user application to
    the hardware.

50
File systems have layers
  • The file system of Unix System V has three layers
  • File/Directory API
  • Inodes
  • Buffers

User Program
File/Directory API
Inodes
File system
Buffers
Storage Device
51
File systems have layers
  • Unix System V File/Directory API
  • Functions like open() and read()
  • Called by user programs
  • Directories are implemented as files
  • Contain childrens
  • Name
  • Inode Number

User Program
File/Directory API
Inodes
File system
Buffers
Storage Device
52
File systems have layers
  • Unix System V Inodes
  • Allocate disk blocks for files.
  • Record file attributes
  • Owner
  • Access Permissions
  • File Size
  • Type
  • File
  • Directory
  • Special
  • Not File Names

User Program
File/Directory API
Inodes
File system
Buffers
Storage Device
If file names arent stored in inodes, where are
they stored?
File names are stored in the parent directory
entry.
53
File systems have layers
  • Unix System V Inodes, continued
  • Stored on disk
  • Cached in memory
  • With added details
  • Device the inode is from
  • The inode number
  • If the file is a mount point
  • Much more
  • Many pathnames may point to a single inode

User Program
File/Directory API
Inodes
File system
Buffers
Storage Device
54
File systems have layers
  • Implementation of an inode

File Attributes
Direct Block
Direct 0
Direct Block
Direct 1
. . .
Direct 2
Direct 3
Direct 4
Direct Block
Direct 5
Direct 6
Indirect Block
Indirect Block
Indirect Blocks
Direct 7
Direct 8
Inode
Direct 9
Indirect2 Blocks
Inodes
Single Indirect
Inode
Double Indirect
Indirect3 Blocks
Triple Indirect
Inode
Inodes
Inodes
55
File systems have layers
  • Notes on previous diagram
  • All the direct pointers of an inode are used
    before using an indirect pointer
  • All of the slots in a single indirect inode are
    consumed before starting to use double indirect
    inodes
  • Likewise for double indirect

56
File systems have layers
  • Unix System V Buffers
  • In memory copy of contents of a disk block
  • Same size
  • Mechanism through which caching is achieved
  • Read ahead
  • Delayed write

User Program
File/Directory API
Inodes
File system
Buffers
Storage Device
57
File systems have layers
  • Review Questions
  • Where is the type of a directory entry stored?
  • In the inode the directory entry points to.
  • Where is the name of an inode stored?
  • In the directory entry which points to it.
  • Can more than one file point to a given inode?
  • Yes. Many files may point to the same inode.
  • Can more than inode point to a disk block?
  • No. Inodes point to zero or more blocks, but a
    block may be referenced by zero or one inode.
  • How is a directory different that a file?
  • A directory is a type of file, as indicated by
    the inode, which contains a listing of the
    directorys children

58
File systems have layers
  • How do you get from a file name to a files
    contents?
  • Recursive procedure
  • Start with inode of current or root directory
  • If path begins with / use root otherwise use
    current
  • Inodes for both are cached for the process
  • Get disk block(s) pointed to by inode for
    directory
  • Search directory listing for next part of path.
  • If found, get the inode pointed to by entry
  • If inode says child is a directory, go to 2. If
    child is a file, get disk blocks(s) inode points
    to.

59
File systems have layers
  • Resolving a file path to a files contents
  • Consider how wed resolve the path
    /foo/foobarred/found to that files contents in
    this example directory tree.
  • See following slides for an example

/
foo
bar
.bashrc
foobarred
.ssh
lost
found
60
File systems have layers
  • Resolving a file path to a files contents

Disk Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Step through the following slides to see the
process of resolving the file name
/foo/foobarred/found to its contents
foobarred 2 .ssh 3
7
61
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Inode for root directory is known to be in slot
0.
foobarred 2 .ssh 3
7
62
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Root directory inode points to block 0
foobarred 2 .ssh 3
7
63
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Root directory listing says that inode 1 is for
child foo.
foobarred 2 .ssh 3
7
64
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Foos inode says that its a directory and points
to block 7
foobarred 2 .ssh 3
7
65
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Foos directory listing says that inode 2 is for
child foobarred
foobarred 2 .ssh 3
7
66
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Foobarreds inode says that it is a directory and
its listing is in block 2
foobarred 2 .ssh 3
7
67
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Directory listing of foobarred says that child
found has inode 7.
foobarred 2 .ssh 3
7
68
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
Inode says that found is a file and 1 is its
first block
foobarred 2 .ssh 3
7
69
File systems have layers
  • Resolving a file path to a files contents

Disk Data Blocks
Inodes
Contents


Type
Disk Block
0
foo 1 .bashrc5 bar 4
0
Dir
0
1
Dir
7
Im found
1
2
Dir
2
lost 6 found 7
2
3
File
3
All my secrets would go in here
3
4
Dir
5
5
I get into bars with my aliases
4
File
4
6
File
6
-empty-
5
7
File
1
Im lost
6
The content of /foo/foobarred/found is Im found
foobarred 2 .ssh 3
7
70
File systems have layers
  • Review Questions
  • What are the layers of the System V file system?
  • File/Directory API
  • Inodes
  • Buffers

71
File systems have layers
  • Problems in System V file system implementation
  • Searching a directory listing for a child is time
    consuming
  • Directory listing is unsorted
  • Allows entries to be inserted and removed cheaply
  • Makes searching expensive
  • Requires that a linear be performed cant use
    binary
  • Implies that time to find entry increases
    linearly with the number of directory entries

72
File systems have layers
  • Problems in System V file system implementation
  • The listing for the directory must be read from
    disk at each step in a path
  • Can cause the disk head to jump around
  • For example, if you want to read the file
    /foo/foobarred/found, you have to read and search
    the three directory listing along the way
  • Detrimental to performance

73
File systems have layers
  • The Linux solution to these shortcomings
  • Add another layer of indirection
  • Layer is called the dcache
  • Short for directory cache
  • Caches the contents of directory listings
  • The dcache is composed of dentries
  • Short for directory entry
  • A dentry is a cached association from a path name
    to an inode
  • Also caches relationships to other dentries

74
File systems have layers
  • With the addition of the dcache, what does the
    Linux file system software stack look like,
    compared to the System V file system software
    stack?
  • Glad you asked
  • Next slide, please

75
File systems have layers
  • Since version 2.1, the Linux virtual file system
    has had three layers
  • File/Directory API
  • Dcache
  • Inodes
  • Buffers are not part of VFS
  • Dcache doesnt exist in System V

User Program
File/Directory API
Dcache
Virtual file system
Inodes
Layers of the Linux kernel FS
Real FS Implementation
Buffers
Storage Device
76
File systems have layers
  • Modern Linux File/Directory API
  • Implements a superset of the System V API
  • User programs interact with the File Directory
    API through path names or integer file
    descriptors
  • A file descriptors is an index into an array of
    pointers to file structures
  • More on that later

User Program
File/Directory API
Dcache
Virtual file system
Inodes
Layers of the Linux kernel FS
Real FS Implementation
Buffers
Storage Device
77
File systems have layers
  • Modern Linux Dcache
  • Improves performance
  • Composed of dentries
  • Caches the inode that a path name points to
  • Caches relationships to other dentries
  • Parent
  • Siblings
  • Children
  • Has a hash value for the name of the file it
    represents
  • Speeds up string comparisons

User Program
File/Directory API
Dcache
Virtual file system
Inodes
Layers of the Linux kernel FS
Real FS Implementation
Buffers
Storage Device
78
File systems have layers
  • Modern Linux Inodes
  • Much like System V inodes
  • Contain pointers to file system implementation
    specific operations

User Program
File/Directory API
Dcache
Virtual file system
Inodes
Layers of the Linux kernel FS
Real FS Implementation
Buffers
Storage Device
79
File systems have layers
  • Modern Linux Real File System Implementation
  • Can be compiled into the kernel or can be loaded
    as a kernel module

User Program
File/Directory API
Dcache
Virtual file system
Inodes
Layers of the Linux kernel FS
Real FS Implementation
Buffers
Storage Device
80
File systems have layers
  • Modern Linux File Buffers
  • Integrated with the virtual memory cache since
    1999
  • See virtual memory presentation for more
    information

User Program
File/Directory API
Dcache
Virtual file system
Inodes
Layers of the Linux kernel FS
Real FS Implementation
Buffers
Storage Device
81
File systems have layers
  • Review Questions
  • Whats the purpose of a dentry?
  • A dentry exists to cache a directory entry in
    order to improve performance
  • What are some of the things a dentry links to?
  • A dentry links to
  • Its parent
  • A list of its children
  • A list of its siblings
  • Its files in memory inode

82
Overview
  • What is a file system?
  • Historical view of file systems
  • Another layer of indirection
  • Do I have to?
  • File systems have layers
  • How is it done?
  • Sign me up

83
How is it done?
  • Down to the nitty-gritty code details
  • Theres nothing to fear here

84
How is it done?
  • There are quite a few C structures that the
    kernel employs to keep track of open files
  • struct task_struct (View sched.h code)
  • Details about a user process
  • Has a pointer to a files_struct
  • During a syscall, the kernel has a pointer the
    current process task_struct
  • struct files_struct (View file.h code)
  • Tracks the open files for a process
  • Has an array of pointers to file struct
    instances.
  • When a user program opens a file, its given an
    integer index into this array

85
How is it done?
  • File related kernel structures, continued
  • struct file (View fs.h code)
  • The kernel maintains an array of these with an
    element for each open file system-wide
  • Has a pointer to dentry plus file owner,
    read/write cursor position, and more
  • struct dentry (View dcache.h code)
  • Points to the inode for the file
  • struct inode (View fs.h code)
  • Stores its block number and the device its from
  • Points to operations in the FS implementation
    which can read the disk

86
How is it done?
  • The following diagram shows the relationship of
    the aforementioned structures

87
Kernels array of all file structs
task_struct struct
User Program
files_struct struct
other stuff
other stuff
File handle (integer value)
A file struct
files_struct pointer
file struct ptr array
file struct ptr
Dentry pointer
file struct ptr
Other stuff
file struct ptr
file struct ptr
A file struct
Storage Device
In memory inode cache
FS instance
Dcache
Disk Inode
A Dentry
inode struct
Disk Inode
inode struct
Inode pointer
inode struct
Other stuff
Data Blocks
A Dentry
Inside the Linux Kernel
88
How is it done?
  • Review Questions
  • When a user program calls open() to open a file,
    a non-negative return value indicates success.
    What does the function of that non-negative
    number?
  • The return value is the index into the array of
    file structure pointers in the files_struct
    structure.
  • What is the cardinality between user processes
    and files_struct instances?
  • There is a one-to-one relationship between
    files_struct instances and user processes.
  • Why?
  • Because there are one-to-one relationships
    between user processes and task_struct instances
    and between task_struct instances and
    files_struct instances.

89
How is it done?
  • Review Questions
  • If two user processes open the same file, are two
    or one file struct instances created?
  • Trick question
  • Normally, two instances are created
  • However, if a process opened a file and then
    forked to create another process, parent and
    child have the file open and both share one file
    struct instance

90
How is it done?
  • Great. Now we see how the data relates
  • But wed like to see some action
  • Consider the following C program
  • include ltunistd.hgt
  • void main()
  • const int seekFromStart 0
  • const int rdWrCreate 00102
  • char ch 'A'
  • int fd open("fsSample.txt", rdWrCreate)
  • write(fd, ch, 1)
  • lseek(fd, 0, seekFromStart)
  • read(fd, ch, 1)
  • close(fd) printf("The character read was
    c\n", ch)
  • What actually happens when those file system API
    functions are called?
  • Links in diagrams go to Linux 2.6 source

91
How is it done?
User program
Takes the path name of the file to open as an
argument
open()
Make a copy of the path name string in kernel
space
System call layer
Reserve an unused element in the array of file
struct pointers in the process files_struct
instance (diagram)
sys_open()
getname()
get_unused_fd()
The workhorse see the next slide for more
detail. Will return a file struct instance for
the opened file.
filp_open()
fd_install()
putname()
92
How is it done?
path_lookup()
User program
Determine if the path is relative to the current
directory, the root directory, or a process
specific root.
open()
Get the dentry for that directory.
Call link_path_walk() which performs file name
resolution. See next slide for more.
System call layer
sys_open()
__lookup_hash()
Get a dentry for the file were opening. Call
real FS implementation to populate structure if
not cached.
filp_open()
open_namei()
dentry_open()
vfs_create()
If the file doesnt exist, call the real FS
implementation to create it.
may_open()
Check that the user has permission to open the
file.
93
How is it done?
Purpose Resolve the path name to a dentry
User program
Set inode to the inode for directory that
path_lookup() determined the path name was
relative to
open()
While there are segments left in the path name
Fail if user does not have permission to inode
Parse the next / separated segment in path name
System call layer
If the segment is ..
Set inode to the parent of inode, allowing for
crossing mount points or inode being root dir
sys_open()
Get a dentry for child of inode named by segment
filp_open()
If the dentry for the segment isnt cached
Ask FS implementation for dentry and its inode
open_namei()
If the dentry points to a symbolic link
Get the dentry and inode for the item pointed to
path_lookup()
If the dentry is for a mount point
Get the dentry and inode for mounted root dir
link_path_walk()
Set inode to dentrys inode
Return the dentry that owns inode
94
How is it done?
User program
open()
System call layer
sys_open()
Returned a dentry for the path name
filp_open()
Allocate a file struct instance
open_namei()
dentry_open()
Populate that file struct from the dentry
95
How is it done?
User program
open()
System call layer
sys_open()
getname()
Returned a file struct instance for the opened
file
get_unused_fd()
filp_open()
Set the previously reserved file struct pointer
to the file struct instance returned by
filp_open()
fd_install()
putname()
Deallocate the kernel copy of the filename
96
How is it done?
User program
open()
System call layer
Returns the index where fd_install() put the new
file struct pointer. Value is called the file
descriptor, or FD.
sys_open()
97
How is it done?
Takes the open files FD as an argument
User program
read()
System call layer
Get the file struct instance for the FD
sys_read()
Copy the read/write cursor location from file
struct
fget_light()
file_pos_read()
Call the real FS implementation to read a
specified number of bytes from the file
vfs_read()
file_pos_write()
fput_light()
Increment the read/write cursor by the number of
bytes read
Release the file struct instance for the FD
98
How is it done?
Takes the open files FD as an argument
User program
lseek()
System call layer
Get the FDs file struct instance
sys_lseek()
default_llseek()
fget_light()
Get the read/write cursor location from file
struct
vfs_llseek()
Update the cursor according to the arguments
fput_light()
Save the cursor location back into the file struct
Release the FDs file struct instance
99
How is it done?
Takes the open files FD as an argument
User program
write()
System call layer
Get the file struct instance for the FD
sys_write()
Copy the read/write cursor location from file
struct
fget_light()
file_pos_read()
Call the real FS implementation to write the
specified bytes at the write cursor location
vfs_write()
file_pos_write()
fput_light()
Increment the read/write cursor by the number of
bytes written
Release the file struct instance for the FD
100
How is it done?
User program
Takes the FD of the file to close as an argument
close()
Frees FDs slot in the array of file struct
pointers
System call layer
Saves the value of FD so that an unused slot can
be found quickly when next one is needed
sys_close()
filp_close()
Calls the real FS implementation to flush any
unwritten data to storage
fput()
If this is the last reference to the file struct
instance
Allow the real FS implementation to free
resources
Deallocate the dentry for the file
101
Overview
  • What is a file system?
  • Historical view of file systems
  • Another layer of indirection
  • Do I have to?
  • File systems have layers
  • How is it done?
  • Sign me up

102
Sign me up.
  • So how do the Linux kernel and the Virtual File
    System find out about a particular file system
    implementation?
  • (e.g., ext2, nfs, reiserfs)

103
Sign me up.
  • When you perform a Linux installation you are
    prompted to see if you want each of the supported
    file systems in your build
  • When the kernel is actually built, the file
    system startup code contains calls to the
    initialization routines of all of the built in
    file systems

104
Sign me up.
  • Linux file systems may also be built as modules
    and, in this case, they may be demand loaded as
    they are needed or loaded by hand using insmod
  • Whenever a file system module is loaded it
    registers itself with the kernel and unregisters
    itself when it is unloaded

105
Sign me up.
  • Each specific file system's initialization
    routine registers itself with the Virtual File
    System and is represented by a
  • file_system_type
  • data structure which contains the name of the
    file system and a pointer to its VFS Super Block
    read routine

106
Sign me up.
  • When a file system is registered the
    file_system_type data structure is populated with
    data specific to the file system
  • And the file_system_type structure is put into a
    list, pointed at by the file_systems pointer

107
Sign me up.
  • The kernel utilizes the file_systems list to
    check if a specific file system has been
    registered
  • And to assist with the mapping of the Virtual
    File Systems operations to a specific
    implementation of a file systems operations by
    using the VFS Super Block read routine

108
Sign me up.
file_systems
file_system_type
file_system_type
file_system_type
name (ext2)
name (reiserfs)
name (nfs)
fs_flags
fs_flags
fs_flags
read_super()
read_super()
read_super()
next
next
next
file_systems pointer
list of file_system_type structures
Registered File Systems Example
109
Sign me up.
  • Where is the file_system_type structure declared?
  • What does it look like?
  • linux/fs.h
  • struct file_system_type
  • const char name
  • int fs_flags
  • struct super_block (read_super) (struct
    super_block , void ,int)
  • struct file_system_type next

110
Sign me up.
  • Some details of the file_system_type structure
  • - name
  • -The name of the file system type, such as
    ext2, nfs, reiserfs etc.
  • - This field is used as a key and it is not
    possible to register a file system that is
    already in use
  • - Function find_filesystem() utilizes the name
    field to check if a file system has already
    been registered

111
Sign me up.
  • - fs_flags
  • - A number of flags which record features of
    the file system
  • - If the fs_flag is set to FS_REQUIRES_DEV
    then a block device must be given when mounting
    the file system
  • - Not all file systems need a device to hold
    them. The /proc file system, for example, does
    not require a block device

112
Sign me up.
  • - fs_flags (continued)
  • - Aside The command
  • cat /proc/filesystems
  • displays file system information from the
    file_systems list of file_system_structures
  • - In particular, it is possible to see if a
    file system requires a device or not by noting
    the presence or lack of the nodev in front of
    a file system listing

113
Sign me up.
  • - fs_flags (continued)
  • ledford_at_esus cat /proc/filesystemsnodev
    rootfsnodev bdevnodev procnodev sockfsnodev
    tmpfsnodev shmnodev pipefsnodev
    binfmt_miscext3ext2nodev ramfsiso9660nodev
    nfsnodev smbfsnodev autofsreiserfsnodev
    devptsxfs

114
Sign me up.
  • - next
  • - A pointer used for chaining all the
    file_system_type structures together

file_systems
file_system_type
file_system_type
file_system_type
name (ext2)
name (reiserfs)
name (nfs)
fs_flags
fs_flags
fs_flags
read_super()
read_super()
read_super()
next
next
next
115
Sign me up.
  • - read_super
  • - This routine is called by the VFS when an
    instance of the file system is mounted
  • struct super_block (read_super) (struct
    super_block , void , int)
  • - What is going on here?
  • - A VFS super_block structure is being
    populated by the function call read_super with
    data specific to a particular file system

116
Sign me up.
  • struct super_block (read_super) (struct
    super_block , void , int)
  • - The void pointer points to data that has
    been passed down from the mount system call
  • - The trailing int signifies whether or not
    read_super should be silent about errors
  • - This is only set when mounting the root
    file system
  • - Several file systems may be tried when
    attempting to mount the root file system so
    avoiding unsightly errors is desired

117
Sign me up.
  • So what gets called when we register a file
    system?
  • Linux finds out about new file system types by
    calls to
  • register_filesystem()
  • And forgets about them by the calls to its
    counterpart
  • unregister_filesystem()

118
Sign me up.
  • The formal declarations are
  • include ltlinux/fs.hgt
  • int register_filesystem(struct file_system_type
    fs)
  • int unregister_filesystem(struct file_system_type
    fs)

119
Sign me up.
  • register_filesystem()
  • linux/fs/super.c
  • int register_filesystem(struct file_system_type
    fs)
  • struct file_system_type tmp
  • if (!fs)
  • return -EINVAL
  • if (fs-gtnext)
  • return -EBUSY
  • tmp file_systems
  • while (tmp)
  • if (strcmp((tmp)-gtname,
    fs-gtname) 0)
  • return -EBUSY
  • tmp (tmp)-gtnext
  • tmp fs
  • return 0

120
Sign me up.
  • unregister_filesystem()
  • linux/fs/super.c
  • int unregister_filesystem(struct file_system_type
    fs)
  • ifdef CONFIG_MODULES
  • struct file_system_type tmp
  • tmp file_systems
  • while (tmp)
  • if (fs tmp)
  • tmp fs-gtnext
  • fs-gtnext NULL
  • return 0
  • tmp (tmp)-gtnext
  • endif
  • return -EINVAL

121
Sign me up.
  • How is the file_system_type structure populated
    with file system specific data?
  • - Each file system implementation defines a
    file_system_type structure with data specific to
    that file system

122
Sign me up.
  • Example
  • - The ext2 file_system_type structure
  • linux/fs/ext2/super.c
  • static struct file_system_type ext2_fs_type
  • "ext2",
  • FS_REQUIRES_DEV / FS_IBASKET /,
    / ibaskets have unresolved bugs /
  • ext2_read_super,
  • NULL
  • - Here we can see the fields name, fs_flags,
    read_super and next being populated

123
Sign me up.
  • How is the register_filesystem() called from a
    specific file system?
  • - Each file system implementation defines an
    init function that calls register_filesystem()

124
Sign me up.
  • Example
  • - The ext2 init_ext2_fs() function
  • linux/fs/ext2/super.c
  • static int __init init_ext2_fs(void)
  • return register_filesystem(ext2_fs_type)
  • - This is pretty self-explanatory

125
Sign me up.
  • How is the unregister_filesystem() called from a
    specific file system?
  • - Each file system implementation defines an
    exit function that calls unregister_filesystem()

126
Sign me up.
  • Example
  • - The ext2 exit_ext2_fs() function
  • linux/fs/ext2/super.c
  • static void __exit exit_ext2_fs(void)
  • unregister_filesystem(ext2_fs_type)
  • - Again, This is self-explanatory

127
Sign me up.
  • So how and where are the calls to the init and
    exit functions made?
  • linux/fs/ext2/super.c
  • module_init(init_ext2_fs)
  • module_exit(exit_ext2_fs)
  • - Calls to module_init() and module_exit() begin
    the, file system specific, registration and
    unregistration processes

128
Sign me up.
  • But how are module_init() and module_exit()
    called?
  • - module_init() is called when the module is
    loaded, if built as a module, with a call to
    insmod
  • - Or it is called at the same time as all of
    the init calls are made during the kernel boot
    process
  • - module_exit() is called when the module is
    unloaded, if built as a module, with a call to
    rmmod

129
Sign me up.
  • So how does this all fit together to register a
    file system?

130
Sign me up.
Called at boot time or with insmod
module_init(init_ext2_fs)
Takes a file_system_type structure specific to
ext2 as a parameter
file_system_type
name (ext2)
init_ext2_fs(void)
fs_flags
register_filesystem(ext2_fs_type)
read_super()
next
Populates the file_systems list with the
file_systems_type structure for ext2
file_systems
file_system_type
name (ext2)
fs_flags
read_super()
Registering the ext2 file system
next
131
Sign me up.
  • So how does this all fit together to unregister a
    file system?

132
Sign me up.
Sign me up.
Called with rmmod
module_exit(exit_foo_fs)
Takes a file_system_type structure specific to
foo as a parameter
file_system_type
name (foo)
exit_foo_fs(void)
fs_flags
unregister_filesystem(foo_fs_type)
read_super()
next
Remove from the file_systems list the
file_systems_type structure for foo
file_systems
file_system_type
name (foo)
fs_flags
read_super()
Unregistering the foo file system
next
133
Sign me up.
Called with rmmod
module_exit(exit_foo_fs)
exit_foo_fs(void)
unregister_filesystem(foo_fs_type)
Remove from the file_systems list the
file_systems_type structure for foo
file_systems
file_system_type
name (foo)
fs_flags
read_super()
Unregistering the foo file system
next
134
Sign me up.
Sign me up.
Sign me up.
Called with rmmod
module_exit(exit_foo_fs)
Update the file_systems pointer to point at what
next was pointing at in the file_system_type
structure for foo In this case it is ext2
exit_foo_fs(void)
unregister_filesystem(foo_fs_type)
Remove from the file_systems list the
file_systems_type structure for foo
file_systems
file_system_type
name (ext2)
fs_flags
read_super()
Unregistering the foo file system
next
135
Sign me up.
  • Whoptie doo, Basil.. what does it all mean?
  • - For the Virtual File System layer to work with
    a specific file system implementation it must
    have some knowledge of the file system
  • - This knowledge is acquired in part by
    registering the file system
  • - Once this is done the Virtual File System can
    map calls to files, on a specific file system,
    through the correct structures and perform the
    requested operations

136
Sign me up.
  • Whats next?
  • - After the file system has been registered we
    must mount it in order to use it
  • - A file system is mounted at boot or with the
    use of the mount command
  • - To unmount a file system the umount command is
    used

137
Sign me up.
  • So what does mounting do?
  • - When a file system is mounted the file_systems
    list is searched to see if the file system has
    been registered

Sli
Write a Comment
User Comments (0)
About PowerShow.com