Title: nfsv4 and linux
1nfsv4 and linux
- peter honeyman
- linux scalability project
- center for information technology integration
- university of michigan
- ann arbor
2open source reference implementation
- sponsored by sun microsystems
- part of citis linux scalability project
- ietf reference implementation
- 257 page spec
- linux and openbsd
- interoperates with solaris, java, network
appliance, hummingbird, emc, ... - september 1 code drop for linux 2.2.14
3whats new?
- lots of state
- compound rpc
- extensible security added to rpc layer
- delegation for files - client cache consistency
- lease-based, non-blocking, byte-range locks
- win32 share locks
- mountd gone
- lockd, statd gone
4nfsv4 state
- state is new to nfs protocol
- nfsv3 lockd manages state
- compound rpc - server state
- dos share locks - server and client state
- delegation - server and client state
- server maintains per-thread global state
- client and server maintain file, lock, and lock
owner state
5server state per global thread
- compound operations often use result of previous
operation as arguments - nfs file handle is the coin of the realm
- current file handle ? current working directory
- some operations (rename) need two file handles -
save file handle
6compound rpc
- hope is to reduce traffic
- complex calling interface
- partial results used
- rpc/xdr layering
- variable length kmalloc buffer for args and recv
- want to xdr args directly into rpc buffer
- want to allow variable length receive buffer
7rpc/xdr layering
- rpc layer does not interpret compound ops
- replay cache locking vs. regular
- have to decode to decide which replay cache to
use
8example mount compound rpc
putrootfh lookup getattr getfh
9nfsv4 mount
- server pseudofs joins exported subtrees with a
read only virtual file system - any client can mount into the pseudofs
- users browse the pseudofs (via lookup)
10nfsv4 pseudofs
- access into exported sub trees based on users
credentials and permissions - client /etc/fstab doesnt change with servers
export list - server /etc/exports doesnt need to maintain an
ip based access list
11mounting a pseudo file system
Local FS
Pseudo FS
nfsv4 client
/
/
b
a
c
b
a
user creds
d
e
f
the server boots, parses /etc/exports, creates
the pseudo fs, mirroring the local fs up to the
exported directories. the local fs exported
directories are mounted on their pseudo fs
counterparts.
user has read-only access to the pseudo fs, and
traverses the pseudo fs until encountering an
exported directory.
the users permissions in the negotiated security
realm determine access to the exported directory.
the client boots and mounts a directory of the
pseudo fs with the AUTH_SYS security flavor.
the first nfsv4 procedure that acts on the
exported directory causes nfsd to return
NFS4ERR_WRONGSEC, causing the client to call
SECINFO and obtain the list of security flavors
on the exported directory.
before the first open, the client calls
SETCLIENTID to negotiate a per-server unique
client identifier.
g
h
i
local fs directory
pseudo fs directory
exported directory
12rpcsec_gss
- mit krb5 gssrpc and sesame are open source, but
neither is really rpcsec_gss - sun released their rpcsec_gss, a complete rewrite
of onc - gss sun onc a tough match
- both are transport independent
- gss channel bindings / onc xprt
- overloading of programs null_proc
13kernel rpcsec_gss
- rpc layering had to be violated
- gss implementations are not kernel safe
- security service code not kernel safe (kerberos
5) - kernel security services implemented as rpc
upcalls to a user-level daemon, gssd - but only some services - e.g. encryption -need to
be in the kernel
14rpcsec_gss where are we now?
- (mostly) complete user-level kerberos 5
implementation - linux kernel implementation with kerberos 5
- mutual authentication
- session key setup
- no encryption
- gssd
15kerberos 5 security initialization
kerberos 5 kdc
3
2
gssd
gssd
user
user
4
7
kernel
kernel
1
6
5
8
nfsd
nfs client
9
10
2,3 kerberos 5 tcp/ip
1,4,6,7 gssd rpc interface
5,8 nfsv4 overloaded null procedure
9,10 nfsv4 compound procedure
16locking
- lease based locks
- no byte range callback mechanism
- server defines a lease for per client lock state
- server can reclaim client state if lease not
renewed - open sets lock state, including lock owner
(clientid, pid) - server returns lock stateid
17locking
- stateid mutating operations are ordered (open,
close, lock, locku, open_downgrade) - lock owner can request a byte range lock and
then - upgrade the initial lock
- unlock a sub-range of the initial lock
- server is not required to support sub-range lock
semantics
18server lock state
- need to associate file, lock, lock owner, lease
- per lock owner lock sequence number
- per file state in hash table
- may move file state into struct file private area
19server lock state
- lock owners in hash table
- server doesnt own the inode
- lock state in linked list off file state
- stateid handle to server lock state
- per client state in hash table - lock lease
20client lock state
- lock owners in hash table
- per lock owner lock sequence number
- use struct file private data area
- client owns the inode, use private inode data area
21client lock state
- use inode file_lock struct private data area for
byte range lock state - (eventually) store same locking state as the
server for delegated files - use the super block private data area to store
per server state (returned clientid)
22delegation
- intent is to reduce traffic
- server decides to hand out delegation at open
- if client accepts, client provides callback
- many read delegations, or one write delegation
23delegation
- when client delegates a cached file it handles
- all locking, share and byte range
- future opens
- client cant reclaim a delegation without a new
open - no delegation for directories
24server delegation state
- associates delegation with a file
- delegation state in linked list off file state
- stateid separate from the lock stateid
- client call back path
25linux vfs changes
- shared problem open with o_excl described by
peter braam - nfsv4 implements win32 share locks, which require
atomic open with create - linux 2.2.x and linux 2.4 vfs is problematic
26linux vfs changes
- to create and open a file, three inode operations
are called in sequence - lookup resolves the last name component
- create is called to create an inode
- open is called to open the file
27xopen
- inherent race condition means no atomicity
- we partially solved this problem
- we added a new inode operation which performs the
open system call in one step - int xopen(struct file filep, struct inode
dir_i, struct dentry dentry, int mode) - if the xopen() inode operation is null, the
current two step code is used - nfsv4 open subsumes lookup, create, open, access
28user name space
- local file system uses uid/gid
- protocol specifies ltuser namegt_at_ltrealmgt
- different security can produce different name
spaces
29user name space
- unix user name
- kerberos 5 realm
- pki realm - x500 or dn naming
- gssd resolves ltuser namegt_at_ltrealmgt to local file
system representation
30open issues
- local file system choices
- currently ext2
- acl implementation will determine fs for linux
2.4 - kernel additions and changes
- rpc rewrite
- crypto in the kernel
- atomic open
31next steps
- march 31 - full linux 2.4 implementation, without
acls - june 30 - acls added
- network appliance sponsored nfsv3/v4 linux
performance project
32any questions?
http//www.citi.umich.edu/projects/nfsv4 http//ww
w.nfsv4.org