Coda Server Internals - PowerPoint PPT Presentation

About This Presentation
Title:

Coda Server Internals

Description:

Small & large: large for directories. difference is ACL at back of large vnodes. Inode field: ... Directory repair, BulkTransfer the repair file and replay operations ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 34
Provided by: bra94
Category:
Tags: coda | internals | server

less

Transcript and Presenter's Notes

Title: Coda Server Internals


1
Coda Server Internals
  • Peter J Braam

2
Contents
  • Data structure overview
  • Volumes
  • Vnodes
  • Inodes

3
Data Structure Overview
Object
Purpose
Resides where
Inodes
File Contents
/vicep partitions
Volumes Vnodes Directory cnts ACL Reslogs
Meta Data Dir contents
RVM
Volume location
VLDB, VRDB RW db files
Volinfo records
VSGDB, .pdb, .tk files dynamic RO db files
VSGDB Pdb records Tokens
Security
Servers/SCM Partitions Startup flags Skipvolumes L
OG DATA DB Locators
Configuration Data
Static data
4
RVM layout (coda_globals.h)
  • Already_initialized (int)
  • struct VolHeadMAXVOLS
  • struct VnodeDiskObject SmallVnodeFreeListsSM_FRE
    ESIZE
  • short SmallVnodeIndex
  • . Same for large
  • MaxVolId (unsigned long)
  • Remainder is dynamically allocated

5
Volume zoo (volume.h, camprivate.h)
  • RVM structures
  • VolumeData
  • VolHead
  • VolumeHeader
  • VolumeDiskData
  • VM structures
  • Volume
  • VolumeInfo ..

6
A volume in RVM
contains
pointer to rvm malloced data
7
VolumeDiskData (rvm)
  • Lots of stuff
  • Identity location partition, name,
  • runtime info use, inService, blessed, salvaged
  • Vnode related next uniquefier
  • Versionvector
  • Resolution flags, pointer to recov_vol_log
  • Quota
  • Resource usage filecount, diskused etc

8
Volumes in VM
  • struct Volumes sit in VolHash with copies of RVM
    data structures
  • Salvage before attaching to VolHash
  • Model of operation (FS)
  • GetVolume copy out from RVM
  • Do your mods in VM
  • PutVolume does RVM transaction
  • Model of operation (Volutil)
  • operate on RVM

9
Volumes in Venus RPCs
  • One RPC GetVolInfo
  • used for mount point traversal
  • Only relates to
  • volume location database
  • volume replication database
  • VSGDB
  • Could sit in separate Volume Location Server

10
Vnodes (cvnode.h)
  • Small large large for directories
  • difference is ACL at back of large vnodes
  • Inode field
  • small vnodes points to diskfile inode number
  • large vnodes is RVM address of dir inode
  • Contain important small structure vv_t
  • Pointers to reslog entries
  • VM cvnodes with hash table, freelists etc

11
Vnodes in RVM
  • RVM VnodeDiskinfo (rvm_malloced)
  • vnodes sit on rec_smolists
  • each link points to a DiskVnode
  • lists link vnodes with identical vnodenumbers but
    different uniquefiers
  • new vnodes grabbed from FreeLists (index.cc,
    recova,b,c.cc)
  • volumes have arrays of rec_smolists which grow
    when they are full

12
Vnodes in action
  • Model
  • GetFSObj calls GetVnode
  • work is done
  • PutFS Objects calls
  • rvm_begin_transaction
  • ReplaceVnode - copies data from VM to RVM
  • rvm_end_transaction
  • Getting a vnode takes 3 pointer derefs, possibly
    3 page faults vs. 1 for local file systems.
  • Is this necessary? Probably not. Cure it yes!

13
Directories (rvm)
  • DirInode
  • page table and copy on write refcount
  • DirPages 2048 bytes each
  • build up the directory
  • divided into 64 32byte blobs
  • Hash table for fast name lookups
  • Blob Freelist
  • Array of free blobs per page

14
Directories
  • More than one vnode can point to directory (copy
    on write)
  • VM hash table of DirHandles
  • point to VM contiguous copy of dir
  • point to DirInode
  • have a lock etc
  • Model as for volumes vnodes
  • Critique too baroque

15
Files
  • Vnode references file by InodeNumber
  • Files are copy on write
  • There are FileInodes like dir inodes, but they
    are held in external DB or in inode itself
  • Server always reads/writes whole files (could be
    exploited)

16
Volinit and salvage
  • Set up volume hash table, serverlist,
    DiskPartitionList
  • Cycle through partitions, check each for
  • list of inodes
  • every inode has a vnode
  • every vnode has a directory name
  • every directory name has a vnode
  • Put volume in a VM hash table

17
Server connection info
  • Array of HostEntry (a venus)
  • Contains a linked list of connections
  • Contains a callback connection id
  • Connection setup
  • first binding creates a host callback conn
  • new binding creates a new connection and verifies
    callback
  • in RPC2_NewBinding ViceNewConnectFS

18
Callbacks
  • Hashtable of FileEntries
  • each contains Fid
  • number of users
  • linked list of callbacks
  • Callbacks point to HostEntry
  • Ops
  • RPC BreakCallBack
  • Local placing, delete, deleteVenus

19
Callbacks
  • Connection is non-authenticated. Should be fixed.
    Session key for CB connection should not expire.
  • Side effect of callback connection is used for
    BackFetch bulk transfer of files during
    reintegration.

20
RPC processing
  • Venus RPCs
  • srvproc.cc - standard file ops
  • srvproc2.cc - standard volume ops
  • codaproc.cc - repair stuff
  • codaproc2.cc - reintegration stuff
  • Volutil RPCs
  • vol-your-rpc.cc (in coda-src/volutil)
  • Resolution below

21
RPC processing
  • RPC structure
  • ValidateParms validate, hand off COP2, cid
  • GetObject vm copy, lock objects
  • CheckSemantics
  • Concurrency, Integrity, Permissions
  • Perform operations
  • BulkTransfer, UpdateObjects, OutParms
  • PutObject rvm transactions, inode deletions

22
vlists
  • GetFSObjects instantiate a vlist
  • RPC needs list of objects copied from RVM
  • Modification status is held there (did
    CopyOnWrite kick in etc)
  • PutObjects
  • rvm_begin_transaction
  • walk through the list, copy, rvm_set_range,
    unlock
  • rvm_end_transaction

23
COP2 handling
  • In COP2 Venus give final VV to server
  • are sent out by Venus (with some delay) often
    piggybacked in bulk
  • server knows about pending COP2 entries in hash
    table (coppend.cc)
  • Manager thread CopPendingManager
  • Runs every minute.
  • Removes entries more than 900 secs old

24
Cop2 to RVM
  • Data can be
  • PiggyBacked on another rpc
  • sent in ViceCop2 rpc.
  • Both cases call InternalCop2 (srvproc.cc)
  • InternalCop2 (codaproc.cc)
  • notifies the manager to dequeue
  • gets the FS objects listed for the COP2
  • installs final VVs into RVM (rvm transaction!)

25
COP2 Problems
  • Easy cause of conflicts in replicated volumes
    when clients access objects in rapid succession.
    (Can be fixed easily during the writeback caching
    operation)
  • Not optimized for singly replicated volume.

26
Resolution
  • Initiated by client with RPC to coordinator
  • ViceResolve (codaproc.cc)
  • coordinator
  • sets up connections in VSG (unauthenticated)
  • LockAndFetch (res/reslock, resutil)
  • lock volumes,
  • collect closure

27
Resolution - special cases
  • RegResDirRequired (rvmres/rvmrescoord.cc)
  • check for
  • unresolved ancestors
  • already inconsistent
  • runts (missing objects)
  • weak equality (identical storeid)

28
RecovDirResolve
  • Phase II (rvmres/rescoord,subphase?.cc)
  • coordinator request logs from other servers
  • subordinates lock affected dirs,marshall logs
  • coordinator merges logs
  • Phase III
  • ship merged log to subordinates
  • perform operations on VM copies
  • Return results to coordinator

29
Resolution
  • Phase IV (is old Phase 3 )
  • collect results, compute new VVs ship to
    subordinates
  • commit results

30
Comments on resolution
  • Old versions of resolution
  • OldDirResolve resolve only runts and weak
  • DirResolve resolve only in VM
  • Remove these
  • resolve directory has nothing to do with
    resolution should be called librepair. Srv uses
    merely one function in it - repair uses the rest

31
Volume Log
  • During FS operations, log entries are created for
    use during resolution
  • Different format per operation (rvmres/recov_vollo
    g.cc)
  • Added to the vlist by SpoolVMLogRecord
  • Put in RVM at commit time

32
Repair
  • Venus makes ViceRepair RPC.
  • File and symlink repair BulkTransfer the object
  • Directory repair, BulkTransfer the repair file
    and replay operations
  • Venus follows this with a COP2 multi rpc
  • For directory repair Venus invokes asynchronous
    resolve

33
Future
  • Good
  • Design is simple and efficient
  • There is little C should eliminate
  • easy to multi-thread
  • Bad
  • Scalability 8GB in practice, 40GB in theory
  • Data handling is bad tricky to fix
  • Volume code was is worst rewrite
Write a Comment
User Comments (0)
About PowerShow.com