Title: The Next Generation Root File Server
1The Next Generation Root File Server
- Andrew Hanushevsky
- Stanford Linear Accelerator Center
- 27-September-2004
- http//xrootd.slac.stanford.edu
2What is xrootd?
- File Server
- Provides high performance file-based access
- Scalable, extensible, naively usable
- Fault tolerant
- Server failures handled in a natural way
- Servers may be dynamically added and removed
- Secure
- Framework allows use of almost any protocol
- Rootd Compatible
3Goals II
- Simplicity
- Can run xrootd out of the box
- No config file needed for non-complicated/small
installations - Generality
- Can configure xrootd for ultimate performance
- Meant for intermediate to large-scale sites
4How Is high performance achieved?
- Rich but efficient server protocol
- Combines file serving with P2P elements
- Allows client hints for improved performance
- Pre-read, prepare, client access processing
hints, - Multiplexed request stream
- Multiple parallel requests allowed per client
- An extensible base architecture
- Heavily multi-threaded
- Clients are dedicated threads whenever possible
- Extensive use of OS I/O features
- Async I/O, device polling, etc.
- Load adaptive reconfiguration.
5xrootd Server Architecture
p2p heart
application
Protocol Thread Manager
xrd
xrootd
Protocol Layer
xroot
Authentication
Filesystem Logical Layer
ofs
odc
Authorization
optional
Filesystem Physical Layer
oss
(included in distribution)
Filesystem Implementation
mss
_fs
6Rootd Bilateral Compatibility
Client-Side Compatibility
Application
xrootd
TXNetFile
rootd
TNetFile
Server-Side Compatibility
Application
xrootd rootd compability
rootd
TNetFile
client
7How performant is it?
- Can deliver data at disk speeds (streaming mode)
- Assuming good network proper TCP buffer size
- Low CPU overhead
- 75 less CPU than NFS for same data load
- It is memory hungry, however.
- General requirements
- Middling speed machine
- The more CPUs the better
- 1-2GB of RAM
8How is scalability achieved?
- Protocol allows server scalability
- Server directed I/O segmenting
- Request deferral to pace client load
- Unsolicited responses for ad hoc client steering
- P2P elements for lashing servers together
- Request redirection key element
- Integrated with a P2P control network
- olbd servers provide control information
9How does it scale?
- xrootd scales in multiple dimensions
- Can run multiple load balanced xrootds
- Provides single uniform name and data space
- Scales from 1 to over 32,000 cooperating data
servers - Architected as self-configuring structured
peer-to-peer (SP2) data servers - Servers can be added removed at any time
- Client (TXNetFile) understands SP2 configurations
- xrootd informs client when running in this mode
- Client has more recovery options in the event of
failure
10Load Balancing Implementation
- Control Interface (olbd)
- Load balancing meta operations
- Find files, change status, forwarded requests
- Data Interface (xrootd)
- Data is provided to clients
- Interfaces to olbd via the ofs layer
- Separation is important
- Allows use of any protocol
- Client need not know the control protocol
11Entities Relationships
xrootd Data Network (redirectors steer clients
to data Data servers provide data)
olbd Control Network Managers
Servers (resource info, file location)
Redirectors
olbd
M
ctl
olbd
xrootd
S
Data Clients
xrootd
Data Servers
12Typical SP2 Configuration
subscribe
olbd
xrootd
Dynamic Selection
olbd
xrootd
client
mss
redirector
xrootd
subscribe
olbd
13Example SLAC Configuration
kan01
kan02
kan03
kan04
kanxx
kanolb-a
bbr-olb03
bbr-olb04
client machines
14Why do this?
- Can transparently incrementally scale
- Servers can come and go
- Load balancing effects recovery
- New servers can be added at any time
- Servers may be brought down for maintenance
- Files can be moved around in real-time
- Client simply adjust to the new configuration
- TXNetFile object handles recovery protocol
15What we have seen
- For a single server
- 1,000 simultaneous clients
- 2,200 simultaneous open files
- Bottlenecks
- Disk I/O (memory next behind)
16What Have We Heard
- The system is too stable
- Users run extra-long jobs (1-2 weeks) now
- Error not discovered until weeks later
- The system is too aggressive
- New servers are immediately taken over
- Easy configuration but startling for
administrators
17Next Getting remote data
SLAC
Firewall
Firewall
Firewall
IN2P3
RAL
IN2P3 proxy
RAL proxy
xrootds
Firewalls require Proxy servers
18Proxy Service
- Attempts to address competing goals
- Security
- Deal with firewalls
- Scalability
- Administrative
- Configuration
- Performance
- Ad hoc forwarding for near-zero wait time
- Intelligent caching in local domain
19Proxy Implementation
- Uses capabilities of olbd and xrootd
- Simply an extension of local load balancing
- Implemented as a special file system type
- Interfaces in the ofs layer
- Functions in the oss layer
- Primary developer is Heinz Stockinger
20Proxy Interactions
data01
data02
data03
data04
RAL
proxy olb
local olb
4
5
local olb
proxy olb
SLAC
3
1
red01
data02
data03
proxy01
2
client machines
21Why This Arrangement?
- Minimizes cross-domain knowledge
- Necessary for scalability in all areas
- Security
- Configuration
- Fault tolerance recovery
22Scalable Proxy Security
SLAC PROXY OLBD
RAL PROXY OLBD
Data Servers
Data Servers
Firewall
1 Authenticate develop session key 2 Distribute
session key to authenticated subscribers 3 Data
servers can log into each other using session key
23Proxy Performance
- Introduces minimal latency overhead
- Virtually undetectably from US/Europe
- Negligible on faster links
- 2 slower on fast US/US links
- 10 slower on LAN
- Can be further improved
- Parallel streams
- Better window size calculation
- Asynchronous I/O
24Conclusion
- xrootd provides high performance file access
- Unique performance, usability, scalability,
security, compatibility, and recoverability
characteristics - Should scale to tens of thousand clients
- Can support tens of thousand of servers
- Distributed as part of the CERN root package
- Open software, supported by
- SLAC (server) and INFN-Padova (client)