Title: xrootd
1xrootd
- Andrew Hanushevsky
- Stanford Linear Accelerator Center
- 5-April-05
- http//xrootd.slac.stanford.edu
2Outline
- Problem Statement
- Goals
- Achieving Goals
- Necessary Scalable Architecture
- Other Necessary Elements
3The Three Legged Stool
- Scalable High Performance Data Access
- Non-trivial task
- Lots of data, clients, servers, etc
- Requires Coordinated Approach
- Grid
- Mostly for bulk transfer and replication
- Disk Cache Management
- Mostly for MSS Services
- Data Delivery
- High speed access to actual data
Analysis
4Data Delivery Goals
- High Performance File-Based Access
- Scalable, extensible, usable
- Fault tolerance
- Servers may be dynamically added and removed
- Flexible Security
- Allowing use of almost any protocol
- Simplicity
- Can run xrootd out of the box
- No config file needed for non-complicated/small
installations - Generality
- Can configure xrootd for ultimate performance
- Meant for intermediate to large-scale sites
- Rootd Compatibility
5How Is high performance achieved?
- Rich but efficient server protocol
- Combines file serving with P2P elements
- Allows client hints for improved performance
- Pre-read, prepare, client access processing
hints, - Multiplexed request stream
- Multiple parallel requests allowed per client
- An extensible base architecture
- Heavily multi-threaded
- Clients are dedicated threads whenever possible
- Extensive use of OS I/O features
- Async I/O, memory mapped I/O, device polling,
etc. - Load adaptive reconfiguration.
6How performant is it?
- Can deliver data at disk speeds (streaming mode)
- Assuming good network, proper TCP buffer size,
and asynchronous I/O - Low CPU overhead
- 75 less CPU than NFS for same data load
- General requirements
- Speed of machine should be matched to NIC
- Rule of thumb 1MHz/1Mb 20 (minimum)
- The more CPUs the better
- 1GB of RAM
7Device Speed Delivery (xrootd)
8Low Latency Per Request (xrootd)
9Scalable Performance (xrootd)
10The Good, the Bad, and the Ugly
11xrootd Server Scaling
- Linear scaling relative to load
- Allows deterministic sizing of server
- Disk
- NIC
- CPU
- Memory
12Scaling in the next dimension
- xrootd servers can be clustered
- Increase access points and available data
- Cluster overhead scales linearly
- Allows deterministic sizing of cluster
- In fact, cluster becomes more efficient as it
grows - Supports massive clusters
- Architected for over 262,144 servers/cluster
- Clusters self-organize
- 280 nodes self-cluster in about 7 seconds
13Acronyms, Entities Relationships
xrootd Data Network (redirectors steer clients
to data Data servers provide data)
olbd Control Network Managers, Supervisors
Servers (resource info, file location)
Redirectors
olbd
M
ctl
olbd
xrootd
S
Data Clients
data
xrootd
Data Servers
14Cluster Architecture
A manager is an optionally replicated xrootd/olbd
pair functioning as a root node
Up to 64 servers or cells can connect to a
manager
Server
A server is an xrootd/olbd pair leaf node that
delivers data
A cell is 1-to-64 entities (servers or
cells) clustered around a cell manager called a
supervisor
15Cell Cluster Equivalence
- Cell clusters equivalent to B64 Trees
- One logical root node
- Multiple intermediate nodes
- Leaf nodes are data servers
- Scales 64n where n is height of tree
- Scales very quickly (642 4096, 643 262,144)
16Example SLAC Configuration
kan01
kan02
kan03
kan04
kanxx
kanolb-a
bbr-olb03
bbr-olb04
client machines
17Bottom Heavy System
- The olbd/xrootd architecture is bottom heavy
- Major decisions occur at leaf nodes
- Allows super-rapid client dispersal
- Ideal scaling characteristics
18Marginal Redirection Overhead
Server cache search
Linux Solaris
19Other Necessary Items
- Fault Tolerance
- Proxy Service
- Integrated Security
- Application Server Monitoring
- Mass Storage System Support
- Grid Support
20Fault Tolerance
- Server and resources may come and go
- Uses control network (olbd) to effect recovery
- New servers can be added at any time for any
reason - Files can be moved around in real-time
- Clients simply adjust to the new configuration
- TXNetFile object handles recovery protocol
- Future Provide Protocol UnResp interface
- Can be used to perform reactive client scheduling
21Levels of Fault Tolerance
xrootd
xrootd
Manager (Root Node)
Fully Replicate
olbd
olbd
xrootd
xrootd
xrootd
Hot Spares
Supervisor (Intermediate Node)
olbd
olbd
olbd
xrootd
xrootd
Data Replication Restaging Proxy Search
Data Server (Leaf Node)
olbd
olbd
22Proxy Support
data01
data02
data03
data04
IN2P3
olbd
3
2
Firewall
proxy xrootd
INDRA
2
client machines
23Monitoring
- xrootd provides selectable event traces
- Login, file open, close, disconnect
- Application markers and client information
- All read/write requests
- Highly scalable architecture used
- Complete toolkit for utilizing data
- Critical for tuning applications
- Jacek Becla, SLAC, working on this
24MSS Support
- Lightweight agnostic interfaces provided
- oss.mssgwcmd command
- Invoked for each create, dirlist, mv, rm, stat
- oss.stagecmd command
- Long running command, request stream protocol
- Used to populate disk cache (i.e., stage-in)
mssgwcmd
MSS
xrootd (oss layer)
stagecmd
25Leaf Node SRM
- MSS Interface ideal spot for SRM hook
- Use existing hooks or new long running hook
- mssgwcmd stagecmd
- oss.srm command
- Processes external disk cache management requests
- Should scale quite well
Grid
srm
xrootd (oss layer)
MSS
26Root Node SRM
- Team olbd with SRM
- File management discovery
- Tight management control
- Several issues need to be considered
- Introduces many new failure modes
- Will not generally scale
Grid
srm
olbd (root node)
MSS
27SRM Status
- Two sites looking at providing SRM layer
- BNL
- IN2P3
- Unfortunately, SRM interface in flux
- Heavy vs light protocol
- Will work with LBL team to speed effort
28Conclusion
- xrootd provides high performance file access
- Unique performance, usability, scalability,
security, compatibility, and recoverability
characteristics - One server can easily support over 600 parallel
clients - New software architecture
- Challenges
- Maintain scaling while interfacing to external
systems - Opportunities
- Can provide data access at the LHC scale.