Title: Scalla Update
1Scalla Update
- Andrew Hanushevsky
- Stanford Linear Accelerator Center
- Stanford University
- 25-June-2007
- HPDC DMG Workshop
- http//xrootd.slac.stanford.edu
2Outline
- Introduction
- Design points
- Architecture
- Clustering
- The critical protocol element
- Capitalizing on Scalla Features
- Solving some vexing grid-related problems
- Conclusion
3What is Scalla?
- Structured Cluster Architecture for
- Low Latency Access
- Low latency access to data via xrootd servers
- POSIX-style byte-level random access
- Hierarchical directory-like name space of
arbitrary files - Does not have full file system semantics
- This is not a general purpose data management
solution - Protocol includes high performance scalability
features - Structured clustering provided by olbd servers
- Exponentially scalable and self organizing
4General Design Points
- High speed access to experimental data
- Write once read many times processing mode
- Small block sparse random access (e.g., root
files) - High transaction rate with rapid request
dispersal (fast opens) - Low setup cost
- High efficiency data server (low CPU/byte
overhead, small memory footprint) - Very simple configuration requirements
- No 3rd party software needed (avoids messy
dependencies) - Low administration cost
- Non-assisted fault-tolerance
- Self-organizing servers remove need for
configuration changes - No database requirements (no backup/recovery
issues) - Wide usability
- Full POSIX access
- Server clustering for scalability
- Plug-in architecture and event notification for
applicability (HPSS, Castor, etc)
5xrootd Plugin Architecture
Protocol Driver (Xrd)
Many ways to accommodate other systems
6Architectural Significance
- Plug-in Architecture Plus Events
- Easy to integrate other systems
- Orthogonal Design
- Uniform client view irrespective of server
function - Easy to integrate distributed services
- System scaling always done in the same way
- Plug-in Multi-Protocol Security Model
- Permits real-time protocol conversion
- System Can Be Engineered For Scalability
- Generic clustering plays a significant role
7Quick Note on Clustering
- xrootd servers can be clustered
- Increase access points and available data
- Allows for automatic failover
- Structured point-to-point connections
- Cluster overhead (human non-human) scales
linearly - Cluster size is not limited
- I/O performance is not affected
- Always pairs xrootd olbd servers
- Data handled by xrootd and cluster management by
olbd - Symmetric cookie-cutter arrangement (a
no-brainer) - Architecture can be used in very novel ways
- E.g., cascaded caching for single point files
(ask me) - Redirection Protocol is Central
xrootd
olbd
8File Request Routing
up to 64 servers
A
No External Database
open(/a/b/c)
Managers cache the next hop to the file
go to C
Who has /a/b/c?
2nd open
B
go to C
I have
open(/a/b/c)
C
Client
Manager (Head Node/Redirector)
Data Servers
Cluster
Client sees all servers as xrootd data servers
9Two Level Routing
up to 6464 (4096) servers
A
Who has /a/b/c?
Data Servers
open(/a/b/c)
B
D
go to C
Who has /a/b/c?
I have
open(/a/b/c)
I have
C
E
I have
go to F
Client
Manager (Head Node/Redirector)
Supervisor (sub-redirector)
F
open(/a/b/c)
Cluster
Client sees all servers as xrootd data servers
10Significance Of This Approach
- Uniform Redirection Across All Servers
- Natural distributed request routing
- No need for central control
- Scales in much the same way as the internet
- Only immediate paths to the data are relevant,
not the location - Integration and distribution of disparate
services - Client is unaware of the underlying model
- Critical for distributed analysis using stored
code - Natural fit for the grid
- Distributed resources in multiple administrative
domains
11Capitalizing on Scalla Features
- Addressing Some Vexing Grid Problems
- GSI overhead
- Data Access
- Firewalls
- SRM
- Transfer overhead
- Network
- Bookkeeping
- Scalla building blocks are fundamental elements
- Many solutions are constructed in the same way
12GSI Issues
- GSI Authentication is Resource Intensive
- Significant CPU Administrative Resources
- Process occurs on each server
- Well Known Solution
- Perform authentication once and convert protocol
- Example, GSI to Kerberos conversion
- Elementary Feature of Scalla Design
- Allows each site to choose local mechanism
13Speeding GSI Authentication
1st Point of Contact (Specialized xroot Server)
Client sees all servers as xrootd data
servers Client can be redirected to 1st point of
contact When signature expires
GSI To SSI Plug-in
Client
xrootd
Return signed Cert and redirect to xroot cluster
Subsequent Points of Contact (xrootd with SSI
Auth)
Standard xrootd Cluster
xrootd
14Firewall Issues
- Scalla Architected as a Peer-to-Peer Model
- A server can as act as a client
- Provides Built-In Proxy Support
- Can bridge firewalls
- Scalla clients also support SOCKS4 protocol
- Elementary Feature of Scalla Design
- Allows each site to choose their own security
policy
15Vaulting Firewalls
1st Point of Contact (Specialized xroot Server)
Client sees all servers as xrootd data servers
Client
Proxy Plug-in
xrootd
Subsequent Data Access
Firewall
Standard xrootd Cluster
xrootd
16Grid FTP Issues
- Scalla Integrates With Other Data Transports
- Using the POSIX Preload Library
- Rich emulation avoids application modification
- Example, GSIftp
- Elementary Feature of Scalla Design
- Allows fast and easy deployment
17Providing Grid FTP Access
1st Point of Contact (Standard GSIftp Server)
FTP servers can be Firewalled and Replicated for
scaling
Client
Preload Library
GSI FTP
Subsequent Data Access
Firewall
Standard xrootd Cluster
xrootd
18SRM Issues
- Data Access via SRM Falls Out
- Requires a trivial SRM
- Only need a closed surl-turl rewriting mechanism
- Thanks to Wei Yang for this insight
- Some Caveats
- Requires existing SRM changes
- Simple if url rewriting were a standard plug-in
- Plan to have StoRM and LBL SRM versions available
- Many SRM functions become no-ops
- Generally not needed for basic point-to-point
transfers - Typical for smaller sites (i.e., tier 2 and
smaller)
19Providing SRM Access
ftphost (Standard GSIftp Server)
SRM access is a Simple interposed add-on
Client
Proload Library
GSI FTP
Subsequent Data Access
gsiftp//ftphost/a/b/c
Standard xrootd Cluster
SRM
xrootd
srmhost
20A Scalla Storage Element (SE)
Clients (Linux, MacOS, Solaris, Windows)
Managers
Optional
Data Servers
All servers, including gridFTP, SRM and Proxy Can
be replicated/clustered within the Scalla
Framework For scaling and fault tolerance
21Data Transport Issues
- Enormous effort spent on bulk transfer
- Requires significant SE resource near CEs
- Impossible to capitalize on opportunistic
resources - Can result in large wasted network bandwidth
- Unless most of data used multiple times
- Still have the missing file problem
- Requires significant bookkeeping effort
- Large job startup delays until all of the
required data arrives - Bulk Transfer originated from a historical view
of the WAN - Too high latency, unstable, and unpredictable for
real-time access - Large unused relatively cheap network capacity
for bulk transfers - Much of this is no longer true
- Its time to reconsider these beliefs
22WAN Real Time Access?
- CPU/event lt RTT/p
- Where p is number of pre-fetched events
- Real time WAN access equivalent to LAN
- Some assumptions here
- Pre-fetching is possible
- Analysis framework structured to be asynchronous
- Firewall problems addressed
- For instance, using proxy servers
23Workflow In a WAN Model
- Bulk transfer only long-lived useful data
- Need a way to identify this
- Start jobs the moment enough data present
- Any missing files can be found on the net
- LAN access to high use / high density files
- WAN access to everything else
- Locally missing files
- Low use or low density files
- Initiate background bulk transfer when
appropriate - Switch to local copy when finally present
24Scalla Supports WAN Models
- Native latency reduction protocol elements
- Asynchronous pre-fetch
- Maximizes overlap between client CPU and network
transfers - Request pipelining
- Vastly reduces request/response latency
- Vectored reads and writes
- Allows multi-file and multi-offset access with
one request - Client scheduled parallel streams
- Removes the server from second guessing the
application - Integrated proxy server clusters
- Firewalls addressed in a scalable way
- Federated peer clusters
- Allows real-time search for files on the WAN
25A WAN Data Access Model
Sites Federated As Independent Peer Clusters
- Independent Tiered Resource Sites
- Cross-share data when necessary
- Local SE unavailable or file is missing
26Conclusion
- Many ways to build a Grid Storage Element (SE)
- Choice depends on what needs to be accomplished
- Light weight simple solutions many times work the
best - This is especially relevant to smaller or highly
distributed sites - WAN-cognizant architectures should be considered
- Effort needs to be spent on making analysis WAN
compatible - This may be the best way to scale production LHC
analysis - Data analysis presents the most difficult
challenge - The system must withstand of 1,000s of
simultaneous requests - Must be lightening fast within significant
financial constraints
27Acknowledgements
- Software Collaborators
- INFN/Padova Fabrizio Furano (client-side),
Alvise Dorigo - Root Fons Rademakers, Gerri Ganis (security),
Bertrand Bellenet (windows) - Alice Derek Feichtinger, Guenter Kickinger,
Andreas Peters - STAR/BNL Pavel Jackl
- Cornell Gregory Sharp
- SLAC Jacek Becla, Tofigh Azemoon, Wilko Kroeger
- Operational collaborators
- BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC
- Funding
- US Department of Energy
- Contract DE-AC02-76SF00515 with Stanford
University