Title: Building Large Scale Fabrics A Summary
1Building Large Scale Fabrics A Summary
Marcel Kunze, FZK
2Observation
- Everybody seems to need unprecedented amount of
CPU, Disk and Network b/w - Trend to PC based computing fabrics and commodity
hardware - LCG (CERN), L. Robertson
- CDF (Fermilab), M. Neubauer
- D0 (FermiLab), I. Terekhov
- Belle (KEK), P. Krokovny
- Hera-B (DESY), J. Hernandez
- Ligo, P. Shawhan
- Virgo, D. Busculic
- AMS, A.Klimentov
- Considerable savings in cost wrt. RISC based
farmNot enough bang for the buck (M. Neubauer)
3AMS02 Benchmarks
1)
Executive time of AMS standard job compare to
CPU clock
1) V.Choutko, A.Klimentov AMS note 2001-11-01
4 Fabrics and Networks Commodity Equipment
Needed for LHC at CERN in 2006 Storage Raw
recording rate 0.1 1 GB/sec Accumulating at 5-8
PetaBytes/year 10 PetaBytes of disk Processing
200000 of todays (2001) fastest
PCs Networks 5-10 Gbps between main Grid
nodes Distributed computing effort to avoid
congestion 1/3 at CERN 2/3 elsewhere
5PC Cluster 5 (Belle) 1U server Pentium III
1.2GHz 256 CPU (128 nodes)
63U
PC Cluster 6 Blade server LP Pentium III
700MHz 40CPU (40 nodes)
7Disk Storage
8IDE Performance
9Basic Questions
- Compute farms contain several 1000s of computing
elements - Storage farms contain 1000s of disk drives
- How to build scalable systems ?
- How to build reliable systems ?
- How to operate and maintain large fabrics ?
- How to recover from errors ?
- EDG deals with the issue (P. Kunszt)
- IBM deals with the issue (N. Zheleznykh)
- Project Eliza Self healing clusters
- Several ideas and tools are already on the market
10Storage Scalability
- Difficult to scale up to systems of 1000s of
components and keep single system
imageNFS-Automounter, Symbolic links etc. - (M.Neubauer, CAF ROOTD does not need this and
allows for direct worldwide access to distributed
files w/o mounts) - Scalability in size and throughput by means of
storage virtualisation - Allows to set up non-TCP/IP based systems to
handle multi-GB/s
11Virtualisation of Storage
Data Servers mount virtual storage as SCSI-Device
Input Load balancing switch
Shared Data Access (Oracle, PROOF)
Storage Area Network (FCAL, InfiniBand,)
200 MB/s sustained
Scalability
12Storage Elements(M. Gasthuber)
- PNFS Perfectly Normal FileSystem
- Store MetaData with the Data
- 8 hierarchies of file tags
- Migration of data (hierarchical storage systems)
dCache - Development of DESY and FermiLab
- ACLs, Kerberos, ROOT-aware
- Web-Monitoring
- Cached as well as direct tape access
- Fail-safe
13Necessary admin. Tools(A. Manabe)
- System (SW) Installation /update
- Dolly (Image cloning)
- Configuration
- Arusha (http//ark.sourceforge.net)
- LCFGng (http//www.lcfg.org)
- Status Monitoring/ System Health Check
- CPU/memory/disk/network utilization
Ganglia1,plantir2 - (Sub-)system service sanity check
Pikt3/Pica4/cfengine1 http//ganglia.sourcefor
ge.net 2 http//www.netsonde.com3
http//pikt.org 4 http//pica.sourceforge.net/wt
f.html - Command Execution
- WANI WEB base remote command executer
14WANI is implemented on Webmin GUI
Start
Command input
Node selection
15Command execution result
Host name
Results from 200nodes in 1 Page
16(No Transcript)
17CPU Scalability
- The current tools scale up to 1000 CPUs(In the
previous example 10000 CPUs would require to
check 50 pages) - Autonomous operation required
- Intelligent self-healing clusters
18Resource Scheduling
- Problem How to access local resources from the
Grid ? - Local batch queues vs. Global batch queues
- Extension of Dynamite (Amsterdam university) to
work with Globus Dynamite-G (I. Shoshmina) - Open Question How do we deal with interactive
applications on the Grid ?
19Conclusions
- A lot of tools exist
- A lot of work needs yet to be done in the Fabric
area in order to get reliable, scalable systems