Title: PROOF and Condor
1PROOF and Condor
- Fons Rademakers
- http//root.cern.ch
2PROOF Parallel ROOT Facility
- Collaboration between core ROOT group at CERN and
MIT Heavy Ion Group
- Part of and based on ROOT framework
- Uses heavily ROOT networking and other
infrastructure classes
3Main Motivation
- Design a system for the interactive analysis of
very large sets of ROOT data files on a cluster
of computers - The main idea is to speed up the query processing
by employing parallelism - In the GRID context, this model will be extended
from a local cluster to a wide area virtual
cluster. The emphasis in that case is not so
much on interactive response as on transparency - With a single query, a user can analyze a
globally distributed data set and get back a
single result - The main design goals are
- Transparency, scalability, adaptability
4Main Features
- The PROOF system allows
- Parallel analysis of trees in a set of files
- Parallel analysis of objects in a set of files
- Parallel execution of scripts
- on a cluster of heterogeneous machines
5Parallel Chain Analysis
proof.conf slave node1 slave node2 slave
node3 slave node4
Remote PROOF Cluster
Local PC
root
.root
node1
ana.C
.root
root
node2
root root 0 tree.Process(ana.C)
root root 0 tree.Process(ana.C) root 1
gROOT-gtProof(remote)
root root 0 tree.Process(ana.C) root 1
gROOT-gtProof(remote) root 2
chain.Process(ana.C)
.root
node3
.root
node4
6PROOF - Architecture
- Data Access Strategies
- Local data first, also rootd, rfio, dCache,
SAN/NAS - Transparency
- Input objects copied from client
- Output objects merged, returned to client
- Scalability and Adaptability
- Vary packet size (specific workload, slave
performance, dynamic load) - Heterogeneous Servers
- Support to multi site configurations
7Workflow For Tree Analysis Pull Architecture
Slave 1
Slave N
Master
Process(ana.C)
Process(ana.C)
Initialization
Packet generator
Initialization
GetNextPacket()
GetNextPacket()
0,100
Process
100,100
Process
GetNextPacket()
GetNextPacket()
200,100
Process
300,40
Process
GetNextPacket()
GetNextPacket()
340,100
Process
Process
440,50
GetNextPacket()
GetNextPacket()
490,100
Process
590,60
Process
SendObject(histo)
SendObject(histo)
Wait for next command
Add histograms
Wait for next command
Display histograms
8Data Access Strategies
- Each slave get assigned, as much as possible,
packets representing data in local files - If no (more) local data, get remote data via
rootd, rfiod or dCache (needs good LAN, like GB
eth) - In case of SAN/NAS just use round robin strategy
9Additional Issues
- Error handling
- Death of master and/or slaves
- Ctrl-C interrupt
- Authentication
- Globus, ssh, kerb5, SRP, clear passwd, uid/gid
matching - Sandbox and package manager
- Remote user environment
10Running a PROOF Job
- Specify a collection of TTrees or files with
objects
root0 gROOT-gtProof(cluster.cern.ch) root1
TDSet set new TDSet(TTree, AOD) root2
set-gtAddQuery(lfn/alice/simulation/2003-04,V0.
6.root) root10 set-gtPrint(a) root11
set-gtProcess(mySelector.C)
- Returned by DB or File Catalog query etc.
- Use logical filenames (lfn)
11The Selector
- Basic ROOT TSelector
- Created via TTreeMakeSelector()
// Abbreviated version class TSelector public
TObject Protected TList fInput TList
fOutput public void Init(TTree)
void Begin(TTree) void SlaveBegin(TTree
) Bool_t Process(int entry) void
SlaveTerminate() void Terminate()
12PROOF Scalability
8.8GB, 128 files 1 node 325 s 32 nodes in
parallel 12 s
32 nodes dual Itanium II 1 GHz CPUs, 2 GB RAM,
2x75 GB 15K SCSI disk, 1 Fast Eth
Each node has one copy of the data set (4 files,
total of 277 MB), 32 nodes 8.8 Gbyte in 128
files, 9 million events
13PROOF and Data Grids
- Many services are a good fit
- Authentication
- File Catalog, replication services
- Resource brokers
- Job schedulers
- Monitoring
- ? Use abstract interfaces
14The Condor Batch System
- Full-featured batch system
- Job queuing, scheduling policy, priority scheme,
resource monitoring and management - Flexible, distributed architecture
- Dedicated clusters and/or idle desktops
- Transparent I/O and file transfer
- Based on 15 years of advanced research
- Platform for ongoing CS research
- Production quality, in use around the world,
pools with 100s to 1000s of nodes. - See http//www.cs.wisc.edu/condor
15COD - Computing On Demand
- Active, ongoing research and development
- Share batch resource with interactive use
- Most of the time normal Condor batch use
- Interactive job borrows the resource for short
time - Integrated into Condor infrastructure
- Benefits
- Large amount of resource for interactive burst
- Efficient use of resources (100 use)
16COD - Operations
17PROOF and COD
- Integrate PROOF and Condor COD
- Great cooperation with Condor team
- Master starts slaves as COD jobs
- Standard connection from master to slave
- Master resumes and suspends slaves as needed
around queries - Use Condor or external resource manager to
allocate nodes (vms)
18PROOF and COD
Condor
Batch
Slave
Client
Condor
Slave
Batch
Condor
Batch
19PROOF and COD Status
- Status
- Basic implementation finished
- Successfully demonstrated at SC03 with 45 slaves
as part of PEAC - TODO
- Further improve interface between PROOF and COD
- Implement resource accounting
20PEAC PROOF Enabled Analysis Cluster
- Complete event analysis solution
- Data catalog and data management
- Resource broker
- PROOF
- Components used SAM catalog, dCache, new global
resource broker, CondorCOD, PROOF - Multiple computing sites with independent storage
systems
21PEAC System Overview
22PEAC Status
- Successful demo at SC03
- Four sites, up to 25 nodes
- Real CDF StNtuple based analysis
- COD tested with 45 slaves
- Doing post mortem and plan for next design and
implementation phases - Available manpower will determine time line
- Plan to use 250 node cluster at FNAL
- Other cluster at UCSD
23Conclusions
- PROOF maturing
- Lot of interest from experiments with large data
sets - COD essential to share batch and interactive work
on the same cluster - Maximizes resource utilization
- PROOF turns out to be powerful application to use
and show the power of Grid middleware to its full
extend - See tomorrows talk by Andreas Peters on PROOF and
AliEn