Title: Grid Enabled Analysis for Particle Physics
1Grid Enabled AnalysisforParticle Physics
- Julian Bunn,Caltech
- (Conrad Steenberg, Frank van Lingen, Michael
Thomas) - Visit of
- Lt. General Syed Shujaat Hussain (NUST)
- November 2003
2Particle Physics Computing Challenges
Petabytes, Petaflops, Global Collaborations
- Geographical dispersion People and resources
- Complexity The detector and the LHC
environment - Scale Tens of Petabytes per year of data
5000 Physicists 250 Institutes 60
Countries
New Forms of Distributed Systems Data Grids
3Data Grid Hierarchy
CERN/Outside Resource Ratio 12Tier0/(?
Tier1)/(? Tier2) 111
PByte/sec
100-1500 MBytes/sec
Online System
Experiment
CERN 1M SI95 1 EB Disk Tape Robot
Tier 0 1
10-100 Gbps
Tier 1
FNAL 200k SI95 600 TB
IN2P3 Center
INFN Center
RAL Center
10 Gbps
Tier 2
10 Gbps
Tier 3
Institute 1TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
0.110 Gbps
Physics data cache
Tier 4
Workstations
4Caltech Tier2 (at CACR)
Caltech Tier2 User Community There are currently
57 registered users from Caltech (HEP, CACR, CS),
CERN, UFL, UC Davis, UCR, UCSD, UCLA, UWM, FNAL,
ANL and Romania.
Caltech Tier2 Uses, of 3 Clusters PG
Participating in CMS PreChallenge Production
IGT US CMS simulatn, reco and analysis ?
CalibDGT Develop and Test CMS grid software.
Cluster 1 Production Grid (PG)1 4U front-end
node with dual P4 Xeon 2.4GHz CPUs with 2
GB 1.5TB disk22 1U compute nodes with dual P4
Xeon 2.8/2.4 GHz CPUs with 1GB 1.5 TB
StorageCluster 2 Integration Grid Testbed
(IGT)1 7U dual P3 1 GHz with 2GB front-end
server 20 2U dual P3 800 MHz 512MB compute
nodes. 1 3 TB RAID arrayCluster 3
Development Grid Testbed (DGT)5 Node AMD
Athlon-based cluster, and 2 1U dual P4 Xeon
2.4GHz/1GB MonALISA servers 1 1U dual P4 Xeon
2.4GHz/1GB Network server 1 Sun Ultra-250 network
data server Gigabit connection to public and
private networks
5COJAC CMS ORCA Java Analysis Component Java3D
Objectivity JNI Web Services
- Object Collections
- Multiplatform, Light Client
- Interface to OODBMS
Demonstrated Caltech-Riode Janeiro and Chile in
2002
6Grid Enabled Analysis Architecture
ROOT
Laptop
Browser
PDA
Desktop
Peer Group
Clarens
Super Peer Group
DAGs
CMS Apps
File Transfer
Others
MonaLisa
From GriPhyn, iVDGL, Globus etc
Caltech/CMS Developments
7GAE - Grid Analysis Environment
- The development of a physics analysis environment
that integrates with the Grid systems, is where
the real Grid Challenge lies - To be used by a large diverse community
- 100s - 1000s of tasks with different technical
demands - Needs priorities
- Needs security
- How much automation is possible. How much is
desirable? - GAE is a key to success or failure for
physics Grid applications - Where the physics gets done
- Where the Grid End-to-End Services and the Grid
Application Software Layers get built - Where we learn how to collaborate remotely to do
physics
8GAE Tools Clarens
- Our emphasis is on accomodating existing analysis
tools in our CAIGEE architecture - To facilitate this, we use the Clarens
Dataserver - Clarens is server software that makes datasets
and services available to clients in a suitable
lingua franca - Clients initially Grid-authenticate with a
Clarens server, and then are able to make use of
a wide set of data and analysis services on offer
9Interconnected System
- Try to make sense of the Alphabet soup
- Service/functionality oriented view
- Providers
- Clients
- Both
- Middleware/information providers
10Connecting components
- Client/server view based on resource
- abundance
- Middleware/IP helps organize resources in a
resource-scarce environment - G, OGS, Tomcat, MonaLisa, web server
- Metadata catalogs some missing
- Needed make a (more) consistent and unified
- environment without resorting to X scripting
- languages as glue
- Interact with network-enabled components in
- from the most sensible environment for the
- task
- Not only client/server, but between all
components
11Enable higher level services
- Reduce the development impedance for higher level
services to function properly - E.g. MonaLisa uses modules to monitor using
'ping', SNMP, Ganglia etc., but provides
agregated information using single remote API
(SOAP) - Reduce manual interaction
- Counter-example VO management
- Obtain certificates from X CAs via LDAP
- Store in VO LDAP server, create VO
- Extract structure using different tool, using
config file to create new config file (gridmap)
used by middleware (Globus gatekeeper) - Site admins must maintain separate copies of
gridmap files for different clusters/servers
12Security and Virtual Organization
- Authentication via X509 certificates
- Verifies certificate chain up to a list of
accepted Certificate Authority certificates - Client identified internally by the certificate
distinguished name (DN) uniqueness ensured by
CA - Authorization done using an internal VO
- VO consists of a hierarchy of groups and users
- Does not need to store client certificates, uses
Dns - VO data stored in DB
SUPER ADMIN GROUP
DN1,DN2...
Part of can add users to
Can create groups
Specified in server setup file
Can add users to admin group
GROUP N Member DN1 Member DN2 ...
ADMIN GROUP
Can add users to admin group
13CAIGEE architecture II
14Interactive Analysis
- Use Clarens as RPC layer
- Python as scripting language already used
- Multithreaded analysis job listens to RPCs
- Use Condor/PBS/LSF as scheduler to start and kill
jobs.
Head node
Sched
Clarens
Client
Clarens
Pclarens
Farm node
Analysis
15Current work
- WSDL interface descriptions/Resource/data
discovery - Integation with CMS analysis tools
- POOL RL catalog interface
- NorduGrid RL catalog interface (Atlas)
- BOSS job submission (INFN)
- Java version of server
- Sphinx job scheduling
- Chimera virtual data system
- OGSI compatibility
- Monitoring integration via MonaLisa
16GAE Collaboration DesktopExample
- Four-screen Analysis Desktop 4 Flat Panels 5120
X 1024 - Driven by a single server and single graphics
card - Allows simultaneous work on
- Traditional analysis tools (e.g. ROOT)
- Software development
- Event displays
- MonALISA monitoring displays Other Grid Views
- Job-progress Views
- Persistent collaboration (VRVS shared windows)
- Online event or detector monitoring
- Web browsing, email
17GAE Tools PDA Client
- A handheld GAE client fruits of collaboration
between NUST and Caltech - Software is Java Analysis Studio (JAS) ported to
the Pocket PC 2002 OS - Hardware is any Pocket PC 2002 device (But we use
HP/Compaq iPAQ devices) - This tool includes Grid authentication/security
components
18Grid-Enabled Analysis Prototypes
Collaboration Analysis Desktop
COJAC (via Web Services)