ATLAS and the Grid

About This Presentation

Title:

ATLAS and the Grid

Description:

RWL Jones, Lancaster University. ATLAS Needs Grid Applications ... Pacman - package management and distribution tool ... dependencies as PACMAN cache files ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 41

Provided by: rogerw150

Category:

Tags: atlas | grid

more less

Transcript and Presenter's Notes

Title: ATLAS and the Grid

1
ATLAS and the Grid ACAT02 Moscow June
2002 RWL Jones Lancaster University
2
The ATLAS Computing Challenge

Running conditions at startup
0.8x109 event sample ? 1.3 PB/year, before data
processing
Reconstructed events, Monte Carlo data ? 10
PB/year (3 PB on disk)
CPU 1.6M SpecInt95 including analysis
CERN alone can handle only a fraction of these
resources

3
The Solution The Grid
Note Truly HPC, but requires more
Not designed for tight-coupled problems, but
spin-offs many
4
ATLAS Needs Grid Applications

The ATLAS OO software framework is Athena, which
co-evolves with the LHCb Gaudi framework
ATLAS is truly intercontinental
In particular, it is present on both sides of the
Atlantic
Opportunity the practical convergence between US
and European Grid projects will come through the
transatlantic applications
Threat There is an inevitable tendency towards
fragmentation/divergence of effort to be resisted
Other relevant talks
Nick Brook co-development with LHCb, especially
through UK GridPP collaboration (or rather, Ill
present this later)
Alexandre Vaniachine, describing work for the
ATLAS Data Challenges

5
Data Challenges
Test Bench Data Challenges

Prototype I May 2002
Performance and scalability testing of components
of the computing fabric (clusters, disk storage,
mass storage system, system installation, system
monitoring) using straightforward physics
applications. Test job scheduling and data
replication software (DataGrid release 1.2)
Prototype II Mar 2003
Prototyping of the integrated local computing
fabric, with emphasis on scaling, reliability and
resilience to errors. Performance testing of LHC
applications. Distributed application models
(DataGrid release 2).
Prototype III Mar 2004
Full scale testing of the LHC computing model
with fabric management and Grid management
software for Tier-0 and Tier-1 centres, with some
Tier-2 components (DataGrid release 3).

6
The Hierarchical View
1 TIPS 25,000 SpecInt95 PC (1999) 15
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm20 TIPS

One bunch crossing per 25 ns
100 triggers per second
Each event is 1 Mbyte

100 MBytes/sec
Tier 0
CERN Computer Centre gt20 TIPS
Gbits/sec
or Air Freight

HPSS

Tier 1
UK Regional Centre (RAL)
US Regional Centre
French Regional Centre
Italian Regional Centre

HPSS

HPSS

HPSS

HPSS

Tier 2
Tier2 Centre 1 TIPS
Tier2 Centre 1 TIPS
Tier2 Centre 1 TIPS
Gbits/sec
Tier 3
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Lancaster 0.25TIPS
Sheffield
Manchester
Liverpool
Physics data cache
100 - 1000 Mbits/sec
Tier 4
Workstations
7
A More Grid-like Model
The LHC Computing Facility
8
Features of the Cloud Model

All regional facilities have 1/3 of the full
reconstructed data
Allows more on disk/fast access space, saves tape
Multiple copies mean no need for tape backup
All regional facilities have all of the analysis
data (AOD)
Resource broker can still keep jobs fairly local
Centres are Regional and NOT National
Physicists from other Regions should have also
Access to the Computing Resources
Cost sharing is an issue
Implications for the Grid middleware on
accounting
Between experiments
Between regions
Between analysis groups
Also, different activities will require different
priorities

9
Resource Estimates
10
Resource Estimates

Analysis resources?
20 analysis groups
20 jobs/group/day 400 jobs/day
sample size 108 events
2.5 SI95s/ev gt 1011 SI95 (s/day) 1.2106 SI95
Additional 20 for activities on smaller samples

11
Rough Architecture
Installation of Software and Env
Middleware RB, GIS
Data Catalogue
Job Configuration/VDC /metadata
User Interface to Grid experiment framework
User
12
Test Beds

EDG Test Bed 1
Common to all LHC experiments
Using/testing EDG test bed 1 release code
Already running boxed fast simulation and
installed full simulation
US ATLAS Test Bed
Demonstrate success of grid computing model for
HEP
in data production
in data access
in data analysis
Develop deploy grid middleware and applications
wrap layers around apps
simplify deployment
Evolve into fully functioning scalable
distributed tiered grid
NorduGrid
Developing a regional test bed
Light-weight Grid user interface, working
prototypes etc
see talk by Aleksandr Konstantinov

13
EDG Release 1.2

EDG has strong emphasis on middleware
development applications come second
ATLAS has been testing the stable releases of
the EDG software as they become available as part
of WP8 (ATLAS key contact Silvia Resconi)
EDG Release (1.2) is under test by Integration
Team people plus Loose Cannons (experiment
independent people) on the development testbed at
CERN.
Standard requirements must be met before the
ATLAS Applications people test a release
The development testbed must consist of at
least 3 sites in 3 different countries ( e.g.
CERN, CNAF, RAL )
There must be a long ( gt 24 hours) unattended
period with a low error rate ( lt 1 of jobs
failed )

http//pcatl0a.mi.infn.it/resconi/validation/vali
d.html
14
EDG TestBed 1 Status28 May 2002 1703

Web interface showing status of (400) servers
at testbed 1 sites
5 Main Production Centres

15
GridPP Sites in Testbed(s)
16
NorduGrid Overview

Launched in spring 2001, with the aim of creating
a Grid infrastructure in the Nordic countries
Partners from Denmark, Norway, Sweden, and
Finland
Initially the Nordic branch of the EU DataGrid
(EDG) project testbed
Independent developments
Relies on funding from NorduNet2

http//www.nordugrid.org
17
US Grid Test Bed Sites
U Michigan
Lawrence Berkeley National Laboratory
Boston University
Argonne National Laboratory
Brookhaven National Laboratory
Indiana University
Oklahoma University
University of Texas at Arlington
US -ATLAS testbed launched February 2001
18
US Hardware and Deployment

8 gatekeepers - ANL, BNL, LBNL, BU, IU, UM, OU,
UTA
Farms - BNL, LBNL, IU, UTA Multiple RD
gatekeepers
Uniform OS through kickstart
Running RH 7.2 ?
First stage deployment
Pacman, Globus 2.0b, cernlib (installations)
Simple application package
Second stage deployment
Magda, Chimera, GDMP (Grid data management)
Third stage
MC production software VDC
Many US names mentioned later, thanks also to
Craig Tull, Dan Engh, Mark Sosebee

19
Important Components

GridView - simple script tool to monitor status
of test bed (Java version being developed)
Gripe - unified user accounts
Magda - MAnager for Grid Data
Pacman - package management and distribution tool
Grappa - web portal based on active notebook
technology

20
Grid User Interface

Several prototype interfaces
GRAPPA
EDG
Nordugrid
Lightweight
Nothing experiment specific
GRAT
Line mode (and we will always need to retain line
mode!)
Now defining an ATLAS/LHCb joint user interface,
GANGA
Co-evolution with Grappa
Knowledge of experiment OO architecture needed
(Athena/Gaudi)

21
Interfacing Athena/Gaudi to the GRID
GANGA/Grappa
GUI
GRID Services
?
Histograms Monitoring Results
jobOptions/ Virtual Data Algorithms
Athena/GAUDI Application
?
22
EDG GUI for Job Submission

23
GRAPPA

Based on XCAT Science Portal, framework for
building personal science portals
A science portal is an application-specific Grid
portal
Active notebook
HTML pages to describe the features of the
notebook and how to use it
HTML forms which can be used to launch
parameterizable scripts (transformation)
Parameters stored in a sub-notebook (derivation)
Very flexible
Jython - access to Java classes
Globus Java CoG kit
XCAT
XMESSAGES
Not every user has to write scripts
Notebooks can be shared among users
Import/export capability
Shava Smallen, Rob Gardner

24
GRAPPA/XCAT Science Portal Architecture
The prototype can

Submit Athena jobs to Grid computing elements
Manage JobOptions, record sessions
Staging and output collection supported
Tested on US ATLAS Grid Testbed

25
GANGA/Grappa Development Strategy

Completed existing technology requirement
survey
Must be Grid aware but not Grid-dependent
Still want to be able to pack and go to a
standalone laptop
Must be component-based
Interface Technologies (Standards needed ? GGF)
Programmatic API (eg. C, C, etc)
Scripting as Glue ala Stallman (eg. Python)
Others eg. SOAP, CORBA, RMI, DCOM, .NET, etc.
Defining the experiment software services to
capture and present the functionality of the Grid
service

26
Possible Designs

Two ways of implementation
Based on one of the general-purpose grid portals
(not tied to a single application/framework)
Alice Environment (AliEn)
Grid Enabled Web eNvironment for Site-Independent
User Job Submission (GENIUS)
Grid access portal for physics applications
(Grappa)
Based on the concept of Python bus (P. Mato)
use different modules whichever are required to
provide full functionality of the interface
use Python to glue this modules, i.e., allow
interaction and communication between them

27
Python Bus
GRID
Internet
28
Modules description

GUI module
Provides basic functionality
Can be implemented using
wxPython extension module
Qt/Desktop C toolkit, etc.

29
Installation Tools

To use the Grid, deployable software must be
deployed on the Grid fabrics, and the deployable
run-time environment established (Unix and
Windows)
Installable code and run-time environment/configur
ation
Both ATLAS and LHCb use CMT for the software
management and environment configuration
CMT knows the package interdependencies and
external dependencies ? this is the obvious tool
to prepare the deployable code and to expose
the dependencies to the deployment tool
(Christian Arnault, Chas Loomis)
Grid aware tool to deploy the above
PACMAN (Saul Youssef) is a candidate which seems
fairly easy to interface with CMT

30
Installation Issues

Most Grid projects seem to assume either code is
pre-installed or else can be dumped each time
into the input sandbox
The only route for installation of software
through the Grid seems to be as data in Storage
Elements
In general these are non-local
Hard to introduce directory trees etc this way
(file based)
How do we advertise installed code?
Check it is installed by a preparation task sent
to the remote fabric before/with the job
Advertise the software is installed in your
information service for use by the resource
broker
Probably need both!
The local environment and external packages will
always be a problem
Points to a virtual machine idea eventually
Java?
Options?
DAR mixed reports, but CMS are interested
PACKMAN from AliEn
LGCG, OSCAR not really suitable, more for site
management?

31
CMT and deployable code

Christian Arnault and Charles Loomis have a
beta-release of CMT that will produce package
rpms, which is a large step along the way
Still need to have minimal dependencies/clean
code!
Need to make the package dependencies explicit
Rpm requires root to install in the system
database (but not for a private installation)
Developer and binary installations being
produced, probably needs further refinement
Work to expose dependencies as PACMAN cache files
ongoing
Note much work elsewhere in producing rpms of
ATLAS code, notably in Copenhagen this effort
has the advantage of the full dependency
knowledge in CMT being exposable

32
pacman

Package manager for the grid in development by
Saul Youssef (Boston U, GriPhyN/iVDGL)
Single tool to easily manage installation and
environment setup for the long list of ATLAS,
grid and other software components needed to
Grid-enable a site
fetch, install, configure, add to login
environment, update
Sits over top of (and is compatible with) the
many software packaging approaches (rpm, tar.gz,
etc.)
Uses dependency hierarchy, so one command can
drive the installation of a complete environment
of many packages
Packages organized into caches hosted at various
sites
How to fetch can be cached rather than the
desired object
Includes a web interface (for each cache) as well
as command line tools

An encryption package needed by Globus
name SSLeay
description Encryption
url http//www.psy.uq.oz.au/ftp/Crypto
/ssleay
source http//www.psy.uq.oz.au/ftp/Crypto
/ssleay
systems linux-i386 SSLeay-0.9.0b.tar.
gz,SSLeay-0.9.0b,\
linux2 SSLeay-0.9.0b.tar.
gz,SSLeay-0.9.0b,\
sunos5 SSLeay-0.9.0b.tar.
gz,SSLeay-0.9.0b
depends
exists /usr/local/bin/perl
inpath gcc
bins
paths
enviros
localdoc README
daemons
install \
root ./Configure linux-elf,make clean,
\

34
Grid Applications Toolkit

Horst Severini, Kaushik De, Ed May, Wensheng
Deng, Jerry Gieraltowski. (US Test Bed)
Repackaged Athena-Atlfast (OO fast detector
simulation) for grid testbed (building on Julian
Phillips and UK effort)
Script 1 can run on any globus enabled node
(requires transfer of 17MB source)
Script 2 runs on machine with packaged software
preinstalled on grid site
Script 3 runs on afs enabled sites (latest
version of software is used)
Other user toolkit contents
to check status of grid nodes
submit jobs (without worrying about underlying
middleware or ATLAS software)
uses only basic RSL globus-url-copy

35
Monitoring Tool

GridView - a simple visualization tool using
Globus Toolkit
First native Globus application for ATLAS grid
(March 2001)
Collects information using Globus tools.
Archival information is stored in MySQL server on
a different machine. Data published through web
server on a third machine.
Plans
Java version
Better visualization
Historical plots
Hierarchical MDS information
Graphical view of system health
New MDS schemas
Optimize archived variables
Publishing historical information through GIIS
servers??
Explore discovery tools
Explore scalability to large systems
Patrick McGuigan

36
(No Transcript)
37
MDS Information
Listing of available object classes
38
More Details
39
MDS Team

Dantong Yu, Patrick McGuigan, Craig Tull, KD, Dan
Engh
Site monitoring
publish BNL acas information ?
Glue schema testbed ?
Software installation
pacman information provider
Application monitoring
Grid monitoring
GridView, Ganglia
hierarchical GIIS server ?
GriPhyN-PPDG GIIS server ?

40
Data Management Architecture
AMI ATLAS Metatdata Interface Query
LFN Associated attributes and values
MAGDA MAnager for Grid-based Data Manage
replication, physical location
VDC Virtual Data Catalog Derive and transform
LFNs
41
Managing Data -Magda

MAnager for Grid-based Data (essentially the
replica catalogue tool)
Designed for managed production and chaotic
end-user usage
Designed for rapid development of components to
support users quickly, with components later
replaced by Grid Toolkit elements
Deploy as an evolving production tool and as a
testing ground for Grid Toolkit components
GDMP will be incorporated
Application in DCs
Logical files can optionally be organized into
collections
File management in production replication to
BNL CERN, BNL data access
GDMP integration, replication and end-user data
access in DC1
Developments
Interface with AMI (ATLAS Metadata Interface,
allows queries on Logical File Name collections
by users, Grenoble project)
Interfaces to Virtual Data Catalogue, see AVs
talk)
Interfacing with hybrid ROOT/RDBMS event store
Athena (ATLAS offline framework) integration
further grid integration

Info http//www.usatlas.bnl.gov/magda/info Engin
e http//www.usatlas.bnl.gov/magda/dyShowMain.pl
T Wenaus, W Deng
42
Magda Architecture Schema

MySQL database at the core of the system
DB access via perl, C, java, cgi (perl)
scripts C and Java APIs auto-generated off the
MySQL DB schema
User interaction via web interface and command
line
Principal components
File catalog covering any file types
Data repositories organized into sites, each with
its locations
Computers with repository access a host can
access a set of sites
Replication operations organized into tasks
Interfacing to Data Catalogue efforts at Grenoble
Will replace with standard components e.g. GDMP
as they become available for production

43
Magda Architecture

DB access via perl, C, java, cgi (perl)
scripts
C and Java APIs auto-generated off the MySQL DB
schema
User interaction via web interface and command
line

44
Magda Sites
45
Conclusion

The Grid is the only viable solution to the ATLAS
Computing problem
The problems of coherence across the Atlantic are
large
ATLAS (and CMS etc) are at the sharp end, so we
will force the divide to be bridged
Many applications have been developed, but need
to be refined/merged
These revise our requirements we must use
LCG/GGF and any other forum to ensure the
middleware projects satisfy the real needs this
is not a test bed!
The progress so far is impressive and encouraging
Good collaborations (especially ATLAS/LHCb)
The real worry is scaling up to the full system
Money!
Manpower!
Diplomacy?!