ATLAS and the Grid - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

ATLAS and the Grid

Description:

RWL Jones, Lancaster University. ATLAS Needs Grid Applications ... Pacman - package management and distribution tool ... dependencies as PACMAN cache files ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 41
Provided by: rogerw150
Category:
Tags: atlas | grid

less

Transcript and Presenter's Notes

Title: ATLAS and the Grid


1
ATLAS and the Grid ACAT02 Moscow June
2002 RWL Jones Lancaster University
2
The ATLAS Computing Challenge
  • Running conditions at startup
  • 0.8x109 event sample ? 1.3 PB/year, before data
    processing
  • Reconstructed events, Monte Carlo data ? 10
    PB/year (3 PB on disk)
  • CPU 1.6M SpecInt95 including analysis
  • CERN alone can handle only a fraction of these
    resources

3
The Solution The Grid
Note Truly HPC, but requires more
Not designed for tight-coupled problems, but
spin-offs many
4
ATLAS Needs Grid Applications
  • The ATLAS OO software framework is Athena, which
    co-evolves with the LHCb Gaudi framework
  • ATLAS is truly intercontinental
  • In particular, it is present on both sides of the
    Atlantic
  • Opportunity the practical convergence between US
    and European Grid projects will come through the
    transatlantic applications
  • Threat There is an inevitable tendency towards
    fragmentation/divergence of effort to be resisted
  • Other relevant talks
  • Nick Brook co-development with LHCb, especially
    through UK GridPP collaboration (or rather, Ill
    present this later)
  • Alexandre Vaniachine, describing work for the
    ATLAS Data Challenges

5
Data Challenges
Test Bench Data Challenges
  • Prototype I May 2002
  • Performance and scalability testing of components
    of the computing fabric (clusters, disk storage,
    mass storage system, system installation, system
    monitoring) using straightforward physics
    applications. Test job scheduling and data
    replication software (DataGrid release 1.2)
  • Prototype II Mar 2003
  • Prototyping of the integrated local computing
    fabric, with emphasis on scaling, reliability and
    resilience to errors. Performance testing of LHC
    applications. Distributed application models
    (DataGrid release 2).
  • Prototype III Mar 2004
  • Full scale testing of the LHC computing model
    with fabric management and Grid management
    software for Tier-0 and Tier-1 centres, with some
    Tier-2 components (DataGrid release 3).

6
The Hierarchical View
1 TIPS 25,000 SpecInt95 PC (1999) 15
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm20 TIPS
  • One bunch crossing per 25 ns
  • 100 triggers per second
  • Each event is 1 Mbyte

100 MBytes/sec
Tier 0
CERN Computer Centre gt20 TIPS
Gbits/sec
or Air Freight
  • HPSS

Tier 1
UK Regional Centre (RAL)
US Regional Centre
French Regional Centre
Italian Regional Centre
  • HPSS
  • HPSS
  • HPSS
  • HPSS

Tier 2
Tier2 Centre 1 TIPS
Tier2 Centre 1 TIPS
Tier2 Centre 1 TIPS
Gbits/sec
Tier 3
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Lancaster 0.25TIPS
Sheffield
Manchester
Liverpool
Physics data cache
100 - 1000 Mbits/sec
Tier 4
Workstations
7
A More Grid-like Model
The LHC Computing Facility
8
Features of the Cloud Model
  • All regional facilities have 1/3 of the full
    reconstructed data
  • Allows more on disk/fast access space, saves tape
  • Multiple copies mean no need for tape backup
  • All regional facilities have all of the analysis
    data (AOD)
  • Resource broker can still keep jobs fairly local
  • Centres are Regional and NOT National
  • Physicists from other Regions should have also
    Access to the Computing Resources
  • Cost sharing is an issue
  • Implications for the Grid middleware on
    accounting
  • Between experiments
  • Between regions
  • Between analysis groups
  • Also, different activities will require different
    priorities

9
Resource Estimates
10
Resource Estimates
  • Analysis resources?
  • 20 analysis groups
  • 20 jobs/group/day 400 jobs/day
  • sample size 108 events
  • 2.5 SI95s/ev gt 1011 SI95 (s/day) 1.2106 SI95
  • Additional 20 for activities on smaller samples

11
Rough Architecture
Installation of Software and Env
Middleware RB, GIS
Data Catalogue
Job Configuration/VDC /metadata
User Interface to Grid experiment framework
User
12
Test Beds
  • EDG Test Bed 1
  • Common to all LHC experiments
  • Using/testing EDG test bed 1 release code
  • Already running boxed fast simulation and
    installed full simulation
  • US ATLAS Test Bed
  • Demonstrate success of grid computing model for
    HEP
  • in data production
  • in data access
  • in data analysis
  • Develop deploy grid middleware and applications
  • wrap layers around apps
  • simplify deployment
  • Evolve into fully functioning scalable
    distributed tiered grid
  • NorduGrid
  • Developing a regional test bed
  • Light-weight Grid user interface, working
    prototypes etc
  • see talk by Aleksandr Konstantinov

13
EDG Release 1.2
  • EDG has strong emphasis on middleware
    development applications come second
  • ATLAS has been testing the stable releases of
    the EDG software as they become available as part
    of WP8 (ATLAS key contact Silvia Resconi)
  • EDG Release (1.2) is under test by Integration
    Team people plus Loose Cannons (experiment
    independent people) on the development testbed at
    CERN.
  • Standard requirements must be met before the
    ATLAS Applications people test a release
  • The development testbed must consist of at
    least 3 sites in 3 different countries ( e.g.
    CERN, CNAF, RAL )
  • There must be a long ( gt 24 hours) unattended
    period with a low error rate ( lt 1 of jobs
    failed )

http//pcatl0a.mi.infn.it/resconi/validation/vali
d.html
14
EDG TestBed 1 Status28 May 2002 1703
  • Web interface showing status of (400) servers
    at testbed 1 sites
  • 5 Main Production Centres

15
GridPP Sites in Testbed(s)
16
NorduGrid Overview
  • Launched in spring 2001, with the aim of creating
    a Grid infrastructure in the Nordic countries
  • Partners from Denmark, Norway, Sweden, and
    Finland
  • Initially the Nordic branch of the EU DataGrid
    (EDG) project testbed
  • Independent developments
  • Relies on funding from NorduNet2

http//www.nordugrid.org
17
US Grid Test Bed Sites
U Michigan
Lawrence Berkeley National Laboratory
Boston University
Argonne National Laboratory
Brookhaven National Laboratory
Indiana University
Oklahoma University
University of Texas at Arlington
US -ATLAS testbed launched February 2001
18
US Hardware and Deployment
  • 8 gatekeepers - ANL, BNL, LBNL, BU, IU, UM, OU,
    UTA
  • Farms - BNL, LBNL, IU, UTA Multiple RD
    gatekeepers
  • Uniform OS through kickstart
  • Running RH 7.2 ?
  • First stage deployment
  • Pacman, Globus 2.0b, cernlib (installations)
  • Simple application package
  • Second stage deployment
  • Magda, Chimera, GDMP (Grid data management)
  • Third stage
  • MC production software VDC
  • Many US names mentioned later, thanks also to
    Craig Tull, Dan Engh, Mark Sosebee

19
Important Components
  • GridView - simple script tool to monitor status
    of test bed (Java version being developed)
  • Gripe - unified user accounts
  • Magda - MAnager for Grid Data
  • Pacman - package management and distribution tool
  • Grappa - web portal based on active notebook
    technology

20
Grid User Interface
  • Several prototype interfaces
  • GRAPPA
  • EDG
  • Nordugrid
  • Lightweight
  • Nothing experiment specific
  • GRAT
  • Line mode (and we will always need to retain line
    mode!)
  • Now defining an ATLAS/LHCb joint user interface,
    GANGA
  • Co-evolution with Grappa
  • Knowledge of experiment OO architecture needed
    (Athena/Gaudi)

21
Interfacing Athena/Gaudi to the GRID
GANGA/Grappa
GUI
GRID Services
?
Histograms Monitoring Results
jobOptions/ Virtual Data Algorithms
Athena/GAUDI Application
?
22
EDG GUI for Job Submission

23
GRAPPA
  • Based on XCAT Science Portal, framework for
    building personal science portals
  • A science portal is an application-specific Grid
    portal
  • Active notebook
  • HTML pages to describe the features of the
    notebook and how to use it
  • HTML forms which can be used to launch
    parameterizable scripts (transformation)
  • Parameters stored in a sub-notebook (derivation)
  • Very flexible
  • Jython - access to Java classes
  • Globus Java CoG kit
  • XCAT
  • XMESSAGES
  • Not every user has to write scripts
  • Notebooks can be shared among users
  • Import/export capability
  • Shava Smallen, Rob Gardner

24
GRAPPA/XCAT Science Portal Architecture
The prototype can
  • Submit Athena jobs to Grid computing elements
  • Manage JobOptions, record sessions
  • Staging and output collection supported
  • Tested on US ATLAS Grid Testbed

25
GANGA/Grappa Development Strategy
  • Completed existing technology requirement
    survey
  • Must be Grid aware but not Grid-dependent
  • Still want to be able to pack and go to a
    standalone laptop
  • Must be component-based
  • Interface Technologies (Standards needed ? GGF)
  • Programmatic API (eg. C, C, etc)
  • Scripting as Glue ala Stallman (eg. Python)
  • Others eg. SOAP, CORBA, RMI, DCOM, .NET, etc.
  • Defining the experiment software services to
    capture and present the functionality of the Grid
    service

26
Possible Designs
  • Two ways of implementation
  • Based on one of the general-purpose grid portals
    (not tied to a single application/framework)
  • Alice Environment (AliEn)
  • Grid Enabled Web eNvironment for Site-Independent
    User Job Submission (GENIUS)
  • Grid access portal for physics applications
    (Grappa)
  • Based on the concept of Python bus (P. Mato)
  • use different modules whichever are required to
    provide full functionality of the interface
  • use Python to glue this modules, i.e., allow
    interaction and communication between them

27
Python Bus
GRID
Internet
28
Modules description
  • GUI module
  • Provides basic functionality
  • Can be implemented using
  • wxPython extension module
  • Qt/Desktop C toolkit, etc.

29
Installation Tools
  • To use the Grid, deployable software must be
    deployed on the Grid fabrics, and the deployable
    run-time environment established (Unix and
    Windows)
  • Installable code and run-time environment/configur
    ation
  • Both ATLAS and LHCb use CMT for the software
    management and environment configuration
  • CMT knows the package interdependencies and
    external dependencies ? this is the obvious tool
    to prepare the deployable code and to expose
    the dependencies to the deployment tool
    (Christian Arnault, Chas Loomis)
  • Grid aware tool to deploy the above
  • PACMAN (Saul Youssef) is a candidate which seems
    fairly easy to interface with CMT

30
Installation Issues
  • Most Grid projects seem to assume either code is
    pre-installed or else can be dumped each time
    into the input sandbox
  • The only route for installation of software
    through the Grid seems to be as data in Storage
    Elements
  • In general these are non-local
  • Hard to introduce directory trees etc this way
    (file based)
  • How do we advertise installed code?
  • Check it is installed by a preparation task sent
    to the remote fabric before/with the job
  • Advertise the software is installed in your
    information service for use by the resource
    broker
  • Probably need both!
  • The local environment and external packages will
    always be a problem
  • Points to a virtual machine idea eventually
    Java?
  • Options?
  • DAR mixed reports, but CMS are interested
  • PACKMAN from AliEn
  • LGCG, OSCAR not really suitable, more for site
    management?

31
CMT and deployable code
  • Christian Arnault and Charles Loomis have a
    beta-release of CMT that will produce package
    rpms, which is a large step along the way
  • Still need to have minimal dependencies/clean
    code!
  • Need to make the package dependencies explicit
  • Rpm requires root to install in the system
    database (but not for a private installation)
  • Developer and binary installations being
    produced, probably needs further refinement
  • Work to expose dependencies as PACMAN cache files
    ongoing
  • Note much work elsewhere in producing rpms of
    ATLAS code, notably in Copenhagen this effort
    has the advantage of the full dependency
    knowledge in CMT being exposable

32
pacman
  • Package manager for the grid in development by
    Saul Youssef (Boston U, GriPhyN/iVDGL)
  • Single tool to easily manage installation and
    environment setup for the long list of ATLAS,
    grid and other software components needed to
    Grid-enable a site
  • fetch, install, configure, add to login
    environment, update
  • Sits over top of (and is compatible with) the
    many software packaging approaches (rpm, tar.gz,
    etc.)
  • Uses dependency hierarchy, so one command can
    drive the installation of a complete environment
    of many packages
  • Packages organized into caches hosted at various
    sites
  • How to fetch can be cached rather than the
    desired object
  • Includes a web interface (for each cache) as well
    as command line tools

33
  • An encryption package needed by Globus
  • name SSLeay
  • description Encryption
  • url http//www.psy.uq.oz.au/ftp/Crypto
    /ssleay
  • source http//www.psy.uq.oz.au/ftp/Crypto
    /ssleay
  • systems linux-i386 SSLeay-0.9.0b.tar.
    gz,SSLeay-0.9.0b,\
  • linux2 SSLeay-0.9.0b.tar.
    gz,SSLeay-0.9.0b,\
  • sunos5 SSLeay-0.9.0b.tar.
    gz,SSLeay-0.9.0b
  • depends
  • exists /usr/local/bin/perl
  • inpath gcc
  • bins
  • paths
  • enviros
  • localdoc README
  • daemons
  • install \
  • root ./Configure linux-elf,make clean,
    \

34
Grid Applications Toolkit
  • Horst Severini, Kaushik De, Ed May, Wensheng
    Deng, Jerry Gieraltowski. (US Test Bed)
  • Repackaged Athena-Atlfast (OO fast detector
    simulation) for grid testbed (building on Julian
    Phillips and UK effort)
  • Script 1 can run on any globus enabled node
    (requires transfer of 17MB source)
  • Script 2 runs on machine with packaged software
    preinstalled on grid site
  • Script 3 runs on afs enabled sites (latest
    version of software is used)
  • Other user toolkit contents
  • to check status of grid nodes
  • submit jobs (without worrying about underlying
    middleware or ATLAS software)
  • uses only basic RSL globus-url-copy

35
Monitoring Tool
  • GridView - a simple visualization tool using
    Globus Toolkit
  • First native Globus application for ATLAS grid
    (March 2001)
  • Collects information using Globus tools.
    Archival information is stored in MySQL server on
    a different machine. Data published through web
    server on a third machine.
  • Plans
  • Java version
  • Better visualization
  • Historical plots
  • Hierarchical MDS information
  • Graphical view of system health
  • New MDS schemas
  • Optimize archived variables
  • Publishing historical information through GIIS
    servers??
  • Explore discovery tools
  • Explore scalability to large systems
  • Patrick McGuigan

36
(No Transcript)
37
MDS Information
Listing of available object classes
38
More Details
39
MDS Team
  • Dantong Yu, Patrick McGuigan, Craig Tull, KD, Dan
    Engh
  • Site monitoring
  • publish BNL acas information ?
  • Glue schema testbed ?
  • Software installation
  • pacman information provider
  • Application monitoring
  • Grid monitoring
  • GridView, Ganglia
  • hierarchical GIIS server ?
  • GriPhyN-PPDG GIIS server ?

40
Data Management Architecture
AMI ATLAS Metatdata Interface Query
LFN Associated attributes and values
MAGDA MAnager for Grid-based Data Manage
replication, physical location
VDC Virtual Data Catalog Derive and transform
LFNs
41
Managing Data -Magda
  • MAnager for Grid-based Data (essentially the
    replica catalogue tool)
  • Designed for managed production and chaotic
    end-user usage
  • Designed for rapid development of components to
    support users quickly, with components later
    replaced by Grid Toolkit elements
  • Deploy as an evolving production tool and as a
    testing ground for Grid Toolkit components
  • GDMP will be incorporated
  • Application in DCs
  • Logical files can optionally be organized into
    collections
  • File management in production replication to
    BNL CERN, BNL data access
  • GDMP integration, replication and end-user data
    access in DC1
  • Developments
  • Interface with AMI (ATLAS Metadata Interface,
    allows queries on Logical File Name collections
    by users, Grenoble project)
  • Interfaces to Virtual Data Catalogue, see AVs
    talk)
  • Interfacing with hybrid ROOT/RDBMS event store
  • Athena (ATLAS offline framework) integration
    further grid integration

Info http//www.usatlas.bnl.gov/magda/info Engin
e http//www.usatlas.bnl.gov/magda/dyShowMain.pl
T Wenaus, W Deng
42
Magda Architecture Schema
  • MySQL database at the core of the system
  • DB access via perl, C, java, cgi (perl)
    scripts C and Java APIs auto-generated off the
    MySQL DB schema
  • User interaction via web interface and command
    line
  • Principal components
  • File catalog covering any file types
  • Data repositories organized into sites, each with
    its locations
  • Computers with repository access a host can
    access a set of sites
  • Replication operations organized into tasks
  • Interfacing to Data Catalogue efforts at Grenoble
  • Will replace with standard components e.g. GDMP
    as they become available for production

43
Magda Architecture
  • DB access via perl, C, java, cgi (perl)
    scripts
  • C and Java APIs auto-generated off the MySQL DB
    schema
  • User interaction via web interface and command
    line

44
Magda Sites
45
Conclusion
  • The Grid is the only viable solution to the ATLAS
    Computing problem
  • The problems of coherence across the Atlantic are
    large
  • ATLAS (and CMS etc) are at the sharp end, so we
    will force the divide to be bridged
  • Many applications have been developed, but need
    to be refined/merged
  • These revise our requirements we must use
    LCG/GGF and any other forum to ensure the
    middleware projects satisfy the real needs this
    is not a test bed!
  • The progress so far is impressive and encouraging
  • Good collaborations (especially ATLAS/LHCb)
  • The real worry is scaling up to the full system
  • Money!
  • Manpower!
  • Diplomacy?!
Write a Comment
User Comments (0)
About PowerShow.com