Title: Achieving the Vision: Grid2003 and Beyond
1Achieving the Vision Grid2003 and Beyond
Paul Avery University of Florida avery_at_phys.ufl.ed
u
2- Grid2003 Running since Oct. 2003
- 27 sites (U.S., Korea)
- 2100-2800 CPUs
- 700-1100 concurrent jobs
- 10 applications
Korea
http//www.ivdgl.org/grid2003
3Grid2003 A Collaborative Effort
- Participants sites in U.S. Trillium Grid
projects - PPDG (Particle Physics Data Grid)
- GriPhyN
- iVDGL (International Virtual Data Grid
Laboratory) - (US-ATLAS, US-CMS, BTEV, LIGO, SDSS, Computer
Science) - US-ATLAS and US-CMS effort
- Fermilab, LBL, Argonne
- U. New Mexico, U. Texas Arlington
- International sites affiliated with LHC
experiments - Kyungpook National University (Korea, CMS)
- New sites
- University of Buffalo (CCR)
4Grid2003 A Federated Approach
- Federation Example from US-LHC testbeds
- Local responsibility for facilities, but
reporting to US-LHC projects - Systems and support, local resources w/ well
defined interfaces - General grid-wide services provided by (some)
sites - Six distinct Virtual Organizations (VOs) within
Grid2003 - US-ATLAS
- US-CMS
- BTEV
- LIGO
- SDSS
- iVDGL
5Organization in Grid2003 Federation
- Grid sites Autonomy, control, agreements,
policies - Setup and manage systems (mix local and Grid use)
- Install configure middleware head nodes
- Automate central monitoring, validation,
diagnosis - Grid system services
- Collaborative approach to bringing up cross-site
services(VO management, monitoring,
configuration management) - Interfaces well defined through VDT and services
- Robust against single point of failure
- Grid application groups
- 10 applications in several domains
- End-to-end operations, diagnosis and production
services
6Middleware Packaging and Distribution
- Virtual Data Toolkit (VDT) from GriPhyN
- Globus, Condor, GriPhyN Chimera, Pegasus, DAGMAN
- Pacman from iVDGL
- Meta-packaging and distribution tool
- VO management scripts from EDG
- Mapping user accounts across multiple VOs
- Schema and information providers
- From joint DataTAG/EDG/Trillium GLUE project
- MonALISA monitoring framework from Caltech
- Netlogger monitoring package from DOE Science
Grid - Upgrades
- Upgraded from VDT 1.1.9 to VDT 1.1.11 during
project - Upgrade from MDS 2.2 to MDS 2.4
7Applications Run on Grid2003
- High energy physics
- US-ATLAS analysis (DIAL), US-ATLAS simulation
(GCE) - US-CMS simulation (MOP)
- BTeV simulation
- Gravity waves
- LIGO blind search for continuous sources
- Digital astronomy
- SDSS cluster finding (maxBcg)
- Bioinformatics
- Bio-molecular analysis (SnB)
- Genome analysis (GADU/Gnare)
- CS Demonstrators
- Job Exerciser, GridFTP Demo, NetLogger-grid2003
8Grid2003 A Necessary Step
- Learning how to cope with large scale
- Interesting failure modes as scale increases
- Enormous human burden, barely possible on SC2003
timescale - Previous experience from Grid testbeds critical
- Learning how to operate a Grid
- Add sites, recover from errors, provide
information,update software, add sites, test
applications, - Need tools, services, procedures, documentation,
organization - Need reliable, intelligent, skilled people
- Learning how to delegate responsibilities
- Multiple levels Project, VO, service, site,
application - Essential for future growth
- Grid2003 experience critical for building
useful Grids - See Grid2003 Project Lessons for details
9Grid2003 A SUCCESS Story!
- Much larger than originally planned
- More sites, CPUs, simultaneous jobs
- More applications (10) in more diverse areas
- Able to accommodate a new institution
application - U Buffalo
- Survived updates of critical software
- VDT, MDS, MonaLISA
- Still operational after 2.5 months
- US-CMS using it for production simulations
- twice resources than in US-CMS alone (next slide)
10US-CMS Production
USCMS
Non-USCMS
11Lesson 1 Building Stuff Matters
- Building something brings out the best in people
- (Similar to a large HEP detector)
- Cooperation
- Willingness to invest time
- Striving for excellence!
- Grid development requires significant deployments
- CMS testbed debugging Globus, Condor (early
2002) - ATLAS testbed early development of Grid tools
- SDSS, LIGO, CMS virtual data tools
- Powerful training mechanism
- Good starting point for new institutions
12Lesson 2 Packaging Matters
- VDT and Pacman
- Simple installation, configuration of Grid tools
( applications) - Hugely important for first testbeds in 2002
- Major advances over 13 VDT releases
- Great improvements expected in Pacman 3
- Packaging is a strategic issue!
- More than a convenience crucial to our future
success - Packaging ? Uniformity automation ? lower
barriers to scaling - Automation is the next frontier
- Reduce FTE overhead, communication traffic
- Automate installation, configuration, testing,
validation - Automate software updates, enable remote
installation, etc. - Develop a complete Grid2003 installation in
Pacman 3?
13Grid2003 and Beyond (1)
- Continuing commitment of Grid2003 stakeholders
- Deploy Functional Demonstration Grids
Grid2004, Grid2005, - Continuing evolution of Functional Demonstration
Grids - New release every 6-12 months, increasing
functionality scale - Continuing commitment to Grid and related RD
- CS research, VDT improvements (GriPhyN, PPDG)
- Security (PPDG)
- Advanced monitoring (MonALISA/GEMS, MDS, )
- Collaborative tools, e.g. VRVS, AG,
14Grid2003 and Beyond (2)
- Continuing development of new tools, services
- Grid enabled analysis
- UltraLight infrastructures CPU storage
optical networks - Continuing development and exploitation of
networks - National HENP WG on Internet2, National Lambda
Rail - International SCIC, AMPATH, world data xfer
speed records
15Chimera Virtual Data System
- Virtual Data Language (VDL)
- Describes virtual data products
- Virtual Data Catalog (VDC)
- Used to store VDL
- Abstract Job Flow Planner
- Creates a logical DAG (dependency graph)
- Concrete Job Flow Planner
- Interfaces with a Replica Catalog
- Provides a physical DAG submission file to
Condor-G - Generic and flexible
- As a toolkit and/or a framework
- In a Grid environment or locally
VDC
AbstractPlanner
XML
XML
VDL
DAX
ReplicaCatalog
ConcretePlanner
Virtual data CMS production MCRunJob
DAG
DAGMan
16National Light Rail Footprint
- Started in 2003
- Initial 4?10 Gb/s
- Future 40?10 Gb/s
17UltraLight
Unified Infrastructure Computing, Storage,
Networking
- 10 Gb/s network
- Caltech, UF, FIU, UM, MIT
- SLAC, FNAL, BNL
- Intl partners
- Cisco, Level(3), Internet2
18Grid2003 and Beyond (3)
- Continuing commitment to international
collaboration - Close coordination with LCG
- Constant participation in LHC production
computing exercises - Development of new international partners
(Brazil, Korea, ) - GLORIAD, ITER,
- Continuing commitment to multi-disciplinary
activities - HEP, CS, LIGO, Astronomy, Biology, Coastal
Engineering, - Continuing evolution of interactions w/ funding
agencies - Partnership of DOE (labs) and NSF (universities)
- Close interaction of Directorates within NSF
(e.g., CHEPREO) - Continuing commitment to coordinated outreach
- QuarkNet, GriPhyN, iVDGL, PPDG, CHEPREO, CMS,
ATLAS - Jan. 29-30 Needs Assessment Workshop in Miami
- Digital Divide efforts (Feb. 15-20 Rio workshop)
19An Inter-Regional Center for High Energy Physics
Research and Educational Outreach (CHEPREO) at
Florida International University
- E/O Center in Miami area
- iVDGL Grid Activities
- CMS Research
- AMPATH network (S. America)
Funded September 2003
20Is Grid2003 a Path to Open Science Grid?
- Yes (previous slides)
- but not the whole story
- Security
- User account management
- Storage management
- Cluster management
- Accounting
- Database integration
- Optical network integration
- Heterogeneity (IA64, G5, other Linux flavors)
- MPI type applications
- More (applications, manpower, computing
resources) -
- We need collaborators!