Title: The Grid: What, Why, How, When?
1The Grid What, Why, How, When?
Tony Doyle
2Contents
- Data is everywhere
- E-Science
- LHC Motivation
- What are the Challenges?
- What are the limits?
- What is the Grid?
- How does the Grid Work?
- (Security) Interlude
- What is GridPP?
- GridPP in context
- Organisation
- Applications development
- Middleware development
- Hardware deployment
- LCG deployment status
- Is GridPP a Grid?
- What was GridPP1
- What lies ahead?
- Summary
3Data is everywhere
What is done with data?
Nothing
Read it
Listen to it
Analyse it
Watch it
Calculate what the weather is going to do
Calculate how proteins fold
"Job"
4Science generates data and might require a Grid?
Earth Observation
Bioinformatics
Astronomy
Digital Curation
Healthcare
?
Collaborative Engineering
5e-Science
Dr John Taylor - Director General of Research
Councils
"Science increasingly done through distributed
global collaborations enabled by the internet
using very large data collections, terascale
computing resources and high performance
visualisation"
"e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it"
"e-Science will change the dynamic of the way
Science is undertaken"
61. Rare Phenomena - Huge Background
Why (particularly) the LHC?
2. Complexity
All interactions
9 orders of magnitude
The HIGGS
When you are face to face with a difficulty you
are up against a discovery Lord Kelvin
7What are the challenges?
- Must
- share data between thousands of scientists with
multiple interests - link major (Tier-0 Tier-1) and minor (Tier-1
Tier-2) computer centres - ensure all data accessible anywhere, anytime
- grow rapidly, yet remain reliable for more than a
decade - cope with different management policies of
different centres - ensure data security
- be up and running routinely by 2007
8What are the challenges?
2. Software efficiency
1. Software process
3. Deployment planning
4. Link centres
10. Policies
5. Share data
Data Management, Security and Sharing
8. Analyse data
9. Accounting
6. Manage data
7. Install software
9What are the limits on Data?Advanced Areal
Density Trends
1 PetaBit/in2 !!
Technical Progress Technology Boundaries
Atom Surface Density
1000000
Atom Level Storage
100000
1 Terabit/in2 !
Volumetric Optical
10000
Probe Contact Area Viability
)
1000
2
?
Probe
?
100
Superparamagnetic Effect
Areal Density (Gb/in
?
Magnetic Disk
10
Tape Demos
?
Optical Disk
1
Currently disk capacity doubles every year
(or so) for unit cost.
Serpentine Longitudinal Tape
0.1
Helical Tape
0.01
Parallel Track
Longitudinal Tape
0.001
1987
1992
1997
2002
2007
2012
2017
2022
Year
LHC era
M. Leonhardt 4-9-02
10What are the limits on CPU?Moores Law
No Exponential is Forever but We Can Delay
'Forever ftp//download.intel.com/
research/silicon/ Gordon_Moore _ISSCC_021003.pdf
Technical Progress Technology Boundaries
Currently CPU performance doubles every two
years (or so) for unit cost.
LHC era
11Applies to our problem?
Step-1.. financial planning
Step-2.. Compare to (e.g. Tier-1) expt.
requirements
Ian Foster / Carl Kesselman "A computational
Grid is a hardware and software infrastructure
that provides dependable, consistent, pervasive
and inexpensive access to high-end computational
capabilities."
Step-3.. Conclude that more than one centre is
needed
Step-4.. A Grid?
Currently network performance doubles every
year (or so) for unit cost.
12Electricity Grid
- Analogy with the Electricity Power Grid
Power Stations
Distribution Infrastructure
'Standard Interface'
13Computing Grid
Computing and Data Centres
Fibre Optics of the Internet
14What is the Grid? Hour Glass
I. Experiment Layer e.g. Portals
II. Application Middleware e.g. Metadata
III. Grid Middleware e.g. Information Services
IV. Facilities and Fabrics e.g. Storage Services
15How do I start? http//www.gridpp.ac.uk/start/
- Getting started as a Grid user
- Quick start guide for LCG2GridPP guide to
starting as a user of the Large Hadron Collider
Computing Grid. - Getting an e-science certificateIn order to use
the Grid you need a Grid certificate. This page
introduces the UK e-Science Certification
Authority, which issues cerficates to users. You
can get a certificate from here. - Using the LHC Computing Grid (LCG)CERN's guide
on the steps you need to take in order to become
a user of the LCG. This includes contact details
for support. - LCG user scenarioThis describes in a practical
way the steps a user has to follow to send and
run jobs on LCG and to retrieve and process the
output successfully. - Currently being improved..
16Job Submission(behind the scenes)
Replica Catalogue
Information Service
Resource Broker
Author. Authen.
Job Submission Service
Logging Book-keeping
Compute Element
17From Web to Grid Who do you trust?
No-one?
It depends on what you want (assume its
scientific collaboration)
18How do I Authorise?
oxyz,dceu-datagrid, dcorg
otestbed,dceu-datagrid, dcorg
VODirectory
CNSteven Hawking
AuthorizationDirectory
Authentication Certificate
Authentication Certificate
Authentication Certificate
19What is GridPP?
- 19 UK Universities, CCLRC (RAL Daresbury) and
CERN - Funded by the Particle Physics and Astronomy
Research Council (PPARC) - GridPP1 - 2001-2004 17m "From Web to Grid"
- GridPP2 - 2004-2007 16m "From Prototype to
Production"
20GridPP in Context
Experiments
Tier-2 Centres
Apps Int
Institutes
GridPP
UK Core e-Science Programme
EGEE
Not to scale!
21Enabling Grids for E-science in Europe
- Deliver a 24/7 Grid service to European science
- build a consistent, robust and secure Grid
network that will attract additional computing
resources. - continuously improve and maintain the middleware
in order to deliver a reliable service to users. - attract new users from industry as well as
science and ensure they receive the high standard
of training and support they need. - 100 million euros/4years, funded by EU
- gt400 software engineers service support
- 70 European partners
22- In future will include services to facilitate
collaborative (grid) computing - Authentication (PKI X509)
- Job submission/batch service
- Resource brokering
- Authorisation
- Virtual Organisation management
- Certificate management
- Information service
- Data access/integration
- (SRB/OGSA-DAI/DQPS)
- National Registry (of registrys)
- Data replication
- Data caching
- Grid monitoring
- Accounting
Level-2 Grid
23CB
PMB
Deployment Board
User Board
Tier1/Tier2, Testbeds, Rollout Service specificat
ion provision
Requirements Application Development User feedb
ack
Metadata
Storage
Workload
Network
Security
Info. Mon.
24Mapping Grid Structures
Requirements Application Development User feedb
ack
User Board
User Board
Expmts
ARDA
Deployment Board
PMB
Metadata
Workload
Network
Security
Info. Mon.
Storage
LCG
Tier1/Tier2, Testbeds, Rollout Service specificat
ion provision
EGEE
Deployment Board
25Application Development
ATLAS
LHCb
CMS
SAMGrid (FermiLab)
BaBar (SLAC)
QCDGrid
PhenoGrid
26Middleware Development
Network Monitoring
Configuration Management
Grid Data Management
Storage Interfaces
Information Services
Security
27UK Tier-1/A Centre Rutherford Appleton Laboratory
- High quality data services
- National and international role
- UK focus for international Grid development
- 1000 CPU
- 200 TB Disk
- 60 TB Tape (Capacity 1PB)
28UK Tier-2 Centres
ScotGrid Durham, Edinburgh, Glasgow
NorthGrid Daresbury, Lancaster,
Liverpool, Manchester, Sheffield SouthGrid Birming
ham, Bristol, Cambridge, Oxford, RAL PPD,
Warwick LondonGrid Brunel, Imperial, QMUL, RHUL,
UCL
29GridPP Deployment Status
GridPP deployment is part of LCG (Currently the
largest Grid in the world) The future Grid in
the UK is dependent upon LCG releases
- Three Grids on Global scale in HEP (similar
functionality) - sites CPUs
- LCG (GridPP) 90 (15) 8700 (1500)
- Grid3 USA 29 2800
- NorduGrid 30 3200
30LCG Overview
- By 2007
- 100,000 CPUs
- - More than 100 institutes worldwide
- building on complex middleware being developed
in advanced Grid technology projects, both in
Europe (Glite) and in the USA (VDT) - prototype went live in September 2003 in 12
countries - Extensively tested by the LHC experiments during
this summer
31Deployment Status (26/10/04)
- Incremental releases significant improvements in
reliability, performance and scalability - within the limits of the current architecture
- scalability is much better than expected a year
ago - Many more nodes and processors than anticipated
- installation problems of last year overcome
- many small sites have contributed to MC
productions - Full-scale testing as part of this years data
challenges - GridPP The Grid becomes a reality widely
reported -
British Embassy (USA)
British Embassy (Russia)
Technology Sites
32Data Challenges
- Ongoing..
- Grid and non-Grid Production
- Grid now significant
- ALICE - 35 CPU Years
- Phase 1 done
- Phase 2 ongoing
LCG
- CMS - 75 M events and 150 TB first of this
years Grid data challenges
Entering Grid Production Phase..
33Data Challenge
- 7.7 M GEANT4 events and 22 TB
- UK 20 of LCG
- Ongoing..
- (3) Grid Production
- 150 CPU years so far
- Largest total computing requirement
- Small fraction of what ATLAS need..
Entering Grid Production Phase..
34LHCb Data Challenge
- 424 CPU years (4,000 kSI2k months), 186M events
- UKs input significant (gt1/4 total)
- LCG(UK) resource
- Tier-1 7.7
- Tier-2 sites
- London 3.9
- South 2.3
- North 1.4
- DIRAC
- Imperial 2.0
- L'pool 3.1
- Oxford 0.1
- ScotGrid 5.1
Entering Grid Production Phase..
35Paradigm ShiftTransition to Grid
424 CPU Years
Jun 8020 25 of DC04
May 8911 11 of DC04
Aug 2773 42 of DC04
Jul 7723 22 of DC04
36More Applications
- ZEUS uses LCG
- needs the Grid to respond to increasing demand
for MC production - 5 million Geant events on Grid since August 2004
- QCDGrid
- For UKQCD
- Currently a 4-site data grid
- Key technologies used
- - Globus Toolkit 2.4
- - European DataGrid
- eXist XML database
- managing a few hundred gigabytes of data
37Issues
First large-scale Grid production problems
being addressed at all levels
38Is GridPP a Grid?
5
- Coordinates resources that are not subject to
centralized control - using standard, open, general-purpose protocols
and interfaces - to deliver nontrivial qualities of service
- YES.
- This is why development and maintenance of LCG
is important. - YES.
- VDT (Globus/Condor-G) EDG/EGEE(Glite) meet
this requirement. - YES.
- LHC experiments data challenges over the summer
of 2004.
http//www-fp.mcs.anl.gov/foster/Articles/WhatIsT
heGrid.pdf
http//agenda.cern.ch/fullAgenda.php?idaa042133
39What was GridPP1?
- A team that built a working prototype grid of
significant scale - gt 1,500 (7,300) CPUs
- gt 500 (6,500) TB of storage
- gt 1000 (6,000) simultaneous jobs
- A complex project where 82 of the 190 tasks for
the first three years were completed
A Success The achievement of something desired,
planned, or attempted
40Aims for GridPP2? From Prototype to Production
BaBarGrid
BaBar
EGEE
SAMGrid
D0
CDF
ATLAS
LHCb
EDG
ARDA
GANGA
LCG
CMS
ALICE
LCG
CERN Tier-0 Centre
CERN Prototype Tier-0 Centre
CERN Computer Centre
UK Tier-1/A Centre
UK Prototype Tier-1/A Centre
RAL Computer Centre
4 UK Tier-2 Centres
19 UK Institutes
4 UK Prototype Tier-2 Centres
Separate Experiments, Resources, Multiple Accounts
Prototype Grids
'One' Production Grid
2004
2007
2001
41Planning GridPP2 ProjectMap
Structures agreed and in place (except LCG
phase-2)
42What lies ahead? Some mountain climbing..
Annual data storage 12-14 PetaBytes per year
CD stack with 1 year LHC data ( 20 km)
100 Million SPECint2000
Importance of step-by-step planning
Pre-plan your trip, carry an ice axe and crampons
and arrange for a guide
Concorde (15 km)
In production terms, weve made base camp
? 100,000 PCs (3 GHz Pentium 4)
We are here (1 km)
Quantitatively, were 9 of the way there in
terms of CPU (9,000 ex 100,000) and disk (3 ex
12-143 years)
43- Why? 2. What?
- 3. How? 4. When?
- From Particle Physics perspective the Grid is
- 1. needed to utilise large-scale computing
resources efficiently and securely - 2. a) a working prototype running today on large
testbed(s) - b) about seamless discovery of computing
resources - c) using evolving standards for interoperation
- d) the basis for computing in the 21st Century
- e) not (yet) as transparent as end-users want it
to be - 3. see the GridPP getting started pages
- (two-day EGEE training courses available)
- a) Now, at prototype level, for simple(r)
applications (e.g. experiment Monte Carlo
production) - b) September 2007 for more complex applications
(e.g. data analysis) ready for LHC