Title: eScience
1eScience Grid ComputingGraduate Lecture
- 5th November 2012
- Robin Middleton PPD/RAL/STFC
- (Robin.Middleton_at_stfc.ac.uk)
I am indebted to the EGEE, EGI, LCG and
GridPP projects and to colleagues therein for
much of the material presented here.
2eScience Graduate Lecture
A high-level look at some aspects of computing
for particle physics today
- What is eScience, what is the Grid ?
- Essential grid components
- Grids in HEP
- The wider picture
- Summary
3What is eScience ?
- also e-Infrastructure, cyberinfrastructure,
e-Research, - Includes
- grid computing (e.g. WLCG, EGEE, EGI, OSG,
TeraGrid, NGS) - computationally and/or data intensive highly
distributed over wide area - digital curation
- digital libraries
- collaborative tools (e.g. Access Grid)
- other areas
- Most UK Research Councils active in e-Science
- BBSRC
- NERC (e.g. climate studies, NERC DataGrid -
http//ndg.nerc.ac.uk/ ) - ESRC (e.g. NCeSS - http//www.merc.ac.uk/ )
- AHRC (e.g. studies in collaborative performing
arts) - EPSRC (e.g. MyGrid - http//www.mygrid.org.uk/ )
- STFC (e.g. GridPP - http//www.gridpp.ac.uk/ )
4eScience year 2000
- Professor Sir John Taylor, former (1999-2003)
Director General of the UK Research Councils,
defined eScience thus - science increasingly done through distributed
global collaborations enabled by the internet,
using very large data collections, terascale
computing resources and high performance
visualisation. - Also quotes from Professor Taylor
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it. - e-Science will change the dynamic of the way
science is undertaken.
5What is Grid Computing ?
- Grid Computing
- term invented in 1990s as metaphor for making
computer power as easy to access as the electric
power grid(Foster Kesselman - "The Grid
Blueprint for a new computing infrastructure) - combines computing resources from multiple
administrative domains - CPU and storageloosely coupled
- serves the needs of one or more virtual
organisations (e.g. LHC experiments) - different from
- Cloud Computing (e.g. Amazon Elastic Compute
Cloud - http//aws.amazon.com/ec2/ ) - Volunteer Computing (SETI_at_home, LHC_at_home -
http//boinc.berkeley.edu/projects.php )
6Essential Grid Components
- Middleware
- Information System
- Workload Management Portals
- Data Management
- File transfer
- File catalogue
- Security
- Virtual Organisations
- Authentication
- Authorisation
- Accounting
7Information System
- At the heart of the Grid
- Hierarchy of BDII (LDAP) servers
- GLUE information schema
- (http//www.ogf.org/documents/GFD.147.pdf)
- LDAP (Lightweight Directory Access Protocol)
- tree structure
- DN Distinguished Name
8Workload Management System (WMS)
- For example - composed of the following parts
- User Interface (UI) access point for the user
to the WMS - Resource Broker (RB) the broker of GRID
resources, responsible to find the best
resources where to submit jobs - Job Submission Service (JSS) provides a
reliable submission system - Information Index (BDII) a server (based on
LDAP) which collects information about Grid
resources used by the Resource Broker to rank
and select resources - Logging and Bookkeeping services (LB) store Job
Info available for users to query - However, you are much more likely to use a portal
to submit work
- Executable gridTest
- StdError stderr.log
- StdOutput stdout.log
- InputSandbox /home/robin/test/gridTest
- OutputSandbox stderr.log, stdout.log
- InputData lfntestbed0-00019
- DataAccessProtocol gridftp
- Requirements other.ArchitectureINTEL \
other.OpSysLINUX other.FreeCpus gt4 - Rank other.GlueHostBenchmarkSF00
Example JDL
9Portals - Ganga
- Job Definition Management
- Implemented in Python
- Extensible plug-ins
- Used ATLAS, LHCb non-HEP
- http//ganga.web.cern.ch/ganga/index.php
10Data Management
- Storage Element (SE)
- gt1 implementation
- all are accessed through SRM (Storage Resource
Manager) interface - DPM Disk Pool Manager (disk only)
- secure authentication via GSI, authorisation via
VOMS - full POSIX ACL support with DN (userid) and VOMS
groups - disk pool management (direct socket interface)
- storage name space (aka. storage file catalog)
- DPM can act as a site local replica catalog
- SRMv1, SRMv2.1 and SRMv2.2
- gridFTP, rfio
- dCache (disk tape) developed at DESY
- ENSTORE developed at Fermilab
- CASTOR devloped at CERN
- Cern Advanced STORage manager
- HSM Hierarchical Storage Manager
- disk cache tape
11File Transfer Service
- File Transfer Service is a data movement fabric
service - multi-VO service, balance usage of site resources
according to VO and site policies - uses SRM and gridFTP services of an Storage
Element (SE) - Why is it needed ?
- For the user, the service it provides is the
reliable point to point movement of Storage URLs
(SURLs) among Storage Elements - For the site manager, it provides a reliable and
manageable way of serving file movement requests
from their VOs - For the VO manager, it provides ability to
control requests coming from users(re-ordering,
prioritization,...)
12File Catalogue
- LFC LHC File Catalogue - a file location
service - Glossary
- LFN Logical File Name GUID Global Unique
ID SURL Storage URL - Provides a mapping from one or more LFN to the
physical location of file - Authentication authorisation is via a grid
certificate - Provides very limited metadata size, checksum
- Experiments usually have a metadata catalogue
layered above LFC - e.g. AMI ATLAS Metadata Interface
13Grid Security
- Based around X.509 certificates Public Key
Infrastructure (PKI) - issued by Certificate Authorities
- forms a hierarchy of trust
- Glossary
- CA Certificate Authority
- RA Registration Authority
- VA Validation Authority
- How it Works
- User applies for certificate with public key at a
RA - RA confirms user's identity to CA which in turn
issues the certificate - User can then digitally sign a contract using the
new certificate - User identity is checked by the contracting party
with VA - VA receives information about issued certificates
by CA
14Virtual Organisations
- Aggregation of groups ( individuals) sharing use
of (distributed) resources to a common end under
an agreed set of policies - a semi-informal structure orthogonal to normal
institutional allegiances - e.g. A HEP Experiment
- Grid Policies
- Acceptable use Grid Security New VO
registration - http//proj-lcg-security.web.cern.ch/proj-lcg-secu
rity/security_policy.html - VO specific environment
- experiment libraries, databases,
- resource sites declare which VOsit will support
15Security - The Three As
- Authentication
- verifying that you are who you say you are
- your Grid Certificate is your passport
- Authorisation
- knowing who you are, validating what you are
permitted to do - e.g. submit analysis jobs as a member of LHCb
- e.g. VO capability to manage production software
- Accounting (auditing)
- local logging what you have done your jobs !
- aggregated into grid-wide respository
- provides
- usage statistics
- information source in event of security incident
16Grids in HEP
- LCG EGEE EGI Projects
- GridPP
- The LHC Computing Grid
- Tiers 0,1,2
- The LHC OPN
- Experiment Computing Models
- Typical data access patterns
- Monitoring
- Resource providers view
- VO view
- End-user view
17LCG?EGEE-gtEGI
LCG ? LHC Computing Grid Distributed Production
Environment for Physics Data Processing Worlds
largest production computing grid In 2011
gt250,000 CPU cores, 15PB/Yr, 8000 physicist, 500
institutes
EGEE ? Enabling Grids for E-sciencE Starts from
LCG infrastructure Production Grid in 27
countries HEP, BioMed, CompChem, Earth Science,
EU Support
18GridPP
- Phase 1 2001-2004
- Prototype (Tier-1)
- Phase 2 2004-2008
- From Prototype to Production
- Production (Tier-12)
- Phase 3 2008-2011
- From Production to Exploitation
- Reconstruction, Monte Carlo, Analysis
- Phase 4 2011-2014
- routine operation during LHC running
- Integrated within the LCG/EGI framework
- UK Service Operations (LCG/EGI)
- Tier-1 Tier-2s
- HEP Experiments
- _at_ LHC, FNAL, SLAC
- GANGA (LHCb ATLAS)
- Working with NGS informing the UK NGI for EGI
Tier-1 Farm Usage
19LCG The LHC Computing Grid
- Worldwide LHC Computing Grid - http//lcg.web.cern
.ch/lcg/ - Framework to deliver distributed computing for
theLHC experiments - Middleware / Deployment
- (Service/Data Challenges)
- Security (operations policy)
- Applications (Experiment) Software
- Distributed Analysis
- Private Optical Network
- Experiments ? Resources ? MoUs
- Coverage
- Europe ? EGI
- USA ? OSG
- Asia ? Naregi, Taipei,China
- Other
20LHC Computing Model
The LHC Computing Centre
CERN Tier 0
21LHCOPN Optical Private Network
- Principle means to distribute LHC data
- Primarily linking Tier-0 and Tier-1s
- Some Tier-1 to Tier-1 Traffic
- Runs over leased lines
- Some resilience
- Mostly based on10 Gigabit technology
- Reflects Tierarchitecture
22LHC Experiment Computing Models
- General (ignoring experiment specifics)
- Tier-0 (_at_CERN)
- 1st pass reconstruction (including initial
calibration) - RAW data storage
- Tier-1
- Re-processing some centrally organised analysis
- Custodial copy of RAW data, some ESD, all AOD,
some SIMU - Tier-2
- (chaotic) user analysis simulation
- some AOD (depends on local requirements)
- Event sizes determine disk buffers at experiments
Tier-0 - Event datasets
- formats (RAW, ESD, AOD, etc)
- (adaptive) placement (near analysis) replicas
- Data streams physics specific, debug,
diagnostic, express, calibration - CPU storage requirements
- Simulation
23Typical Data Access Patterns
Access Rates (aggregate, average) 100 Mbytes/s
(2-5 physicists) 500 Mbytes/s (5-10
physicists) 1000 Mbytes/s (50 physicists) 2000
Mbytes/s (150 physicists)
Typical LHC particle physics experiment One year
of acquisition and analysis of data
Raw Data 1000 Tbytes
Reco-V1 1000 Tbytes
Reco-V2 1000 Tbytes
ESD-V1.1 100 Tbytes
ESD-V1.2 100 Tbytes
ESD-V2.1 100 Tbytes
ESD-V2.2 100 Tbytes
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
24Monitoring
- A resource providers view
25Monitoring
- Virtual Organisation specifics
26Monitoring - Dashboards
- Virtual Organisation view
- e.g. ATLAS dashboard
27Monitoring Dashboards
- For the end user
- available through dashboard
28The wider Picture
- What some other communities do with Grids
- The ESFRI projects
- Virtual Instruments
- Digital Curation
- Clouds
- Volunteer Computing
- Virtualisation
29What are other communities doing with grids ?
- Astronomy Astrophysics
- large-scale data acquisition, simulation, data
storage/retrieval - Computational Chemistry
- use of software packages (incl. commercial) on
EGEE - Earth Sciences
- Seismology, Atmospheric modeling, Meteorology,
Flood forecasting, Pollution - Fusion (build up to ITER)
- Ion Kinetic Transport, Massive Ray Tracing,
Stellarator Optimization. - Computer Science
- collect data on Grid behaviour (Grid Observatory)
- High Energy Physics
- four LHC experiments, BaBar, D0, CDF, Lattice
QCD, Geant4, SixTrack, - Life Sciences
- Medical Imaging, Bioinformatics, Drug discovery
- WISDOM drug discovery for neglected / emergent
diseases(malaria, H5N1, )
30ESFRI Projects(European Strategy Forum on
Research Infrastructures)
- Many are starting to look at their e-Science
needs - some at a similar scale to the LHC (petascale)
- project design study stage
- http//cordis.europa.eu/esfri/
Cherenkov Telescope Array
31Virtual Instruments
- Integration of scientific instruments into the
Grid - remote operation, monitoring, scheduling,
sharing - GridCC - Grid enabled Remote Instrumentation with
Distributed Control and Computation - CR build workflows to monitor control remote
instruments in real-time - CE, SE, ES , IS SS as in a normal grid
- Monitoring services
- Instrument Element (IE)
- - interfaces for remote control monitoring
- CMS run control includes an IEbut notreally
exploited (yet) ! - DORII Deployment Of Remote Instrumentation
Infrastructure - Consolidation of GridCC with EGEE,
g-Eclipse,Open MPI, VLab - The Liverpool Telescope - robotic
- not just remote control, but fully autonomous
- scheduler operates on basis of observingdatabase
- (http//telescope.livjm.ac.uk/)
32Digital Curation
- Preservation of digital research data for future
use - Issues
- media data formats metadata data management
tools reading (FORTRAN) ... - digital curation lifecycle - http//www.dcc.ac.uk/
digital-curation/what-digital-curation - Digital Curation Centre - http//www.dcc.ac.uk/
- NOT a repository !
- strategic leadership
- influence national (international) policy
- expert advice for both users and funders
- maintains suite of resources and tools
- raise levels of awareness and expertise
33JADE (1978-86)
- New results from old data
- new improved theoretical calculations MC
models optimised observables - better understanding of Standard Model (top, W,
Z) - re-do measurements better precision, better
systematics - new measurements, but at (lower) energies not
available today - new phenomena check at lower energies
- Challenges
- rescue data from (very) old media resurrect old
software data management implement modern
analysis techniques - but, luminosity files lost recovered from ASCII
printout in an office cleanup - Since 1996
- 10 publications (as recent as 2009)
- 10 conference contributions
- a few PhD Theses
- (ack S.Bethke)
34What is HEP doing about it ?
- ICFA Study Group on Data Preservation and Long
Term Analysis in High Energy Physics
https//www.dphep.org/ - 5 Workshops so far intermediate report to ICFA
- Available at arxiv0912.0255
- Initial recommendationsDecember 2009
- Blueprint for DataPreservation in HighEnergy
Physics to follow
35Grids, Clouds, Supercomputers,
(Ack Bob Jones former EGEE Project Director)
- Grids
- Collaborative environment
- Distributed resources (political/sociological)
- Commodity hardware (also supercomputers)
- (HEP) data management
- Complex interfaces (bug not feature)
- Supercomputers
- Expensive
- Low latency interconnects
- Applications peer reviewed
- Parallel/coupled applications
- Traditional interfaces (login)
- Also SC grids (DEISA, Teragrid)
- Clouds
- Proprietary (implementation)
- Economies of scale in management
- Commodity hardware
- Virtualisation for service provision and
encapsulating application environment - Details of physical resources hidden
- Simple interfaces (too simple?)
- Volunteer computing
- Simple mechanism to access millions CPUs
- Difficult if (much) data involved
- Control of environment ? check
- Community building people involved in Science
- Potential for huge amounts of real work
35
36Clouds / Volunteer Computing
- Clouds are largely commercial
- Pay for use
- Interfaces from grids exist
- absorb peak demands(e.g. before a conference !)
- CernVM images exist
- Volunteer Computing
- LHC_at_Home
- SixTrack study particle orbitstability in
accelerators - Garfield study behaviour of gas-based detectors
37Virtualisation
- Virtual implementation of a resource e.g. a
hardware platform - a current buzzword, but not new IBM launched
VM/370 in 1972 ! - Hardware virtualisation
- one or more virtual machines running an operating
system within a host system - e.g. run Linux (guest) in a virtual machine (VM)
with Microsoft Windows (host) - independent of hardware platform migration
between (different) platforms - run multiple instances on one box provides
isolation (e.g. against rogue s/w) - Hardware-assisted virtualisation
- not all machine instructions are virtualisable
(e.g. some privileged instructions) - h/w-assist traps such instructions and provides
hardware emulation of them - Implementations
- Zen, VMware, VirtualBox, Microsoft Virtual PC,
- Interest to HEP ?
- the above opportunity to tailor to experiment
needs (e.g. libraries, environment) - CernVM CERN specific Linux environment -
http//cernvm.cern.ch/portal/ - CernVM-FS network filesystem to access
experiment specific software - Security certificate to assure origin/validity
of VM
38Summary
- What is eScience about and what are Grids
- Essential components of a Grid
- middleware
- virtual organisations
- Grids in HEP
- LHC Computing GRID
- A look outside HEP
- examples of what others are doing