Title: The Grid Meeting the LHC computing challenge
1The GridMeeting the LHCcomputing challenge
- Gavin McCance
- University of Glasgow
- RSE 6th February 2002
2Outline
- Scale of the LHC computing challenge
- Grid Middleware
- Data Replication
- Experimental testbed
3LHC computing challenge
- Typical experiment
- 2 MB per event
- 2.7x109 event sample ? 5.4 PB/year
- Up to 9 PB/year Monte Carlo samples
- Very large storage and computational requirements
- CERN can handle max of 1/3 of this!
4computing challenge
- Distribute data store and compute resources
- Take advantage of existing local clusters and
local infrastructure - Easier to get funding for local clusters,
particularly cross-experiment or
cross-disciplinary compute resources
5Tiered model
Tier 0
CERN Computer
HPSS
Basic reconstructed data
Tier 1
French Regional Centre
Italian Regional Centre
US Regional Centre
RAL Regional Centre
HPSS
HPSS
HPSS
HPSS
Tier 2
Tier2 Centre
Tier2 Centre
Tier2 Centre
Tier2 Centre
Higher level analysis data and Monte Carlo
Tier 3
Institute
Institute
Institute
Institute
Tag data and Monte Carlo
6UK Grid
GridPP collaboration
Basic reconstructed data
RAL Regional Centre
Tier 1
HPSS
Tier 2
Tier2 Centre
Tier2 Centre
Tier2 Centre
Tier2 Centre
Higher level analysis data and Monte Carlo
Tier 3
Institute
Institute
Institute
Institute
Tag data and Monte Carlo
7GridPP
8GridPP
- 17M three year project
- Working in collaboration with theEU DataGrid
project - Middleware production
- Integration of middleware technologies into HEP
experiments - Validation of Grid software
9Middleware
Application programs gridopen() call
Layered APIs. Transparent security. Transparent
data access. Intelligent use of distributed
resources.
Grid middleware
Data access specifics HPSS, Castor Job
submission specifics PBS, LSF Specific security
procedures
10Middleware Activities
- GridPP mirrors EU DataGrid
- Workload Management
- What jobs go where?
- Data Management ()
- Wheres the (best) data?
- Information Services
- Whats the state of everything?
11Middleware Activities
- Fabric and Mass Storage Management
- Interfaces to underlying systems
- Network Monitoring
- Whats the bandwidth from here to there?
- Security
- Crops up everywhere transparent to applications
12Data Management
- Data Replication
- Meta Data Catalogues
- Replica Optimisation
- Which replica should I use?
13Data Replication
- Problems if data exist only in one place
- No one site can afford to store all data!
- Multiple accesses to the same data overload
network! Petabytes! - What if site / network is down?
- Make Replica! But need to keep track of all the
files and their various replica! - Need replica catalogue!
14Replica Catalogues
- Distributed catalogue in database
- Have a globally unique Logical File Name (LFN)
mapping to multiple physical instances of the
file (PFNs). - Database must be globally accessible and secure
- Key is to leverage industry standard technologies
File-1
Paris
LFN
File-1
Glasgow
File-1
Chicago
File-1
15Metadata Catalogue
- Allows a client to access securely any remote SQL
database on the Grid over HTTP(S)
Oracle MySQL PostgreSQL
PKI Security
Standard communication protocols (XML/SOAP over
HTTPS)
SQL Metadata Service
16Distribution
- Dont want a single point of failure or
bottleneck - Must distribute SQL database
- Designing scalable architectures
- e.g. a RC may exist on each storage site ?
responsible for its own files
Queries will propagate down until replica
information is found
17Choosing the best
- What does the best replica mean?
- Nearest? Fastest? Real cost?
- For multiple files, the best run location is
some minimisation - Network cost network monitoring
- Monetary cost EU-US link
- A reasonable decision must be made on the basis
of limited information!
18Economic models
- Data files viewed as commodities to be bought
and sold by storage sites - The buyer is a job requesting a file
- The (virtual) cost is
- Reverse auction, buy from cheapest
19Economic replication
- If a storage site believes it can make money on
a popular file (based on its observation of
access patterns) it can buy it from another site
(replication) - Selfish local optimisation should lead to a
reasonable global optimisation for file
distribution - Inherently distributed optimisation.. No
distributed planning overhead!
B
1
File 1
1
A
20Will it work???
Developing simulation tool
simulated Grid provides testing arena for these
ideas!
21Testbed Software
- UK HEP is providing testbed
- EU experiments.. CERN LHC
- US experiments.. Fermilab / SLAC
First EU DataGrid software release!
Currently being tested..
22Experiment integration
- Taking the kit and trying to integrate it into
the experiments software frameworks
ATLAS/LHCb software framework (GAUDI)
Make Grid Services transparently available to
ATLAS and LHCb programs
GANGA framework
Grid middleware
23UK and EU Testbed
- Some successful tests so far
- e.g. large file transfers UK, Italy, US and CERN
- Increasing Monte Carlo challenges planned
Currently UK testbed
24finally
- Basic Grid software has been delivered
- More developments to come
- Integration with experiments and testing
- Already successful tests
- A excellent base to build on!
- Plenty still to do!