Title: Response to the PPRP
1Response to the PPRP
2Introduction
GridPP has addressed the 10 questions received
from the PPRP in the document http//www.gridpp.ac
.uk/docs/gridpp3/pprp/GridPPResponseToPPRP_FINAL.d
oc This presentation will go through those
responses. In addition, GridPP has addressed 120
comments and questions from 7 referees in the
documents http//www.gridpp.ac.uk/docs/gridpp3/pp
rp/referee_response_1_2.doc http//www.gridpp.ac.u
k/docs/gridpp3/pprp/referee_response_3_4_5_6_7.doc
The PPRP has also been presented with a number
of related documents detailing the input to the
PPRP questions from the three large LHC
Experiments and various background documents
addressing individual areas. All this information
is collected on the web page http//www.gridpp.ac
.uk/docs/gridpp3/pprp/
3PPRP Question-1
1. The Panel would like to further understand
the advantages of the proposed overarching GridPP
model for operations (as opposed to development)
as against each experiment making its own
arrangements.
A The GridPP identity enables a unified and
coordinated voice for the UK community that
raises our profile, strengthens our negotiating
power increases our influence and enables better
communication. B Cross-experiment support the
middleware stack is presently divided into
lower-level middleware that is part of the gLite
release and higher level middleware that is
provided by the experiments. The common goal is
to continue to move middleware from the
experiment specific to the generic level. Thus,
future support for the middleware stack must
follow this transition and is part of the
overarching GridPP model. C The Tier centre
structure has been set up by and through the
GridPP project. The Tier-2 MOUs between GridPP
and the Institutes, establish a uniform
responsibility and the critical relationship
between the Tier-1 and the Tier-2s is carefully
supervised through the deployment team. An
overarching project is more likely to succeed in
nurturing these structures to optimise the UK
Grid for Particle Physics.
4PPRP Question-1
1. The Panel would like to further understand
the advantages of the proposed overarching GridPP
model for operations (as opposed to development)
as against each experiment making its own
arrangements.
D The GridPP Deployment Team The deployment of
LCG releases will be better implemented by a
coordinated deployment team managed by a common
project. E Without an overarching project, there
is a risk that the UK Particle Physics Grid would
fragment into a set of experiment-specific
resource clusters which would completely
undermine the advantages that predicated the
decision to take the Grid approach that has been
the basis for investment over the last 5
years. In addition, statements have been
received (and presented in full to the PPRP)
from the three large LHC experiment which - All
strongly support the concept. - Propose no
alternative.
5PPRP Question-2
- 2. The Panel would like to explore the priorities
and potential options for descope. - If funding were only available to support, 30,
50 or 70 of the total request what would be the
priority areas for investment in terms of
obtaining the best UK science return? - What would be the political and experimental
impacts of funding at a much lower level? - How would you prioritise the work packages?
6PPRP Question-2 PREAMBLE
- 0) Computing is an integral part of the LHC
project. - GridPP3 is the continuation of a project with a
previously defined scope. This is not a new
initiative where the scope and scale are more
elastic. WE HAVE A WELL DEFINED (FIXED) TASK TO
PERFORM. - The scope of the project was described in the
PPARC call The original proposal required a
careful evaluation of the minimum requirements
consistent with meeting the PPARC call. In
particular it did not provide the UK with
additional capacity that might give a competitive
edge. - Hardware is the biggest item. As you de-scope and
reduce hardware you keep all the data and service
tasks but throw away the ability to do any
physics. - We are embedded in an international context and
have been for 5 years. Can not sensibly move
away from the LCG model of middleware, operations
and support. The levels of service expected are
agreed in the MOU signed by PPARC. - As in any international collaboration, (e.g. the
detectors) there are elements of service work
that need to be contributed in a broadly pro-rata
manner. - BOTTOM LINE Enormously difficult to de-scope a
project that is well underway with - well defined responsibilities. We basically start
to fail as we de-scope.
7Input to Scenario Planning - Resources
Changes in the LHC schedule have prompted another
round of resource planning. New global resource
requirements presented to CRRB (Oct 24th) from
which new UK resource requirements have been
derived and incorporated in the scenario
planning. Hardware prices have been re-examined
following recent Tier-1 purchase (CPU was cheaper
than expected). We have adjusted (lower) our
best empirical estimate of future prices but
have also declared a contingency on hardware
spend of 25 (up from 15) over the lifetime of
the project. Combination of the above result in
a 9 savings on the project cost.
8Input to Scenario Planning - ATLAS
The priority of the ATLAS-UK collaboration to
ensure the best science return is the hardware
and its operation. Within this, ATLAS notes
that UK Tier-2 resources contribute directly to
the UK output, whereas shortages in Tier-1
resources affect all ATLAS physicists globally.
For Tier-1 resources, ATLAS regard the 15
hardware reduction proposed in the 70 scenario
as barely manageable the 50 scenario would do
serious damage to the analysis capacity for the
large UK physics community and it would also
threaten the calibration and commissioning of the
SCT. To reduce the Tier-2 hardware, cuts would
have to be made in simulation, calibration, and
then analysis capability but even the first of
these will degrade physics output. Tier-2 cannot
be cut below the 70 scenario. ATLAS has derived
the UK fraction of the global requirements by
noting that UK authorship is 12.5 (now 13.9) of
the Global ATLAS Tier-1 authorship and that there
are 4 out of 30 (13.3) of ATLAS Tier-2s are in
the UK.
9Input to Scenario Planning - CMS
The priority of the CMS-UK collaboration is
access to Tier-2 resources in the UK and access
to Tier-1 resources preferably in the UK. CMS
argue that the 70 scenario, achieved with a 15
reduction in the requested hardware, would be at
the threshold for CMS to host a UK Tier-1. In the
50 scenario, the priority for CMS would be to
protect their Tier-2 resources which would have
to be hosted by a Tier-1 external to the UK. The
revised CMS UK hardware request is based on a
more detailed algorithm than a simple fraction of
the global requirements. The scale is set by
dual requirements of (a) a minimum size for a CMS
Tier-1 of 50 of average CMS Tier-1 (7 of
global requirements) and (b) the UK fraction of
Tier-1 authors (same bases at ATLAS) of 8. The
details are calculated from the dual requirements
to accept 4 out of CMSs 50 data-streams (8) and
the need for the Tier-1 to serve an entire AOD
dataset.
10Input to Scenario Planning - LHCb
The LHCb collaboration has a somewhat different
computing model from ATLAS and CMS with most
analysis performed at the Tier-1 and the Tier-2
used predominantly for Monte Carlo simulation.
LHCb prioritizes Tier-1 hardware and its
operation, followed by Tier-2 hardware and its
operation and finally support etc. The revised
hardware requests from UK LHCb are based on the
new global requirements, calculated from the UK
authorship fraction of 18.6 (revised upwards
from 16.6 at the time of the GridPP3
submission). The Tier-2 resource request also
includes 18.6 of the global LHCb Tier-2 resource
shortfall of 30 to give a total of about 24 of
the global Tier-2 requirements. It is noted that
any fall below the global authorship fraction of
18.6 at either the Tier-1 or Tier-2 would have
to be negotiated in a global context.
1170 Scenario
An example 70 scenario based on Experiment
Inputs and a bottom-up examination of all posts.
73.5
12What has been lost in the 70 scenario? - 15
of Hardware
- - Hardware at the Tier-1 and Tier-2 is reduced
by 15. - - Contributes to a global shortfall of Tier-1
resources for all three LHC - experiments.
- If cuts applied uniformly
- - Takes CMS to the threshold level for a UK
Tier-1. - - Takes ATLAS to the threshold for holding the
entire AOD in the UK. - - Reduces the LHCb UK Tier-1 resources below the
UK authorship fraction. - (Un-quantified cost/consequences).
- The reduction of hardware directly (and
disproportionately) impacts the ability of UK
groups to produce physics output and will be a
competitive disadvantage.
13What has been lost in the 70 scenario? - 7 of
Tier-1 Staff Effort
- Staffing effort at Tier-1 in the proposal is
barely adequate to meet MOU quality of service
and was identified as a significant risk. - Staffing effort does not scale linearly with
hardware. - Cuts achieved by removing 3-FTE ramp-up of
Tier-1 staff in the GridPP2 period (designed to
match the ramp-up of hardware) and 1-FTE during
the GridPP3 period (probably from incident
response team). - The working allowance, previously included to
address risk of failing to meet MOU service
levels, has also been removed. - Net result is a significant increase in the risk
that the Tier-1 service levels will not be met in
full.
14What has been lost in the 70 scenario? - 11 of
Tier-2 Staff Effort
- The Tier-2 staff would be reduced by 1.75 FTE out
of 14.75. - This is likely to contribute to either or both
of - (a) a reduction of Tier-2 resources
levered from the institutes - (b) a reduction in the service level
achieved at the Tier-2s. - - The working allowance, previously included
to address risk of failing to meet MOU service
levels, has also been removed. - Net result is an increase in the risk that the
Tier-2 resource and service levels will not be
met.
15What has been lost in the 70 scenario? - 31 of
Support Staff Effort
The Data Management post (1 FTE) for Replica
Optimisation is not funded. This work was judged
as a good investment to optimise the use of
limited storage resources. Removing funding for
this post removes the likelihood of much greater
savings on the purchase of storage resource in
the future. A reduction in data storage support
(0.5 SY) reduces the flexibility to support
multiple storage technology in the UK. (GridPP
does not wish to support multiple storage
technologies but recognises the likely
need). Continuing support (0.5 FTE) for the
GridPP Real Time Monitor would not be funded.
The RTM is the face of the LCG/EGEE grid, is a
highly visible and acclaimed demonstration show
piece that has repeatedly illustrated the UKs
position as a major international player in this
field. A 1-FTE reduction in the support for the
R-GMA information and monitoring system. This
major UK contribution is deeply embedded in the
EGEE/LCG stack. Any reduction in effort must be
carefully planned in conjunction with our
partners to try to minimise disruption globally.
16What has been lost in the 70 scenario? - 31 of
Support Staff Effort
- The Security Vulnerability work (0.5 FTE) would
be dropped. During GridPP2 - the UK has pro-actively taken a leading
international role developing security - vulnerability policies and procedures.
- Support for GridSite would be reduced by 0.5 FTE.
The GridSite security toolkit - developed by GridPP, is embedded in the EGEE/LCG
middleware and used as the - basis for the GridPP and other websites together
with the GridSiteWiki. - A Networking post in the GridPP3 proposal
designed to help network provision - and network monitoring would be reduced to 50.
This reduces the network - support at a time when the network will be coming
under intense stress and - production standards are required.
- The loss of over 30 of the support staff means
that UK Grid will operate less effectively (Data
Management Storage Networking) - and International roles and
responsibilities will be reduced or lost (RTM
R-GMA Vulnerabilities GridSite).
17What has been lost in the 70 scenario? 12 of
Operations 10 of Management 25 of Outreach.
Support for the UK Grid Operations Centre in
GridPP3 would be reduced from 3 to 2 FTE. The
current manpower is 5.5 funded by EGEE. This
increases the risk that the Grid Operations
Centre on which GridPP relies to provide Grid
monitoring, ticketing and accounting, would not
function effectively.
In the reduced scenarios the task of managing the
project is likely to be as least as difficult as
for the full proposal. Nevertheless, management
effort would be reduced primarily by not buying
out 25 FTE for the User Board Chair as currently
proposed. There is a risk that the User Board
would not be as pro-active at collecting or
presenting the Users requirements and concerns,
as desired. The 0.5 FTE requested for Industrial
Liaison would be dropped. This means that we are
unlikely to establish much industrial outreach.
1850 Scenario
An example 50 scenario based on Experiment
Inputs and a bottom-up examination of all posts.
19What has been lost in the 50 scenario? - 40 of
Tier-1 Hardware
40 of the Tier-1 HW will be lost. All three
LHC Experiments will need to negotiate the
consequences of providing significantly less
Tier-1 resources than their UK Author
fraction. (Un-quantified cost). The UK could no
longer host a CMS Tier-1 centre and special
arrangements would need to be made to provide UK
CMS Tier-2s, access to resources and support at a
non-UK Tier-1. (Un-quantified cost). For ATLAS
and LHCb, this level of Tier-1 resource would do
serious damage to the analysis capacity for the
large UK physics communities and for ATLAS it
would also threaten the calibration and
commissioning of the SCT.
20What has been lost in the 50 scenario? - 30 of
Tier-2 Hardware
30 of the Tier-2 HW will be lost. The physics
output for all three experiments would be
reduced. Competitive advantage would be
completely lost. ATLAS would apply reductions
to simulation, calibration, and then analysis
capability but even the first of these will
degrade physics output. LHCb would reduce
Monte Carlo simulation, similarly compromising
physics output. As CMS sole UK resource, the
reduction would directly scale the CMS physics
output.
21What has been lost in the 50 scenario? 22 of
Tier-1 Staff 23 of Tier-2 Staff
Tier-1 staff would be further reduced from 17 to
14 FTE. Comparing this with the current level of
13.5 FTE it is quite apparent that the Tier-1
(which would have much more hardware by that
point) could not reach the level of
service defined in the MOU signed by PPARC.
There would need to be international
negotiations as to whether the Tier-1 could
function as such, for either of the remaining
two experiments. Tier-2 staff would be further
reduced from 13 to 11 FTE. This is likely to
contribute to either or both of (a) a reduction
of Tier-2 resources levered from the institutes
(b) a reduction in the service level achieved at
the Tier-2s.
22What has been lost in the 50 scenario? - 66
of the Support Staff lost.
The support post for generic metadata issues
would be lost and all support would have to be
via the experiments. Support for grid storage
technologies would be reduced from 7 SY to 2SY
over the project. This would (probably) be
limited to Castor support at CCLRC. Institutes
would need to look elsewhere for support on the
technologies likely to be deployed therein.
The portal work would be stopped leaving the
smaller or future experiments with a higher
hurdle to getting on the Grid. The testing and
performance monitoring work associated with the
Work Load Management system would stop. This is
an area where there is strong European pressure
to continue and is of potentially direct benefit
to UK physics by providing knowledge about the
current condition of the Grid on a site-by-site
basis.
23What has been lost in the 50 scenario? - 66
of the Support Staff lost.
Support for information and monitoring systems
would be reduced to 1FTE. (R-GMA could not be
supported and negotiations with our international
partners would have to determine how best to use
this post to help the transition to whatever new
system evolved internationally). Security
support would be reduced to 1 FTE. This would be
split as deemed appropriate at the time between
VOMS support and Operational Security. The
support for GridSite (an international
obligation) would be dropped. Again, this would
have to involve discussion with international
partners since the LCG/EGEE middleware stack
would be at risk. The networking support post
for monitoring and provision would be lost. This
would be in a regime where the need for Network
support has become more critical with at least
one of the major experiments attempting to use a
non-UK Tier-1.
24What has been lost in the 50 scenario? - 30
of the Operations Staff 25 of Management all
dissemination (except GridPP2 period).
Support for the Grid Operations Centre would be
further reduced from 2 to 1.5 FTE, further
increasing the risk that the GOC on which GridPP
relies to provide Grid monitoring, ticketing and
accounting, would not function effectively. One
of the four Tier-2 coordinators would be lost.
This increases the risk of failure of part of
the Tier-2 organisation reduces the deployment
team and increase the likelihood that delays to
upgrades at some sites will reduce the available
resources with a direct impact on physics
output. Management would be further reduced
(this would have to be optimised). There is a
risk that the management becomes less engaged and
therefore less effective. All dissemination and
outreach activities would be stopped after the
GridPP2 phase is complete.
2530 Scenario
GridPP has examined the original PPARC call and
has determined that it is unable to form a
proposal that meets any of the criteria listed
with funding at the 30 level 2. a) Underpin
the particle physics programme by delivering the
functional Tier 1 centre for the LHC experiments
and for the other experiments where UK groups
will require computing GRID access and
facilities. The 50 scenario presented above
already fails to meet this criterion because the
Tier-1 would be sub-threshold for at least one of
the LHC experiments. At the 30 funding level
there could only be a Tier-1 for (probably) one
LHC experiment. Most likely, in a 30 scenario
there would be no Tier-1 and the resources would
be used as a Tier-2 (though it is not clear what
to do about LHCb). Etc.. (see document)
26Question-2 Summary
GridPP has taken input from the 3 large LHC
experiments as guidance in an attempt to design a
GridPP3 project in 70 and 50 funding scenarios.
The outcome is a 74 funding scenario that
preserves 85 of the hardware (the threshold for
a UK CMS Tier-1) but is likely to result in a
failure to meet service levels inadequate
support across the UK in many areas and the
elimination of much of the UK obligation to the
international effort that directly and indirectly
benefits UK physicists. A 55 scenario is
provided that doesnt work It does not respect
the criteria of the call and there are large
political and financial unknowns associated with
delivering less than a pro-rata share of LHC
hardware so the real cost cannot be provided. The
UK Grid would not function at the required level
of service and support for UK users would be
completely inadequate. We do not regard the
fine details of these scenarios as fixed but they
are offered as examples of our approach to, and
the consequences of, funding below 90 of the
original proposal.
27Risks
- GridPP believes that the risks introduced by the
74 scenario are very large and - urges the PPRP to consider an outcome closer to
90. - In the 90 scenario, all the Risks defined in the
GridPP3 proposal still apply except that there is
an increased risk that hardware is more costly
than planned. - In the 74 scenario, there is an additional risk
that the level of hardware provision for all 3
experiments will compromise the physics output.
For LHCb there are unknown consequences at
providing hardware below the authorship fraction
level. - In the 74 scenario, service levels signed up to
by PPARC at the Tier-1 and Tier-2 are severely at
risk. - In the 74 scenario, support for middleware in
the UK will be inadequate. There is a risk that
this will seriously undermine physics output. - In the 74 scenario, most middleware
contributions by the UK to the international
effort will be dropped. This puts the whole Grid
at risk and damages the UK reputation and
influence.
28PPRP Question-3
3. The UK would like to play a key role in this
important project but the current financial
constraints necessitate focusing on the crucial
areas and what needs to be done. The Panel would
like to identify these areas, giving
consideration to the current LHC timescale, and
to understand the implications of delaying parts
of the project, especially with regard to
hardware (e.g. same CPU performance with fewer,
fast processors).
Identifying crucial areas is covered the Scenario
Planning presented in response to Question-2 and
by each of the responses to GridPP from the three
large LHC experiments. The new LHC timescale has
been included in the new resource requirements
prepared by the LHC experiments and presented to
the CRRB on October 24th 2006. These new global
requirements have been used to derive new UK
requirements, as described in the response to
Question-2 and in the experiment documents. The
resource requirements are effectively shifted
which, combined with reduced hardware cost
estimates used by GridPP, have resulted in about
a 9 saving on the project cost. This is embedded
in the 70 and 50 scenario plans.
29PPRP Question-4
- PART-1
- The Panel wishes to understand better the
apparent disparity between the estimated Tier-1
needs of CMS and ATLAS. It seems that ATLAS
requires roughly twice the CPU and disk resource,
but less tape than CMS. Given the similar
computing models between the two experiments,
relatively small differences in the parameters
chosen seem to have significant implications on
the assessment of need and hence cost. - PART-2
- How has GridPP interacted with the experiments to
ensure that the most cost effective solution has
been arrived at? - PART-3
- The Panel wishes to understand the levels of
requests for tier-1 facilities by the different
experiments relative to the UK contribution to
the each experiment.
Part-2
GridPP relies on the careful scrutiny and
rigorous peer review of the computing models and
global resource levels by the LHCC and the CRRB
to ensure that the most cost effective solution
has been achieved.
30PPRP Question-4
PART-3
ATLAS has derived the UK fraction of the global
Tier-1 requirements by noting that UK authorship
is 12.5 (now 13.9) of the Global ATLAS Tier-1
authorship. CMS has derived their UK Tier-1
hardware request based on a more detailed
algorithm than a simple fraction of the global
requirements. The scale is set by dual
requirements of (a) a minimum size for a CMS
Tier-1 of 50 of average CMS Tier-1 (7 of
global requirements) and (b) the UK fraction of
Tier-1 authors (same bases at ATLAS) of 8. The
details are calculated from the dual requirements
to accept 4 out of CMSs 50 data-streams (8) and
the need for the Tier-1 to serve an entire AOD
dataset. This latter requirement results in a
slightly large fractional requirement at the
Tier-1 in early years which then reduces to 8
in the steady state. LHCb has derived the UK
fraction of Tier-1 resources the UK authorship
fraction of 18.6 (revised from16.6 at the time
of the GridPP3 submission).
31PPRP Question-4
PART-1
Latest round of resource review has led to
convergence of the models. In particular, CMS has
increased trigger rate (now similar to ATLAS)
during early years to acquire more calibration
and standard-model physics data. Event sizes,
data rates, processing times, and replication
strategy have evolved to become significantly
closer. Remaining difference is the strategy for
data storage and replication ATLAS --- 2 copies
of the ESD data distributed over all Tier-1
centres plus a cumulative AOD sample spanning
multiple years, all on disk. CMS --- 1 copy of
RECO (ESD) is stored over all Tier-1 centres, in
addition to CERN, and only a single years AOD is
stored on disk at Tier-1s (previous years are
accessible from tape). This leads to a smaller
Tier-1 disk requirements from CMS, but higher
requirements on tape infrastructure, bandwidth
and storage. These are different optimisations
that will probably converge as experience is
gained.
32PPRP Question-5
5. The Panel would like the applicants to justify
the rationale behind the proposed regional Tier-2
structure in GridPP3 and to set out the pros and
cons of other possible structures, for example,
experiment based or rationalised structure with
fewer Tier-2 sites, or fewer institutes. The
Panel would like the applicants to consider
possible cost savings and improvements in
efficiency and service delivery that different
structures might produce.
Need to discuss The Past, The Present, and The
Future. The underlying message is that, the
proposed system is the logical development of
the current structure which works well and, in
turn, was developed for good reasons. We see much
much bigger risks to performance in breaking the
current structure than in keeping it.
33PPRP Question-5
History of the Tier-2 Structure The current
Tier-2s were formed naturally in response to
local and regional funding opportunities and
other geo-political considerations. Many assumed
(used as leverage) a continuing relationship
with the Particle Physics community. It is
natural that all Particle Physics groups wished
to be associated to a T2, but this was not a
GridPP requirement. However, clearly it was
uniformly perceived as beneficial for the local
physicists and the institute. In GridPP1 there
was no PPARC funding for Tier-2s and in GridPP2
there was PPARC funding for some manpower at
Tier-2s (plus some specialised servers) but not
for the bulk of the computing resources.
Nevertheless large amounts of resources were
made available. GridPP has interacted with
four Tier-2 centres through their management
boards. The overhead of having more than one
site within the Tier-2 is, to first order, an
internal choice (the JeS submission requirement
for the GridPP3 proposal broke this model).
34PPRP Question-5
- Current Status of Tier-2 Structure
- There are currently 17 Institutes organised into
4 Distributed Tier-2s. Of the 17 - Institutes, 4 have no GridPP manpower, 8 have
less than one FTE and 5 have one or - more FTEs of GridPP manpower. The total of 9 FTE
funded by GridPP for hardware - support (plus 5.5 FTE specialist posts) is
clearly is a very cost effective situation - given the 3703 KSI2K of CPU and 263 TB of disk
available (06Q1 numbers). For - comparison, the Tier-1 had 13.5 GridPP-funded FTE
and made available 830 KSI2K - and 180 TB in the same period.
- Performance measures are being developed (within
GridPP and wLCG). The UK is - probably ahead of the game here. There are more
details in the written response - but the UK Tier-2 performance is
- good relative to other counties
- improving even though the hurdles are getting
higher - on track to meet the MOU requirements.
35PPRP Question-5
- Future of Tier-2 Structure
- GridPP proposes to continue to develop 4 Regional
Tier-2 centres. - GridPP would like to remain neutral on the number
of sites and institutions within - each Tier-2, and simply offer a packaged of
hardware money and effort to each - Tier-2 in return for the delivery of a specified
quantity of resource and a specified - service level. We believe this approach
- Allows a market-driven optimisation of resources
according to constraints which are outside the
control and knowledge of GridPP (e.g. Other
sources of funding Institutional priorities and
strategies prior commitments and aspirations.) - Builds upon a system that is both viewed and
measured as successful. - Is in the best interests of Physicists at all
Institutes allowing some small measure of local
control whilst enabling Grid access to vast
resources and providing on-site expertise in as
many places as possible.
36PPRP Question-5
- Future of Tier-2 Structure
- Alternate structures have been considered
- Fewer Tier-2s foresee no advantage in having
the same number of institutes associated with
fewer Tier-2s. Clear disadvantages. - Fewer Institutes Hardware and manpower costs
remain the same running and infrastructure costs
likely to become more visible. Some gains in the
efficiency of staff effort by concentration of
resources (though this means less levered effort,
not less GridPP effort service level may be
easier to achieve). May alienate some
institutes will result in less leverage of
resources will leave some institutes without
local expertise. Conclude It will cost more
deliver less resources service level might be
better but physicists less supported. Not the
optimisation we chose. - Experiment-based Tier-2s runs against the grain
and would leave the UK at odds with the rest of
the wLCG not a sensible Grid structure and would
limit peak resources available to individual
Experiments. Would most likely lead to a
divergence from standards and a fragmented UK
Grid.
37PPRP Question-6
6. The Panel would like to explore the impact to
the UK of leadership roles within LCG. What are
the benefits and costs to the UK of this,
particularly with regard to middleware?
The Big Picture Roles (eg Leadership) and
duties (eg Middleware support) for the LCG
project must be shared between the members. This
allows the common project to benefit from all the
available skills and expertise it provides a
contribution in kind that should broadly reflect
the size of the contributing group it
demonstrates the engagement of all partners and
in return, it enables strategic influence and
other tangible benefits. Performing duties, gives
us the credibility to take on leadership roles.
Appendix-D of the proposal listed 86 external
roles of members of GridPP within related
projects, 17 of which are specifically LCG
related, 22 are within EGEE, and a further 8
associated with computing within the LHC
Experiment collaborations.
38PPRP Question-6
Specific Examples a) David Kelsey Coordinator
of LCG Grid Security, Chair of Joint
(LCG/EGEE/OSG) Security Policy Group and Deputy
Director of EGEE Security. b) Jeremy Coles
Secretary of LCG Grid Deployment Board. c) John
Gordon UK Representative on LCG Management
Board and a Deputy Chair. d) Neil Geddes UK
member of the LCG Oversight Board (OB) and LCG
Collaboration Board Chair. e) EGEE Project
Executive Board Frank Harris Dave Kelsey, and
previously Pete Clarke. Project Management Board
Chair Robin Middleton (to summer 06). Project
Collaboration Board Dave Colling John Gordon
Jeff Tseng Tony Doyle and Roger Barlow. EGEE
JRA1 (Middleware re-engineering) Cluster Leader
(UK) Steve Fisher.
39PPRP Question-6
Related Examples i) Nick Brook (formerly
GridPP UB Chair and PMB member) is the LHCb
computing coordinator. ii) Roger Jones
(currently GridPP Applications Coordinator and
PMB member) is the chair of the ATLAS
International Computing Board. iii) Dave Newbold
(formerly GridPP UB chair and PMB member) is the
chair of the CMS Computing Committee. Conclude
That as a consequence of investment and hard work
over the last five years, the current overall
influence of the UK in the LHC Experiments is
very high. This ultimately benefits UK physicists
and has been a good investment.
40PPRP Question-7
7. Before making a recommendation to the office
about the extension to GridPP2 the Panel would
like more information about each of the posts and
to know whether they are core activities. What
are the implications of not funding these posts
and what evidence is there that a delay in
resolving this will lead to a loss of staff who
might be expected to continue into GridPP3?
Detailed information on the areas covered by the
GridPP extension was provided in the GridPP3
proposal. Specific information on each individual
post was provided on the Institutional JeS forms
submitted to PPARC. All these posts are
considered core to the current programme during
the 7-month period of GridPP2 when it will be
necessary in the build-up of the Production Grid
prior to LHC data-taking. It should be noted that
funding for the applications posts was not
requested but that many of these have not been
funded on the RG leaving a serious shortage of
effort. If not funded We will lose our entire
pool of highly skilled staff the UK will not be
ready for LHC data much of the current work will
be abandoned in and large amounts of resources
will have been wasted. Evidence 25 turnover of
staff since proposal submission, c.f. 10 p.a.
previously.
41PPRP Question-8
8. The Panel would like to see a full
justification for each of the posts requested in
GridPP3 and to see the cost to PPARC (including
estates and indirect costs) of each post.
A separate document has been provided for PPARC
staff including full details extracted from the
Institutional JeS submissions. This incorporates
a compilation of the Institute submissions
organised by work package, giving the
justification and costs for each post that should
be read in conjunction with the proposal and
relevant appendices.
42PPRP Question-9
9. The Panel would like to explore the issues of
quality assurance in both Tier-1 and Tier-2
activities. How will the applicants ensure that
GridPP3 provides an adequate and cost-effective
service to its users?
The service levels at the Tier-1 and Tier-2 are
defined by the International Memorandum of
Understanding. The Tier-1/A Management Board,
including PPARC representation, advises all
stakeholders on whether the Tier-1/A Service at
RAL is delivering its objectives on time and
making appropriate use of its available
resources. The main instrument for assuring
quality and levels of service at the Tier-2s will
be a new Memorandum of Understanding between
GridPP and the institutes as described in the
Tier-2 Appendix to the GridPP3 Proposal. This
would set out the required levels of services in
order for the UK to meet its WLCG MoU commitments
and provide the necessary service to UK
physicists. (continued)
43PPRP Question-9
Quality Assurance is performed by monitoring the
performance of the Tier-1 and Tier-2 compared to
MOU commitments, and the performance compared to
international partners. As previously described,
monitoring is already advanced and being
developed further. We currently monitor - CPU
and storage usage - Site functional test -
Configuration tests - Ticket response times -
Upgrade timescales - Schedule downtime - VO
support - Transfer tests.
44PPRP Question-10
- 10. The Panel would like information on where the
Tier-1 centre will be housed at RAL - Is any construction or refurbishment of an
appropriate building on the critical path - for the GridPP project?
- - Will the centre have sufficient space available
to meet GridPP's requirements? - - What are the risks associated with this?
- - How will this be funded?
Atlas Centre at RAL has sufficient capacity to
house the full GridPP3 requirements for 2008 LHC
running as given in the proposal. CCLRC has
approved construction of a new computer building
at RAL budgeted at approx 17M and will be funded
by the CCLRC Capital Investment Plan. Completion
is due in summer of 2008 in time for the autumn
delivery which will meet the 2009 data-taking
requirements. This has sufficient space for
capacity to grow to 2012 when the number of racks
is expected to have reached a steady state.
45PPRP Question-10
- The main risks are
- Late completion. There is some slack in the
schedules to meet the data taking requirements
for April 2009 which mitigates this risk. - b) Power and cooling required to deliver the
required resources may exceed the estimates. This
is mitigated by inclusion of chilled water mains
in the new building to allow direct water cooling
of the hottest racks if power densities exceed
current estimates. - c) Electricity charges for power and cooling
which are currently met by CCLRC overheads
charges. It is possible that at some future time
these may be attributed directly to GridPP. This
is explicitly listed as a potential call on
contingency in the GridPP3 proposal.
46SUMMARY
- GridPP and the experiments have described the
advantages of the proposed overarching GridPP
model for operations. - The potential options for descoping the GridPP
project are extremely limited, but we have
provided input on the 3 scenarios. - The crucial GridPP areas have been described,
taking the LHC experiment requirements fully into
account. - The ATLAS and CMS planned trigger rates have
converged and the computing models are similar.
Residual differences have been identified. - The proposed Tier-2 structure and the pros and
cons of other possible structures have been
explored.
47SUMMARY
- The impact to the UK of leadership and benefit of
middleware roles within LCG and EGEE has been
provided. - GridPP has collated the required financial
information about each of the posts in the
GridPP2 and GridPP3 period, according to WP. - GridPP has collated the required post
descriptions for each of the posts in the
GridPP2 and GridPP3 period, according to work
package. - The mechanisms that have ensured quality
assurance at the Tier-1 and Tier-2 have been
described and the associated costs recognised in
order to deliver a performant Grid to end-users. - GridPP has indicated that the computer building
at RAL will be funded via CCLRC, is no longer on
the critical path, and will provide sufficient
space. Residual risks have been identified.