Title: Distributed Analysis at the LCG
1Distributed Analysis at the LCG
- Torre Wenaus, BNL/CERN
- LCG Applications Area Manager
- http//cern.ch/lcg/peb/applications
- Caltech Grid Enabled Analysis Workshop
- June 24, 2003
2Distributed Analysis Related Activity at the LCG
- Middleware requirements and use cases arising
from distributed analysis (GAG, HEPCAL) - See Ruths talk
- Analysis modelling (Grid Technology Area)
- See Kathrin Paschens talk
- Distributed analysis application layer
(Applications Area) - ARDA RTAG
- Hopes for this meeting
3Applications Area Activity
Blue Common activity Grey Experiment specific
Products mentioned are examples not a
comprehensive list
4Distributed Analysis in the Applications Area
- Anticipated activity
- Grid interfaces to the experiments interfaces
to physicist end users, and grid-enabled services
serving higher level applications and frameworks - Integration/adaptation of physics applications
software in the grid environment - Prerequisite A mandate coming from agreement
among experiments on common work - Via an RTAG Requirements and Technical
Assessment Group - Distributed Analysis RTAG just established a week
ago - But even in the absence of a mandate, we have
started limited, focused work because we have two
people hired explicitly to work on distributed
analysis - Development of a remote launch service
- Task agreed upon a week ago, and now starting
5Remote Launch Service
- A grid service in the LCG architecture
- Remotely launch the clients and/or masters making
up a distributed parallel interactive analysis
task - Using grid middleware
- Providing immediate launch and responsiveness
- A generic service usable in different analysis
tool contexts - The service will be integrated and used in both
PROOF and Ganga - ie. integrated with ROOT/CINT and as a Ganga
Python module - What middleware can/should we use? Looking first
at Condor Computing On Demand (COD) appears
to have the specs we need - Very interesting talk by Derek Wright at
http//www.cs.wisc.edu/condor/CondorWeek2003/prese
ntations/ - Maarten Ballintijn may already have Condor/PROOF
COD working? - Looking forward to PROOF demo
6Other Distributed Analysis Tasks
- Before remote launch service was chosen as an
initial distributed analysis task, others were
proposed and considered - An indication of (some of) what is seen to be
missing for interactive analysis - Proposed tasks were
- Grid-based control/communication service used
between interactive masters/clients - Development of an OGSA(-like?) service making use
of GSI - Is no middleware project going to provide us with
this essential service? - Interface to datasets/file catalogs including
querying on tags, LFN, etc. i.e., a dataset
service - Interface to resource broker to find the best
location(s), based on the data set and
interactive availability, where to run the query - Do todays resource brokers understand
distributed interactive analysis? Will
tomorrows? - Comments on these and on how best to use 1-1.5
FTEs on distributed analysis are welcome
7RTAG on An Architectural Roadmap towards
Distributed Analysis (ARDA) 1).
- Observation
- Different LHC experiments have developed packages
(AliEn, Ganga, Dirac, Impala, Boss, Grappa,
Magda) that either sit on top, complement,
expand or parallel the functionality of the Grid
middleware (VDT, EDG) - At this time the LCG is coming to grips with the
middleware development requirements - There is an expectation that an OGSA Services
Architecture will be the basis for future
development. - The Experiments need to specify in their TDRs,
baselines, fallback and development strategies - Motivation
- To agree on requirements as laid out in a first
step by recent work within the GAG and identify
commonalities within the current projects which
might allow the LCG (both in the AA and GTA
areas) to provide a focus of effort. - To provide guidance to the LCG on future
Middleware development directions and interfacing
work to match the experiment requirements - To build on the richness of the current technical
solutions to avoid duplication of efforts - To clearly identify the roles and
responsibilities of the components/layers/
services in the experiment DA planning - To give guidance to the community on the expected
division of work between the experiments, the LCG
and the external projects.
1)Arda was the name given by the Elves to their
World and all it contained, see
www.glyphweb.com/arda/
8Mandate for the ARDA RTAG
- To review the current DA activities and to
capture their architectures in a consistent way - To confront these existing projects to the HEPCAL
II use cases and the user's potential work
environments in order to explore potential
shortcomings. - To consider the interfaces between Grid, LCG and
experiment-specific services - Review the functionality of experiment-specific
packages, state of advancement and role in the
experiment. - Identify similar functionalities in the different
packages - Identify functionalities and components that
could be integrated in the generic GRID
middleware - To confront the current projects with critical
GRID areas - To develop a roadmap specifying wherever possible
the architecture, the components and potential
sources of deliverables to guide the medium term
(2 year) work of the LCG and the DA planning in
the experiments.
9Schedule and Makeup of ARDA RTAG
- The RTAG shall provide a draft report to the SC2
by September 03. - It should contain initial guidance to the LCG and
the experiments to inform the September LHCC
manpower review, in particular on the expected
responsibilities of - The experiment projects
- The LCG (Development and interfacing work rather
than coordination work) - The external projects
- The final RTAG report is expected for October 03.
- The RTAG shall be composed of
- Two members from each experiment
- Representatives of the LCG GTA and AA
- If not included above, the RTAG shall co-opt or
invite representatives from the major Distributed
Analysis projects and non-LHC running experiments
with DA experience.
10This Meeting
- I hope this meeting can give a kick start to the
RTAG - Informed by a survey of what exists (code, use
cases) now, - What are the components/layers/services required
specifically for distributed analysis? - What software is currently existing or in the
works to cover these? - Can an architecture that is realizable in the
near term be blocked out? Can it be agreed on? - On the principle that we have to start with
realizable architectures and tools and build
upwards incrementally over time - With due consideration for the RD nature of
present work, can we work in a coherent and
complementary way? - Can we identify elements which should be pursued
as common solutions? - When we confront current middleware with our
needs, what is missing? How will the holes be
filled?