Title:
1Grey areas of the new architecture
- Massimo Sgaravatto
- INFN Padova
2Issues
- Many topics reported in D1.4 were not deeply
discussed - Some were NEVER discussed
- Not sure if there is a general consensus on what
has been written (Hope so) - In any case D1.4 too vague
- Ok for a high level architecture document such
as D1.4 - Not enough in my opinion to describe in details
how the whole system will work and how the whole
stuff must be reorganized/implemented - Not all components are in the picture (e.g. the
Grid Accounting components)
3(No Transcript)
4Examples of areas that must be clarified
- Reservation and co-allocation
- How a reservation/co-allocation is used by a job
- Where and how a status of a reservation/co-allocat
ion is kept ? LB ? - Interfaces with GARA
- Interfaces with LB
- Which components push events to LB ?
- Which events are pushed to LB ?
- Collection jobs (e.g. jobs belonging to a same
DAG) - LB API needed for job checkpointing
- Which are the events that the Workload Manager
can be notified by the Log Monitor, and what is
the expected actions ? - A job is submitted to CondorG when a suitable
resource has been found, or is it immediately
inserted into CondorG queue on hold, and then
released when a suitable resource is found ? -
-
5What is needed (in my opinion)
- Necessary to define much more clearly and in much
more details the whole architecture - Needed to define, considering the various use
cases (the various commands and the various
events which could occur) the exact
functionalities provided by these components and
the interfaces between these components - Necessary to define clear responsibilities for
the various components - This must be done NOW if we want to rely on the
new architecture by release 2.0
6Responsabilities
- User Interface Datamat
- Network Server Catania (recycle some existing
code of RB ?) - Protocol Catania (recycle some existing code of
RB ?) - Workload Manager CNAF (recycle some existing
code of RB ?) - Reservation Agent CNAF
- Co-Allocation Agent CNAF
- Resource Broker (MatchMaker) Catania
- Partitioner Padova
- Helper Francesco G.
- Job Adapter CNAF(recycle some existing code of
jobwrapper) - JSS object (Padova)
- Log Monitor Padova (evolution of JSSparser)
- Logging Bookkeeping CESNET
- Integration with DAGMan CNAF
- Grid Accounting components Torino
- Interactive jobs support integration
7Proposed schedule
- Today define responsibilities for the various
modules - Today define which functionalities can be
realistically be in place (and tested) for
release 2.0 (8 working weeks till the end of
September) - Planned new functionalities (release 1.4 and
2.0) - Support for interactive jobs
- Support for job dependencies
- Integration with WP2 query optimization service
- Java API (if needed by applications)
- GUI
- Advance reservation API
- Deployment of Accounting infrastructure over
Testbed (HLRs with command line interface) - Support for logical trivial job check-pointing
- Support for job partitioning
- Full integration of cost estimation/accounting
into scheduling policies - Integration of advance reservation/co-allocation
in to Resource Broker - RB relying on the new IS Glue Schema
- Today and next days identify which other
components are missing in the picture and plug
them in the picture (only Grid Accounting stuff ?)
8Proposed schedule
- (Chat) meetings to discuss in more details the
functionalities of the various components and the
interfaces between them - Start considering existing functionalities and
then considering, one by one, the new
functionalities that will be in place for release
2.0 - Starting this Wednesday (real meeting between
few partners) - Date ?? New CVS in place
- Date ?? Start implementation relying on the new
CVS - September 2-5 EDG Workshop in Budapest
- September 9 start hands-on meeting
- September 30 release 2.0
9Mail from Bob Jones
-
- Reflecting on what we discussed and taking into
account to the opinions ofseveral of you, I
think we should be more realistic and assume
there willonly be at most one more EDG release
after 1.2 that is deployed on theproduction
testbed in 2002. The SC2002 et al. demos for
November should beprepared based on release
1.2Obviously the development and certification
testbeds will be more advanced.For the EU review
at the start of 2003, I think we could imagine
providingdemos of what is currently possible on
the production testbed (i.e. reusethe SC2002 et
al. demos) and also show them the latest features
of thedevelopment or certification testbeds.
10Mail from Bob Jones
- Mware sw scheduling infoPlease
look at the software release plan
(http//edms.cern.ch/document/333297) and, for
each item for your WP listed in release 1.2, 1.3,
1.4 2.0 tell meDelivery dateWhen you
expect it to be deliveredNote1 If it is
already included in release 1.2 then just say
"1.2"Note 2 "delivered" means documented and
tested (REALLY!)Effort RequiredState how much
effort is required to make the delivery
(remember documented tested). Please specify
in (wo)man weeks.Identify who will perform the
work (i.e. specify the names and how many weeks
of work they do each)Note 1 please check with
the people concerned that your information is
correct and that they can schedule the estimated
time (i.e. they are not over committed with other
tasks, on holiday for that period
etc.)DependenciesList other sw not already
included in release 1.2 that it depends on (both
in your WP and any other)GLUE schema please be
sure to include details of the work on the
information providers/consumers (including their
current status).In general I prefer you to be
pessimistic rather than optimistic about your
dates
11Software release plan
Item Expected Release date Involved people Estimated effort Required Dependencies
12WP1 Software release plan
Item Expected Release date Involved teams Estimated effort Required Dependencies
C API 1.3 Datamat
Support for MPICH jobs 1.3 Padova
Improving error reporting 1.3 Datamat, Catania
Support for interactive jobs 1.4 Milano
Job dependencies 1.4 CNAF Condor team?
Integration with WP2 Query Optim. Service 1.4 Catania WP2 Query Opt. Service
13WP1 Software release plan
Item Expected Release date Involved teams Estimated effort Required Dependencies
Java API (if needed) 1.4 Datamat
GUI 1.4 Datamat
 Deployment of Accounting infrast. over Testbed (HLRs with command line interface) 1.4 Torino WP4?
Advance reservation API 1.4 CNAF
14WP1 Software release plan
Item Expected Release date Involved teams Estimated effort Required Dependencies
RB relying on the Glue schema 1.4 Catania Schema and DIT defined WP4 (inf. pr.)
Job checkpointing 2.0 Pd, Ces. LB
Job partitioning 2.0 Padova Job checkp., job depend.
Full integration of cost estimation/accounting into scheduling policies 2.0 Catania, Torino
Integration of advance res./co-all. in to RB 2.0 Catania, CNAF
15My personal ideas
- Deliver new 1.2 RPMs as requested
- JSS problems fixes for outstanding issues with
autotools (if any) - No new 1.3 RPMs
- To avoid to be asked to support 1.3 (as it
happened with 1.2) and therefore not being able
to implement the new stuff - Deliver 2.0 RPMs (but with less functionalities
as original planned)
16WP1 Sw rel. plan (my prop.)
Item Expected Release date Involved teams Estimated effort Required Dependencies
C API 1.3? 2.0 SM, MP (CT) Datamat (FP, AM), CESNet (AK), Pd (RP) 3 person week
Support for MPICH jobs 1.3? 2.0 Padova (AG) ½ person week
Improving error reporting and communication from UI 1.3? 2.0 Datamat (FP, AM), Catania (SM, MP) 2 person week
Support for interactive jobs 1.4? 2.0 Mi (MM), CNAF (ER) Datamat (FP, AM) 3 person week
Job dependencies 1.4? 2.0 CNAF (FG, ER), Cesnet (all), Datamat (FP, AM) 16 person week
Integration with WP2 Query Optim. Service 1.4? 2.0 Catania (SM, MP) 1 person week WP2 Query Opt. Service
17WP1 Sw rel. plan (my prop.)
Item Expected Release date Involved teams Estimated effort Required Dependencies
Java API GUI 1.4? 2.0 Datamat (GA) 6 person week
Deployment of Accounting infrast. over Testbed (HLRs with command line interface) 1.4?2.0 Torino (AG, SB) 8 person week WP4
Advance reservation API 1.4?2.0 CNAF (FG, ER, SF) 2 person week
18WP1 Sw rel. plan (my prop.)
Item Expected Release date Involved teams Estimated effort Required Dependencies
RB relying on the Glue schema 1.4?2.0 Catania (SM, MP) 2 person week Schema and DIT defined WP4 (inf. pr.)
Job checkpointing 2.0 Pd (AG, RP), Ces. (MM) 6 person week LB
Job partitioning 2.0?after 2.0 Padova (AG, RP) 4 person week Job checkp., job depend.
Full integration of price estimation/accounting into scheduling policies 2.0?after 2.0 Catania (SM, MP), Torino (SB, AG) 8 person week
Integration of advance res./co-all. in to RB 2.0? after 2.0 Catania (SM, MP), CNAF (ER, SF, FG) 12 pers. week WP4, WP5, WP7