Title: Middleware emerging onto the NGS: Resource Broker
1Middleware emerging onto the NGS Resource Broker
Mike Minetermjm_at_nesc.ac.uk
2Outline
- NGS middleware Toolkits inviting development of
higher level services - By projects e.g. RealityGrid and BRIDGES
- For deployment as NGS services
- What is a Resource Broker?
- Where does it come from?
- LCG-2 ( EGEE-0)
- Providing production service for LCG-2
- Being configured for the NGS
- Current LCG-2 activity
3Resource broker
- On the current NGS we have
- GRAM to submit jobs
- Information service to tell us what queues are
busy - The RB takes the work out of deciding where to
run a job - First step the LCG-2 RB is being added to the
NGS - (LCG Large Hadron Collider Compute grid)
4Current production mware LCG-2
5Major components
Replica Catalogue
User interface
Information Service
Resource Broker
Author. Authen.
Input sandbox Broker Info
Output sandbox
Logging Book-keeping
Computing Element
Job Status
6Replica Location Server
RB node
Network Server
Workload Manager
Inform. Service
Job Contr.
Characts. status
Computing Element
Storage Element
7Job Status
RB node
submitted
Replica Location Server
Network Server
Workload Manager
Inform. Service
UI allows users to access the
functionalities of the WMS (via command line,
GUI, C and Java APIs)
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
8- edg-job-submit myjob.jdl
- Myjob.jdl
- JobType Normal
- Executable "(CMS)/exe/sum.exe"
- InputSandbox "/home/user/WP1testC","/home/file
, "/home/user/DATA/" - OutputSandbox sim.err, test.out,
sim.log" - Requirements other. GlueHostOperatingSystemNam
e linux" - other. GlueHostOperatingSystemRelease "Red Hat
7.3 other.GlueCEPolicyMaxCPUTime gt 10000 - Rank other.GlueCEStateFreeCPUs
Job Status
RB node
submitted
Replica Location Server
Network Server
Workload Manager
Inform. Service
Job Contr. - CondorG
CE characts status
SE characts status
Job Description Language (JDL) to specify job
characteristics and requirements
Computing Element
Storage Element
9NS network daemon responsible for
accepting incoming requests
RB node
Job Status
Replica Location Server
Network Server
Job
Input Sandbox files
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
10Job submission
RB node
Job Status
Replica Location Server
Network Server
Job
Workload manager
Inform. Service
RB storage
WM acts to satisfy the request
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
11Job submission
Job Status
RB node
Replica Location Server
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
Where must this job be executed ?
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
12Job submission
RB node
Job Status
Matchmaker responsible to find the best CE
for a job
Replica Location Server
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
13Job submission
Where are (which SEs) the needed data ?
RB node
Job Status
Replica Location Server
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
What is the status of the Grid ?
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
14Job submission
RB node
Job Status
Replica Location Server
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
CE choice
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
15Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Adapter
Job Contr. - CondorG
Job Adapter responsible for the final touches
to the job before performing submission (e.g.
creation of wrapper script, PFN, etc.)
CE characts status
SE characts status
Computing Element
Storage Element
16Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job
Job Contr.
CE characts status
Job Controller responsible for the actual job
management operations (done via CondorG)
SE characts status
Computing Element
Storage Element
17Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
SE characts status
Job
Computing Element
Storage Element
18Compute element reminder!
Job request
I.S.
Logging
Logging
Info system
Globus gatekeeper
gridmapfile
Grid gate node
Local resource management systemCondor / PBS /
LSF master
Homogeneous set of worker nodes
19Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Input Sandbox files
Grid enabled data transfers/ accesses
Storage Element
Computing Element
20Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Output Sandbox files
Computing Element
Storage Element
21Job submission
RB node
Job Status
edg-job-get-output ltdg-job-idgt
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Computing Element
Storage Element
22Job submission
RB node
Job Status
submitted
Replica Location Server
Network Server
waiting
RB storage
ready
Workload Manager
Output Sandbox files
Inform. Service
scheduled
Job Contr. - CondorG
running
done
cleared
Computing Element
Storage Element
23Job monitoring
RB node
edg-job-status ltdg-job-idgt edg-job-get-logging-inf
o ltdg-job-idgt
Network Server
LB receives and stores job events processes
corresponding job status
Workload Manager
Job status
Logging Bookkeeping
Job Contr. - CondorG
Log Monitor
Log of job events
LM parses CondorG log file (where CondorG
logs info about jobs) and notifies LB
Computing Element
24LCG-2 and NGS
- LCG-2 replica management
- Logical file names, mapped by catalogue to
multiple physical files - Storage element
- Corresponds to NGS data node (approx.)
- Compute element
- A batch queue PBS or Condor for example
- Information service
- Same middleware and GLUE schema are used
25More about the RB
- Developed by the European DataGrid project, EDG
then hardened by LCG, and now one of the
sources for the EGEE middleware (next talk) - Uses components of Condor
- matchmaker and Condor-G
- Try the GENIUS portal on GILDA
- GILDA is a dissemination grid running the LCG-2
middleware - Demo site https//grid-demo.ct.infn.it/
- And look athttp//lcg.web.cern.ch/LCG/http//www
.hep.ph.ic.ac.uk/e-science/projects/demo/index.htm
l
26Implications for the NGS
- Are being worked out!
- Integration with NGS core nodes in progress
- UI requirements??
- LCG user interface OGSA-DAI SRB client
- Lighter-weight alternatives?
- To packaging?
- For client software
27Summary
- The resource broker receives a job description in
JDL - It choose a batch queue for job submisison
- Its an example of the higher services that will
be deployed for the NGS, built upon the current
toolkits