Middleware emerging onto the NGS: Resource Broker - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Middleware emerging onto the NGS: Resource Broker

Description:

... responsible for the. actual job management. operations ... Local resource management system: Condor / PBS / LSF master. Globus gatekeeper. Job request ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 28
Provided by: Nes66
Category:

less

Transcript and Presenter's Notes

Title: Middleware emerging onto the NGS: Resource Broker


1
Middleware emerging onto the NGS Resource Broker
Mike Minetermjm_at_nesc.ac.uk
2
Outline
  • NGS middleware Toolkits inviting development of
    higher level services
  • By projects e.g. RealityGrid and BRIDGES
  • For deployment as NGS services
  • What is a Resource Broker?
  • Where does it come from?
  • LCG-2 ( EGEE-0)
  • Providing production service for LCG-2
  • Being configured for the NGS
  • Current LCG-2 activity

3
Resource broker
  • On the current NGS we have
  • GRAM to submit jobs
  • Information service to tell us what queues are
    busy
  • The RB takes the work out of deciding where to
    run a job
  • First step the LCG-2 RB is being added to the
    NGS
  • (LCG Large Hadron Collider Compute grid)

4
Current production mware LCG-2
5
Major components
Replica Catalogue
User interface
Information Service
Resource Broker
Author. Authen.
Input sandbox Broker Info
Output sandbox
Logging Book-keeping
Computing Element
Job Status
6
Replica Location Server
RB node
Network Server
Workload Manager
Inform. Service
Job Contr.
Characts. status
Computing Element
Storage Element
7
Job Status
RB node
submitted
Replica Location Server
Network Server
Workload Manager
Inform. Service
UI allows users to access the
functionalities of the WMS (via command line,
GUI, C and Java APIs)
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
8
  • edg-job-submit myjob.jdl
  • Myjob.jdl
  • JobType Normal
  • Executable "(CMS)/exe/sum.exe"
  • InputSandbox "/home/user/WP1testC","/home/file
    , "/home/user/DATA/"
  • OutputSandbox sim.err, test.out,
    sim.log"
  • Requirements other. GlueHostOperatingSystemNam
    e linux"
  • other. GlueHostOperatingSystemRelease "Red Hat
    7.3 other.GlueCEPolicyMaxCPUTime gt 10000
  • Rank other.GlueCEStateFreeCPUs

Job Status
RB node
submitted
Replica Location Server
Network Server
Workload Manager
Inform. Service
Job Contr. - CondorG
CE characts status
SE characts status
Job Description Language (JDL) to specify job
characteristics and requirements
Computing Element
Storage Element
9
NS network daemon responsible for
accepting incoming requests
RB node
Job Status
Replica Location Server
Network Server
Job
Input Sandbox files
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
10
Job submission
RB node
Job Status
Replica Location Server
Network Server
Job

Workload manager
Inform. Service
RB storage
WM acts to satisfy the request
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
11
Job submission
Job Status
RB node
Replica Location Server
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
Where must this job be executed ?
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
12
Job submission
RB node
Job Status
Matchmaker responsible to find the best CE
for a job
Replica Location Server
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
13
Job submission
Where are (which SEs) the needed data ?
RB node
Job Status
Replica Location Server
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
What is the status of the Grid ?
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
14
Job submission
RB node
Job Status
Replica Location Server
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
CE choice
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
15
Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Adapter
Job Contr. - CondorG
Job Adapter responsible for the final touches
to the job before performing submission (e.g.
creation of wrapper script, PFN, etc.)
CE characts status
SE characts status
Computing Element
Storage Element
16
Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job
Job Contr.
CE characts status
Job Controller responsible for the actual job
management operations (done via CondorG)
SE characts status
Computing Element
Storage Element
17
Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
SE characts status
Job
Computing Element
Storage Element
18
Compute element reminder!

Job request
I.S.
Logging
Logging
Info system
Globus gatekeeper
gridmapfile
Grid gate node
Local resource management systemCondor / PBS /
LSF master
Homogeneous set of worker nodes
19
Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Input Sandbox files
Grid enabled data transfers/ accesses
Storage Element
Computing Element
20
Job submission
RB node
Job Status
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Output Sandbox files
Computing Element
Storage Element
21
Job submission
RB node
Job Status
edg-job-get-output ltdg-job-idgt
Replica Location Server
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Computing Element
Storage Element
22
Job submission
RB node
Job Status
submitted
Replica Location Server
Network Server
waiting
RB storage
ready
Workload Manager
Output Sandbox files
Inform. Service
scheduled
Job Contr. - CondorG
running
done
cleared
Computing Element
Storage Element
23
Job monitoring
RB node
edg-job-status ltdg-job-idgt edg-job-get-logging-inf
o ltdg-job-idgt
Network Server
LB receives and stores job events processes
corresponding job status
Workload Manager
Job status
Logging Bookkeeping
Job Contr. - CondorG
Log Monitor
Log of job events
LM parses CondorG log file (where CondorG
logs info about jobs) and notifies LB
Computing Element
24
LCG-2 and NGS
  • LCG-2 replica management
  • Logical file names, mapped by catalogue to
    multiple physical files
  • Storage element
  • Corresponds to NGS data node (approx.)
  • Compute element
  • A batch queue PBS or Condor for example
  • Information service
  • Same middleware and GLUE schema are used

25
More about the RB
  • Developed by the European DataGrid project, EDG
    then hardened by LCG, and now one of the
    sources for the EGEE middleware (next talk)
  • Uses components of Condor
  • matchmaker and Condor-G
  • Try the GENIUS portal on GILDA
  • GILDA is a dissemination grid running the LCG-2
    middleware
  • Demo site https//grid-demo.ct.infn.it/
  • And look athttp//lcg.web.cern.ch/LCG/http//www
    .hep.ph.ic.ac.uk/e-science/projects/demo/index.htm
    l

26
Implications for the NGS
  • Are being worked out!
  • Integration with NGS core nodes in progress
  • UI requirements??
  • LCG user interface OGSA-DAI SRB client
  • Lighter-weight alternatives?
  • To packaging?
  • For client software

27
Summary
  • The resource broker receives a job description in
    JDL
  • It choose a batch queue for job submisison
  • Its an example of the higher services that will
    be deployed for the NGS, built upon the current
    toolkits
Write a Comment
User Comments (0)
About PowerShow.com