Interactive MPI on Demand - PowerPoint PPT Presentation

About This Presentation
Title:

Interactive MPI on Demand

Description:

Run startds as COD jobs on base pool. Report to personal Condor. Base jobs suspend ... COD provides no file transfer. Can re-use existing startd binary ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 18
Provided by: Miron1
Category:
Tags: mpi | cod | demand | interactive

less

Transcript and Presenter's Notes

Title: Interactive MPI on Demand


1
Interactive MPI on Demand
2
Unix Tool Philosophy
  • 1) Individual tools do one thing well
  • 2) Communicate via ascii streams
  • 3) Are composable

3
The Paradox
  • Universal assent that its good
  • No one uses it
  • (Except for shell one-liners)
  • grep abc sort uniq c sort n

4
More than just shell scripts
  • Division in Unix processes provides
  • Restartabilty
  • Better security
  • Scalable across multi-core

5
For example
  • Qmail
  • Secure, stable
  • Implemented across dozen processes

6
Getting back to Condor
  • Condor uses this in some places
  • x-Gahps
  • condor_master
  • Replaceable shadow/starter pairs
  • Multi_shadow vs. many shadow
  • But not everywhere
  • schedd

7
Condor Daemons as Components
  • Very Successful strategy
  • Glide-in
  • Personal-condor
  • Hoffman and schedds as jobs
  • Condor-c

8
Case Study MPI on Demand
  • The problem
  • Have a pool with lots of machines
  • Very-long running (weeks) vanilla jobs
  • Need to run big, but short MPI
  • Cant reboot startds
  • Need Dedicated scheduler
  • Requires dedicated machines

9
Possible Solutions
  • Add suspension slot
  • Requires Reboot
  • Submit MPI job normally
  • Preempts vanilla job

10
COD refresher
  • COD Computing On Demand
  • No Scheduling
  • No File Transfer
  • When COD runs, vanilla job suspends
  • Checkpoint to swap
  • Needs security on to work
  • Explicitly allowed

11
Startd as COD job
  • Overview
  • Launch personal condor
  • Run startds as COD jobs on base pool
  • Report to personal Condor
  • Base jobs suspend
  • Submit parallel job to personal Condor
  • Remove COD startds

12
Startd under COD Details
  • Two condor_config files careful!
  • COD provides no file transfer
  • Can re-use existing startd binary
  • Need to pre-stage or NFS config_file
  • Dont lose claimid!

13
Example code
  • HOSTSa b c
  • For h in hosts do
  • Condor_cod request name h gt claimid.h
  • For n in claimid. do
  • Condor_cod activate id cat n -jobad ja

14
Cod JOB_AD
  • CMD /nfs/path/run-startd.sh
  • IWD /tmp
  • Out startd.out
  • Err startd.err
  • Universe 5

15
Run-startd.sh
  • Mkdir p p-condor/spool,log,execute)
  • CONDOR_CONFIG/nfs/new_config
  • Exec /usr/sbin/condor_master f -t

16
Summary
  • Use condor daemons as components
  • Mix-and-match as needed

17
Questions?
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com