Condor - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Condor

Description:

Condor uses ClassAd Matchmaking to make sure that everyone is happy. ... More than 150 Condor installations worldwide in academia and industry ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 19
Provided by: Wils196
Category:
Tags: condor

less

Transcript and Presenter's Notes

Title: Condor


1
Condor
  • What it is
  • and
  • How it works.
  • Paul Wilson
  • Department of Earth Sciences
  • University College, London

2
What is High-Throughput Computing?
  • High-performance CPU cycles/second under ideal
    circumstances.
  • How fast can I run simulation X on this
    machine?
  • High-throughput CPU cycles/day (week, month,
    year?) under non-ideal circumstances.
  • How many times can I run simulation X in the
    next month using all available machines?

3
What is Condor?
  • Condor converts collections of distributively
    owned workstations and dedicated clusters into a
    distributed high-throughput computing facility.
  • Condor uses ClassAd Matchmaking to make sure that
    everyone is happy.

4
Condor Project (Established 85)
  • Distributed High Throughput Computing research
    performed by a team of 25 faculty, full time
    staff and students who
  • face software engineering challenges in a
    distributed UNIX/Linux/NT environment,
  • are involved in national and international
    collaborations,
  • actively interact with academic and commercial
    users,
  • maintain and support a large distributed
    production environment,
  • and educate and train students.
  • Funding US Govt. (DoD, DoE, NASA, NSF),
  • ATT, IBM, INTEL, Microsoft UW-Madison

5
The Condor System
  • Unix and NT
  • Operational since 1986
  • Software available free on the web
  • More than 150 Condor installations worldwide in
    academia and industry
  • Condor at UCL will be one of the largest single
    Condor pools in the world.

6
Some HTC Challenges
  • Condor does whatever it takes to run your jobs,
    even if some machines
  • Crash (or are disconnected)
  • Run out of disk space
  • Dont have your software installed
  • Are frequently needed by others
  • Are far away managed by someone else

7
Layout of the Condor Pool
ClassAd Communication Pathway
  • Machines can be configured to do one of 3 things
  • run AND submit jobs.
  • Run jobs only.
  • Submit jobs only.
  • This 3-machine pool has the following
  • A run and submit-enabled Central Manager
  • 2 x run-enabled pool machines.

8
Job Startup
Startd
Schedd
Starter
Shadow
Customer Job
Submit
9
The Condor Daemons
  • master
  • schedd
  • shadow
  • startd
  • starter
  • negotiator
  • collector

10
Master daemon big daddy.
  • master ALL MACHINES IN POOL
  • Runs all the time on every machine.
  • Spawns all other daemons, monitors each and
    restarts them if they crash.
  • Has cmd-line tools to reconfigure the daemons
  • condor_off/on (switch off/on condor resource)
  • condor_reconfig (reconfig the master config)
  • Condor_off master (switch off master daemon)

11
Job Scheduling and Monitoring
  • 2 3. schedd shadow MACHINE SUBMITTING JOBS
  • Any machine submitting jobs runs these daemons.
  • When a job is submitted the Shadow monitors it
    and controls file I/O, remote calls etc.
  • Schedd represents requests to the pool, and
    stores the job_queue. There are cmd-line tools to
    view and manipulate the queue
  • Condor_rm (remove a job from the queue)
  • Condor_q (look at the current queue)
  • Condor_submit (submit a job to the queue)

12
Job Starting and Execution
  • 4 5. startd and starter MACHINE RUNNING JOBS.
  • Startd represents the machine capable of running
    jobs.
  • Advertises the machines attributes to the pools
    master machine, in order to be matched with
    likely jobs.
  • Startd daemon spawns the starter when issued with
    a job the starter sets up the execution
    environment and runs the job.
  • The starter daemon communicates with the shadow
    daemon on the submitting machine- this
    communication controls the I/O for the job.
  • starter and shadow daemons exist only for the
    lifetime of the job, and each job has its own
    pair.

13
Resource information Collection
  • 6. Collector. Pool Central Manager ONLY.
  • Collects information about the Condor pool.
  • All other daemons on all pool machines
    periodically send their ClassAds to the
    collector.
  • The condor_status command queries the collector
    and provides a rich list of data for each machine
    in the pool.
  • condor_status v is a long list of information,
  • condor_status is a summary, mainly showing
    whether the machine is idle, busy, matched or
    vacating.

14
Matching Jobs with Resources
  • 7. negotiator. Pool Central Manager ONLY
  • This is the backbone of the Condor system.
  • Responsible for job to machine matchmaking.
  • Queries the collector periodically for the
    current status of all resources in the pool.
  • Contacts each schedd daemon with waiting job
    requests, and matches these requests with any
    free resources which match them.

15
What is ClassAd Matchmaking?
  • Condor uses ClassAd Matchmaking to make sure that
    work gets done within the constraints of both
    users and owners.
  • Users (jobs) have constraints
  • I need an Alpha with 256 MB RAM
  • Owners (machines) have constraints
  • Only run jobs when I am away from my desk and
    never run jobs owned by Bob.

16
The CLASSAD 1 Condors Temping Agency.
  • Each Condor Pool MACHINE provides a ClassAd to
    the central manager.
  • This is an advertisement of a pool machines
    attributes and status.
  • Chinon 1 condor_status
  • Name OpSys Arch State
    Activity LoadAv Mem ActvtyTime
  • chinon.geol.ucl.ac. IRIX65 SGI Unclaimed
    Idle 0.198 512 0000004
  • sancerre.geol.ucl.a WINNT51 INTEL Unclaimed Busy
    0.020 512 002284
  • geol-ws42.geol.ucl WINNT50 INTEL Claimed Busy
    0.990 256 0132721

17
The CLASSAD 2...The local Job Centre
  • Each Condor Pool JOB provides a ClassAd to the
    central manager.
  • This is an advert of a Jobs requirements, files
    and location.
  • Chinon 1 condor_q
  • -- Submitter sancerre.geol.ucl.ac.uk
    lt128.40.78.1931029gt sancerre.geol.ucl.ac.uk
  • ID OWNER SUBMITTED RUN_TIME ST PRI
    SIZE CMD
  • 237.4 arnaud 5/12 1101 0001255
    R 0 5.9 abinis.exe
  • 289.0 pwilson 5/16 1453 0010342 R
    0 25.4 metadise.exe

18
How do I use Condor?
  • This will be covered after the break in three
    sessions
  • DEMO How to install Condor create a pool
  • DEMO How to submit monitor Condor Jobs
  • PRACTICAL Job submission exercise.
Write a Comment
User Comments (0)
About PowerShow.com