Title: Condor at Cardiff
1Condor at Cardiff
2Contents
- What is Condor
- Condor at Cardiff
- Condor Users at Cardiff
- Green Computing at Cardiff
- Advanced Research Computing at Cardiff
- Virtualization
- Patterns
3What is Condor
- Condor is the name for two species of New World
vultures, each in a monotypic genus - They are the largest flying land birds in the
Western Hemisphere
4What is Condor
- A specialised workload management system for
compute-intensive jobs - Users submit their jobs to Condor
- Condor places them into a queue
- Condor chooses where and when to run them
- Condor carefully monitors their progress
- Condor informs the user upon completion
- http//www.cs.wisc.edu/condor/
5Condor at Cardiff - Pilot
- The Condor pool began as a pilot service back in
April of 2004 led by Dr Hugh Beedie, CTO of
Information Services in conjunction with staff at
the Welsh e-Science Centre - First user from the School of Business
- A solution looking for problems
6Condor at Cardiff - Production
- The Condor pool transitioned to a production
service back in January of 2006 with the
appointment of Dr James Osborne as project
manager - Latest user from the School of Psychology
- Doubled size of pool, Tripled number of users
- Distributed using Novell Zenworks
- Common condor_config files EA, EI, S, SEA
- Injected condor_config_local variables
- IS_OWNED_BY, IS_EXECUTE_ALWAYS, RANK
7Central Manager
master, collector, negotiator
Execute Nodes
1600 Workstations
Submit Nodes
30 Workstations
master, schedd, shadow
master, startd, starter
8Condor Users at Cardiff
- User in a computing context refers to one who
uses a computer system - Users may need to identify themselves for the
purposes of accounting, security, logging and
resource management - Users are also widely characterized as the class
of people that uses a system without complete
technical expertise required to fully
understand the system
9Growth of User Base
10Diversity of User Base
- Architecture 1
- Biosciences 9
- Business 1
- Computer Sci 6
- Engineering 3
- Epidemiology 2
- History Arch 2
- Mathematics 2
- Optometry 2
- Physics 2
- Psychology 1
- Social Sci 1
- Total 32
11Diversity of Applications
- Blast, Damfilt
- Dammin, Energyplus
- Gasbor, Grinder
- Lea, Leadmix
- Matlab, Msvar
- Oxcal, Perl
- Pest, R
- Sienna, Structure
- Econometric Modelling
- Fluid Dynamics
- Fourier Analysis
- Geological Modelling
- Image Processing
- Radiation Transport
- Travelling Salesman
- WIFI Roaming
12Donna Lammie
Structural Biophysics Group
- OPTOM
- X-Ray Diffraction
- Determine shape of molecules
- Time on a single workstation 2-3 Days
- Time on the Condor pool 2-3 Hours
- Speed-up factor of 2000
13Donna Lammie
C. Baldock et. al. Nanostructure of Fibrillin-1
Reveals Compact Conformation of EGF Arrays and
Mechanism for Extensibility. Proceedings of the
National Academy of Sciences of the United States
of America, 103(32)11922-11927, August 2006.
14Patrick Downes
Research Assistant
- Velindre Cancer Centre
- Montecarlo simulation
- Radiotherapy dose calculation
- Time on a single workstation 3 Months
- Time on the Condor pool 36 Hours
- Speed-up of 6000
15Patrick Downes
16Green Computing at Cardiff
- Green Computing is the study and practice of
using computing resources efficiently - Typically, technological systems or computing
products that incorporate green computing
principles take into account the so-called triple
bottom line of economic viability, social
responsibility, and environmental impact
17Power Consumption
Based on a P4 3GHz PC with 512MB RAM
18Watts Up Pro
- Measures
- Watts, Volts, Amps, WattHrs, Cost, Avg Kwh, Mo
Cost, Max Wts, Max Vlt, Max Amp, Min Wts, Min
Vlt, Min Amp, Pwr Fct, Dty Cyc, Pwr Cyc - Freq 1 second
- Duration 15 minutes
19Economic Viability
Based on a P4 3GHz PC with 512MB RAM
- Makes sound financial sense
- Hibernate saves 60 per year
- Condor 30 per year (max)
- Dedicated 150 per year
- Condor is 5 times cheaper
Saving of Hibernate Cost of 100W Electricity
(Idle State) for 16 Hours out of 24 Cost of
Condor Cost of 150W Electricity (Condor State)
Cost of 100W Electricity (Idle State) Cost of
Dedicated Cost of 150W Electricity (Condor
State) Cost of 100W Electricity (Air Con)
20Environmental Impact
Based on a P4 3GHz PC with 512MB RAM
- Makes sound environmental sense
- Hibernate saves 650Kg CO2 per year
- Condor 325Kg CO2 per year (max)
- Dedicated 1,625Kg CO2 per year
- Condor is 5 times greener
Saving of Hibernate Cost of 100W Electricity
(Idle State) for 16 Hours out of 24 Cost of
Condor Cost of 150W Electricity (Condor State)
Cost of 100W Electricity (Idle State) Cost of
Dedicated Cost of 150W Electricity (Condor
State) Cost of 100W Electricity (Air Con)
21Across Campus
Based on 10,000 P4 3GHz PCs with 512MB RAM
- Makes sound financial sense
- Hibernate would save 600,000 per year
- Hibernate 16 out of 24 hours
- Makes sound environmental sense
- Hibernate would save 6,500T CO2 per year
- Rainforest required 52Km2
- Rainforest required 40 area of Cardiff
Saving of Hibernate Cost of 100W Electricity
(Idle State) for 16 Hours out of 24 Cost of
Condor Cost of 150W Electricity (Condor State)
Cost of 100W Electricity (Idle State) Cost of
Dedicated Cost of 150W Electricity (Condor
State) Cost of 100W Electricity (Air Con)
22Cardiffs Condor Pool
- ...is the equivalent of a 500,000 supercomputer
- costs 50,000 in equipment, power, and staff
- improves return on investment
- ...is one of the largest pools in the UK
- and we plan to expand the pool
- is probably the most utilised pool in the UK
- by a factor of 10
- ...has more users than other pool in the UK
- and we are working hard to keep it that way
Nobody corrected me at the 1st Campus Grids SIG
in Oxford Nobody corrected me at the 21st Open
Grid Forum in Manchester
23The ARC Spectrum
- HPC
- Tightly Coupled
- Supercomputers
- NUMA Machines
- Million
- HTC
- Loosely Coupled
- Small Clusters
- Campus Grids
- Thousand
- H Thousand
Large Clusters SMP H Thousand Million
24The ARC Division
- ARCCA will provide, co-ordinate, support and
develop advanced research computing services for
researchers at Cardiff University - ARCCA will also work with clients and partners
outside the University through a range of
outreach activities - ARCCA is staffed with experts in the field who
are already available to help and support your
research needs through a range of services - ARCCA is procuring a range of dedicated high-end
computing equipment which is planned to be fully
operational by early 2008
25The ARC Organisation
- Prof Martyn Guest Director of ARC
- Dr Christine Kitchen Manager of ARC
- Dr James Osborne Applications
- Mr Huw Lynes Infrastructure
- Ms Liz Fitzgerald Admin Officer
- Another Programmer
26Prof Martyn Guest
- 2007
- Director of Advanced Research Computing at
Cardiff - 1995
- Associate Director of Computational Science and
Engineering at Daresbury - 1971
- PhD Theoretical Chemistry
- 1967
- BSc Chemistry
27The ARC Cluster
- 256 x Compute Nodes (Cluster)
- Dual Socket Quad Core Intel Xeon E5472 3.0GHz
- 16 Gb of Memory
- ConnectX Infiniband Dual GigE
- 4 x Compute Nodes (SMP)
- Quad Socket Quad Core Intel Xeon X7350 2.93GHz
- 32 Gb of Memory, 1Tb of Local Disk (RAID5)
- ConnectX Infiniband Dual GigE Resilient PSU
28The ARC Cluster
- 4 x Login Nodes
- Dual Socket Quad Core Intel Xeon E5472 3.0GHz
- 32 Gb of Memory, 0.5Tb of Local Disk (RAID1)
- ConnectX Infiniband Dual GigE Resilient PSU
- 2 x Storage Nodes
- Dual Socket Quad Core Intel Xeon E5472 3.0GHz
- 32 Gigabytes of Memory Resilient PSU
- ConnectX Infiniband Dual GigE Fibre Channel
29Virtualization
- Virtualization is a broad term that refers to the
abstraction of computer resources - This includes making a single physical resource
appear to function as multiple logical resources - Or it can include making multiple physical
resources appear as a single logical resource
30Central Manager Utilisation
Based on 6 months of monitoring
- CPU (Percentage) 16.60 (average) 65.99 (max)
- (Single Socket Single Core Intel Xeon 2.4GHz)
- RAM (Gb) 1.09 (average) 1.60 (max)
- 55.00 and 80.00 of current capacity (2 GB)
- Disk (Gb) 1.25 (average) 1.50 (max)
- 1.71 and 2.05 of current capacity (73 GB)
31Central Manager Utilisation
Based on 6 months of monitoring
- Net In (Kbps) 29.66 (average) 45.39 (max)
- 0.02 and 0.04 of current capacity (Gigabit)
- Net Out (Kbps) 39.13 (average) 86.17 (max)
- 0.03 and 0.07 of current capacity (Gigabit)
32Central Manager Virtualization
Based on 6 months of monitoring
- 1 x Condor Server
- Dual Socket Quad Core Intel Xeon E5472 3.0GHz
- 32 Gb of Memory, 0.5Tb of Local Disk (RAID1)
- Dual GigE Resilient PSU
- 4 x Virtual Central Managers ?
- 2 x Virtual Submit Nodes ?
33Design Patterns
- A Design Pattern is a general repeatable solution
to a commonly occurring problem in software
design - A design pattern is not a finished design that
can be transformed directly into code - It is a description or template for how to solve
a problem that can be used in many different
situations
34(No Transcript)
35Questions
- condor_at_cardiff.ac.uk
- http//www.cardiff.ac.uk/arcca/
- http//www.cs.wisc.edu/condor/