Title: Installation of a Condor Supercomputing pool
1Installation of a Condor Supercomputing pool
- Brain Campbell
- Bryce Carmichael
- Unquiea Wade
- Mentor
- Dr. Eric Akers
2Abstract
The international polar year was designed to
study and better understand the current state of
the climatic changes to the worlds ice sheets.
For the last few decades, there have been
automated weather stations and satellites in
geo-synchronous orbit that created data sets.
Today, numerous amounts of data are unexplored
due to insufficient funding and the scarcity of
resources. For this reason, the polar grid
concept was proposed to delegate the analysis of
the existing data sets. The goal of the
Elizabeth City State Universitys Polar Grid Team
was to construct a model network to serve as a
base for a super computing pool. The super
computing pool will be constructed on the
universitys campus and linked to the overall
polar grid system. Numerous Software and
protocols were researched that are currently in
use at other institutions around the nation. From
the possible protocols, the condor software was
chosen. Condor was created and developed at the
University of Wisconsin because of easier usage
and its willingness for expansion. An
eighteen node computing pool was constructed and
tested within Dixon Hall's second floor lab using
Condor. This pool was comprised of seventeen
desk-tops running on a Windows NT platform, with
the pool's mater housed in Lane hall acting as a
Linux based server.
3Purpose
- The goal was to utilize all of our computers.
- Gain knowledge about Supercomputing.
- Setup a pool of computers that can be accessed by
Polar Grid. - Familiarize team members with job submission
and overall operation of Condor.
4Introduction to Supercomputing
- What is Supercomputing?
- Supercomputing a term given to a system capable
of processing at speeds much greater than
commercially available CPUs. - High throughput computing is used in describing
systems with intermediate processing abilities.
5Distributive vs. Parallel
- Distributed computing utilizes a network of many
computers, each accomplishing a portion of an
overall task, to achieve a computational result
much more quickly than with a single computer. - Distributed computing also allows many users to
interact and connect openly.
- Parallel processing is the simultaneous
processing of the same task on two or more
microprocessors in order to obtain faster
results. - The computer resources can include a single
computer with multiple processors.
6 Size vs. Efficiency
- Parallel processing allows more intimate
communication between nodes increasing
efficiency. - As the size of the network grows communication
takes up a greater part of the CPUs time - This can be limited by using more than one type
of protocol in a system
7Hardware/Software Options
Condor is a specialized workload management
system for compute-intensive jobs. Like other
full-featured batch systems, Condor provides a
job queueing mechanism, scheduling policy,
priority scheme, resource monitoring, and
resource management.
Beowulf is a design for high-performance parallel
computing clusters on inexpensive personal
computer hardware. Beowulf cluster is a group of
usually identical PC computers running a Free and
Open Source Software (FOSS) Unix-like operating
system, such as BSD, Linux or Solaris.
BOINC is a software platform for volunteer
computing and desktop Grid computing. BOINC is
designed to support applications that have large
computation requirements, storage requirements,
or both.
8History of Condor
- The Condor project was started in 1988.
- Condor was built from the results of the Remote
Unix project and from the continuation of
research in the area of Distribute Resource
Management (DRM). - Condor was created at the University of
Wisconsin-Madison (UW-Madison), and it was first
installed as a production system in the
UW-Madison Department of Computer Science.
9Why choose Condor?
- Versatility
- Capability of switching between distributive or
parallel computing - Multiple programming codes for simple execution
of jobs. - Operates on Multiple platforms
10Resources Required
- Availability Open source software
- Easy Expansion Any number of nodes can be added
to an existing pool - Cost efficiency Any CPU meeting the base
requirements can be use efficiently.
11System Requirements
- Windows
- Condor for Windows requires Windows 2000 (or
better) or Windows XP. - 300 megabytes of free disk space is recommended.
Significantly more disk space could be desired to
be able to run jobs with large data files. - Condor for Windows will operate on either an NTFS
or FAT file system. However, for security
purposes, NTFS is preferred.
- Unix
- The size requirements for the downloads are
currently vary from about 20 Mbytes - (statically linked HP Unix on a PA RISC) to more
than 50 Mbytes (dynamically linked Irix on an
SGI). - In addition, you will need a lot of disk space
in the local directory of any machines that are
submitting jobs to Condor
12Installation
http//parrot.cs.wisc.edu/
.
Condor software can be access through their main
website. Condor can be downloaded on various
platform such as Solaris, Linux/Unix, Windows,
and MAC Administrative and user manuals are also
available on the website.
13Configuration
Installation overseen through the windows
installation wizard Changes to default Pool
master node Linux base machine in lane hall
10.40.20.37 having a Linux based master will
allow the eventual use of the full array of
condor options. Read Write access -
parameters changed to include 10... to allow
fee back and access from different nodes. Due
to the use of the CERSER labs during class hours
each node is required to be idle for 15 minutes
before it is available to perform tasks. If a
tasks interrupted it will be restarted on a
different machine, if the original node is not
freed in less than ten minutes
14 Job Submission and Tracking
Jobs can be submitted using any executable file
format through the condor/bin directory. Job
s are submitted through the condor bin using the
condor_submit filename,the status of the nodes
within the system can be checked using the
command condor_status,
15Condor Status Menu
condor _status command will bring up a menu given
the condition that will list the current platform
and availability of each node. Availability is
signified by the one word qualifiers in the
fourth column. Unclaimed The node is open but
is unable to perform the specified task Claimed
The node is currently running a specified task
Matched The node is opened and can perform a
specified task Owner The node has a local
user demanding its attention
16Job Submission and Tracking
After submission a task can be traced through the
pool using condor_q, command. The results of
the tasks can be seen within the output files
created through the executable. or through the
.log file that is created automatically for each
task.
17Results
Condor pool composed of 17 nodes running on
windows NT platform has been established in the
Dixon hall laboratory. Operating under a Linux
based master housed at the lane hall
offices. To date simple tasks have been
submitted using C code and have ran
successfully through the pool. Diagnostic
assessment has shown two CPUs unconnected to the
network and that there were naming redundancies
which hindered the installation of the condor
system.
18Conclusions
- Installation of Condor was a success .
- Expansion of the cluster is easy and can be done
efficiently with minimal cost in resources. - Management and Programming with Condor can be
done on an undergraduate level and is encouraged.
19Future Work
- Familiarize more of CERSER teams with Condor
software. - Continue the expansion of the Condor pool .
- Link ECSU to the Polar Grid network.
- Encourage the development of a programs to aide
future CERSER research projects.
20References
- Andrew S. Tanenbaum, Maarten Van Steen (2002)
Distributed Systems Principles and Paradigms. New
Jersey Prentice- Hall Inc. - Amza C., A.L. Cox, S. Dwarkadas, P. Keleher, R.
Rajamony H. Lu, W. Yu, and W.Zwaenepoel.
ThreadMarks Shared memory computing on networks
of workstations, to appear in IEEE
Computer,(draft copy) www.cs.rice.edu/willy/Tread
Marks/papers.html. - A.J. van der Steen, An evaluation of some Beowulf
clusters, Technical Report WFI-00-07, Utrecht
University, Dept. of Computational Physics,
December 2000. (Also available through
www.euroben.nl, directory reports/.) - A.J. van der Steen, Overview of recent
supercomputers high-end servers, June 2005,
www.euroben.nl, directory reports/. - http//www.cs.wisc.edu/condor/manual/v7.0/
- http//boinc.berkeley.edu/trac/wiki/BoincIntro
- http//www.supercomputingonline.com/ads.php
21Questions