Title: Grid Computing: Harnessing Underutilized Resources
1Grid Computing Harnessing Underutilized Resources
- UNCW Department of Chemistry Biochemistry
Seminar - September 24, 2004
- Ned H. Martin
2Outline
- Definition of Grid computing
- A brief history of computing
- Growth of computing power
- Rationale for Grid computing
- How a Grid works
- Examples of Grid projects
- Grid computing in NC
- Limitations of Grid computing
- UNCW Grid initiative GridNexus
- Whats next?
3Definition of Grid Computing
- Grid computing is a form of distributed computing
that involves coordinating and controlled sharing
of diverse computing, applications, data,
storage, or network resources across dynamic and
geographically dispersed multi-institutional
virtual organizations. - A user of Grid computing does not need to have
the data and the software on the same computer,
and neither must be on the users home (login)
computer.
4Grid Computing
- The term Grid computing suggests a computing
paradigm similar to an electric power grid - a
variety of resources contribute power into a
shared "pool" for many consumers to access on an
as-needed basis.
5Background of Grid Computing
- The idea of Grid computing resulted from the
confluence of three developments - The proliferation of largely unused computing
resources (especially desktop computers) - Their greatly increased cpu speed in recent years
- The widespread availability of fast, universal
network connections (the Internet).
6Brief History of Computing
- 1943 "I think there is a world market for maybe
5 computers." Thomas Watson, chairman of IBM - 1947 Testudo The very first computer in the
Netherlands the relay-based machine was 5 m
long. Adding took 30 s and multiplication 45 s.
7Brief History of Computing
- 1949 "Computers in the future may weigh no more
than 1.5 tons." -Popular Mechanics, forecasting
the relentless march of science - 1957 "I have traveled the length and breadth of
this country and talked with the best people, and
I can assure you that data processing is a fad
that won't last out the year." -The business book
editor for Prentice Hall.
8Brief History of Computing
- 1977 "There is no reason anyone would want a
computer in their home." -Ken Olson, president,
chairman and founder of Digital Equipment Corp. - 1980 "DOS addresses only 1 Megabyte of RAM
because we cannot imagine any applications
needing more." -Microsoft on the development of
DOS. - 1981 "640k ought to be enough for anybody."
-Bill Gates
9Brief History of Computing
- 1979 Introduction of the 8086 chip by Intel
used a 16 bit processor too expensive, so an 8
bit version was developed (the 8088), which was
chosen by IBM for the first IBM PC available
clock frequencies up to 10 MHz. It had an
instruction set of about 300 operations. At
introduction the fastest processor was the 8 MHz
version which achieved 0.8 MIPs (0.8 x 106
instructions per second) and contained 29,000
transistors.
10Brief History of Computing
- 1982 Intel 80286 released. It supported clock
frequencies of up to 20 MHz. At introduction the
fastest version ran at 12.5 MHz, achieved 2.7
MIPs and contained 134,000 transistors. - 1985 Intel 80386 DX released. It supported clock
frequencies of up to 33 MHz. At the date of
release the fastest version ran at 20 MHz and
achieved 6.0 MIPs. It contained 275,000
transistors.
11Brief History of Computing
- 1989 Intel 80486 DX released by Intel. It
contained the equivalent of about 1.2 million
transistors. At the time of release the fastest
version ran at 25 MHz and achieved up to 20 MIPs.
Later versions had clock speeds up to 100 MHz. - 1993 Intel Pentium released. At that time it was
only available in 60 66 MHz versions which
achieved up to 100 MIPs, with over 3.1 million
transistors.
12Brief History of Computing
- 1995 Pentium Pro released. At introduction it
achieved a clock speed of up to 200 MHz. It
achieved 440 MIPs and contained 5.5 million
transistors - this was nearly 2400 times as many
as the first microprocessor in 1971- and capable
of 70,000 times as many instructions per second. - 2004 Pentium 4 chips available with clock speeds
of up to 3.6 GHz providing 11,356 MIPS and
containing 125,000,000 transistors. - 2005 500,000,000 transistors !!!
13Growth of Computing Power
ts/104
2004
14Rationale for Grid Computing
- The proliferation of largely unused computing
resources (especially desktop computers, of which
152 million were sold in 2003). - Their greatly increased cpu speed in recent years
(now gt3 GHz). - The widespread availability of fast, universal
network connections (the Internet).
15Rationale for Grid Computing
- High performance computers (formerly called
supercomputers) are very expensive to buy and
maintain. - Much of the enhancement of computing power
recently has come through the application of
mulltiple cpus to a problem (e.g., NCSC had a 720
processor IBM parallel computer). - Many computing tasks relegated to these
(especially massively parallel) computers could
be performed by a divide and conquer strategy
using many more, although slower, processors as
are available on a Grid.
16How a Grid Works
- The term "grid computing" suggests a computing
paradigm similar to an electric power grid - a
variety of resources contribute power into a
shared "pool" for many consumers to access on an
as-needed basis - Ideally the user does not know or care where the
computing operation is being performed the
process is invisible to the user. - Middleware handles security, authentication,
authorization, resource selection and routing of
input and output seamlessly.
17Examples of Grid Projects
- SETI_at_home
- DNet (distributed.net)
- GRID.ORG (anti-cancer ligand screening)
- IBM Smallpox cure
- Entropia.org
- CERN
18Grid Projects SETI_at_home
- SETI_at_home
- A large-scale search through data gathered by
radiotelescopes in P.R. for evidence of
extraterrestrial life - Involved more than 3 million computers averaging
about 14 TeraFLOPS, or 14 trillion floating point
operations per second, - Utilized over 500,000 years of processing time in
the past year and a half.
19Grid Projects DNet
- DNet (distributed.net)
- Began in 1997 as the first general-purpose
distributed computing network on the Internet - Highly successful in bringing individuals
together to complete cryptographic challenges via
a distributed environment. - Equivalent to more than 160,000 PII 266Mhz
computers working 24 hours a day, 7 days a week,
365 days a year! - The core distributed.net development team joined
United Devices in 2000.
20Grid Projects GRID.ORG
- The United Devices Cancer Research Project
(GRID.ORG) will advance research to uncover new
cancer drugs through the combination of
chemistry, computers, and specialized software. - The research centers on proteins that have been
determined to be a possible target for cancer
therapy. Through a process called "virtual
screening", LigandFit docking software by
Accelrys identifies molecules that interact with
these proteins, and determines which ones have a
high likelihood of being developed into a drug. - In the first year and a half, over 3.5 million
drug candidates were screened using over a
million personal computers.
21Grid Projects Smallpox Cure
- Smallpox cure
- To help find a cure for smallpox, IBM and a group
of partners harnessed the processing power of 2
million idle PCs. They then screened 35 million
drug compounds and smallpox proteins to find the
most effective cure.
22Grid Projects Entropia
- In 1997, Entropia applied idle computers
worldwide to problems of scientific interest. In
just two years, this network grew to encompass
30,000 computers with an aggregate speed of over
one teraflop per second. Among its several
scientific achievements is the identification of
the largest known prime number.
23Grid Projects CERN
- CERN
- By 2005, detectors at the Large Hadron Collider
at CERN, the European Laboratory for Particle
Physics will produce several petabytes of data
per year - a million times the storage capacity
of a desktop computer - Just the basic data analysis requires 20 tflops/s
of computing power (the fastest supercomputer
produces 3 teraflops per second). - more sophisticated analyses will need orders of
magnitude more computing power
24Grid Computing in NC
- NCBioGrid (www.ncbiogrid.org/), an outgrowth of
the High Performance Computing and Data Storage
Focus Group of the NCĀ Genomics and Bioinformatics
Consortium - NC Computing Grid now includes 7 universities
plus MCNC UNCW will be joining soon - UNCW Grid started as a grid for UNCW
bioinformatics/genomics research, expanded now
into chemistry and business applications.
25Limitations of Grid Computing
- Currently, although efforts are being made to
standardize protocols (e.g., Globus toolkit and
Avaki), interacting with Grid services remains a
complex process. - Most of the existing applications that access
Grid services require the user to type cumbersome
commands, often using a command-line interface. - Creating new clients and services requires
programming in a language such as C or Java and
using a host of libraries for interacting with
Open Grid Services Infrastructure, Grid Security
Infrastructure, Web Services Description Language
and other standards.
26Limitations of Grid Computing
- These tools and techniques are useful to a select
group of computing specialists however the only
way to make Grid resources accessible to a wide
range of users is to provide a relatively simple
graphical user interface (GUI). - The UNCW Grid project proposes to develop a
Graphical Grid User Interface that is easy to use
and can access a wide range of applications. - Our hope is to create an interface to Grid
computing that accomplishes what Internet
browsers (Netscape and Internet Explorer) did to
open up the WWW .
27UNCW Grid Initiative GridNexus
- This initiative grew in part out of a need for
HPC resources following the closure of the NCSC
in June 2003, coupled with the availability of
faculty with software programming expertise and
others with computing applications that could
benefit from use of a Grid. - The UNC-OP funded UNCWs proposal for 557,634
over two years to develop Grid portals (GUI
middleware to allow users to access software on
computers on a Grid).
28UNCW Grid Initiative GridNexus
- The UNCW Grid Computing Project is a two-year
collaborative project among a multi-discipline,
multi-investigator core research team at UNCW and
several discipline-focused researchers at partner
institutions NCSU, WCU, NCCU, ECU, and CFCC. The
research areas and institutional interests of
this project are - Advanced Grid Software Development (UNCW)
- Computational Chemistry (UNCW and ECU)
- Bioinformatics (UNCW, NCSU, and NCCU)
- Combinatorics (UNCW)
- Business Computing (UNCW and NCCU)
- Education and Training (UNCW, WCU, CFCC)
- This project proposes to develop a Grid interface
that is easy-to-use and may be used by a
wide-range of applications and users. We have
developed an innovative graphical user interface
(GUI) for grid applications. In particular, we
introduced a new scripting language (JXPL)
designed for web-based services, a GUI for
creating scripts, and have demonstrated the use
of these tools with grid services.
29UNCW Grid Initiative GridNexus
- UNCWs initiative is unique in that it involves
undergraduate students as the main players in the
development of the Grid portal (GUI). - Undergraduate computer science students are
partnered with faculty and students in
application areas (chemistry, biology, business)
to develop graphical front-ends to access
services (programs) on computers on the Grid. - Grid portals are being developed for the two
computational chemistry programs (Gaussian 03 and
DMol ) most often used in research by our faculty
and students.
30Resources of UNCW Grid
- Beowulf cluster 16 PIII processors in Computer
Sciences Department - Fire and FireDev servers plus disc storage
devices - PQS Quantum Cube 8 cpu cluster with PQS and
Gaussian 03 computational chemistry software,
plus TCP-Linda environment. - An 8 processor IBM blade cluster with 0.5 tB disk
storage will be added soon. - Other computers may be added, including the
possibility of using all computing lab computers,
or possibly even all faculty/staff computers
(when not in use).
31Remote Computing before Grid
- Now, to submit a quantum chemistry calculation
- to a remote computer, e.g., at NCSU, one must
- Telnet to remote computer, login (separate login
and password for each user account and for each
computer) - FTP input data file from local computer to remote
machine (requires login, password) - Create and edit an input file for job (using vi
or other text editor) - Create a .job file, edit it if necessary
- Select queue based on cpus and time required
submit .job file - Check progress of calculation by periodically
telnet to remote machine look
for file that indicates completion of job. - FTP output file to local computer
- Open output file in text editor, examine
numerical data - Open output file in a commercial program on local
computer to visualize structure
32Remote Computing on a Grid
- In the future, using Grid middleware to submit a
quantum chemistry calculation to a remote
computer at NCSU
- Login to Grid (single user login and password to
access ANY Grid resource) - Select a data file and job parameters from
pull-down menus click to submit (.input and .job
file is created automatically by Grid middleware,
job is submitted automatically to an appropriate
available computer) - Upon completion of computation, output file is
automatically sent to local computer to visualize
structure (which can also be automated).
33Development of a Grid Portal
- The objective is to make accessing HPC resources
(wherever they may be located) easy to scientists
who are not computer savvy. - Most computation involves doing various
mathematical operations on a dataset. - A GUI approach is employed, in which the user,
after a single login that checks authentication
and authorization, can create a workflow of
functions/operations graphically by connecting
boxes dragged from a series of lists of options,
then applying that series of steps to a dataset. - Such a workflow can be saved for subsequent
application to another dataset.
34Development of a Grid Portal
- Job submission Ideally in a grid, the grid
middleware should select the best resource
those computers that are available, capable, and
have the software needed to handle the job. - The user need not select nor know where the
computation is taking place. In fact, the job
may even be passed from one computer to another
for various aspects of the calculation. - The output is returned to the users workstation
or account, rather than the user having to access
and download the output file from a remote
computer.
35UNCWs Grid Portal GridNexus
- 3 main application types genomics/
bioinformatics, business and chemistry - Chemistry resources on UNCW Grid
- PQS Quantum Cube 8 cpu cluster with PQS and
Gaussian 03 computational chemistry software and
TCP-Linda - Beowulf Cluster 16 cpu cluster with Gaussian 03
computational chemistry software and TCP-Linda - Soon to be added IBM blade server with 8 or 16
cpus Gaussian 03 will be installed on it. - Java script for file transformatione.g., to
convert HyperChem file into a Gaussian 03 input
file
36Quantum Chemistry Portal
- A GUI is under development to allow a user
to select the following from pull-down menus
within boxes that are linked into a workflow - Data input file
- Transform to another file type if necessary
- Level of calculation HF, DFT, MP2, etc.
- Basis set 6-31G(d,p), 6-311G(2d,p), etc.
- Number of processors needed
- CPU time requested
- Keywords opt, nmr, freq, popnpa, etc.
- Charge and multiplicity
37Design of UNCW Grid GUI
- Select from pull-down menus in categories
Basis Set
Data sets (Windows Explorer-like file
browser)
Level of Theory
CPU Time
Processors
Chg. Multiplicity
Keywords
File Type Transformer
Submit
Visualize
38Design of UNCW Grid GUI
- Select from pull-down menus in categories
Basis Set
Data sets (Windows Explorer-like file
browser)
Level of Theory
HF MP2 DFT
CPU Time
Processors
Chg. Multiplicity
Keywords
File Type Transformer
Submit
Visualize
39Design of UNCW Grid GUI
- Functions can be grouped into sets called
workflows for repetitive operations
Basis Set
Data sets (Windows Explorer-like file
browser)
Level of Theory
CPU Time
Processors
Chg. Multiplicity
Keywords
File Type Transformer
Submit
Visualize
40Design of UNCW Grid GUI
- Preferences among choices can be saved as part of
the workflow
6-31G(d)
Data sets (Windows Explorer-like file
browser)
HF
4000
4
0,1
NMR
File Type Transformer
Submit
Visualize
41Design of UNCW Grid GUI
- The result is a much more simplified process for
the user
Select data, Transform it
Calculate, Visualize
42Design of UNCW Grid GUI
- Multiple repeatedly used sets of commands
(workflows) can be saved - A users preferences within a workflow (e.g.,
level of theory, basis set, processors, cpu
time requested, keywords, charge and
multiplicity) could be saved also (future design
feature). - In the future a user may need only to specify a
data set (file) and link it to a pre-set
workflow to initiate a calculation!
43Chemistry Portal
- Initially, the portal will operate under Linux
- Next it will be ported to operate under Windows
- Eventually, computations will be submitted online
through web browsers - This could be accomplished from any devise (e.g.,
pc, laptop, or even a cell phone) that can access
the Internet.
44JXPL Language
- UNCW Mathematics Faculty Dr. Jeff Brown with help
from Computer Science Faculty Dr. Clayton Ferner
and recent graduate Mike Wood developed a new
java-base programming language called JXPL. - JXPL is the language used in the GridNexus
project, and is a language commonly used with web
services and grid services - The advantages of JXPL include
- It is readily extensible
- Interfaces easily with (LISP-like) data
structures in GUI - JXPL scripts are written in XML, a commonly used
language
45Whats Next?
- More filters to transform data need to be
developed and tested - Fancier graphics may be added to the GUIs
- More computational nodes will be added to the
Grid. The eventual goal is to include all NC
institutions of higher learning. - Extend Grid to include more software applications
- Extend Grid services to other disciplines
- Include industry and businesses as users and
developers.
46References
- http//people.uncw.edu/vetterr/grid/proposal/UNC-O
P_Grid_Project20Overview.htm - http//www.ox.compsoc.net/swhite/history/
- http//www.grid.org
- http//www.gridcomputingplanet.com/
- http//www.globus.org/research/papers/anatomy.pdf
- http//www.ibm.com/grid
- http//www.globus.org
- http//www.usatlas.bnl.gov/computing/grid/
47Acknowledgments
- UNC-OP for funding the UNCW Grid Initiative
Proposal - Fostering Undergraduate Research Partnerships
through a Graphical User Environment for the
North Carolina Computing Grid, Dr. Ron Vetter,
PI - Co-PIs Dr. Rebecca S. Boston, NCSU Dr. Anthony
Wilkinson, WCU Dr. Marilyn McClelland, NCCU Dr.
Libero Bartolotti, ECU Ms. Judy Porter, CFCC. - UNCW Participants Computer Science Dr. Ron
Vetter, Dr. Clayton Ferner, Dr. David Berman, and
Dr. Tom Hudson. Information Technology Systems
Dr. Bob Tyndall and Mr. Bobby Miller.
Mathematics and Statistics Dr. Jeff Brown.
Chemistry and Biochemistry Dr. Ned H. Martin.
Biological Sciences Dr. Ann Stapleton
Information Systems and Operations Management
Dr. Tom Janicki. - UNCW Computer Science students working on the
Chemistry portal Tristan
Carland, Jerry Martin, Andrew Martin