Title: Grid Computing
1Grid Computing
- Mark P. Wachowiak, Ph.D.
- February 2, 2007
2Objectives
- Grid computing
- Software and middleware for the grid
- Present and future grid applications
3Grid Computing
- Definition
- Grid computing is distributed computing
performed transparently across multiple
administrative domains (P.V. Coveney). - Distributed high-performance computing.
- Large geographically distributed networks of
computers. - Provides a means for using distributed resources
to solve large problems. - What the Web did for communication, grids
endeavor to do for computation.
4Grid Computing (2)
- Very general computing applications
- Database searches and queries.
- Scientific applications.
- Weather prediction.
- Cryptography.
- Business applications.
- Transparency
- Distributing computational resources among
multiple and widely separated sources and users
is a difficult algorithmic problem.
5Characteristics of Grids
- Grids coordinate resources that are not subject
to centralized control. - Grids use standard, open, general-purpose
protocols and interfaces. - Grids deliver high qualities of service.
- http//devresource.hp.com/drc/technical_papers/gri
d_soa/04.png
6Grid vs. Parallel Computing
Beowulf cluster
SHARCNet University of Western Ontario
compneuro.uwaterloo.ca/beowulf.html
7Grid vs. Parallel Computing (2)
- Grid computing is distinguished from parallel
computing on one or more multiprocessors - Parallel computing locally clustered machines
or large supercomputers. - Grid computing computation across different
administrative domains.
www.chemistry.msu.edu/Facilities/Supercomputer/
8Two Tenets of Grid Computing
- Virtualization
- Individual resources, such as computers, disks,
information sources, and applications) are pooled
together and made available by abstractions. - Overcomes hard-coded connections between
providers and consumers of resources. - Provisioning
- When a request for a resource is made, a specific
resource is identified to fulfill the request. - The system determines how to meet the need, and
optimizes system performance.
9Characteristics of Grid Applications
- Data acquired by scientific instruments.
- Data are stored in archives on separate, perhaps
geographically-separated sites. - Data are managed by teams belonging to different
organizations. - Large quantities of data (tera- or petabytes) are
collected. - Software used to analyze and summarize the raw
data.
10The Importance of Standardization
- Without standardization, grid computing
practitioners would need to acquire accounts at
many different computer centers, managed by
different organizations. - Different security and authentication protocols
and accounting practices would have to be
applied. - Very heterogeneous software environment.
11Objectives
- Grid computing
- Software and middleware for the grid
- Grid applications
12Importance of Middleware
- Middleware eases grid users experience and
provides them with levels of abstraction. - Middleware extends the Webs information and
database management capabilities. - Allowing remote deployment of computational
resources.
13Globus Toolkit
- Most widely-used middleware for grids.
- Open source toolkit for building computing grids.
- Provides a standard platform upon which other
services build. - Provides directory services, security, and
resource management.
www.globus.org
14Objectives
- Grid computing
- Software and middleware for the grid
- Grid applications
15CPU Scavenging
- Unused PC resources worldwide are harnessed.
Also known as shared computing. - CPU-scavenging systems gain and lose machines at
unpredictable times as users interact with their
computers, or as network connections fail. - CPU-scavengers can migrate jobs to allow smooth
operation.
16SETI_at_home
- Search for Extraterrestrial Intelligence
- Goal to analyze vast amounts of data from the
Arecibo radio telescope. - Initiated by the Space Sciences Laboratory at the
University of California, Berkeley
www.ras.ucalgary.ca/svlbiimages/arecibo.jpg,
www.artscouncil.org.uk/spaceart
17SETI_at_home (2)
- Uses a free screen saver, available to the
public. - When activated, the screensaver program downloads
time sequences of radio telescope data and
searches them for radio sources. - SETI_at_home has more than 5 million participants.
- Inspiration for other scientific applications in
need of large computing resources.
18SETI_at_home (3)
- Main purpose A program downloads and analyzes
radio telescope data. - Data is recorded at the Arecibo Observatory in
Puerto Rico. - The data is sent to Berkeley, where it is
processed into units of 107 seconds of data. - These work units are sent from the SETI_at_home
server over the Internet to participating
computers around the world for analysis.
19SETI_at_home (4)
- The analysis software can search for signals with
about one-tenth the strength of those sought in
previous surveys, because it makes use of a very
computationally intensive algorithm. - Data is merged into a database using SETI_at_home
computers in Berkeley. Various pattern-detection
algorithms are applied to search for the most
interesting signals.
20SETI_at_home User Client
21BOINC
- Berkeley Open Infrastructure for Network
Computing. - Funded by the National Science Foundation.
- Used in the SETI project.
- Client-server architecture
- Client Used by the computer supplying resources
for one or more BOINC projects. Performs the
computations. - Server System software, such as database
services and projects web site.
22Remote Procedure Calls
- Mechanism by which the server communicates with
the client in BOINC. - Similar to a regular function call or method
invocation, but one computer executes the
function on another computer.
23Remote Procedure Calls - Examples
- Return screensaver mode
- get_screensaver_mode(int status)
- Get a list of results for jobs in progress
- get_results(RESULTS)
- Get a list of file transfers in progress
- get_file_transfers(FILE_TRANSFERS)
- Get the clients current state
- get_state(CC_STATE)
24Human Proteome Folding Project (HPFP)
- Goal to predict the structure of human
proteins. - Devised at the Institute for Systems Biology,
University of Washington. - Produces the likely structures for each of the
proteins using a set of predefined rules. - Improved knowledge of human proteins is important
in developing new therapies. - Officially completed on July 18, 2006.
- Second stage now underway.
25Human Proteome Folding Project
WCG desktop console - users monitor progress on
protein-folding project.
Typical desktop screensaver setup for HPFP
http//msnbcmedia.msn.com/j/msnbc/Components/Photo
s/041116/folding2.hmedium.jpg
http//ndg.gunzclan.org/Charlotte/graphics/2/image
s/IMG0153_JPG.jpg
26Business Applications
- Business application grid (BAG).
- Major focus is using existing grid computing
technologies to unite all of an organizations
desktops, workstations, servers, printers,
peripherals, etc., to perform useful work during
idle time. - Usually focused on well-defined problems
- Calculating performance averages for a mutual
fund. - Reducing processing time in wealth management
systems. - Database applications.
27Business Applications (2)
- A large financial services company uses
specialized grid software for new corporate
banking applications. - Oracle Corporation offers a grid database system.
28Business Grid Middleware
- Provides an IT-level infrastructure to support
business applications. - Middleware provides services for composing,
submitting, and managing business applications. - Business functions (e.g. credit card
authorization and shipping-and-handling services)
are not provided. - Globus Toolkit 4 makes it easier to build an
application that taps into existing distributed
computing resources (e.g. servers, storages,
databases).
29Conclusions
- Grid computing is an enabling technology that
is rapidly gaining popularity in - Science.
- Medicine.
- Engineering.
- Business and financial applications.
- Many software vendors offer grid computing
toolkits and middleware. - In 2004, 20 of companies were seeking grid
computing solutions (Evans Data Corp.).
30Benefits of Grid Computing
- Collaboration.
- Increased productivity.
- Efficient use of resources and storage.
- Cost-effectiveness.
- Heterogeneous environments.
- Failure tolerance.
- Transparency.
31Challenges
- Lack of control over resources, administration.
- Security.
- Middleware.
- Network failures.
- Cultural issues.
32Thank you.
33Open grid services architecture
- OGSA standard for grid-based applications.
- Framework for meeting grid requirements.
Application specific grid services
application specific
interfaces
e.g. astronomy, biomedical informatics,
high-energy physics
OGSA
services directory, management, security
standard
services naming, service data (metadata)
grid service interfaces
OGSI
GridService
e.g.
service creation and deletion, fault model,
service groups
Factory
web services
Open-grid services infrastructure
34Globus toolkit
Other non-GT3 services can run on top of the GT3
architecture.
Replica management keeps track of subsets of
large data sets that are being worked on.
Job management checking status of jobs,
pausing, stopping if necessary. Index services
helping to locate grid resources to meet specific
needs. Reliable file transfer service (RFT)
performs large file transfers from a client to a
grid service.
Restricts access to grid services so that only
authorized clients can use them. Provides
another layer of security on top of firewalls.
Low-level functions
http//gdp.globus.org/gt3-tutorial/multiplehtml/ch
01s04.html
35Other grid tools
- Resource management
- Grid Resource Allocation and Management Protocol
(GRAM) - Information Services
- Monitoring and Discovery Service (MDS)
- Security Services
- Grid Security Infrastructure (GSI)
- Data Movement and Management
- Global Access to Secondary Storage (GASS) and
GridFTP
36World-Wide Telescope (2002)
- Goal deployment of data resources shared by
astronomers. - Data
- Archives of observations over a particular period
of time, part of the EM spectrum, and area of the
sky. - Observations collected at different sites around
the world. - Data on same celestial objects are combined over
different periods of time and different parts of
the EM spectrum.
37World-Wide Telescope (2)
- Data archives (? terabyte) managed locally by the
teams that collect the data. - As data is acquired, it is analyzed and stored as
transformed data so that it can be used by remote
astronomy sites. - Librarian role of scientists.
- Metatdata is required to describe
- Time the data was collected.
- Part of the sky observed.
- Instruments used.
38WCG ongoing projects
- FightAIDS_at_Home
- Launched by WCG in 2005.
- Each computer processes one potential drug
molecule and tests how well it would dock with
HIV protease, inhibiting viral reproduction. - Human Proteome Folding Phase 2
- Released in 2006.
- Extension of HPF1, focusing on human-secreted
proteins. - Better protein models, but more computationally
intensive.
39World Community Grid (WCG)
- Goal to create the world's largest public
computing grid for humanitarian concerns. - Administered and funded by IBM.
- Platforms Windows, Linux, and Mac OS X.
- Uses the idle time of Internet-connected desktop
computers. - The agent works as a screen saver (like
SETI_at_home), only using a computer's resources
when it would otherwise be idle, and returning
resources to the users when requested. - Projects are approved by an advisory board
representatives of major research institutions,
universities, UN, WHO.
40WCG Smallpox research
- Completed project.
- WCG largely began due to the success of this
project in shaving years off research time. - Analysis of therapeutic candidates to fight the
small virus. - About 35 million potential drug molecules were
screened against several smallpox proteins,
resulting in 44 new potential treatments.
41WCG Ongoing projects (2)
- Help Defeat Cancer (2006)
- Processes large numbers of tissue samples using
tissue microarrays. - Genome Comparison (2006)
- Compares gene sequences of different organisms to
find similarities. - Goal determining the purpose of specific gene
sequences in particular functions by comparing it
with similar sequences with known functions in
another organism.
42Other grid projects
Description of the project
Reference
1. Aircraft engine maintenance using fault
histories and
www.cs.york.ac.uk/dame
sensors for predictive diagnostics
2. Telepresence for predicting the effects of
www.neesgrid.org
earthquakes on buildings, using simulations and
test sites
3. Bio-medical informatics network providing
nbcr.sdsc.edu
researchers with access to experiments and
visualizations of results
4. Analysis of data from the CMS high energy
particle
www.uscms.org
detector at CERN by physicists world-wide over 15
years
5. Testing the effects of candidate drug
molecules for
Taufer et al. 2003
their effect on the activity of a protein, by
performing parallel
Chien 2004
computations using idle desktop computers
6. Use of the Sun Grid Engine to enhance aerial
www.globexplorer.com
photographs by using spare capacity on a cluster
of web servers
7. The butterfly Grid supports multiplayer games
for
www.butterfly.net
very large numbers of players on the internet
over the Globus toolkit
8. The Access Grid supports the needs of small
group
www.accessgrid.org
collaboration, for example by providing shared
workspaces
43Requirements of grid systems
- Remote access to resources, specifically, to
archived data. - Data processing at the site where the data is
managed. - Remote requests (queries) result in a
visualization or results from a small quantity of
data. - Resource manager of a data archive create
instances of services when they are needed. - Similar to distributed object model, where
servant objects are created when needed.
44Requirements of grid systems (2)
- Metadata to describe characteristics of archived
data. - Directory services based on the metadata.
- Software for
- Query management.
- Data transfer.
- Resource reservation.