Title: Grid
1Grid
2Introduction to Grid Computing
- What is a Grid an integrated advanced cyber
infrastructure that delivers - Computing capacity
- Data capacity
- Communication capacity
- Why? There are many applications that are
characterized as follows - Large varied distributed collaborations need to
work together - Need lots of cycles, storage (we are talking
about teraflops, terabytes) - Need to share results, codes, parameter files,
3Grid Motivation
- Grid Computing was originally about extending
scientific computing on single machines to
distributed systems - Despite the improvement in raw computing power,
storage capacity, communication it is difficult
to keep up with the increased demand from the
types of applications being developed.
4Grid Motivation
- Scientific Applications
- Analysis of large data volumes from different
sources. - Lots of computation needed to model an aspect of
the natural world - Often requires substantially different types of
computational resources - Projected data is measured in petabytes Lots of
storage
5Grid Motivation
- Astronomy
- Digital sky surveys
- Medical data
- X-Ray, mammography data, etc. (many petabytes)
- Digitizing patient records (ditto)
- Molecular genomics and related disciplines
- Human Genome, other genome databases
- Proteomics (protein structure, activities, )
- Protein interactions, drug delivery
- Virtual Population Laboratory (proposed)
- Simulate likely spread of disease outbreaks
- Brain scans (3-D, time dependent)
- Climate studies
6Grid Motivation
- In the business world, companies want to
integrate, manage and analyze large volumes of
data - Example An insurance company mines data from
partner hospitals for fraud detection
7Grid Motivation
- Could buy additional machines
- There is a lot of computing power that is
unutilized or underutilized most of the time - How can applications take advantage of the
multiple resources available in an effective
manner? - A grid is intended for allowing the sharing,
selection and aggregation of a wide variety of
geographically dispersed resources owned by
different organizations (virtual organizations)
8Emergence of the Virtual Organization
- Resource sharing coordinated problem solving
in dynamic virtual organizations
The Anatomy of the Grid, Foster, Kesselman,
Tuecke, 2001
9Other Distributed Infrastructures
- Road, rail, telephones, power, banking, water,
electrial - All started locally, then regionally, then
nationally, and then internationally - Provide reliable relatively low cost access to a
standardized service - Available to the masses
10Electrical Power Grid
- Single entity providing power
- Relatively efficient, low cost, reliable
- US Grid links 10K generators
- Complex physical connections and trading
mechanisms - Components heterogeneous and operated/owned by
different companies - Consumers differ in amount of power they use, the
quality of service they require, and the price
they will pay - Economics important grid driven by economic
factors. Reserve capacities, trading power. - Politics important success depended on
regulatory, political and institutional
developments as much as technical innovation - Control important infrastructure for monitoring,
management and control
11Emergence of the Virtual Organization
- Commonalities
- Need to discover and share resources
- Do not necessarily trust all other participants
- Not just about document exchange Also about
remote software, computers, data, sensors, etc - Resource sharing is conditional and the
conditions are dynamic - Can only use resources for a limited class of
problems or at certain times of the day.
12What is a Grid Checklist (Foster)
- Coordinates distributed resources using
non-centralized control mechanisms. - A grid integrates and coordinates resources and
users that live within different administrative
domains - E.g.., different administrative units of the same
company or different company - Addresses the issue of security, policy, payment,
membership
13What is a Grid Checklist (Foster)
- Uses standard, open, general-purpose protocols
and interfaces - A grid is built from multi-purpose protocols and
interfaces that address such fundamental issues
as authentication, authorization, resource
discovery, and resource access. - Deliver nontrivial qualities of service
- Resources should be used in a coordinated fashion
to deliver various quality of service - Quality of service is usually defined in metrics
such as response time, throughput, availability,
etc
14Grid vs. Internet?
- Weve had computers connected by networks for 20
years - The Grid brings additional notions
- Virtual Organizations
- Infrastructure to enable computation to be
carried out across these - Authentication, monitoring, information, resource
discovery, status, coordination, etc - Can I just plug my application into the Grid?
- No! Much work to do to get there!
15Are these Grids?
- Cluster Management Systems
- Examples Suns Sun Grid Engine,Platforms
Loadsharing facility - These can be installed on a parallel computer or
in a local area network - Can deliver a quality of service
- Each may be an important component of a Grid, but
by itself does not constitute a Grid
16Are these Grids?
- Multi-site scheduler
- Example Platforms multicluster scheduler
- Yes Not terribly sophisticated but it is a grid
- Gnuetella
- Maybe Is it too specialized.
- Is it open or is it a standard?
- WWW
- Fosters checklist more clearly applies to
large-scale Grid deployments - Data Grid GriPhyN, PPDG, EU DataGrid, iVDGL,
DataTAG, NASAs Information Power Grid - TeraGrid Used to link major US academic sites
17Advantages of Grid Computing
- Uses resources scattered across the world
- Access to more computing power
- Better access to data
- Utilize unused cycles
- Facilitates Virtual Organizations (VO)
- Groups of organizations that use the Grid to
share resources
18Online Access to Scientific Instruments
www.globus.org
Advanced Photon Source
wide-area dissemination
desktop VR clients with shared controls
real-time collection
archival storage
tomographic reconstruction
DOE X-ray grand challenge ANL, USC/ISI, NIST,
U.Chicago
19Data Grids forHigh Energy Physics
www.globus.org
Image courtesy Harvey Newman, Caltech
20NEES (Network for Earthquake Engineering
Simulation) Collaboration
U.Nevada Reno
www.neesgrid.org
21Home ComputersEvaluate AIDS Drugs
www.globus.org
- Community
- 1000s of home computer users
- Philanthropic computing vendor (Entropia)
- Research group (Scripps)
- Common goal advance AIDS research
22Sharcnet
SHARCNET is a high performance scientific
computing project involving the University of
Western Ontario, University of Guelph, McMaster
University, the University of Windsor and Wilfred
Laurier University.
SHARCnet
South Western Ontario
SHARCNET provides UWO researchers with
world-class computing capabilities. As of
November 2001, the computer cluster at the
University of Western Ontario was the fastest
computer at a Canadian University and the 12th
fastest in any University in North America.
Cluster of Clusters or Super Cluster
University of Guelph
Wilfred Laurier University
Guelph
Waterloo
Ultra high speed fiber optic networking
London
Hamilton
University of Windsor
McMaster University
Windsor
The University of Western Ontario
http//www.sharcnet.ca
23Example Grids
NSF PACI Grid
NorduGrid
24 25Ideal Grid-based Scientific Computation
- User submits request through GUI
- Application
- Operating System and other requirements
- Input data
- Grid finds and allocates resources to satisfy
request - Grid monitors request processing
- Moves job when resources fail or are too busy
- Grid notifies user when results are available
26Example
- Assume a source file Main.F on machine A, an
input file on machine B. Main.F is written using
MPI, it will need around 4GB of core memory to
run, it will take several hours to complete, and
will produce a large output file. - What functionality is needed?
27Issues
- How to select a machine to run it on?
- How to provide an executable which can run on
that machine? - How to move the input file?
- How to start the executable?
- How to monitor the job? When does it start? When
does it finish? - How to move the output file back?
- What about security?
- How do we know if it didnt work and how it
failed?
28How to Select a Machine
- What properties of a machine are we interested
in? - What resources does my executable require?
- 4 GB memory, several hours of compute time
- Enough diskspace for the output
- What kind of environment do I need on the
machine? - OS limitations?
- MPI? (Which version?), Fortran?
- What resources am I authorized to run on?
- How quickly will it run?
- How much will it cost/what is my allocation
there? - How to find all this information? What should the
user provide?
29More Complicated
- What if the program might need to read in data
kept on machine C while it is running? - What about distributing across processors on
different machines? - What if I have a lot of interconnected programs?
- How do I find the output file afterwards?
- What if it doesnt work?
30Common Features Needed by Grid Systems
- Resource registry is an information source that
allows entities to publish and update information
about the resource they wish to share
Figure from Sean Normans reading course
presentation
31Common Features Needed by Grid Systems
- Client is typically an agent acting on behalf of
the user - Acquires resources requested by the user by
consulting resource registries - Submits an allocation request to the resource
manager(s) responsible for the desired resources
32Common Features Needed by Grid Systems
- If request can be accommodated, resource
manager(s) update status information for acquired
resources in resource registries - Client then sends the appropriate executables and
input data to the allocated resources and
receives a reference to the execution in return
33Common Features Needed by Grid Systems
- Reference allows the client to monitor the
execution of a job and inquire about its status - Client may also receive the results of the job
once its execution is complete
34Some Solutions
- Middleware Toolkits not all speak (or spoke)
Globus - Condor
- Globus Toolkit
- Legion/Avaki
- Condor (now Sun Grid Engine)
- Higher Level Toolkits (build on Globus)
- JavaCoG
- GridPortal Toolkit, Grid Portal Development
Toolkit (GPDK) - Condor-G
- SGE