Title: Principles of High Performance Computing ICS 632
1Principles of High Performance Computing(ICS 632)
- Introduction to Grid Computing
2Grid Computing
- Term coined by Ian Foster in the mi-90s
- Vision for large-scale computing
- Analogy with the Power Grid
- Availability
- Standard Interface
- Distributed
- Market
3(No Transcript)
4On 10/07/2008
Big companies
New companies
5Science Today
- Massive computer simulation
- Massive computerized data analysis
6Multiple CPUs
- The single-CPU system that solves todays problem
is not going to happen for a long time! - and by then, well have bigger problems anyway
- Solution Use multiple CPUs
SGI Altix (up to 512 CPUs)
Dual-Xeon Motherboard
7Multiple Computers
- Adding CPUs to a single computer becomes very
expensive - How about multiple computers together?
- Linux Clusters (60 of Top-500 list)
Blue/Gene 30K computers
8Beyond the machine room?
- Need more capacity than available at (most)
single sites - Everyone would like a 10K-node 100GHz cluster
- Very expensive (cooling, power)
- More economical to have multiple sites
- Need to locate available resources now
- Data/Instruments are inherently distributed
9Grid Computing A Brief History
- Early 90s
- Gigabit testbeds, metacomputing
- Mid to late 90s
- Early experiments (e.g., I-WAY), academic
software projects (e.g., Globus, Legion),
application experiments - 2005
- hundreds of application communities projects
- Major infrastructure deployments
- Significant technology base (esp. Globus
ToolkitTM) - Major industrial interest and involvement
- Global Grid Forum 400 organizations, 50
countries
10Grid Computing A Definition
- Definition Resource sharing coordinated
problem solving in dynamic, multi-institutional
virtual organizations
11The TeraGrid
10,000 processors 1 PetaByte of storage
12Grids that I use
13Desktop Grids (SETI_at_home)
- Detect any alien signals received through
Arecibo radio telescope - Uses the idle cycles of computers to analyze
the data generated from the telescope - Over 500,000 active participants, most of whom
run screensaver on home PC - Over a cumulative 20 TeraFlop/sec
- TeraGrid 40 TeraFlop/src
- Cost 700K!!
- TeraGrid gt 100M
- Companies United Devices
- Intranet solutions
14Domain-specific VOs (CMS)
1800 Physicists, 150 Institutes, 32 Countries
100 PB of data by 2010 50,000 CPUs
15Domain-specific VOs (CMS)
16Domain-Specific Grids (Grid3)
- Grid2003 An Operational Grid (Oct 2003)
- 28 sites (3K CPUs)
- 7 VOs (each for a physics application)
17Domain-Specific Grids (Grid3)
18The Big Question
- At some level, all these applications share
common needs - find resources
- acquire resources
- locate and move data
- start/monitor computation
- all securely and conveniently
- Can a single software infrastructure support all
of the above?
19The Globus Alliance
- http//www.globus.org
- Development of Grid protocols services
- Protocol-mediated access to remote resources
- On the Grid speak Intergrid protocols
- Mostly (extensions to) existing protocols
- Development of Grid APIs SDKs
- Interfaces to Grid protocols services
- Facilitate application development by supplying
- higher-level abstractions
20Globus Toolkit (GTK)
- A software toolkit addressing key technical
problems in the development of Grid-enabled
tools, services, and applications - Offers a modular set of orthogonal services
- Implements standard Grid protocols and APIs
- Available under liberal open source license
- Large community of developers users
- Commercial support
21GTK Services
- GTK services span four main areas
- Security
- Resource Management
- Data Management
- Information Services
- Version 2.4 released in 2003
- Garnered a large scientific user community and
became the de-facto standard
22Globus Security
- Usual concepts
- Authentication establishing identity
- Authorization establishing rights
- Key Features
- easy to use
- single sign-on
- delegation
- mutual user-resource authentication
- integration with local systems (Kerberos, AFS,
...) - Can be called directly by developers
- Is integrated as part of most Globus SDKs
- Typically (mostly) invisible to the user
23Create Processes at A and B that Communicate
Access Files at C
Globus resource manager
Globus resource manager
Site A (Kerberos)
Site B (Unix)
Computer
Computer
Globus FTP server
Site C (Kerberos)
Storage system
24Globus Security
- Globus Security Infrastructure (GSI)
- Extensions to standard protocols APIs
- Standards SSL/TLS, X.509 CA, GSS-API
- Extensions for single sign-on and delegation
- Uses well-known PKI technology
- A private key is used to encrypt data.
- A public key can decrypt data encrypted with the
private key. - All in a X.509 certificate
- Someones subject name (user ID)
- Their public key
- A signature from a Certificate Authority (CA)
that - Certificates for users and resources
25Globus Resource Mngmt
- Goal allow users and applications to find and
utilize grid resources - Requirements
- Allows for programs to be started on grid
resources - must provide an interface to local mechanisms
(fork, PBS, SGE, Condor, etc.) - Allows for resources to be described in some sort
of language - Allows for reservation, co-allocation, etc.
- Note that there are many hard policy issues
here, which are not addressed by Globus SDKs
26Globus Resource Mngmt
- GRAM Globus Resource Allocation Manager
- implemented as part of a gatekeeper daemon that
sits on top of the local resource manager - receives requests and starts local processes
- uses HTTP, integrated with GSI
- RSL Resource Specification Language
- Specifies resource requirements
- Specifies job configuration
- Uses LDAP-like syntax
- Support for reservation and co-allocation (GARA)
- A few other things
27Globus Data Mngmt
- Goal provide all functionality needed for
supporting community of users that user/produce
large data collections - Often termed DataGrid
- convenient terminology
- not a separate infrastructure
- Requirements
- Efficient protocol for large data transfers
- Ways to replicate data and locate replicas
28Globus Data Mngmt
- Data transfers with GridFTP
- Extension to the well-supported FTP protocol
- Integrated with GSI
- Third-party transfers
- Partial file access
- Striping and interface to MPI I/O
- Parallel transfers
29Globus Data Management
- Metadata catalog describes data
- Replica catalog file locations (LDAP)
Metadata Catalog
Replica Catalog
Application
Replica selection
Globus provides APIs
replica
replica
replica
replica
replica
replica
30Globus Information Services
- Goal Allow decision making
- Need information to answer
- What resources are available?
- What is their state?
- Which ones should I use?
- Challenges
- Information is always old
- Distributed state is never coherent
- Scalability over tens of thousands of resources
31Globus Information Services
- Globus provides two components
- Resource description services
- Supplies information about a resource
- Resource index services
- Supplies information which was gathered from
resource description services - Provides naming and indexing for fast retrieval
- Uses LDAP
- With two protocols
- Grid resource registration protocol
- Grid resource enquiry protocol
32GT2 Why is it good?
- Good technical solutions for key problems
- Has enabled the first generation of production
grid systems and applications - Has provided reference implementations that
interface to many systems - Has garnered industrial support
- Has created a community of developers who
- build on top of Globus services
- add to Globus services
33GT2 Why is it bad?
- Protocol deficiencies, e.g.
- Heterogeneous basis HTTP, LDAP, FTP, custom
- No standard means of invocation, notification,
error propagation, authorization, termination, - Significant missing functionality
- e.g., interfaces to classes of resources
- databases, instruments, etc.
- requires the development of specialized
interfaces like was done for, e.g., batch systems - Little work on total system properties, e.g.
- Dependability, end-to-end QoS,
- Reasoning is made difficult by protocol/implementa
tion heterogeneity
34Evolution of Business
- We see something that happened to programming
languages - FORTRAN for Science
- Cobol for business
- But scientists want to manipulate records
- And businesses want to do forecasting
- Today modern language designers do not make a
distinction between scientific languages and
business languages. - Distributed computing done by scientists is
resembling distributed computing done by
businesses, and increasingly so.
35Business Grid Computing
- Walmart
- 423 TBytes
- data from 1,387 discount stores, 1,615
Supercenters, 542 Sam's Clubs, and 75
Neighborhood Markets in the United States, plus
1,520 more stores worldwide. - Real time computing action
- Amazon.com
- Processes several GByte of data / secs
- Linux clusters
- eBay
- 2 data centers, 5 planned
- Google
- Pixar, Dreamworks
36Business Grid Computing
- Grid Computing useful for businesses
- Other intriguing applications
- On-line gaming
- File-sharing applications
- Question Could many non-scientific applications
require the same software infrastructure as
scientific applications? - Should we sell GTK to industry???
37Web Services!!
- Increasingly popular standards-based framework
for accessing network applications - W3C standardization Microsoft, IBM, Sun, others
- WSDL Web Services Description Language
- Interface Definition Language for Web services
- SOAP Simple Object Access Protocol
- XML-based RPC protocol common WSDL target
- WS-Inspection
- Conventions for locating service descriptions
- UDDI Universal Desc., Discovery, Integration
- Directory for Web services
- Clearly provides a lot of the things we need to
achieves the goals of Grid computing and to
satisfy the technology requirements
38Four Fundamental Concepts
- Naming and bindings
- Ways to reference a service
- Information model
- Ways to find out information about a service
- Lifecycle
- Ways to create and destroy services
- Done by factories
- Services have time-to-live
- Notification
- Ways to be notified when something happens
- With these simple concepts, it is possible to
re-implement all the functionality of GTK2
39Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
BioDB 1
Compute Service Provider
User Application
. . .
. . .
I want to create a personal database containing
data on e.coli metabolism
Database Service
Database Factory
BioDB n
Storage Service Provider
credit Ian Foster
40Data Mining for Bioinformatics
Find me a data mining service, and somewhere to
store data
Community Registry
Mining Factory
Database Service
BioDB 1
Compute Service Provider
User Application
. . .
. . .
Database Service
Database Factory
BioDB n
Storage Service Provider
credit Ian Foster
41Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
Handles for Mining and Database factories
BioDB 1
Compute Service Provider
User Application
. . .
. . .
Database Service
Database Factory
BioDB n
Storage Service Provider
credit Ian Foster
42Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
Create a data mining service with initial
lifetime 10
BioDB 1
Compute Service Provider
User Application
. . .
. . .
Create a database with initial lifetime 1000
Database Service
Database Factory
BioDB n
Storage Service Provider
credit Ian Foster
43Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
Create a data mining service with initial
lifetime 10
BioDB 1
Miner
Compute Service Provider
User Application
. . .
. . .
Create a database with initial lifetime 1000
Database Service
Database Factory
BioDB n
Database
Storage Service Provider
credit Ian Foster
44Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
Query
BioDB 1
Miner
Compute Service Provider
User Application
. . .
. . .
Query
Database Service
Database Factory
BioDB n
Database
Storage Service Provider
credit Ian Foster
45Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
Query
BioDB 1
Miner
Keepalive
Compute Service Provider
User Application
. . .
. . .
Query
Database Service
Database Factory
Keepalive
BioDB n
Database
Storage Service Provider
credit Ian Foster
46Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
BioDB 1
Miner
Keepalive
Compute Service Provider
User Application
. . .
. . .
Results
Database Service
Database Factory
Keepalive
Results
BioDB n
Database
Storage Service Provider
credit Ian Foster
47Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
BioDB 1
Miner
Compute Service Provider
User Application
. . .
. . .
Database Service
Database Factory
Keepalive
BioDB n
Database
Storage Service Provider
credit Ian Foster
48Data Mining for Bioinformatics
Community Registry
Mining Factory
Database Service
BioDB 1
Compute Service Provider
User Application
. . .
. . .
Database Service
Database Factory
Keepalive
BioDB n
Database
Storage Service Provider
credit Ian Foster
49GT3 OGSA Globus
- GT3 Core
- Implements Grid service interfaces behaviors
- Reference implementation of evolving standard
- Java first, C soon, C?
- GT3 Base Services
- Evolution of current Globus Toolkit capabilities
- Backward compatible
- Many other Grid services on top
- Not too often that Academia really follows
Industry )
Other Grid
GT3
Services
Data
Services
GT3 Base Services
GT3 Core
50WSRF
- The WS community critiqued OGSI
- Too much stuff in one specification
- Does not work well with current WS and XML tools
- WSDL2.0 incompatible with OGSI extensions of
WSDL1.0 - Web Service Resource Framework
- Re-factoring of OGSI to exploit new development
in WS technology - Implemented in GTK 4.x
51Evolution
52Production Grids
- Lets look at one production Grid
- TeraGrid
- Material from the State of the TeraGrid
presentation by Charlie Catlett - Other material from www.teragrid.org
53TeraGrid
54(No Transcript)
55Roaming Usage?
- Roaming Usage
- Users request a grid resource
- They may end up running anywhere
- Specific Usage
- Users request a particular resource
- Roaming is only a small portion of the workload
- Means that users dont really buy into the grid
idea - Probably due to the fact that contention for
resources isnt super high right now - But still close to 100 utilization
- It takes time for users to truly trust this grid
thing
56(No Transcript)
57(No Transcript)
58(No Transcript)
59Tons of Hardware Resources
- http//www.teragrid.org/userinfo/hardware/specs.ph
p - Indiana
- 96-node IA32 cluster
- NCSA
- 2x512-proc SGI shared memory machine
- 632-node IA64
- 1.2K-node Condor pool
- 1280-node Xeon cluster
- PSC
- 2090-node Cray XT3
- 765-node Alphaserver cluster
- Purdue
- 2.6K-node Condor pool
- etc.....
- Storage Visualization Common Software Stack
60Grid Applications
- Applications that run over multiple resources are
not large MPI applications - Network latency would be prohibitive
- Heterogeneity is sort of annoying
- Co-scheduling may be difficult
- Many current Grid application fit into the
hybrid parallelism category - See Scheduling lecture
- Note that many of these Grids just support many
applications which in themselves are not really
Grid applications - e.g., users just want to use a cluster
61Conclusion
- Grid Computing is not only a buzzword
- Its real and grids are in place
- But a lot of Grid solutions are just old
products rehashed - Affix a Grid Inside sticker
- Many different notions of grids
- The current state of the software infrastructure
is an ok engineering solution, which could be
vastly improved - But its usable and used
- Not clear where industry/academia will drive it
- The new thing is Cloud Computing
- Virtual Machine Technology, leasing of resources
- In the next lecture well talk about systems
issues for grid computing - These are independent on the technology trend of
the moment