Title: Dr Simon See
1Asia Pacific Science and Technology Center March
2003
Dr Simon See Director, HPTC Asia Pacific Asia
Pacific Science and Technology Center Associate
Professor Nanyang Technological Univ.
http//apstc.sun.com.sg
2Challenges
Driving low cost high performance computing in
Academic and Research computing
- Cost of computing
- Budgets/Grants
- Resource sharing and pooling
- Utilization of idle resources
- Not enough throughput more and faster compute
resources required - More thorough model verification
- Growing datasets larger, more complex designs
- 'high resolution' data analysis
- Apps being developed for small systems and memory
to keep costs down
3The Dollar Challenge
Getting the most research out of the dollar
Problem
Getting the most research for the dollar. Need
for a flexible, scalable, inexpensive systems.
Solution
Virtualization of resources on a Grid, with easy
access to data, computing, visualization, and
collaboration services
Benefits
Allows for low startup costs to a highly flexible
and scalable architecture.
4The Grid Concept
- A Grid is a hardware/software/instruments/services
infrastructure that provides dependable,
consistent, inexpensive access to computational
resources.
Analogies electrical power, water, telephony
5Cluster/Grid Computing Covered
- Compute intensive HPTC applications
- Requires a set resource, runs for a period of
time and exits - Typically non-interactive applications
- Throughput applications
- Iterative processes such as circuit verification
or fluid dynamics
6Ian Foster's Definitions
- In 1998, the computational grid is a
hardware/software infrastructure - Dependable, consistent, pervasive, inexpensive
- Provides access to high-end computational
capabilities - In 2000, the computational grid allows for
coordinated resource sharing and problem solving
in dynamic, multi-institutional virtual
organizations - Three key criteria
- Coordinates resources not subject to centralized
control - Uses standard, open, general-purpose protocols
and interfaces - Delivers non-trivial qualities of service
Foster, Tueke 2000
7Gartner Research Definition
- Gartner defines a grid as a collection of
resources owned by multiple organizations
coordinated in such a way as to allow them to
solve a single common problem - Three forms of grids
- Computing grid harnesses multiple computers from
several owners to run one very large application - Data grid uses multiple storage systems from
several owners, dividing data across the combined
resources to host one very large data set - Collaboration grid ties together multiple
collaborative systems from several owners to
allow collaboration on a common issue
8IDC's Definition
- Set of independent computers combined into
unified system through systems software and
networking technologies - Grids share resources among independent owners
- Resources can be added or removed at any time
configuration is not fixed and can change - Connected using industry-standard technology
(networking, I/O, web interfaces)
9IBM's Definition
- Technical definition is ability, using a set of
open standards and protocols, to gain access to
applications and data, processing power, storage
capacity, and a vast array of other computing
resources over the Internet - Grid computing is a network of computation tools
and protocols for coordinated resource sharing
and problem solving among pooled assets - Application processing, distributed across
multiple locations, and interconnected though a
shared network such as the Internet
10Sun Microsystems Definition
Problem-solving through resource pooling in
virtual systems
Resources into a dynamic, single compute resource
from federated assets CPU cycles,
storage Dependable, consistent, pervasive,
inexpensive
Virtualization of Transparent scalability
of Access that is...
11Many Definitions of Grid Computing
- According to Ian Foster, the father of grid
computing, the term grid has been hijacked to
embrace everything from advanced networking to
artificial intelligence - Marketers are applying grid labels to all sorts
of products and services, adding to the confusion
and
12Definitions
Historical Perspective
Grid Computing Environments
Hardware
Software
Process
Instruments
Services
Compute Farm
Grid
Cluster
A Grid is a hardware/software/instruments/service
infrastructure that provides dependable,
consistent, pervasive, inexpensive access to
computational and data capabilities. Foster,
Kesselman, 1999
13Definitions
Historical Perspective
Grid Computing Environments
- Distributed Resource Management
- SGE
- LSF
- NQS
- DJM
- Parallel Computing Environments
- PVM
- MPL
- MPI
- FORTRAN 90
- Linda
14Grid Computing Genealogy
Historical Perspective
Load Balancer
Platform Globus
Utopia
LSF
- Early Grid Technologies
- Distributed Job Manager DJM
- Network Queuing System NQS
- University Research projects
- Mature Commercial Products
- Sun Grid Engine (Sun, formerly Codine/GRD)
- Load Sharing Facility (Platform Computing)
- Load Leveler
- Industry Collaboration / Leadership
- Globus
- Global Grid Forum
- Platform Globus
DJM
NQS/Exec
CONNECT Queue
NQS
NQE
PBS
Task Broker
NC Toolset
Condor
UniJES
Load Leveler
Codine
SGE
GRD
SGE/EE
DNQS
DQS
Avaki
SQS
Globus
GGF
15Class of Grid Computing
Cluster Grid Departmental Computing Simplest
Grid deployment Maximum utilization of
departmental resources Resources allocated
based on priorities
Campus Grid Enterprise Computing Resources
shared within the enterprise Policies ensure
computing on demand Gives multiple groups
seamless access to enterprise resources
Global Grid Internet Computing Resources
shared over the Internet Global view of
distributed datasets Growth path for enterprise
Campus Grids
16Cluster Grid
Departmental Computing
Usage Simplest Grid deployment Single team
Project Department Single site
firewall
Benefit Optimal alignment of resources, tasks,
and budgets
Industry Examples AutomotiveMore simulations
for safer cars EntertainmentFaster image-frame
rendering Life SciencesPattern matching
against huge datasets EDAIncreased design
iterations create more powerful devices
17Campus Grid
Enterprise Computing
Usage Multiple teams in organization share one
or more Cluster Grids Single site to
enterprise-wide
Benefit Maximum ROI and utility
Industry Examples ManufacturingCollaborative
engineering projects Oil and GasMining-distribu
ted databases FinanceMore Monte Carlo
simulations for uncovering new business
18Global Grid
Internet Computing
Usage Linked Cluster and CampusGrid Models
across manyorganizations Typically used for
research
Benefit Creates large virtual system
Facilitates collaborationbetween organizations
Industry Examples MedicineProvides expert
teams access to medical instruments and
distributed computing resources
AcademiaFacilitates collaboration between
geographically dispersed groups
ResearchEnables compute-intensive projects
beyond the firewall
N1
19- Enterprise Grids -
- commercial implementation of production grids
within major corporations having global presence
and requiring a great need for resource access. - virtual collaboration and resource sharing takes
place behind a corporate firewall, this model is
simple and possible to implement in today's
environment.
20- Collaboration Grids -
- It covers project collaboration between
organizations of similar industries or interest
areas for reaching common objectives. - eg. Teragrid, White Rose, Seti_at_home
21- Service Grids -
- The third phase will be driven by ordinary users
demanding grid services as a utility. - Involve sale or lease of computer resources
including bandwidth, applications, and storage
over the Internet on a per user or as-needed
basis. - Emerge only after grid gets recognition as a
reliable and secure resource based on widely
accepted standards and protocols.
22Application Requirements
Requirements and Policies
- Basic Information
- Primary stakeholders
- Operating Systems and support mechanisms
- Deployment model
- I/O and data requirements
- Execution Profile
- Dynamic memory allocation, footprint
- Performance characteristics, reference benchmarks
- Network I/O
- Job duration characteristics
- Utilization trends
- Application Characteristics
- Dependencies
- Priorities vs. other applications
- Data Management post execution
- Checkpointing capable
23Architectural Requirements
Requirements and Policies
- Availability
- Downtime impact of Grid environment
- Impact of interruption of individual jobs
- Acceptable maintenance windows
- Scalability
- Anticipated growth over 135 years
- Desired scaling strategy and response to peak
loads - Strategy for technology refresh, evolution
- Manageability
- Skill set/workload of administration personnel
- Expected stability of applications
- Code management, software distribution mechanisms
- Security
- User authentication mechanisms
- Internet access to grid
- Data Security requirements
24Additional Requirements
Requirements and Policies
- Data Distribution
- Location, volume, refresh, and security of data
- Usability
- Skill set and client environment can impact
usability of environment - Psychological factors important
- Operations Management
- Limited staff to manage hundreds or thousands of
CPUs - Resources added in large blocks
- Change control is extremely critical operation
25Tool
26Grid Tools
- Architectural Tools
- Monitoring, accounting
- Developer tools
- Code development env, debugger
- Tool kit
- Deployment tools
- User tools
- Portal
- Grid GUI
27GriDE Overview (2)
- Q What is GriDE ?
- A GriDE is an integrated development environment
that makes it straightforward for scientists and
engineers to construct grid applications. It
provides friendly tools to access grid resources
and makes the development approach easy and fast.
28GriDE Overview (3)
- Proposed Components
- Workflow Editor
- Cross Compiler
- Grid Debugger
- Performance Tuning
- Data Grid Access
- Project Collaboration
29GriDE Overview (4)
30GriDE Overview (5)
- Implementation
- Extends NetBeans IDE
- JDK 1.4
- Globus Toolkit 2.x
- Java Cog Kit 1.1
- JGraph, Castor XML Framework
- Runs on Solaris, Linux, and Windows platform
31Grid Job Submission (1)
- Three ways to submit jobs to Globus gatekeeper
- Quick submission
- Multiple job submission
- Manual submission
32Grid Job Submission (2)
33Grid Job Submission (3)
34Grid Resource Browsing (2)
35Grid Job Monitoring (1)
- Display job status (stage in, running, stage out,
waiting, etc) - Use Globus Access Secondary Storage (GASS)
service to fetch results - The job history is tree-view structured
36Grid Job Monitoring (2)
37(No Transcript)
38Grid Workflow (1)
- Connect different grid jobs to create more
complex computation - Manage complexity of workflow
- Tasks dependencies using output of one task as
input of other tasks - Can be directly represented as directed graph
39Grid Workflow Editor (1)
- Allows construction of workflow for single,
multiple and array jobs - Generation of RSL for submission to Globus
Toolkit 2.x
40Grid Workflow Editor (2)
41Workflow Engine -Architecture Overview
42Main goals
- To deliver one uniform GUI that supports
different computational and storage
infrastructures - To retrieve the entire users working environment
from any grid access point support for a mobile
user - To constitute a bridge between the end user and
the grid - To hide the complexity of retrieving all
necessary applications and data - To be platform-independent
- It is easy to add a new functionality (based on
web services)
43How does it look like?
44Functionality overviewSingle sign-on technology
X 509 certificates that are signed by CA for each
virtual organisation.
45Functionality overviewUser settings/profile
management
46Functionality overviewJob submission/monitoring/v
isualisation
47Functionality overviewData Management
Supported protocols FTP, GridFTP, HTTP
48Grid Organizations
49Global Grid Forum GGF
- http//www.gridforum.org/
- Grid community and standards organization
- Sponsors include Sun, IBM, HP, SGI, and Cisco
- 5000 members are researchers, vendors, and
practitioners of grid computing - Mission is to oversee the development of industry
standards for grid computing - 40 working groups produce technical
specifications and implementation guidelines - OGSA-WG
- OGSI-WG
50Globus
- http//www.globus.org/
- The Globus Project is a multi-institutional grid
research and development organization began in
1996. - Collaborates with real grid projects in science
and industry. - Develops the Globus Toolkit, which is an open
source software base used for building grid
infrastructures and applications. - Develops and promotes standard grid protocols
that enable interoperability and shared
infrastructure. - Develops and promotes standard grid software APIs
that enable portability and code sharing. - Co-founded the Global Grid Forum (GGF), which
fosters grid standardization and community
51Open Grid Services Architecture - OGSA
- http//www.globus.org/ogsa/
- Open Grid Services Architecture
- Proposed architectural standard for grid
services, defines what they are, what they are
capable of, and what technologies they are based
on (services must be OGSI-compliant) - Modernizes and extends Globus Toolkit protocols
to recast grid concepts within a service-oriented
framework based on web services technologies - Defines programmatic interfaces, management
interfaces, naming conventions, and directories
for convergence of grid computing and web services
52Open Grid Services Infrastructure OGSI
- Open Grid Service Infrastructure
- Companion implementation specification to OGSA
- Defines how entities can create, discover, and
interact with grid service - Defines fundamental interfaces (using WSDL) and
behaviors for distributed systems management - Relies on grid technology and web service
technologies - (SOAP, WSDL, UDDI)
- Global Grid Forum OGSI working group, OGSIWG
53DRMAA Distributed Resource Management
Application API
- http//www.gridforum.org/3_SRM/drmaa.htm
- Distributed Resource Management Application API
- Provides write-once capabilities to any DRM
system that supports DRMAA - Co-chaired by Sun and Intel and developed in
collaboration with Cadence Design Systems, HP,
IBM, Platform Computing, Robarts Research
Institute, and Veridian Systems - Provides for submission, control, and monitoring
of jobs to one or more DRM systems - Sun plans on creating a reference implementation
of the DRMAA specification
54Enterprise Grid Alliance
- Founding members EMC, Fujitsu-Siemens, HP, NEC,
Network Appliance, Oracle, Sun Microsystems - Sponsor Members Ascential Software, Optena,
Paremus - Contributors Cassatt, Brocade, Novell
- Associate Cisco, Citrix, Enigmatec, Force 10
Networks, TopSpin, Data Synapse
55Grid Exchange Concept
Resource Users
Resource Owners
Enablers to Trade
Enablers to Trade
- Pricing Model
- Catalog
- Billing Support Trading Agents Accounting
Agents
- Utility Model The Contract
- Broker to fulfill the contract
56- Flexibility
- Resources can be obtained by users when they need
them - Efficiency
- Resource price reflects resource value
- Scalability
- New entities can be added easily
- Feedback
- Prices of resources, value of resources
57Grid as Commodity
- As a Resources
- Compute Capacity
- Storage
- Bandwdith
- Applications
- As a Services
- Elapsed time
- Reliability
- Avalibility
- security
58Grid Exchange Architecture
GRID DIRECTORY SERVER
Market Info
Application
BUY SIDE
WS-Transactions Protocol
Discovery
Job Control
SS Trader
Scheduler
Local Resource Manager
Sell Side Brokers
Resource Reserve
BS Trader
SGE/N-1
Deployment Agent
Allocation
LOCAL INFRASTRUCTURE
SELL SIDE
BROKER
59Grid Exchange Architecture
Pricing
Trading
Contracts
Settlement
Administration Console Sun ONE Portal Server
Toolkit/SDK
Grid Exchange Broker
BROKER
Buy Side
Sell Side
Sun ONE (OGSA Enabled) Middleware
Resource Managers (PBS, LSF, SGE, NDQS)
Resources (Servers, Storage, Networks etc.)
60Grid Exchange Architecture
Sun ONE Portal
GxE Broker Services
Sun ONE Grid HLS
OGSA HLS
JXTA P2P
Sun ONE Grid Core Services
OGSA Core Services
GxE Core Services
N-1
Billing Account
N-1 Core Services
Local Resources
61Observation
- The research phase is almost over.
62Observation
- The research phase is almost over.
- Grid is into its development phase
63Observation
- The research phase is almost over.
- Grid is into its development phase
- Has we developed systems for computer scientists
or non-computer scientists?
64Observation
- The research phase is almost over.
- Grid is into its development phase
- Has we developed systems for computer scientists
or non-computer scientists? - How do we make it pervasive?
- Easy to use?
- Driven by economic needs?
65Thank You
This box provides space for call to action text,
URLs, or any relevant info