Title: Grid Services
1Grid Services
- Presented by
- Karan Bhatia
2Hype Curve
3Overview
- Grid Computing Background
- Definition
- Opportunities
- Markets
- Technical Challenges
- Security Infrastructure
- Resource Management
- Service Interoperability
- Summary
4Grid Computing is
- Co-ordinated resource sharing and problem
solving in dynamic multi-institutional virtual
organization. Foster, Kesselman, Tuecke - Co-ordinated - multiple resources working in
concert, eg. Disk CPU, or instruments
database, etc. - Resources - compute cycles, databases, files,
application services, instruments. - Problem solving - focus on solving scientific
problems - Dynamic - environments that are changing in
unpredictable ways - Virtual Organization - resources spanning
multiple organizations and administrative
domains, security domains, and technical domains
5Grid Computing is (Industry)
- about finding distributed, underutilized compute
resources (systems, desktops, storage) and
provisioning those resources to users or
applications requiring them. The Grid Report,
Clabby Analytics - Distributed - all the resources laying around in
departments or server rooms. - Underutilized - typical utilization of big iron
is 5 to 10. Organizations save money by
increasing utilization versus purchasing new
resources. - Resources - servers and server cycles,
applications, data resources - Provisioning - predict and schedule resource use
depending on load.
6Types of Grids
- Compute Grids
- Seti_at_home, Entropia, United Devices, Condor
- Data Grids
- Storage Resource Broker (SRB), Avaki, BIRN, GEON
- Collaboration Grids
- Instrumentation (telescience), applications
- Enterprise Grids
- Majority of commercial interest
- Partner Grids
- B2B, Academic/Govt Grids
- Service Grids
- Utility Computing, On Demand, pervasive,
autonomic, etc
7A Grid is
- the next generation Internet,
- all about free cycles ala SETI_at_HOME,
- a distributed object system,
- a new programming model,
- a replacement for high performance computing,
8Example TeleScience Grid
9Grid Resources - Networks
10Grid Resources - Compute
11Top 500.org
12(No Transcript)
13Another Grid Example Google
- Queries
- 150 M queries/day (2000/s)
- 100 countries
- 3.3 B documents
- Hardware
- 15,000 Linux systems in 6 data centers
- 15 Tflop/s and 1000 TB total capacity
- 40-80 1U/2U servers/cabinet
- 100 MB Ethernet switches/cabinate with gigabit
uplinks - Growth from 4000 systems (18 M queries/day)
14Grid Resources - Data
- SDSC Resources
- HPSS
- SDSC's central long-term data storage system,
- one of the world's largest IBM High Performance
Storage System (HPSS) units, - currently holds more than a petabyte (a million
gigabytes) of data in approximately 21 million
files, - It has the capacity to store six petabytes of
data files are added at an average rate of
10,000 gigabytes per month. - Storage-Area Network (SAN)
- A 72-processor Sun Microsystems SunFire 15K
high-end server and 11 Brocade switches (1,400
ports) - 225,000 gigabytes of networked disk storage for
data-oriented applications. - 1 TB of data 2500
15Protein Data Bank (PDB)
16Putting it all together TeraGrid
17Grid Market
18Grid Companies
- IBM
- on demand solutions
- Sun Microsystems
- N1 initiative
- Oracle
- 10g
- Dell
- HP
- utility computing
- Platform Computing
- LSF, metaclulstering
- United Devices
- Desktop grids
- DataSynapse
- Akamai
- Google?
- Sony online entertainment?
- Wheres Microsoft?
19Grid Organizations
- Global Grid Forum (GGF)
- Organization for the Advancement of Structured
Information Standards (OASIS) - Distributed Management Task Force (DMTF)
- World Wide Web Consortium (W3C)
- Globus Alliance
- NSF Middleware Initiative (NMI)
- NASA IPG
- DOE Science Grid
- EU DataGrid
- NSF TeraGrid
20Technical Challenges for Grid Computing
21Challenges Security
- Grids traverse organizational boundaries
- Different administration domains have different
authentication mechanisms - Resources have different use agreements and
sharing priorities - Single sign-on
- Multiple passwords difficult to manage
- Rights delegation
- Trust
- Authentication of users
- Authorization of users
- Resource access
22Security
- Public Key Infrastructure
- Public key A.public
- Private key A.private
- Supports Encrpyption
- Message to B
- m F(m,A.private), send m to B
- recv m, m F(m,A.public)
- Digital Signatures
- Signed message to B
- m (m,F(m,A.public))
- Receiver verifies that m is from A and not
tampered
23Grid Security Infrastructure (GSI)
- A central concept in GSI authentication is the
certificate. - Every user and service on the Grid is identified
via a certificate, a text file containing the
following information - a subject name identifying the person or object
that the certificate represents, - the public key belonging to the subject,
- the identity of a Certificate Authority (CA) that
has signed the certificate to certify that the
public key and the identity both belong to the
subject, - the digital signature of the named CA.
24Proxy Certificate
- A proxy consists of a new certificate with a new
public and private key. - The new certificate contains the owner's identity
modified slightly to indicate that it is a proxy.
- The new certificate is signed by the owner rather
than a CA. - This is called a self-signed certificate.
- The certificate also includes a time notation
after which the proxy should no longer be
accepted by others. - Proxies have limited lifetimes in order to
minimize the security vulnerability. - Because the proxy isn't valid for very long, it
doesn't have to kept quite as secure as the
owner's private key.
25Mutual Authentication
26Additional Challenges
- Certificate Management
- MyProxy
- Role-based Access Control
- CAS, VOM
- Authorization services
- Integration with applications Portals
27Challenges Resource Management
- Resources loosely-coupled
- Higher network latencies
- Planned and unplanned disruptions
- How to provide QoS guarantees?
- Case Study Entropia Desktop Grids
- Additional trust/security issues
28Entropia Inc.
- 1997 Scott Kurowski developed GIMPS (Great
Internet Mersenne Prime Search) - First generation network
- Jan 2000 Kurowski and Chien start Entropia Inc.
- FightAids_at_home with Art Olson, Scripps Research
- Second generation network
- July 2002 DCGrid 5.0 released
- Third generation network
29Entropia 1 Gimps
- Over 1.5 Billion CPU hours served
- 300,000 machines, over 4 years operational
- Every PC and hardware config imaginable (proc,
memory, disk, etc.) - Every networking hookup imaginable
- Found 35th, 36th, 37th, 38th, and 39th Mersenne
Primes
30Entropia 2 FightAids_at_home
- Sept 2000 launch
- Internet-Based
- 54,657 total machines
- 10,770,506 total hours of computation
- 27,881 peak billions of calculations/sec
31Entropia 3 DCGrid
- Enterprise focus
- Tremendous resources available in enterprise
- Complements other HPC resources
- Computing Platform
- Arbitrary application (open scheduling model)
- Security, unobtrusiveness, manageability
guaranteed - Focus on
- Pharmaceuticals, Chemicals, and Materials
- Financial Services
32DCGrid Architecture
33Commoditization of Hardware
Vector Processors
PC Grids
Beowulf Clusters
34Price/Performance
Performance (TFLOPS)
8.0
4.0
2.0
1.0
15,000,000
600,000
3,000,000
Cost
35Server vs. Desktop Grids
- Server environment
- Fixed IP, always connected
- Always-on operation
- Moderate number of systems (10s 100s)
- Dedicated use, trusted systems
- Desktop environment
- Dynamic, temporary IP, intermittent connection
- Off evenings, off weekends, off lunch
- Large numbers of systems (100s 1000s - ?)
- Shared resources, potentially untrusted users
- These differences give rise to desktop Grid
challenges
36Typical PC-Grid Environment
37PC-Grid Challenges
- Provide a stable compute environment for apps
- Isolate app from variable desktop environment
- Operate in environment of dynamic use
- Unobtrusiveness and Fault Tolerance are key!
- Provide simple application integration
- Support ANY Application without modification
- Provide centralized management console
- Zero additional management costs
38Workflow
39Stable Compute Environment
- Entropia Proprietary Sandbox
- Binary-level protection
- System virtualization (registry, file system,
network) - Open Scheduling Infrastructure
- Intelligent scheduling (match resources to
subjobs requirements) - Manage subjob redundancy/fault tolerance
40Manage Dynamic Use
- PC primary use must be respected!
- Entropia Proprietary Sandbox
- Guaranteed to run at idle priority
- Limit application capability
- Monitor page faults, network access
- Management
- Provide time-of-use windows
- Different levels of unobtrusiveness
- Gathers 95 of cycles
41Application Integration
- Support any Win32 binary
- Language Neutral (C, C, Fortran, Java,C, etc.)
- Compiler/library Neutral
App A
Client1
qsub qstat
App B
Client2
Open Grid Platform
Run Applications
App C
Application Preparation Tools
42Manageability
43Application Performance
HMMER
GOLD
AUTODOCK
DOCK
44Scheduling Performance
45Challenges Service Interoperability
- Trying to force homogeneity on users is futile.
Everyone has their own preferences, sometimes
even dogma. - The Internet provides the model
46Typical Application
47Typical Application
- Implementations are provided by a mix of
- Application-specific code
- Off the shelf tools and services
- Tools and services from the Globus Toolkit
- Tools and services from the Grid community
(compatible with GT) - Glued together by
- Application development
- System integration
48How it Really Happens(without the Grid)
49How it Really Happens(with the Grid)
50Theory -gt Practice
51What You Get in the Globus Toolkit
- OGSI(3.x)/WSRF(4.x) Core Implementation
- Used to develop and run OGSA-compliant Grid
Services (Java, C/C) - Basic Grid Services
- Popular among current Grid users, common
interfaces to the most typical services includes
both OGSA and non-OGSA implementations - Developer APIs
- C/C libraries and Java classes for building
Grid-aware applications and tools - Tools and Examples
- Useful tools and examples based on the developer
APIs
52Components in Globus Toolkit 3.0
GSI
WU GridFTP
JAVA WS Core (OGSI)
Pre-WS GRAM
WS-Security
RFT (OGSI)
OGSI C Bindings
WS GRAM (OGSI)
RLS
Data Management
Security
WS Core
Resource Management
Information Services
53Components in Globus Toolkit 3.2
JAVA WS Core (OGSI)
Pre-WS GRAM
WS GRAM (OGSI)
OGSI C Bindings
OGSI Python Bindings (contributed)
pyGlobus (contributed)
Data Management
Security
WS Core
Resource Management
Information Services
54Planned Components in GT 4.0
pyGlobus (contributed)
Authz Framework
Data Management
Security
WS Core
Resource Management
Information Services
55Grid and Web Services Convergence
- The definition of WSRF means that the Grid and
Web services communities can move forward on a
common base.
56Grid Services Example
- (from sotomayor tutorial)
- MathService API
- add(int x)
- subtract(int x)
- getvalue()
Note 1 How is this different than - Web
Services? - Corba? - COM/DCOM?
Note 2 This is too simple! What about -
co-ordination/workflows - personalization -
presentation - security
57OGSI (or what is a grid service?)
- Using web service infrastructure
- MathService is defined by WSDL (like idl)
lt?xml version"1.0" encoding"UTF-8"?gt ... lttypesgt
ltxsdschema targetNamespace"http//www.gt3tutori
al.org/namespaces/0.2/core/gwsdl/Math"
attributeFormDefault"qualified"
elementFormDefault"qualified"
xmlns"http//www.w3.org/2001/XMLSchema"gt
ltxsdelement name"add"gt
ltxsdcomplexTypegt
ltxsdsequencegt
ltxsdelement name"value" type"xsdint"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"addResponse"gt
ltxsdcomplexType/gt lt/xsdelementgt ... lt/ty
pesgt ltmessage name"AddInputMessage"gt
ltpart name"parameters" element"tnsadd"/gt lt/mess
agegt ltmessage name"AddOutputMessage"gt
ltpart name"parameters" element"tnsaddResponse"/
gt lt/messagegt ...
ltgwsdlportType name"MathPortType"
extends"ogsiGridService"gt ltoperation
name"add"gt ltinput
message"tnsAddInputMessage"/gt
ltoutput message"tnsAddOutputMessage"/gt
ltfault name"Fault" message"ogsiFaultMess
age"/gt lt/operationgt ltoperation
name"subtract"gt ltinput
message"tnsSubtractInputMessage"/gt
ltoutput message"tnsSubtractOutputMessage"/gt
ltfault name"Fault"
message"ogsiFaultMessage"/gt
lt/operationgt ltoperation name"getValue"gt
ltinput message"tnsGetValueInputMe
ssage"/gt ltoutput
message"tnsGetValueOutputMessage"/gt
ltfault name"Fault" message"ogsiFaultMessage
"/gt lt/operationgt lt/gwsdlportTypegt lt/defi
nitionsgt
58Basic Concepts
59The GridService PortType
- a grid service is a web service that implements
the GridService PortType
ltportType name"GridService"gt ltoperation
name"setServiceData"gt snip lt/operationgt ltoperat
ion name"destroy"gt snip lt/operationgt ltoperati
on name"requestTerminationAfter"gt snip
lt/operationgt ltoperation name"requestTerminationBe
fore"gt snip lt/operationgt ltoperation
name"findServiceData"gt snip
lt/operationgt lt/portTypegt ltgwsdlportType
name"GridService"gt ltsdserviceData
maxOccurs"unbounded" minOccurs"1"
modifiable"false" mutability"constant"
name"interface" nillable"false"
type"xsdQName"/gt ltsdserviceData
maxOccurs"unbounded" minOccurs"0"
modifiable"false" mutability"mutable"
name"serviceDataName" nillable"False"
type"xsdQName"/gt ltsdserviceData
maxOccurs"1" minOccurs"1" modifiable"false"
mutability"mutable" name"factoryLocator"
nillable"true" type"ogsiLocatorType"/gt
ltsdserviceData maxOccurs"unbounded"
minOccurs"0" modifiable"false"
mutability"extendable" name"gridServiceHandle"
nillable"false" type"ogsiHandleType"/gt
ltsdserviceData maxOccurs"unbounded"
minOccurs"1" modifiable"false"
mutability"mutable" name"gridServiceReference"
nillable"false" type"ogsiReferenceType"/gt
ltsdserviceData maxOccurs"unbounded"
minOccurs"1" modifiable"false"
mutability"static" name"findServiceDataExtensibi
lity" nillable"false" type"ogsi
OperationExtensibilityType"/gt ltsdserviceData
maxOccurs"unbounded" minOccurs"1"
modifiable"false" mutability"static"
name"setServiceDataExtensibility"
nillable"false" type"ogsiOperationExtensibility
Type"/gt ltsdserviceData maxOccurs"1"
minOccurs"1" modifiable"false"
mutability"mutable" name"terminationTime"
nillable"false" type"ogsiTerminationTimeType"/gt
ltsdstaticServiceDataValuesgt
ltogsifindServiceDataExtensibility
inputElement"ogsiqueryByServiceDataNames"/gt
ltogsisetServiceDataExtensibility
inputElement"ogsisetByServiceDataNames"/gt
ltogsisetServiceDataExtensibility
inputElement"ogsideleteByServiceDataNames"/gt
lt/sdstaticServiceDataValuesgt lt/gwsdlportTypegt
60GridService PortType
- FindServiceData()
- QueryByServiceDataNames()
- GetServiceData()
- SetByServiceDataNames()
- DeleteByServiceDataNames()
- RequestTerminationAfter()
- RequestTerminationBefore()
- Destroy()
61Capabilities of a Grid Service
- 2-level naming (GSH vs. GSR)
- Factories
- Lifetime management
- Service Data Elements
- Event Notification
- ServiceGroups
62GSH versus GSR
- A GSH (Grid Service Handle) is a unique name for
a Grid Service Instance - A GSR (Grid Service Reference) is a perhaps
temporary mechanism to access the Grid Service
Instance
63Factories
- Create new instances of services dynamically
- Individualized Instances
- lifetime management techniques
64Service Data Elements
- Generalized State
- useful for describing capability
- Get/Set model similar to javaBeans Properties
- Can specify initial values in WSDL
- Integrated with Notification mechanism
65Service Data ElementsGridService
- Interface
- ServiceDataName
- FactoryLocator
- GridServiceHandle
- GridServiceReference
- TerminationTime
66Notifications
- Source
- implements NotificationSourcePortType
- sends a notification message (XML Element) to
Sinks - Sink
- implements NotificationSinkPortType
- sends a notification subscription request to
source - causes a GridService Instance of porttype
NotificationSubscription to be created
67ServiceGroups
- A grid service that maintains information about
other grid services - Can be used to implement a classic registry model
- Can be used for dataset replication
- A grid service can belong to more than one
Service Group - Membership in a ServiceGroup can be homogeneous
or heterogeneous - Service group portTypes are optional
68Grid Services Summary
- Extends Web Services to support Transient
Services - WSDL 1.2 expected to include extensions
- Requires support for factories, lifetime
management, soft-state management, and
notifications - Java implementation pretty solid
- Security implementation still shaky
69Other Challenges
- Developing user interfaces
- Data Management
- Scheduling/co-scheduling of resources
- Failure management
- Application development
- Performance
- Many others
70What I hope you got from this talk
- Grid Computing is about
- Co-ordinated use of different resources
- Provisioning resources for increased utilization
- Scaling to large numbers of resources, services
and users - Many systems being built
- Many Applications being developed
71Pop Quiz
- What is the definition of Grid Computing?
- What kinds of resources are we talking about?
- What are the main technical challenges in
building grids? - Why should you care?
- What is a proxy certificate? And why is it not
encrypted?