Title: Complexity Computational Environment, integrating data and simulation on the Grid: Multiscale comput
1Complexity Computational Environment,integrating
data and simulation on the Grid Multiscale
computingJPLJune 18 2003
http//www.grid2002.org
- Geoffrey Fox, Marlon Pierce
- Community Grids Lab
- Indiana University
- gcf_at_indiana.edu
http//academia.web.cern.ch/academia/lectures/grid
/
2Grid Backdrop from CT Project
- Grid Computational Environment (GCE) for
SERVOGrid based on Web services (WS) - Job submission Job management, simple security
(to be addressed), File processing - Support as WS key simulation and Pattern
recognition codes (DISLOC, SIMPLEX, VC, PARK,
GEOFEST, DAHMM, PDPC) - Current
- Support databases and visualization
- Simple workflow, notification, metadata services
- Initial Schema for GEM specific (meta-)data
- Portlet based Interfaces
- Extend to ACES (Japan, Australia) for distributed
computers, software, databases, clients - Collaboration and other useful portlets
- Can inherit Globus support from Alliance Portal,
NMI efforts
3AIST Additions
- Compatibility with Grid Services
- Use of OGSA-DAI XML and SQL database standards
- Including extensions for streaming (sensor) data
- Including extensions for integration with
simulations - Optimization for parallel simulations (e.g.
parallel IO) (?) - Better workflow, notification, metadata services
- openGIS/GML compatibility (fault etc. Schema)
- Semantic Grid
- Autonomic (Robust Reliable Resilient) services
(?) - Support multi-scale simulations and data
assimilation - ServoPSE Problem Solving Environments (?)
- GeoLanguage (ServoML specializing CCEML)
integrating workflow and multi-scale support - Interactive portlet based front end with Matlab
and/or Mathemetica style interface
4SERVOGrid Caricature
5Sources of Grid Technology?
- Grids support distributed collaboratories or
virtual organizations that support People,
Computers, Observational Data and results of
thought and data processing - The Web and Web Services
- Most important for Information Grids as these are
naturally service-based - Distributed Objects (CORBA Java/Jini COM)
- Distributed Object same as a Service
- Globus Legion Condor NetSolve Ninf and other High
Performance Computing activities - Compute/File Grids that need to be made into
services (Globus GT3) and integrated with
Information Grids for Geocomplexity - Peer-to-peer Networks
6Taxonomy of Grid Functionalities
7Approach
Application WS
- Build on e-Science methodology and Grid
technology - Geocomplexity (and Biocomplexity) applications
with multi-scale models, scalable parallelism,
data assimilation as key issues - Data-driven models for earthquakes
- Use existing code/database technology
(SQL/Fortran/C) linked to Application Web/OGSA
services - XML specification of models, computational
steering, scale supported at Web Service level
as dont need high performance here - Allows use of Semantic Grid technology
- AIST builds on CT
8OGSA-DAIGrid Services
AnalysisControl Visualize
Grid
Data
Filter
This Type of Grid integrates with Parallel
computing Multiple HPC facilities but only use
one at a time Many simultaneous data sources and
sinks
HPC Simulation
Grid Data Assimilation
Other Gridand Web Services
Distributed Filters massage data For simulation
SERVOGrid (Complexity)Computing Model
9Data Assimilation
- Data assimilation implies one is solving some
optimization problem which might have Kalman
Filter like structure - As discussed by DAO at Earth Science meeting, one
will become more and more dominated by the data
(Nobs much larger than number of simulation
points). - Natural approach is to form for each local
(position, time) patch the important data
combinations so that optimization doesnt waste
time on large error or insensitive data. - Data reduction done in natural distributed
fashion NOT on HPC machine as distributed
computing most cost effective if calculations
essentially independent - Filter functions must be transmitted from HPC
machine
10Distributed Filtering
Nobslocal patch gtgt Nfilteredlocal patch
Number_of_Unknownslocal patch
In simplest approach, filtered data gotten by
linear transformations on original data based on
Singular Value Decomposition of Least squares
matrix
Send needed Filter Receive filtered data
Nobslocal patch 1
Data
Filter
Nfilteredlocal patch 1
Geographically DistributedSensor patches
Nobslocal patch 2
Data
Filter
HPC Machine
Nfilteredlocal patch 2
Factorize Matrixto product of local patches
Distributed Machine
11Grid Politics
- There is a Global Grid Forum meeting 3 times per
year with about 700 attendees per meeting - Exchange information and define standards for
everything not done in W3C and OASIS - e.g. Grid Service, Security, What is a Job,
Database, Computer, How to build portals . - There is a large project called Globus developing
software largely for compute/file Grids - There are some 50 Grid projects (mainly in Europe
and USA) developing software and applications as
well as installing infrastructure - Some are deployment EDG NMI VDT ..
- There are related initiatives called
CyberInfrastructure (NSF USA) and e-Science (UK) - There is a proposed OMII (Open Middleware
Infrastructure Institute) an international
Alliance of separately funded projects with
common coordination
12OGSA OGSI Hosting Environments
- Start with Web Services in a hosting environment
- Add OGSI to get a Grid service and a component
model - Add OGSA to get Interoperable Grid correcting
differences in base platform and adding key
functionalities
13OGSI Open Grid Service Interface
- http//www.gridforum.org/ogsi-wg
- It is a component model for web services.
- It defines a set of behavior patterns that each
OGSI service must exhibit. - Every Grid Service portType extends a common
base type. - Defines an introspection model for the service
- You can query it (in a standard way) to discover
- What methods/messages a port understands
- What other port types does the service provide?
- If the service is stateful what is the current
state? - A set of standard portTypes for
- Message subscription and notification
- Service collections
- Each service is identified by a URI called the
Grid Service Handle - GSHs are bound dynamically to Grid Services
References (typically wsdl docs) - A GSR may be transient. GSHs are fixed.
- Handle map services translate GSHs into GSRs.
14OGSA-DAI(Malcolm Atkinson Edinburgh) UK
e-Science Grid Core Programme Development of Data
Access and Integration Services for
OGSA http//umbriel.dcs.gla.ac.uk/NeSC/general/pro
jects/OGSA_DAI - Access to XML Databases - -
Access to Relational Databases - - Distributed
Query Processing (DB Federation) - - XML Schema
Support for e-Science -
15DAI Key Services
GridDataService GDS Access to data DB
operations GridDataServiceFactory GDSF Makes GDS
GDSF GridDataServiceRegistry GDSR Discovery of
GDS(F) Data GridDataTranslationService GDTS Tra
nslates or Transforms Data GridDataTransportDepot
GDTD Data transport with persistence
Integrated Structured Data Transport Relational
XML models supported Role-based
Authorisation Binary structured files (later)
16Interface transparency one GDS supports multiple
database types
Relational database
17Integration of Data and Filters
- One has the OGSA-DAI Data repository interface
combined with WSDL of the (Perl, Fortran, Python
) filter - User only sees WSDL not data syntax
- Some non-trivial issues as to where the filtering
compute power is - Microsoft says filter next to data
18MultiScale
Load Balancing
Algorithms
InfoGrid
Grid Portals
Parallel Computing
Extended/Integrated VAPARKGEOFEST
Integrated CCE
Computer Science
Large System Simulations
Visualization
e-ScienceCollaborationGrid
Infrastructure
Modeling
Grid
General Complex Systems Simulations
Clusters
Databases
Geology
GeoInformatics
Other Fields X-Complexity
Experiments
Field
Sensors/Satellites
ComplexFluids
Stock Market
BioComplexity
19SERVOGrid Complexity Computing Environment
Parallel SimulationService
DatabaseService
ComputeService
Sensor Service
Middle Tier with XML Interfaces
ApplicationService-1
XML Meta-dataService
ApplicationService-2
CCE Control Portal Aggregation
ComplexitySimulationService
ApplicationService-3
Users
VisualizationService
20SERVOGrid Requirements
- Seamless Access to Data repositories and large
scale computers - Integration of multiple data sources including
sensors, databases, file systems with analysis
system - Including filtered OGSA-DAI
- Rich meta-data generation and access with
SERVOGrid specific Schema extending openGIS
standards and using Semantic Grid - Portals with component model for user interfaces
and web control of all capabilities - Collaboration to support world-wide work
- Basic Grid tools workflow and notification
21Portal such as Jetspeed
Hosting Environment
Hosting Environment
GridComputing or ProgrammingEnvironments
Application/User Framework supporting development
and deployment of OGSI compliant AWS (Application
Web Services)
Generic Application Services
Web Services
OGSA Interoperability Layer
CoreGrid
Sophisticated System Services
OGSA Interoperability Layer
Resource Grid Services
e.g. DAI compliantdatabase
Resources
22Taxonomy of Grid Operational Style
23Paradigms Protocols Platforms and Hosting
- We can start from the Web view where the basic
Grid paradigm is - Meta-data rich Web Services communicating via
messages - These have some basic support from some runtime
such as .NET, Jini (pure Java), Apache
TomcatAxis (Web Service toolkit), Enterprise
JavaBeans, WebSphere (IBM) or GT3 (Globus Toolkit
3) - These are the distributed equivalent of operating
system functions as in UNIX Shell - Called Hosting Environment or platform
24Permeating Principles and Policies
- Meta-data rich Message-linked Web Services as the
permeating paradigm - User Component Model such as Enterprise
JavaBean (EJB) or .NET. - Service Management framework including a possible
Factory mechanism - High level Invocation Framework describing how
you interact with system components. - This could for example be used to allow the
system to built from either W3C or GGF style
(OGSI) Web Services and to protect the user from
changes in their specifications. - Security is a service but the need for fine grain
selective authorization encourages - Policy context that sets the rules for each
particular Grid. - Currently OGSA supports policies for routing,
security and resource use. - The Grid Fabric or set of resources needs
mechanisms to manage them. This includes
automatic recording of meta-data and
configuration of software. - Quality of service (QoS) for the Network and this
implies performance monitoring and bandwidth
reservation services. - Challenging as end-to-end and not just backbone
QoS is needed. - Messaging systems like MQSeries from IBM provide
robustness from asynchronous delivery and can
abstract destination and allow customization of
content such as converting between different
interface specifications. - Messaging is built on transport mechanisms which
can be used to support mechanisms to implement
QoS and to virtualize ports
25Virtualization
- The Grid could and sometimes does virtualize
various concepts - Location URI (Universal Resource Identifier)
virtualizes URL - Replica management (caching) virtualizes file
location generalized by GriPhyn virtual data
concept - Protocol message transport and WSDL bindings
virtualize transport protocol as a QoS request - P2P or Publish-subscribe messaging virtualizes
matching of source and destination services - Semantic Grid virtualizes Knowledge as a
meta-data query - Brokering virtualizes resource allocation
- Virtualization implies references can be indirect
26Interfaces and Functionality and Semantics I
- The Grid platform tries to minimize detail in
protocols and maximize detail in interfaces to
enhance scaling - However rich meta-data and semantics are critical
for correct and interesting operation - Put as much semantic interpretation as you can
into specific services - Lack of Semantic interoperation is in fact main
weakness of todays Grids and Web services - Everything becomes a service whether system or
application level - There are some very important Global Services
- Discovery (look up) and Registration of service
metadata - Workflow
- MetaSchedulers
27Interfaces and Functionality and Semantics II
- There are many other generally important services
- OGSA-DAI The Database Service
- Portal Service linked to by WSRP (Web services
for Remote Portals) - Notification of events
- Job submission
- Provenance interpret meta-data about history of
data - File Interfaces
- Sensor service satellites
- Visualization
- Basic brokering/scheduling
28Categories of Worldwide Grid Servicesto be
exploited by SERVOGrid
- 1) Types of Grid
- R3
- Lightweight
- P2P
- Federation and Interoperability
- 2) Core Infrastructure and Hosting Environment
- Service Management
- Component Model
- Service wrapper/Invocation
- Messaging
- 3) Security Services
- Certificate Authority
- Authentication
- Authorization
- Policy
- 4) Workflow Services and Programming Model
- Enactment Engines (Runtime)
- Languages and Programming
- Compiler
- 7) Information Grid Services
- OGSA-DAI/DAIT
- Integration with compute resources
- P2P and database models
- 8) Compute/File Grid Services
- Job Submission
- Job Planning Scheduling Management
- Access to Remote Files, Storage and Computers
- Replica (cache) Management
- Virtual Data
- Parallel Computing
- 9) Other services including
- Grid Shell
- Accounting
- Fabric Management
- Visualization Data-mining and Computational
Steering - Collaboration
- 10) Portals and Problem Solving Environments
- 11) Network Services
29Two-level Programming I
- The paradigm implicitly assumes a two-level
Programming Model - We make a Service (same as a distributed object
or computer program running on a remote
computer) using conventional technologies - C Java or Fortran Monte Carlo module
- Data streaming from a sensor or Satellite
- Specialized (JDBC) database access
- Such nuggets accept and produce data from users
files and databases - The Grid is built by coordinating such nuggets
assuming we have solved problem of programming
the nugget
30Two-level Programming II
- The Grid is discussing the linkage and
distribution of the nuggets with the
onlyaddition runtime interfaces to Grid as
opposed to UNIX data streams - Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core
programs - Such interpretative environments are the single
processor analog of Grid Programming and this
tends to be called workflow - Workflow is the composition of multiple services
(programs) together to make a new service - Includes Software Bus, Application
Integration, Co-ordination Languages etc.
31Workflow
- Workflow has at least 4 parts
- Programming Environment typically GUI to drag
and drop services and their linkages (familiar
from AVS etc. which was workflow for
visualization) - Language from XML to extended Python
- Compiler converting Language into executable
- Runtime controlling flow of information and
notification events - Can use Python, Mathematica, Matlab, JavaSpaces,
IBM BPEL4WS, DoE CCA etc. - Dont think current systems are very near what
we will want but expect much progress over next
3 years and plenty of systems to work with - Metadata critical to tell you how to combine
services in a sensible way so workflow engines
must interface with metadata service
32Workflow GCEs and Problem Solving Environments
(PSEs)
- There is some confusion between fields of
workflow (Grid Computing Environments GCE) and
PSEs - To extent PSEs just allow manipulation of
nuggets, they are indistinguishable from a
domain specific GCE - They are distinct if they support intra nugget
operations such as - Integration of mesh and simulation
- Closely coupled code linkage
- Generation of code from high level interface like
Mathematica - Even in latter case, a new generation of PSEs
should be built with Grid architecture e.g.
message based and using Grid services like
metadata and notification
33Selected GeoInformatics Data
Tool MetaData
XML Meta-dataService
MultiScale Ontologies
Job MetaData
Complexity Scripts
Workflow
SERVOPSE Programs using CCEML(SERVOML)
SERVOGrid ComplexitySimulation Service
Importance of Metadata Service how should this
be implemented?
34Metadata Approaches
- Specialized services like UDDI and MDS (Globus)
- Nobody likes UDDI
- MDS uses LDAP
- RGMA is MDS with a relational database backend
- By hand as in current GEM Portal which is
roughly same as using service stored SDEs
(Service Data Elements) as in OGSI - Some new MDS coming from Globus GT3?
- Current MDS has both a Schema (insufficient for
us) and a database technology - Semantic Grid technologies
- Some basic XML database (Oracle, Xindice )
- If OGSA compliant (not defined yet), then
doesnt matter that much
35Workflow and SERVOGrid CCE
- SERVOGrid should workflow technology to support
both - code and data coupling (DISLOC with SIMPLEX
etc.) - Multiscale features
- Implementing multiscale model requires
- building Web services for each model,
- describing each model with metadata and
- Describing linkage of models (linkage of ports on
web services) - And describing when to use which scale model
- So workflow and multiscale depend on web services
described by rich metadata - This analysis isnt correct if scales must be
tightly coupled as current workflow wont
support this (CCA from DoE claims to address this
but not clear if general) - We should focus on multiscale models with loose
nugget coupling - Hopefully we will learn how to take same
architecture, compile away inefficiencies and get
high performance on tighter coupling than
conventional distributed workflow
36Technologies under development at Indiana
- Portal Infrastructure and Portlets integrating
with rest of Globus/OGSA-DAI Community - Including job submission, management of modest
meta-data and linkage to databases - Should package as application web service
toolkit and test on ACES world wide iSERVOGrid - Some core portal Metadata (Semantic Grid)
services - Messaging system between Web services that is
useful for - Service Management/Autonomic Grids
- Security
- Notification service
- Collaboration infrastructure and portlets
37Web Services as a Portlet
- Each Web Service naturally has a user interface
specified as just another port - Customizable for universal access
- This gives each Web Service a Portlet view
specified (in XML as always) by WSRP (Web
services for Remote Portals) - So component model for resources automatically
gives a component model for user interfaces - When you build your application, you define
portletat same time
Application as a WSGeneral Application
PortsInterface with other WebServices
User Face ofWeb ServiceWSRP Ports define WS as
a Portlet
Web Services have other ports (Grid Service) to
be OGSI compliant
38Online Knowledge Center built from Portlets
A set of UIComponents
- Web Services provide a component model for the
middleware (see large common component
architecture effort in Dept. of Energy) - Should match each WSDL component with a
corresponding user interface component - Thus one must use a component model for the
portal with again an XML specification (portalML)
of portal component
39Sample page with several portlets proxy
credential manager, submission, monitoring
40Administer Grid Portal
Provide information about application and host
parameters
Select application to edit