Complexity Computational Environment, integrating data and simulation on the Grid: Multiscale comput

1 / 40
About This Presentation
Title:

Complexity Computational Environment, integrating data and simulation on the Grid: Multiscale comput

Description:

Job submission Job management, simple security (to be addressed), File ... Globus Legion Condor NetSolve Ninf and other High Performance Computing activities ... –

Number of Views:224
Avg rating:3.0/5.0
Slides: 41
Provided by: ajh7
Category:

less

Transcript and Presenter's Notes

Title: Complexity Computational Environment, integrating data and simulation on the Grid: Multiscale comput


1
Complexity Computational Environment,integrating
data and simulation on the Grid Multiscale
computingJPLJune 18 2003
http//www.grid2002.org
  • Geoffrey Fox, Marlon Pierce
  • Community Grids Lab
  • Indiana University
  • gcf_at_indiana.edu

http//academia.web.cern.ch/academia/lectures/grid
/
2
Grid Backdrop from CT Project
  • Grid Computational Environment (GCE) for
    SERVOGrid based on Web services (WS)
  • Job submission Job management, simple security
    (to be addressed), File processing
  • Support as WS key simulation and Pattern
    recognition codes (DISLOC, SIMPLEX, VC, PARK,
    GEOFEST, DAHMM, PDPC)
  • Current
  • Support databases and visualization
  • Simple workflow, notification, metadata services
  • Initial Schema for GEM specific (meta-)data
  • Portlet based Interfaces
  • Extend to ACES (Japan, Australia) for distributed
    computers, software, databases, clients
  • Collaboration and other useful portlets
  • Can inherit Globus support from Alliance Portal,
    NMI efforts

3
AIST Additions
  • Compatibility with Grid Services
  • Use of OGSA-DAI XML and SQL database standards
  • Including extensions for streaming (sensor) data
  • Including extensions for integration with
    simulations
  • Optimization for parallel simulations (e.g.
    parallel IO) (?)
  • Better workflow, notification, metadata services
  • openGIS/GML compatibility (fault etc. Schema)
  • Semantic Grid
  • Autonomic (Robust Reliable Resilient) services
    (?)
  • Support multi-scale simulations and data
    assimilation
  • ServoPSE Problem Solving Environments (?)
  • GeoLanguage (ServoML specializing CCEML)
    integrating workflow and multi-scale support
  • Interactive portlet based front end with Matlab
    and/or Mathemetica style interface

4
SERVOGrid Caricature
5
Sources of Grid Technology?
  • Grids support distributed collaboratories or
    virtual organizations that support People,
    Computers, Observational Data and results of
    thought and data processing
  • The Web and Web Services
  • Most important for Information Grids as these are
    naturally service-based
  • Distributed Objects (CORBA Java/Jini COM)
  • Distributed Object same as a Service
  • Globus Legion Condor NetSolve Ninf and other High
    Performance Computing activities
  • Compute/File Grids that need to be made into
    services (Globus GT3) and integrated with
    Information Grids for Geocomplexity
  • Peer-to-peer Networks

6
Taxonomy of Grid Functionalities
7
Approach
Application WS
  • Build on e-Science methodology and Grid
    technology
  • Geocomplexity (and Biocomplexity) applications
    with multi-scale models, scalable parallelism,
    data assimilation as key issues
  • Data-driven models for earthquakes
  • Use existing code/database technology
    (SQL/Fortran/C) linked to Application Web/OGSA
    services
  • XML specification of models, computational
    steering, scale supported at Web Service level
    as dont need high performance here
  • Allows use of Semantic Grid technology
  • AIST builds on CT

8
OGSA-DAIGrid Services
AnalysisControl Visualize
Grid
Data
Filter
This Type of Grid integrates with Parallel
computing Multiple HPC facilities but only use
one at a time Many simultaneous data sources and
sinks
HPC Simulation
Grid Data Assimilation
Other Gridand Web Services
Distributed Filters massage data For simulation
SERVOGrid (Complexity)Computing Model
9
Data Assimilation
  • Data assimilation implies one is solving some
    optimization problem which might have Kalman
    Filter like structure
  • As discussed by DAO at Earth Science meeting, one
    will become more and more dominated by the data
    (Nobs much larger than number of simulation
    points).
  • Natural approach is to form for each local
    (position, time) patch the important data
    combinations so that optimization doesnt waste
    time on large error or insensitive data.
  • Data reduction done in natural distributed
    fashion NOT on HPC machine as distributed
    computing most cost effective if calculations
    essentially independent
  • Filter functions must be transmitted from HPC
    machine

10
Distributed Filtering
Nobslocal patch gtgt Nfilteredlocal patch
Number_of_Unknownslocal patch
In simplest approach, filtered data gotten by
linear transformations on original data based on
Singular Value Decomposition of Least squares
matrix
Send needed Filter Receive filtered data
Nobslocal patch 1
Data
Filter
Nfilteredlocal patch 1
Geographically DistributedSensor patches
Nobslocal patch 2
Data
Filter
HPC Machine
Nfilteredlocal patch 2
Factorize Matrixto product of local patches
Distributed Machine
11
Grid Politics
  • There is a Global Grid Forum meeting 3 times per
    year with about 700 attendees per meeting
  • Exchange information and define standards for
    everything not done in W3C and OASIS
  • e.g. Grid Service, Security, What is a Job,
    Database, Computer, How to build portals .
  • There is a large project called Globus developing
    software largely for compute/file Grids
  • There are some 50 Grid projects (mainly in Europe
    and USA) developing software and applications as
    well as installing infrastructure
  • Some are deployment EDG NMI VDT ..
  • There are related initiatives called
    CyberInfrastructure (NSF USA) and e-Science (UK)
  • There is a proposed OMII (Open Middleware
    Infrastructure Institute) an international
    Alliance of separately funded projects with
    common coordination

12
OGSA OGSI Hosting Environments
  • Start with Web Services in a hosting environment
  • Add OGSI to get a Grid service and a component
    model
  • Add OGSA to get Interoperable Grid correcting
    differences in base platform and adding key
    functionalities

13
OGSI Open Grid Service Interface
  • http//www.gridforum.org/ogsi-wg
  • It is a component model for web services.
  • It defines a set of behavior patterns that each
    OGSI service must exhibit.
  • Every Grid Service portType extends a common
    base type.
  • Defines an introspection model for the service
  • You can query it (in a standard way) to discover
  • What methods/messages a port understands
  • What other port types does the service provide?
  • If the service is stateful what is the current
    state?
  • A set of standard portTypes for
  • Message subscription and notification
  • Service collections
  • Each service is identified by a URI called the
    Grid Service Handle
  • GSHs are bound dynamically to Grid Services
    References (typically wsdl docs)
  • A GSR may be transient. GSHs are fixed.
  • Handle map services translate GSHs into GSRs.

14
OGSA-DAI(Malcolm Atkinson Edinburgh) UK
e-Science Grid Core Programme Development of Data
Access and Integration Services for
OGSA http//umbriel.dcs.gla.ac.uk/NeSC/general/pro
jects/OGSA_DAI - Access to XML Databases - -
Access to Relational Databases - - Distributed
Query Processing (DB Federation) - - XML Schema
Support for e-Science -
15
DAI Key Services
GridDataService GDS Access to data DB
operations GridDataServiceFactory GDSF Makes GDS
GDSF GridDataServiceRegistry GDSR Discovery of
GDS(F) Data GridDataTranslationService GDTS Tra
nslates or Transforms Data GridDataTransportDepot
GDTD Data transport with persistence
Integrated Structured Data Transport Relational
XML models supported Role-based
Authorisation Binary structured files (later)
16
Interface transparency one GDS supports multiple
database types
Relational database
17
Integration of Data and Filters
  • One has the OGSA-DAI Data repository interface
    combined with WSDL of the (Perl, Fortran, Python
    ) filter
  • User only sees WSDL not data syntax
  • Some non-trivial issues as to where the filtering
    compute power is
  • Microsoft says filter next to data

18
MultiScale
Load Balancing
Algorithms
InfoGrid
Grid Portals
Parallel Computing
Extended/Integrated VAPARKGEOFEST
Integrated CCE
Computer Science
Large System Simulations
Visualization
e-ScienceCollaborationGrid
Infrastructure
Modeling
Grid
General Complex Systems Simulations
Clusters
Databases
Geology
GeoInformatics
Other Fields X-Complexity
Experiments
Field
Sensors/Satellites
ComplexFluids
Stock Market
BioComplexity
19
SERVOGrid Complexity Computing Environment
Parallel SimulationService
DatabaseService
ComputeService
Sensor Service
Middle Tier with XML Interfaces
ApplicationService-1
XML Meta-dataService
ApplicationService-2
CCE Control Portal Aggregation
ComplexitySimulationService
ApplicationService-3
Users
VisualizationService
20
SERVOGrid Requirements
  • Seamless Access to Data repositories and large
    scale computers
  • Integration of multiple data sources including
    sensors, databases, file systems with analysis
    system
  • Including filtered OGSA-DAI
  • Rich meta-data generation and access with
    SERVOGrid specific Schema extending openGIS
    standards and using Semantic Grid
  • Portals with component model for user interfaces
    and web control of all capabilities
  • Collaboration to support world-wide work
  • Basic Grid tools workflow and notification

21
Portal such as Jetspeed
Hosting Environment
Hosting Environment
GridComputing or ProgrammingEnvironments
Application/User Framework supporting development
and deployment of OGSI compliant AWS (Application
Web Services)
Generic Application Services
Web Services
OGSA Interoperability Layer
CoreGrid
Sophisticated System Services
OGSA Interoperability Layer
Resource Grid Services
e.g. DAI compliantdatabase
Resources
22
Taxonomy of Grid Operational Style
23
Paradigms Protocols Platforms and Hosting
  • We can start from the Web view where the basic
    Grid paradigm is
  • Meta-data rich Web Services communicating via
    messages
  • These have some basic support from some runtime
    such as .NET, Jini (pure Java), Apache
    TomcatAxis (Web Service toolkit), Enterprise
    JavaBeans, WebSphere (IBM) or GT3 (Globus Toolkit
    3)
  • These are the distributed equivalent of operating
    system functions as in UNIX Shell
  • Called Hosting Environment or platform

24
Permeating Principles and Policies
  • Meta-data rich Message-linked Web Services as the
    permeating paradigm
  • User Component Model such as Enterprise
    JavaBean (EJB) or .NET.
  • Service Management framework including a possible
    Factory mechanism
  • High level Invocation Framework describing how
    you interact with system components.
  • This could for example be used to allow the
    system to built from either W3C or GGF style
    (OGSI) Web Services and to protect the user from
    changes in their specifications.
  • Security is a service but the need for fine grain
    selective authorization encourages
  • Policy context that sets the rules for each
    particular Grid.
  • Currently OGSA supports policies for routing,
    security and resource use.
  • The Grid Fabric or set of resources needs
    mechanisms to manage them. This includes
    automatic recording of meta-data and
    configuration of software.
  • Quality of service (QoS) for the Network and this
    implies performance monitoring and bandwidth
    reservation services.
  • Challenging as end-to-end and not just backbone
    QoS is needed.
  • Messaging systems like MQSeries from IBM provide
    robustness from asynchronous delivery and can
    abstract destination and allow customization of
    content such as converting between different
    interface specifications.
  • Messaging is built on transport mechanisms which
    can be used to support mechanisms to implement
    QoS and to virtualize ports

25
Virtualization
  • The Grid could and sometimes does virtualize
    various concepts
  • Location URI (Universal Resource Identifier)
    virtualizes URL
  • Replica management (caching) virtualizes file
    location generalized by GriPhyn virtual data
    concept
  • Protocol message transport and WSDL bindings
    virtualize transport protocol as a QoS request
  • P2P or Publish-subscribe messaging virtualizes
    matching of source and destination services
  • Semantic Grid virtualizes Knowledge as a
    meta-data query
  • Brokering virtualizes resource allocation
  • Virtualization implies references can be indirect

26
Interfaces and Functionality and Semantics I
  • The Grid platform tries to minimize detail in
    protocols and maximize detail in interfaces to
    enhance scaling
  • However rich meta-data and semantics are critical
    for correct and interesting operation
  • Put as much semantic interpretation as you can
    into specific services
  • Lack of Semantic interoperation is in fact main
    weakness of todays Grids and Web services
  • Everything becomes a service whether system or
    application level
  • There are some very important Global Services
  • Discovery (look up) and Registration of service
    metadata
  • Workflow
  • MetaSchedulers

27
Interfaces and Functionality and Semantics II
  • There are many other generally important services
  • OGSA-DAI The Database Service
  • Portal Service linked to by WSRP (Web services
    for Remote Portals)
  • Notification of events
  • Job submission
  • Provenance interpret meta-data about history of
    data
  • File Interfaces
  • Sensor service satellites
  • Visualization
  • Basic brokering/scheduling

28
Categories of Worldwide Grid Servicesto be
exploited by SERVOGrid
  • 1) Types of Grid
  • R3
  • Lightweight
  • P2P
  • Federation and Interoperability
  • 2) Core Infrastructure and Hosting Environment
  • Service Management
  • Component Model
  • Service wrapper/Invocation
  • Messaging
  • 3) Security Services
  • Certificate Authority
  • Authentication
  • Authorization
  • Policy
  • 4) Workflow Services and Programming Model
  • Enactment Engines (Runtime)
  • Languages and Programming
  • Compiler
  • 7) Information Grid Services
  • OGSA-DAI/DAIT
  • Integration with compute resources
  • P2P and database models
  • 8) Compute/File Grid Services
  • Job Submission
  • Job Planning Scheduling Management
  • Access to Remote Files, Storage and Computers
  • Replica (cache) Management
  • Virtual Data
  • Parallel Computing
  • 9) Other services including
  • Grid Shell
  • Accounting
  • Fabric Management
  • Visualization Data-mining and Computational
    Steering
  • Collaboration
  • 10) Portals and Problem Solving Environments
  • 11) Network Services

29
Two-level Programming I
  • The paradigm implicitly assumes a two-level
    Programming Model
  • We make a Service (same as a distributed object
    or computer program running on a remote
    computer) using conventional technologies
  • C Java or Fortran Monte Carlo module
  • Data streaming from a sensor or Satellite
  • Specialized (JDBC) database access
  • Such nuggets accept and produce data from users
    files and databases
  • The Grid is built by coordinating such nuggets
    assuming we have solved problem of programming
    the nugget

30
Two-level Programming II
  • The Grid is discussing the linkage and
    distribution of the nuggets with the
    onlyaddition runtime interfaces to Grid as
    opposed to UNIX data streams
  • Familiar from use of UNIX Shell, PERL or Python
    scripts to produce real applications from core
    programs
  • Such interpretative environments are the single
    processor analog of Grid Programming and this
    tends to be called workflow
  • Workflow is the composition of multiple services
    (programs) together to make a new service
  • Includes Software Bus, Application
    Integration, Co-ordination Languages etc.

31
Workflow
  • Workflow has at least 4 parts
  • Programming Environment typically GUI to drag
    and drop services and their linkages (familiar
    from AVS etc. which was workflow for
    visualization)
  • Language from XML to extended Python
  • Compiler converting Language into executable
  • Runtime controlling flow of information and
    notification events
  • Can use Python, Mathematica, Matlab, JavaSpaces,
    IBM BPEL4WS, DoE CCA etc.
  • Dont think current systems are very near what
    we will want but expect much progress over next
    3 years and plenty of systems to work with
  • Metadata critical to tell you how to combine
    services in a sensible way so workflow engines
    must interface with metadata service

32
Workflow GCEs and Problem Solving Environments
(PSEs)
  • There is some confusion between fields of
    workflow (Grid Computing Environments GCE) and
    PSEs
  • To extent PSEs just allow manipulation of
    nuggets, they are indistinguishable from a
    domain specific GCE
  • They are distinct if they support intra nugget
    operations such as
  • Integration of mesh and simulation
  • Closely coupled code linkage
  • Generation of code from high level interface like
    Mathematica
  • Even in latter case, a new generation of PSEs
    should be built with Grid architecture e.g.
    message based and using Grid services like
    metadata and notification

33
Selected GeoInformatics Data
Tool MetaData
XML Meta-dataService
MultiScale Ontologies
Job MetaData
Complexity Scripts
Workflow
SERVOPSE Programs using CCEML(SERVOML)
SERVOGrid ComplexitySimulation Service
Importance of Metadata Service how should this
be implemented?
34
Metadata Approaches
  • Specialized services like UDDI and MDS (Globus)
  • Nobody likes UDDI
  • MDS uses LDAP
  • RGMA is MDS with a relational database backend
  • By hand as in current GEM Portal which is
    roughly same as using service stored SDEs
    (Service Data Elements) as in OGSI
  • Some new MDS coming from Globus GT3?
  • Current MDS has both a Schema (insufficient for
    us) and a database technology
  • Semantic Grid technologies
  • Some basic XML database (Oracle, Xindice )
  • If OGSA compliant (not defined yet), then
    doesnt matter that much

35
Workflow and SERVOGrid CCE
  • SERVOGrid should workflow technology to support
    both
  • code and data coupling (DISLOC with SIMPLEX
    etc.)
  • Multiscale features
  • Implementing multiscale model requires
  • building Web services for each model,
  • describing each model with metadata and
  • Describing linkage of models (linkage of ports on
    web services)
  • And describing when to use which scale model
  • So workflow and multiscale depend on web services
    described by rich metadata
  • This analysis isnt correct if scales must be
    tightly coupled as current workflow wont
    support this (CCA from DoE claims to address this
    but not clear if general)
  • We should focus on multiscale models with loose
    nugget coupling
  • Hopefully we will learn how to take same
    architecture, compile away inefficiencies and get
    high performance on tighter coupling than
    conventional distributed workflow

36
Technologies under development at Indiana
  • Portal Infrastructure and Portlets integrating
    with rest of Globus/OGSA-DAI Community
  • Including job submission, management of modest
    meta-data and linkage to databases
  • Should package as application web service
    toolkit and test on ACES world wide iSERVOGrid
  • Some core portal Metadata (Semantic Grid)
    services
  • Messaging system between Web services that is
    useful for
  • Service Management/Autonomic Grids
  • Security
  • Notification service
  • Collaboration infrastructure and portlets

37
Web Services as a Portlet
  • Each Web Service naturally has a user interface
    specified as just another port
  • Customizable for universal access
  • This gives each Web Service a Portlet view
    specified (in XML as always) by WSRP (Web
    services for Remote Portals)
  • So component model for resources automatically
    gives a component model for user interfaces
  • When you build your application, you define
    portletat same time

Application as a WSGeneral Application
PortsInterface with other WebServices
User Face ofWeb ServiceWSRP Ports define WS as
a Portlet
Web Services have other ports (Grid Service) to
be OGSI compliant
38
Online Knowledge Center built from Portlets
A set of UIComponents
  • Web Services provide a component model for the
    middleware (see large common component
    architecture effort in Dept. of Energy)
  • Should match each WSDL component with a
    corresponding user interface component
  • Thus one must use a component model for the
    portal with again an XML specification (portalML)
    of portal component

39
Sample page with several portlets proxy
credential manager, submission, monitoring
40
Administer Grid Portal
Provide information about application and host
parameters
Select application to edit
Write a Comment
User Comments (0)
About PowerShow.com