Title: eScience eBusiness eGovernment and their Technologies Introduction
1e-Science e-Business e-Government and their
TechnologiesIntroduction
- Bryan Carpenter, Geoffrey Fox, Marlon Pierce
- Pervasive Technology Laboratories
- Indiana University Bloomington IN 47401
- January 12 2004
- dbcarpen_at_indiana.edu
- gcf_at_indiana.edu
- mpierce_at_cs.indiana.edu
- http//www.grid2004.org/spring2004
2Course Topics Background/Core
- Java Programming
- We will assume basic Java programming proficiency
- We will cover Java client/server, three-tiered
and network programming. - Ancillary but interesting Java topics to be
covered include Apache Ant, XML-Beans, and Java
Message Service - XML and XML Schema
- We will provide introductory material.
- Necessary to understand Web Service standards
- XML/Java Programming
- XML Databases (Xindice, Sleepycat)
- XPath, XQuery
3Course Topics Web and Grid Services
- Overview Material
- Grid and Web Service Architectures
- Basic Web Service Standards
- WSDL, SOAP structure and definitions
- Building services in Java Apache Axis
- Advanced Web Services Emerging capabilities
- WS-ReliableMessaging, WS-Security, WS-Transaction
- Computational Grids
- Globus Toolkit 2
- Java COG Kit for Globus programming
- Grids Meet Web Services
- Open Grid Service Architecture/Infrastructure
- Implementations GSX from Indiana University
- The Semantic Grid Information Models for
Describing Resources - RDF, DAML-OIL, and OWL
4What are we doing
- This is a semester-long course on Grids (viewed
as technologies and infrastructure) and the
application mainly to science but also to
business and government - We will assume a basic knowledge of the Java
language and then interweave 6 topic areas
first four cover technologies that will be used
by students - Advanced Java including networking, Java Server
Pages and perhaps servlets - XML Specification, Tools, Linkage to Java
- Web Services Basic Ideas, WSDL, Axis and Tomcat
- Grid Systems GT3/Cogkit, Gateway, XSOAP, Portlet
- Advanced Technology Surveys CORBA as history,
OGSA-DAI, security, Semantic Grid, Workflow - Applications Bioinformatics, Particle Physics,
Engineering, Crises, Computing-on-demand Grid,
Earth Science -
5 Grid Computing Making The Global Infrastructure
a Reality
- Based on work done in preparing book edited
withFran Berman andAnthony J.G. Hey, - ISBN 0-470-85319-0
- Hardcover 1080 Pages
- Published March 2003
- http//www.grid2002.org
6 Other
- See the webcast in an Oracle technology
serieshttp//webevents.broadcast.com/techtarget/O
racle/100303/index.asp?loc10 - See also the Gap Analysishttp//grids.ucs.india
na.edu/ptliupages/publications/GapAnalysis30June03
v2.pdf - We can send you nicely printed versions of this
- End of this is a good collection of references
and it gives both a general survey of current
Grids and specific examples from UK - Appendix with more details ishttp//grids.ucs.in
diana.edu/ptliupages/publications/Appendix30June03
.pdf - See also GlobusWorld http//www.globusworld.org/
- and the Grid Forum http//www.gridforum.org
7e-moreorlessanything and the Grid
- e-Business captures an emerging view of
corporations as dynamic virtual organizations
linking employees, customers and stakeholders
across the world. - The growing use of outsourcing is one example
- e-Science is the similar vision for scientific
research with international participation in
large accelerators, satellites or distributed
gene analyses. - The Grid integrates the best of the Web,
traditional enterprise software, high performance
computing and Peer-to-peer systems to provide the
information technology e-infrastructure for
e-moreorlessanything. - A deluge of data of unprecedented and inevitable
size must be managed and understood. - People, computers, data and instruments must be
linked. - On demand assignment of experts, computers,
networks and storage resources must be supported
8So what is a Grid?
- Supporting human decision making with a network
of at least four large computers, perhaps six or
eight small computers, and a great assortment of
disc files and magnetic tape units - not to
mention remote consoles and teletype stations -
all churning away. (Licklider 1960) - Coordinated resource sharing and problem solving
in dynamic multi-institutional virtual
organizations - Infrastructure that will provide us with the
ability to dynamically link together resources as
an ensemble to support the execution of
large-scale, resource-intensive, and distributed
applications. - Realizing thirty year dream of science fiction
writers that have spun yarns featuring worldwide
networks of interconnected computers that behave
as a single entity.
9What is a High Performance Computer?
- We might wish to consider three classes of
multi-node computers - 1) Classic MPP with microsecond latency and
scalable internode bandwidth (tcomm/tcalc 10 or
so) - 2) Classic Cluster which can vary from
configurations like 1) to 3) but typically have
millisecond latency and modest bandwidth - 3) Classic Grid or distributed systems of
computers around the network - Latencies of inter-node communication 100s of
milliseconds but can have good bandwidth - All have same peak CPU performance but
synchronization costs increase as one goes from
1) to 3) - Cost of system (dollars per gigaflop) decreases
by factors of 2 at each step from 1) to 2) to 3) - One should NOT use classic MPP if class 2) or 3)
suffices unless some security or data issues
dominates over cost-performance - One should not use a Grid as a true parallel
computer it can link parallel computers
together for convenient access etc.
10e-Science
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it. This is a
major UK Program - e-Science reflects growing importance of
international laboratories, satellites and
sensors and their integrated analysis by
distributed teams - CyberInfrastructure is the analogous US initiative
Grid Technology supports e-Science and
CyberInfrastructure It is software (middeleware)
built on top of networks
11Global Terabit Research Network
- The Grid software and resources run on top of
high performance global networks
12USA Network
13Terabit Networks
- Network performance will increase faster than
Moores law partly because optical fiber has
almost unlimited bandwidth and partly because
there are many old networks to be replaced - Home dial-ups (56kbit) ? DSL/Cable Modem (2
megabits/sec) ? FTTP (Fiber to the Premise at
gigabit performance) - 2006 Goal of Global Terabit Research
NetworkInternational National Backbone
Organization Optical Desktop Copper Desktop
is10001000100101 Gigabit/sec
14e-Business and (Virtual) Organizations
- Enterprise Grid supports information system for
an organization includes university computer
center, (digital) library, sales, marketing,
manufacturing - Outsourcing Grid links different parts of an
enterprise together (Gridsourcing) - Manufacturing plants with designers
- Animators with electronic game or film designers
and producers - Coaches with aspiring players (e-NCAA or e-NFL
etc.) - Customer Grid links businesses and their
customers as in many web sites such as amazon.com - e-Multimedia can use secure peer-to-peer Grids to
link creators, distributors and consumers of
digital music, games and films respecting rights - Distance education Grid links teacher at one
place, students all over the place, mentors and
graders shared curriculum, homework, live
classes
15e-Defense and e-Crisis
- Grids support Command and Control and provide
Global Situational Awareness - Link commanders and frontline troops to
themselves and to archival and real-time data
link to what-if simulations - Dynamic heterogeneous wired and wireless networks
- Security and fault tolerance essential
- System of Systems Grid of Grids
- The command and information infrastructure of
each ship is a Grid each fleet is linked
together by a Grid the President is informed by
and informs the national defense Grid - Grids must be heterogeneous and federated
- Crisis Management and Response enabled by a Grid
linking sensors, disaster managers, and first
responders with decision support
16Classes of Computing Grid Applications
- Running Pleasing Parallel Jobs as in United
Devices, Entropia (Desktop Grid) cycle stealing
systems - Can be managed (inside the enterprise as in
Condor) or more informal (as in SETI_at_Home) - Computing-on-demand in Industry where jobs
spawned are perhaps very large (SAP, Oracle ) - Support distributed file systems as in Legion
(Avaki), Globus with (web-enhanced) UNIX
programming paradigm - Particle Physics will run some 30,000
simultaneous jobs this way - Pipelined applications linking data/instruments,
compute, visualization - Seamless Access where Grid portals allow one to
choose one of multiple resources with a common
interfaces
17Utility Computing
- An important business application of Grids is
utility computing - Namely support a pool of computers to be assigned
as needed to take-up extra demand - Pool shared between multiple applications
- One his application is common in academia where
different simulations share resources - Web Servers
- Financial Modeling
- Data-mining
- Simulation response to crisis like forest fire or
earthquake - Architecture is Farm of Grid Services connected
to Internet not cluster of computers connected to
each other
18Resources-on-demand
- Computing-on-demand uses dynamically assigned
(shared) pool of resources to support excess
demand in flexible cost-effective fashion
Static Assignment with redundancy
Dynamic on-demand Assignment
19Some Important Styles of Grids
- Computational Grids were origin of concepts and
link computers across the globe high latency
stops this from being used as parallel machine - Knowledge and Information Grids link sensors and
information repositories as in Virtual
Observatories or BioInformatics - More detail on next slide
- Education Grids link teachers, learners, parents
as a VO with learning tools, distant lectures
etc. - e-Science Grids link multidisciplinary
researchers across laboratories and universities - Community Grids focus on Grids involving large
numbers of peers rather than focusing on linking
major resources links Grid and Peer-to-peer
network concepts - Semantic Grid links Grid, and AI community with
Semantic web (ontology/meta-data enriched
resources) and Agent concepts
20Information/Knowledge Grids
- Distributed (10s to 1000s) of data sources
(instruments, file systems, curated databases ) - Data Deluge 1 (now) to 100s petabytes/year
(2012) - Moores law for Sensors
- Possible filters assigned dynamically (on-demand)
- Run image processing algorithm on telescope image
- Run Gene sequencing algorithm on compiled data
- Needs decision support front end with what-if
simulations - Metadata (provenance) critical to annotate data
- Integrate across experiments as in
multi-wavelength astronomy
Data Deluge comes from pixels/year available
212.4 Petabytes Today
22SERVOGrid Solid Earth Research Virtual
Observatory will link Australia, Japan, USA
23SERVOGrid Requirements
- Seamless Access to Data repositories and large
scale computers - Integration of multiple data sources including
sensors, databases, file systems with analysis
system - Including filtered OGSA-DAI (Grid database
access) - Rich meta-data generation and access with
SERVOGrid specific Schema extending openGIS
(Geography as a Web service) standards and using
Semantic Grid - Portals with component model for user interfaces
and web control of all capabilities - Collaboration to support world-wide work
- Basic Grid tools workflow and notification
24DAME
In flight data
5000 engines
Gigabyte per aircraft per Engine per
transatlantic flight
Global Network Such as SITA
Ground Station
Airline
Engine Health (Data) Center
Maintenance Centre
Internet, e-mail, pager
Rolls Royce and UK e-Science ProgramDistributed
Aircraft Maintenance Environment
25NASA Aerospace Engineering Grid
26Virtual Observatory Astronomy GridIntegrate
Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible X-ray
Galaxy Density Map
27e-Chemistry LaboratoryExperiments-on-demand
Grid-enabled Output Streams
Grid Resources
28CERN LHC Data Analysis Grid
29Typical Grid Architecture
UserServices
CoreGrid
30Sources of Grid Technology
- Grids support distributed collaboratories or
virtual organizations integrating concepts from - The Web
- Agents
- Distributed Objects (CORBA Java/Jini COM)
- Globus, Legion, Condor, NetSolve, Ninf and other
High Performance Computing activities - Peer-to-peer Networks
- With perhaps the Web and P2P networks being the
most important for Information Grids and Globus
for Compute Grids
31The Essence of Grid Technology?
- We will start from the Web view and assert that
basic paradigm is - Meta-data rich Web Services communicating via
messages - These have some basic support from some runtime
such as .NET, Jini (pure Java), Apache
TomcatAxis (Web Service toolkit), Enterprise
JavaBeans, WebSphere (IBM) or GT3 (Globus Toolkit
3) - These are the distributed equivalent of operating
system functions as in UNIX Shell - Called Hosting Environment or platform
- W3C standard WSDL defines IDL (Interface
standard) for Web Services
32Meta-data
- Meta-data is usually thought of as data about
data - The Semantic Web is at its simplest considered as
adding meta-data to web pages - For example, the hospital web-page has meta-data
telling you its location, phone-number,
specialties which can be used to automate
Google-style searches to allow planning of
disease/accident treatment from web - Modern trend (Semantic Grid) is meta-data about
web-services e.g. specify details of interface
and useage - Such as that a bioinformatics service is free or
bandwidth input is of limited amount - Provenance history and ownership of data very
important
33A typical Web Service
- In principle, services can be in any language
(Fortran .. Java .. Perl .. Python) and the
interfaces can be method calls, Java RMI
Messages, CGI Web invocations, totally compiled
away (inlining) - The simplest implementations involve XML messages
(SOAP) and programs written in net friendly
languages like Java and Python
PaymentCredit Card
Web Services
WSDL interfaces
Warehouse Shipping control
WSDL interfaces
Web Services
34Services and Distributed Objects
- A web service is a computer program running on
either the local or remote machine with a set of
well defined interfaces (ports) specified in XML
(WSDL) - Web Services (WS) have many similarities with
Distributed Object (DO) technology but there are
some (important) technical and religious points
(not easy to distinguish) - CORBA Java COM are typical DO technologies
- Agents are typically SOA (Service Oriented
Architecture) - Both involve distributed entities but Web
Services are more loosely coupled - WS interact with messages DO with RPC (Remote
Procedure Call) - DO have factories WS manage instances
internally and interaction-specific state not
exposed and hence need not be managed - DO have explicit state (statefull services) WS
use context in the messages to link interactions
(statefull interactions) - Claim DOs do NOT scale WS build on experience
(with CORBA) and do scale
35Details of Web Service Protocol Stack
- UDDI finds where programs are
- remote (distributed) programs are just Web
Services - (not a great success)
- WSFL links programs together(under revision as
BPEL4WS) - WSDL defines interface (methods, parameters, data
formats) - SOAP defines structure of message including
serialization of information - HTTP is negotiation/transport protocol
- TCP/IP is layers 3-4 of OSI
- Physical Network is layer 1 of OSI
36Classic Grid Architecture
Resources
Content Access
Composition
Middle TierBrokers Service Providers
Netsolve
Security
Collaboration
Computing
Middle Tier becomes Web Services
Clients
Users and Devices
37Grid Services for the Education Process
- Learning Object XML standards already exist
- Registration
- Performance (grading)
- Authoring of Curriculum
- Online laboratories for real and virtual
instruments - Homework submission
- Quizzes of various types (multiple choice, random
parameters) - Assessment data access and analysis
- Synchronous Delivery of Curricula including
Audio/Video Conferencing and other synchronous
collaborative tools as Web Services - Scheduling of courses and mentoring sessions
- Asynchronous access, data-mining and knowledge
discovery - Learning Plan agents to guide students and
teachers
38Grid Learning Model
- Education and Research Grids share some services
both for content and process - For example collaboration services are largely
identical - Research will use much larger simulation engines
to get high resolution results - Maybe a researcher uses a CAVE to visualize
education a Macintosh - But both can share data services but run through
different filters to select for precision
(research) or pedagogical value (education) - Education has digital textbook frontend to
resources of the research Grid - Both use same workflow technologies to link
services together
39(No Transcript)
40Some Observations
- Traditional Grids manage and share
asynchronous resources in a rather centralized
fashion - Peer-to-peer networks are just like Grids with
different implementations of message-based
services like registration and look-up - Collaboration systems like WebEx/Placeware
(Application sharing) or Polycom (audio/video
conferencing) can be viewed as Grids - Computers are fast and getting faster. One can
afford many strategies that used to be
unrealistic including rich usually XML based
messaging - Web Services interact with messages
- Everything (including applications like
PowerPoint) will be a Web Service? - Grids, P2P Networks, Collaborative Environments
are (will be) managed message-linked Web Services
41Peer to Peer Grid
Peers
Service FacingWeb Service Interfaces
Peers
User FacingWeb Service Interfaces
Peer to Peer Grid
A democratic organization
42System and Application Services?
- There are generic Grid system services security,
collaboration, persistent storage, universal
access - OGSA (Open Grid Service Architecture) is
implementing these as extended Web Services - An Application Web Service is a capability used
either by another service or by a user - It has input and output ports data is from
sensors or other services - Consider Satellite-based Sensor Operations as a
Web Service - Satellite management (with a web front end)
- Each tracking station is a service
- Image Processing is a pipeline of filters which
can be grouped into different services - Data storage is an important system service
- Big services built hierarchically from basic
services - Portals are the user (web browser) interfaces to
Web services
43Satellite Science Grid Environment
44What is Happening?
- Grid ideas are being developed in (at least) two
communities - Web Service W3C, OASIS
- Grid Forum (High Performance Computing,
e-Science) - Service Standards are being debated
- Grid Operational Infrastructure is being deployed
- Grid Architecture and core software being
developed - Particular System Services are being developed
centrally OGSA framework for this in - Lots of fields are setting domain specific
standards and building domain specific services - There is a lot of hype
- Grids are viewed differently in different areas
- Largely computing-on-demand in industry (IBM,
Oracle, HP, Sun) - Largely distributed collaboratories in academia
45OGSA OGSI Hosting Environments
- Start with Web Services in a hosting environment
- Add OGSI to get a Grid service and a component
model - Add OGSA to get Interoperable Grid correcting
differences in base platform and adding key
functionalities
46Technical Activities of Note
- Look at different styles of Grids such as
Autonomic (Robust Reliable Resilient) - New Grid architectures hard due to investment
required - Critical Services Such as
- Security build message based not connection
based - Notification event services
- Metadata Use Semantic Web, provenance
- Databases and repositories instruments, sensors
- Computing Submit job, scheduling, distributed
file systems - Visualization, Computational Steering
- Fabric and Service Management
- Network performance
- Program the Grid Workflow
- Access the Grid Portals, Grid Computing
Environments
47Issues and Types of Grid Services
- 1) Types of Grid
- R3
- Lightweight
- P2P
- Federation and Interoperability
- 2) Core Infrastructure and Hosting Environment
- Service Management
- Component Model
- Service wrapper/Invocation
- Messaging
- 3) Security Services
- Certificate Authority
- Authentication
- Authorization
- Policy
- 4) Workflow Services and Programming Model
- Enactment Engines (Runtime)
- Languages and Programming
- Compiler
- 7) Information Grid Services
- OGSA-DAI/DAIT
- Integration with compute resources
- P2P and database models
- 8) Compute/File Grid Services
- Job Submission
- Job Planning Scheduling Management
- Access to Remote Files, Storage and Computers
- Replica (cache) Management
- Virtual Data
- Parallel Computing
- 9) Other services including
- Grid Shell
- Accounting
- Fabric Management
- Visualization Data-mining and Computational
Steering - Collaboration
- 10) Portals and Problem Solving
Environments - 11) Network Services
4810 Job Status
1 Job Management Service (Grid Service Interface
to user or program client)
2 Schedule and control Execution
8 VirtualData
3 Access to Remote Computers
6 File and Storage Access
7 CacheDataReplicas
5 Data Transfer
Technology Components of (Services in)a
Computing Grid
9 Grid MPI
49Approach
Application WS
- Build on e-Science methodology and Grid
technology - Science applications with multi-scale models,
scalable parallelism, data assimilation as key
issues - Data-driven models for earthquakes, climate,
environment .. - Use existing code/database technology
(SQL/Fortran/C) linked to Application Web/OGSA
services - XML specification of models, computational
steering, scale supported at Web Service level
as dont need high performance here - Allows use of Semantic Grid technology
50UserServices
GridComputingEnvironments
CoreGrid
51Why we can dream of using HTTP and that slow stuff
- We have at least three tiers in computing
environment - Client (user portal)
- Middle Tier (Web Servers/brokers)
- Back end (databases, files, computers etc.)
- In Grid programming, we use HTTP (and used to use
CORBA and Java RMI) in middle tier ONLY to
manipulate a proxy for real job - Proxy holds metadata
- Control communication in middle tier only uses
metadata - Real (data transfer) high performance
communication in back end
52Virtualization
- The Grid could and sometimes does virtualize
various concepts should do more - Location URI (Universal Resource Identifier)
virtualizes URL (WSAddressing goes further) - Replica management (caching) virtualizes file
location generalized by GriPhyn virtual data
concept - Protocol message transport and WSDL bindings
virtualize transport protocol as a QoS request - P2P or Publish-subscribe messaging virtualizes
matching of source and destination services - Semantic Grid virtualizes Knowledge as a
meta-data query - Brokering virtualizes resource allocation
- Virtualization implies all references can be
indirect and needs powerful mapping (look-up)
services -- metadata
53Integration of Data and Filters
- One has the OGSA-DAI Data repository interface
combined with WSDL of the (Perl, Fortran, Python
) filter - User only sees WSDL not data syntax
- Some non-trivial issues as to where the filtering
compute power is - Microsoft says filter next to data
54SERVOGrid Complexity Computing Environment
Parallel SimulationService
DatabaseService
ComputeService
Sensor Service
Middle Tier with XML Interfaces
ApplicationService-1
XML Meta-dataService
ApplicationService-2
CCE Control Portal Aggregation
ComplexitySimulationService
ApplicationService-3
Users
VisualizationService
55OGSA-DAIGrid Services
AnalysisControl Visualize
Grid
Data
Filter
This Type of Grid integrates with Parallel
computing Multiple HPC facilities but only use
one at a time Many simultaneous data sources and
sinks
HPC Simulation
Grid Data Assimilation
Other Gridand Web Services
Distributed Filters massage data For simulation
SERVOGrid (Complexity) Computing Model
56Two-level Programming I
- The paradigm implicitly assumes a two-level
Programming Model - We make a Service (same as a distributed object
or computer program running on a remote
computer) using conventional technologies - C Java or Fortran Monte Carlo module
- Data streaming from a sensor or Satellite
- Specialized (JDBC) database access
- Such services accept and produce data from users
files and databases - The Grid is built by coordinating such services
assuming we have solved problem of programming
the service
57Two-level Programming II
- The Grid is discussing the composition of
distributed services with the runtime interfaces
to Grid as opposed to UNIX pipes/data streams - Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core
programs - Such interpretative environments are the single
processor analog of Grid Programming - Some projects like GrADS from Rice University are
looking at integration between service and
composition levels but dominant effort looks at
each level separately
58Conclusions
- Grids are inevitable and pervasive
- Can expect Web Services and Grids to merge with a
common set of general principles but different
implementations with different scaling and
functionality trade-offs - e-Science will grow in importance as Science
grows as an international team sport affects
scientists and organizations - Enough is known that one can start today
- We will be flooded with data, information and
purported knowledge - One should be learning about Grids understanding
relevant Web and Grid standards and developing
new domain specific standards - Note many existing (standards) efforts assume
client-server and not a brokered service model
these will need to change!