Title: The Grid: The Past, Present, and Possible Future
1The Grid The Past, Present, and Possible Future
- Mark Baker
- ACET, University of Reading Tel 44 118 378
8615 E-mail Mark.Baker_at_computer.org - Web http//acet.rdg.ac.uk/mab
2Outline
- Characterisation of the Grid.
- The Evolution of the Grid.
- Convergence of Technologies
- WS-RF.
- The UK e-Science programme.
- E-Science applications
- The GridCast Project,
- OGSA-DAI.
- Summary and Conclusions
3Characterisation of the Grid
- In 2001, Foster, Kesselman and Tuecke refined
their original definition of a grid to - "co-ordinated resource sharing and problem
solving in dynamic, multi-institutional virtual
organizations - This definition is the one most commonly used to
day to abstractly define a grid.
4Characterisation of the Grid
- Foster later produced a checklist that could be
used to help understand exactly what can be
identified as a grid system, three parts - Co-ordinated resource sharing with no centralised
point of control and that the users resided
within different administrative domains - If not true it is probably the case that this is
not a grid system! - Standard, open, general-purpose protocols and
interfaces - If not, it is unlikely that system components
will be able to communicate or inter-operate, and
it is likely that we are dealing with an
application-specific system, and not the Grid. - Delivering non-trivial qualities of service -
here we are considering how the components that
make up a grid can be used in a co-ordinated way
to deliver combined services, which are
appreciably greater than sum of the individual
components - These services may be associated with throughput,
response time, meantime between failure,
security, or many other facets.
5Characterisation of the Grid
- From a commercial view point, IBM define a grid
as - a standards-based application/resource sharing
architecture that makes it possible for
heterogeneous systems and applications to share
compute and storage resources transparently
6What is not a Grid!
- A cluster, a network attached storage device, a
desktop PC, a scientific instrument, a network
these are not grids - Each might be an important component of a grid,
but by itself, it does not constitute a grid. - Screen saver/cycle stealers
- SETI_at_HOME, fold_at_home, etc,
- Other application specific to distributed
computing. - Most of the current Grid providers
- Proprietary technology with closed model of
operation. - Globus
- It is a toolkit to build a system that might work
as or within a grid. - Sun Grid Engine, Platform LSF and related.
- Almost anything referred to as a grid by
marketeers!
7Evolution of the Grid
- The early to mid 1990s marks the emergence of the
early metacomputing or grid environments. - Typically, the objective of these early
metacomputing projects was to provide
computational resources to a range of high
performance applications. - Two representative projects, in the vanguard of
this type of technology, were FAFNER and I-WAY
both cica 1995.
8Convergence of Technologies
- Both projects attempted to provide metacomputing
resources from opposite ends of the computing
spectrum - FAFNER was Web-based for factoring the RSA
challenge (130), capable of running on any
workstation with more than 4 Mbytes of memory,
and was a aimed at a trivially parallel
application. - IWAY was a means of unifying the resources of
large US supercomputing centres, and was targeted
at high-performance applications (compute/data
intensive). - Each project was in the vanguard of metacomputing
and helped pave the way for many of the
succeeding projects - FAFNER was the forerunner of the likes of
SETI_at_home, fold_at_home and Distributed.Net, - I-WAY was the same for Globus, Legion, and
UNICORE.
9Convergence of Technologies
- Since the emergence of the second generation of
systems (e.g. Globus/Legion circa 1995) there
has been a number of classes of wide-area
systems that have been developed - Grid-based, aimed at HPC compute/data
intensive, e.g. Globus/Legion/UNICORE - Object-based, e.g. CORBA/CCA/Jini/Java-RMI
- Web, e.g. Javelin, seti_at_home, Charlotte,
fold_at_home, ParaWeb, distributed.net - Enterprise - bespoke systems, such as IBMs
WebSphere, BAEs WebLogic, and Microsofts .Net
platform.
10Convergence of Technologies
- The developers in these four areas, over the
years, evolved their systems there were many
overlaps, various collaborations started, and to
an extent, a realisation that a unified approach
to the development of middleware to support
wide-area applications was arrived at. - Unifying standards bodies helped this process
for example GGF,OASIS, W3C, and IETF. - Convergence of WS, HPC, OO, SOA, .
- A results of this was that the Open Grid Service
Architecture (OGSA) which was announced at GGF4
in Feb 2002, and was declared their flagship
architecture in March 2004. - OGSA was based on Web Services technologies.
11The OGSA Architecture
12Convergence of Technologies
- The OGSA document, first released at GGF11 in
June 2004, gave current thinking on the required
capabilities and was released in order to
stimulate further discussion. - Note instantiations of OGSA depend on emerging
specifications - Currently the OGSA document does not contain
sufficient in formation to develop an actual
implementation of OSGA-based system. - The first OGSA-based reference implementation was
GT3 OGSI, released in July 2003. - Major problems were identified with OGSI, some
where political and other were technical.
13Convergence of Technologies
- In Jan 2004, a significant shift happened when
WS-RF was announced. - Problems were identified with OGSI
- Re-implementation of a lot of layers which are
already standardised in commodity WS, for example
GSDL, - Felt too much in one specification,
- Did not work well with existing tooling for WS,
- Too OO!
- Whereas with WS-RF
- New mechanisms build on top of existing WS
standards and adds a few, - Basically rebuilding OGSI functionality using WS
tooling, extending where necessary, - Dependant on six new or emerging WS
specifications!
14Grid and Web ServicesConvergence!
Grid
GT1
GT2
OGSI
Started far apart
WSRF
WSDL 2, WSDM
WSDL, WS-
Web
HTTP
WSRF meant that Grid and Web communities are
moving forward on a common base!
15WSRF Family of Specifications
- WSRF is a framework consisting of a number of
specifications - WS-Resource Properties,
- WS-Resource Lifetime,
- WS-Service Groups,
- WS-Notification,
- WS-BaseFaults,
- WS-Renewable References.
- Associated WS specifications
- WS-Addressing,
16WS-RF Specifications
- WS-ResourceProperties
- A WS-Resource has zero or more properties
expressible in XML, representing a view on the
WS-Resource's state. - WS-ResourceLifetime
- This specification standardises the means by
which a WS-Resource can be destroyed, monitored
and manipulated. - WS-ServiceGroup
- This specification defines a means of
representing and managing heterogeneous,
by-reference, collections of Web services. - WS-BaseFaults
- Defines an XML Schema for base faults, along with
rules for how this base fault type is used and
extended by Web services.
17WS-RF Specifications
- WS-Addressing
- Provides a mechanism to place the target, source
and other important address information directly
within a Web services message. - WS-Notification
- WS-BaseNotification defines the interfaces for
NotificationProducers and NotificationConsumers..
- WS-BrokeredNotification defines the interface for
the NotificationBroker, which is an intermediary
that among other things, allows the publication
of messages from entities that are not themselves
service providers. - WS-Topics defines a mechanism to organise and
categorise items of interest for subscription
known as "topics.
18Emerging Grid Standards
April 2005 issue of IEEE Computer
19Emerging Grid Standards
20e-Science
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it. - e-Science will change the dynamics of the way
science is undertaken. - John Taylor
- Director General of Research
Councils - Office of Science and Technology
21The Drivers for e-Science
- More data
- Instrument resolution and laboratory automation,
- Storage capacity and data sources.
- More computation
- Computations available, simulations
doubling every year - Faster networks
- Bandwidth,
- Need to schedule.
- More inter-play and collaboration
- Between scientists, engineers, computer
scientists etc., - Between computation and data.
22The Drivers for e-Science
- Collaboration,
- Data Deluge,
- Digital Technology
- Ubiquity,
- Cost reduction,
- Performance increase.
- In summary
- Shared data, information and computation by
geographically dispersed communities.
23The UK e-Science Programme
- First Phase 2001 2004
- Application Projects
- 74M,
- All areas of science and engineering.
- Core Programme
- 15M Research infrastructure,
- 40M Collaborative industrial projects.
- Second Phase 2003 2006
- Application Projects
- 96M,
- All areas of science and engineering.
- Core Programme
- 16M Research Infrastructure,
- DTI Technology Fund.
24The UK e-Science Programme
- An exciting portfolio of Research Council
e-Science projects - Beginning to see e-Science infrastructure deliver
some early wins in several areas, - Astronomy, Chemistry, Bioinformatics,
Engineering, Environment, Healthcare . - The UK unique in strong industrial component
- Over 60 UK companies contributing over 30M,
- Engineering, Pharmaceutical, Petrochemical, IT
companies, Commerce, Media,
25And the future
- Grid Operations Centre, National Grid Service and
AAA services, - Open Middleware Infrastructure Institute,
- National e-Science Institute,
- Digital Curation Centre,
- International Standards Activity,
- Needs continued support from Research Councils
with identifiable e-Science funding lines post
2006.
26E-Science Case Studies
- The GridCast ProjectGrid based Broadcast
Infrastructures - http//www.qub.ac.uk/escience
27The Grid Scenario The BBC Nations -BBC NI,
Scotland and Wales
The focus of the project is distribution of
stored media files and their management in
multiple sites.
- BBC Nations provide customised services in each
nation. - Television programmes are distributed to BBC
Nations from BBC Network (London) using
dedicated leased ATM circuits.
28Grid Infrastructure
- Technical
- High-bandwidth network connections inter-connect
broadcast locations, - Network bandwidth means geography is less of an
issue. - Organisational
- Less centralised.
29Overview
- The aim was develop a baseline media grid to
support a broadcaster - Manage distributed collections of stored media,
- Prototype security and access mechanisms,
- Integrate processing and technical resources,
- Integrate with media standards and hardware.
- To analyse Quality of Service issues
- Analyse remote content distribution
infrastructures, - Analyse remote service provision,
- To analyse reactivity, reliability and resilience
issues in a grid-based broadcast infrastructure
30Characteristics
- Stored media files are Gbytes and increasing
- 1 hour 200 Gbytes distributes 1 petabyte /year
- Management and distribution is significant
technically, - Metadata which includes location, timings,
artists, storage formats is an integral part of
broadcast structure, - Content is a valuable commodity access,
modification, copying must be controlled, - High levels of quality required.
31A Virtualised Infrastructure
Sound Improvement
32Model Grid Service Operation
- A schedule is registered with the schedule
(network) management service. - The schedule is automatically distributed to
(nation) the schedule management component - Local controller receives notification of
schedule availability. - The Nation Controller registers (nation) the
schedule with local schedule management, - Transport services develop a transport plan for
content movement, - Scheduled transport service moves content as
defined in transport plan.
33Broadcast grid issues
- Business change
- A revised organisational model (services and
resources), - Each broadcast location gains control.no network
schedule. - Resilience
- Resource sharing and no single programme
repository, - A BBC Nation can be anywhere!
- Reliability
- Use resources available in other BBC sites or
from 3rd party suppliers. - Cost
- Better use of resources and less need for backup
resources, - Less dependence on particular vendors or
suppliers. - Customisation
- Schedule, local resources, local capabilities.
- Interoperability
- Business model facilitates sharing with other
broadcasters.
34GridCast A Summary
- Television programme distribution
- Using a grid architecture to distribute
programmes between broadcast sites - Concentrating initially on recorded material.
- Television programme production
- Using a grid architecture to monitor and
facilitate programme production. - Television production technical assets
- Using a grid architecture to facilitate access
and use of broadcasting resources in television
programme production.
35OGSA Data Access and Integration
- Middleware for distributed data access over the
Grid. - UK e-Science Edinburgh, Manchester and
Newcastle. - Industry partners IBM, Oracle and Microsoft.
- OGSA-DAI DBMS XML Dist. SQL
Dist. Query
OGSA-DAI
TCP/IP
OGSA/WSRF
TCP/IP
36OGSA-DAI Projects
- OGSA-DAI is one of the Grid Middleware Centre
Projects. - Collaboration between
- EPCC,
- IBM ( Oracle in phase 1),
- National e-Science Centre,
- Manchester University,
- Newcastle University.
- Project funding
- OGSA-DAI, 2002-03
- 3.3 million from the UK Core e-Science funding
programme, - DAIT (DAI Two), 2003-06
- 1.3 million from the UK e-Science Core Programme
II. - "OGSA-DAI" is a trade mark.
Funded by UKs Department of Trade Industry
Engineering Physical Sciences Research Council
as part of the e-Science Core Programme
37OGSA-DAI User Project classification
Physical Sciences
Biological Sciences
OGSA-DAI
Computer Sciences
Commercial Applications
38Example Projects Using OGSA-DAI
Bridges (http//www.brc.dcs.gla.ac.uk/projects/bri
dges/)
N2Grid (http//www.cs.univie.ac.at/institute/index
.html?project-8080)
BioSimGrid (http//www.biosimgrid.org/)
AstroGrid (http//www.astrogrid.org/)
BioGrid (http//www.biogrid.jp/)
GEON (http//www.geongrid.org/)
OGSA-DAI (http//www.ogsadai.org.uk)
eDiaMoND (http//www.ediamond.ox.ac.uk/)
OGSA-WebDB (http//www.gtrc.aist.go.jp/dbgrid/)
FirstDig (http//www.epcc.ed.ac.uk/firstdig/)
GeneGrid (http//www.qub.ac.uk/escience/projects.p
hpgenegrid)
INWA (http//www.epcc.ed.ac.uk/)
myGrid (http//www.mygrid.org.uk/)
ODD-Genes (http//www.epcc.ed.ac.uk/oddgenes/)
IU RGRBench (http//www.cs.indiana.edu/plale/proj
ects/RGR/OGSA-DAI.html)
39The FirstDIG Project
- The FirstDIG (First Data Investigation on the
Grid) project deployed OGSA-DAI within the First
South Yorkshire bus operational environment - First plc are the UKs largest public transport
operator, - Within their UK bus operations they have a huge
range of data sources - vehicle mileage, fuel
consumption, maintenance records, revenue,
reliability, etc. - A generic Grid Data Service Browser has been
built and used to interrogate and combine data
from OGSA-DAI enabled data sources to answer
business questions posed by First South
Yorkshire.
40Summary
- The e-Science programme has pump primed the take
up of the Grid in the UK. - The programme is perceived as being a great
success - given the UK a lead in e-Science. - It has not been without its problems not least
of these was the move to WSRF and the take up of
the various WS specifications. - Output from programme has led to a number of
other projects that will address the current gaps
in grid technologies. - New funding related to infrastructure (JISC)
support by implementing and deploying the
technologies (VREs). - All the projects are collaborations between
academia and industry.
41Some Further Work!
- Robust, reliable and inter-operable middleware
that can scale to support a global
infrastructure - UK OMII meant to be hardening existing
software. - Funding for the implementation and deployment,
rather than just research - UK JISC for academia,
- UK DTI for commerce/industry.
- Security and trust mechanisms
- Take-up of Semantic Web technologies to speed the
automation of component interaction. - Open source software and agreed standards
- GGF, Oasis, EGA, IETF, W3C etc.
- Standards desperately need to standardise the
standards - Educational aspects under graduate other
courses.
42Summary Successful Grid Areas
- Distributed database integration intelligent
queries and data-mining across heterogeneous data
sources. - Parameter sweeps run sequential tasks many
times with different input data - Coupled simulations the output of one
simulation is the input of another - Distributed resources sensors and equipment,
processing, data silos, and visualisation at
different remote sites. - Application Service Provision services on
demand!
43Acknowledgements and links
- Prof Ron Perrott and the Belfast e-Science
Centre - http//www.qub.ac.uk/escience
- Prof Malcolm Atkinson, NeSC
- The OGSA-DAI Project Site
- http//www.ogsadai.org.uk
44Shameless Plug
http//www.amazon.co.uk/exec/obidos/ASIN/047009417
6/qid3D1113207878/202-7878523-7639008
45DS Online
46DS Online
47The End!
48Other e-Science Projects
- Comb-e-Chem, a combinatorial chemistry
application - http//www.combechem.org/ - The system allows students and researchers to
virtually mix chemicals together and then try to
identify the compounds they produce and the
particular benefits these compounds may haveL - Chemistry, CS, Maths, and IT Innovation.
- DAME - Distributed Aircraft Maintenance
Environment http//www.cs.york.ac.uk/dame/ - Aims to produce sensors that measure temperature,
vibration, and pressure of airplane engines as
they fly from one location to another. - Instead of waiting until a plane lands, sensor
data will be sampled in flight and compared with
existing patterns. - If problems are detected mechanics can replace
the damaged or faulty engine parts as soon as the
plane lands and before anything drastic occurs - Universities, Rolls Royce, Data Systems
Solutions, and Cybula.
49Other e-Science Projects
- The Geodise project is a grid-enabled
optimisation and design search program for
engineers -http//www.geodise.org/ - The project will allow aerospace companies, speed
up the design process of their vehicles by
capturing knowledge from previous designs and
putting it together for simulations - Universities, BAE Systems and Rolls-Royce and
Fluent. - Discovery Net - http//www.discovery-on-the.net/
- This project is producing high-throughput sensing
applications such as environmental sensors and
bioinformatic monitors. - The aim is for doctors to someday be able to
monitor the blood pressure, temperature, and drug
intake of all their patients. - A sensor on the patient's body will communicate
the data through a mobile wireless communication
device to the doctor's office. - Universities, InforSense, deltaDOT, and
HydroVenturi
50References
- Ian Foster and Carl Kesselman (Editors), The
Grid Blueprint for a New Computing
Infrastructure, published by Morgan Kaufmann
Publishers 1st edition (November 1, 1998), ISBN
1558604758 - I. Foster, C. Kesselman, and S. Tuecke, The
Anatomy of the Grid Enabling Scalable Virtual
Organizations, International J. Supercomputer
Applications, 15(3), 2001. - Three Pooint Checklist, http//www.gridtoday.com/0
2/0722/100136.html - IBM Grid Computing, http//www-1.ibm.com/grid/grid
_literature.shtml - FAFNER, http//www.npac.syr.edu/factoring.html
- I. Foster, J. Geisler, W. Nickless, W. Smith, S.
Tuecke Software Infrastructure for the I-WAY
High Performance Distributed Computing
Experiment in Proc. 5th IEEE Symposium on High
Performance Distributed Computing. pp. 562-571,
1997. - WS-RF, http//www.globus.org/wsrf