Title: The Grid and enabling applications for it
1The Grid and enabling applications for it
CCPN/TEMBLOR Workshop, Hinxton, 19th May 2004
Mark Hayes, Technical Director, Cambridge
eScience Centre
2In the beginning
"The collection of people, hardware, and
software... will become a node in a
geographically distributed computer network.
Through the network... all the large computers
can communicate with one another. And through
them, all the members of the community can
communicate with other people, with programs,
with data, or with a selected combination of
those resources. J.C.R.Licklider, The Computer
as a Communication Device Science and
Technology, April 1968
The ARPAnet in 1970
3International connectivity - 1991
4International connectivity - 1997
5International bandwidth
From 3D geographic network displays - Cox et
al, ACM Sigmod Record - December 1996
6What does the Internet look like?
http//www.cybergeography.org/
7The World Wide Web
Invented at CERN by Tim Berners-Lee in 1989 as a
tool for collaboration and information sharing in
the particle physics community.
8 The Grid - 1998
Editors Foster Kesselman 700 pages 22
chapters 40 authors A computational Grid is a
hardware and software infrastructure that
provides dependable, consistent, pervasive
and inexpensive access to high-end computational
capabilities. Analogy with the electrical power
grid - just plug in.
9 The Grid - 2003
Editors Berman, Hey, Fox 1000 pages 43
chapters 116 authors
Applications, data sharing and virtual
communities.
104 types of Grid
- CPU intensive cycle scavenging (SETI_at_home)
- Data sharing
- Application provision
- Human-human interaction (e.g. Access Grid)
11 Early distributed computing
1.2 million CPU years so far...
Brute force attempt to crack strong encryption
Protein folding
12Its not just compute cycles...
An exponential growth in data from many areas of
science.
13The data explosion - some big numbers
- CFD turbulence simulations - 100TB
- BaBar particle physics experiment - 1TB/day
- CERN LHC will generate 1GB/s or 10PB/year
- VLBA radio telescope generates 1GB/s today
- NCBI/EMBL database is only 0.5TB but doubling
each year - brain imaging - 4TB/brain at full colour, 10mm
resolution - (4PB/brain at 1mm i.e. cellular resolution)
- Pixar - 100TB/movie
FTP and GREP are not adequate (Jim Gray)
14Application provision
- Google - 10K cpus, 2PB database (2 years ago)
- free email services - HotMail, Yahoo! 2-10PB
storage - netsolve - numerical algorithms on demand
- with Matlab Mathematica plugins
- renderfarm.net - graphics rendering on demand
15The Access Grid
High end video conferencing and collaboration
technology. O(100) nodes world wide.
...one of the most compelling glimpses into the
future Ive seen since I first saw NCSA Mosaic.
Larry Smarr
16The Grid in the UK
Pilot projects in particle physics, astronomy,
medicine, bioinformatics, environmental
sciences...
Contributing to international Grid software
development efforts
10 regional eScience Centres
17Some UK Grid resources
- Daresbury - loki - 64 proc Alpha cluster
- Manchester - green - 512 proc SGI Origin 3800
- Imperial - saturn - large SMP Sun
- Southampton - iridis - 400 proc.Intel Linux
cluster - Rutherford Appleton Lab - hrothgar - 32 proc
Intel Linux - Cambridge - herschel - 32 proc Intel Linux
cluster - ...
- coming soon 4x gt64 CPU JISC clusters, HPC(X)
18Applications on the UK Grid
Ion diffusion through radiation damaged crystal
structures (Mark Calleja, Earth Sciences,
Cambridge)
- Monte Carlo simulation lots of independent runs
- small input output
- more CPU -gt higher temperatures, better stats
- access to 100 CPUs on the UK Grid
- Condor-G client tool for farming out jobs
19Applications on the UK Grid
Reality Grid (Stephen Pickles, Robin Pinning -
Manchester)
- Fluid dynamics of complex mixtures, e.g
- oil, water and solid particles (mud)
- Used CPU at London, Cambridge
- Remote visualisation using SGI
- Onyx in Manchester (from a laptop
- in Sheffield)
- Computational steering
20Applications on the UK Grid
GENIE - Grid Enabled Integrated Earth system
model (Steven Newhouse, Murtaza Gulamali -
Imperial)
- Ocean-atmosphere modelling
- How does moisture transport from the
- atmosphere effect ocean circulation?
- 1000 independent 4000year runs
- (3 days real time!) on 200 CPUs
- Flocked condor pools at London Southampton
- Coupled modelling
211 buys...
- 1 day of cpu time
- 4 GB ram for a day
- 1 GB of network bandwidth
- 1 GB of disk storage
- 10 M database accesses
- 10 TB of disk access (sequential)
- 10 TB of LAN bandwidth (bulk)
22How do you move a terabyte?
Source Terascale SneaketNet, Jim Gray et al
23Some consequences
Compute cycles are (almost) free... by comparison
with network costs. -The cheapest and fastest
way to move 1TB of data out from CERN is still by
FedEx. Though this considers only bandwidth,
low latency networks are even more expensive!
(MPI over WAN doesnt work well.)
24What makes a good Grid application?
A distributed community of users. Tiny network
input output, huge compute requirement. Databas
e access storage is also expensive, therefore
put the computation near the data.
25Web services
- A web service is a network-accessible application
- identified by a URI
- e.g. http//terraservice.net/TerraService.asmx
?opGetTile - with an interface defined in terms of XML based
messages - these messages transported by internet
protocols (usually HTTP) - The application its interface definition
should be - discoverable by other applications
-
- independent of OS platform programming
language. - W3C standards body http//www.w3c.org/
26Acronym soup
XML - eXtensible Markup Language XSLT -
eXtensible Stylesheet Language Transformations SOA
P - Simple Object Access Protocol WSDL- Web
Service Description Language UDDI - Universal
Description, Discovery Integration
protocol BPEL - Business Process Execution
Language WSIF - Web Services Invocation
Framework ..
27terraservice.net
Web service interface to http//terraserver.micro
soft.com/ Example app US Department
of Agriculture have a database of soil
properties, federated with terraservice.net to
provide geographical topographic detail.
28Databases available as Web Services
- Google
- Amazon
- SDSS SkyServer
- EMBL
- EBI-MSD
- EBI Open Bibliographic Query Service
- ...
- http//www.escience.cam.ac.uk/services/dblist.html
29Radar scattering from aircraft
Aim increase the efficiency of the aircraft
design engineering process the scale of radar
scattering simulations to otherwise intractable
objects (i.e. whole aircraft) A collaboration
between the University of Cambridge Department of
Applied Mathematics Theoretical Physics (DAMTP)
and the BAE Advanced Technology Centre at
Filton. Mark Spivack (PI), Andrew Usher
(visualisation programmer), Xiaobo Yang
(scientific programmer), CC-HPCF, new cluster,
BAE input expertise, data,...
30BAE
Cambridge
Portal
HPCF
Reflection data
Visualisation
CAD Design
31 Visualisation tools
Based on the Visualisation Toolkit - open source
C library cross platform, extendable, large
user base - http//www.vtk.org
Surface currents, virtual fly through, looking
for hot-spots
32Increasing efficiency
The calculation can be split into a two stage
process
- Initial long-running, high fidelity calculation
of induced surface - currents on the HPCF.
- 3D electromagnetic fields can be calculated on a
cluster. - Using an approximation technique currently under
development, - subsequent small changes can be re-calculated on
the cluster. - In theory, this would allow interactive design of
the aircraft without - the need for scheduling long-running jobs.
33Tying it all together with Web Services
34 Questions?