Title: What is the e in eScience
1What is the e in e-Science?
W T Hewitt Wednesday 30th June 2004 MIMAS
Open Forum 2004
Manchester
2e-Science
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it.
- e-Science will change the dynamic of the way
science is undertaken.
- John Taylor,
- Director General of Research Councils,
- Office of Science and Technology
3Behind The Wall
- Today - many bits of walls, ad hoc Client-Server
Scientist
4Behind The Wall
- Next generation - Information Utilities and
col-laboratories
MIDLEWARE
Scientist
GRID
Scientist
Scientist
5Why GRID?
- VERY VERY IMPORTANT
- The GRID is one way to realise the e-Science
vision
- WE ARE TRYING TO DO E-SCIENCE!
6Why Grids?
- Large-scale science and engineering are done
through
- the interaction of people,
- heterogeneous computing resources, information
systems, and instruments,
- all of which are geographically and
organizationally dispersed.
- The overall motivation for Grids is to
facilitate the routine interactions of these
resources in order to support large-scale science
and engineering.
7The Grid is
- "the web on steroids."
- "Napster for Scientists" of data grids
- "the solution to all your problems."
- evil." a system manager, of Globus
- "distributed computing re-badged."
- "distributed computing across multiple
administrative domains"
8The Grid
- provides "Flexible, secure, coordinated
resource sharing among dynamic collections of
individuals, institutions, and resource"
- From The Anatomy of the Grid Enabling Scalable
Virtual Organizations
- "enables communities (virtual organizations)
to share geographically distributed resources as
they pursue common goals -- assuming the absence
of central location, central control,
omniscience, existing trust relationships. - Which tense?
- Provides
- May provide
- Will provide
9CERN Large Hadron Collider (LHC)
Raw Data 1 Petabyte / sec Filtered 100Mbyte / se
c 1 Petabyte / year 1 Million CD ROMs
CMS Detector
10Data Grids forHigh Energy Physics
11Examples?
- A biochemist exploits 10,000 computers to screen
100,000 compounds in an hour
- A biologist combines a range of diverse and
distributed resources (databases, tools,
instruments) to answer complex questions
- 1,000 physicists worldwide pool resources for
petaop analyses of petabytes of data
- Civil engineers collaborate to design, execute,
analyze shake table experiments
12Examples
- Climate scientists visualize, annotate, analyze
terabyte simulation datasets
- An emergency response team couples real time
data, weather model, population data
- A multidisciplinary analysis in aerospace couples
code and data in four companies
- A home user invokes architectural design
functions at an application service provider
13Broader Context
- Grid has much in common with major industrial
thrusts
- Business-to-business,
- Peer-to-peer,
- Application Service Providers,
- Storage Service Providers,
- Distributed Computing,
- Internet Computing
- Sharing not adequately addressed by existing
technologies
- Complicated requirements run program X at site
Y subject to community policy P, providing access
to data at Z according to policy Q
- High performance unique demands of advanced
high-performance systems
14What is the Grid?
- Grid computing is distinguished from
conventional distributed computing by its focus
on large-scale resource sharing, innovative
applications, and, in some cases,
high-performance orientation...we review the
"Grid problem", which we define as flexible,
secure, coordinated resource sharing among
dynamic collections of individuals, institutions,
and resources - what we refer to as virtual
organizations." - From "The Anatomy of the Grid Enabling Scalable
Virtual Organizations" by Foster, Kesselman and
Tuecke
15What is the Grid?
- Resource sharing coordinated problem solving in
dynamic, multi-institutional virtual
organizations
- On-demand, ubiquitous access to computing, data,
and all kinds of services
- New capabilities constructed dynamically and
transparently from distributed services
- No central location, No central control, No
existing trust relationships, Little
predetermination
- Uniformity
- Pooling Resources
16Diverse global services
Grid services
Local OS
17Common principles
- Single sign-on
- Often implying Public Key Infrastructure (PKI)
- Standard protocols and services
- Respect for autonomy of resource owner
- Layered architectures
- Higher-level infrastructures hiding heterogeneity
of lower levels
- Interoperability is paramount
18Grid Middleware
- Middleware
- Globus
- UNICORE
- Legion and Avaki
- Scheduling
- Sun Grid Engine
- Load Sharing Facility (LSF)
- from Platform Computing
- OpenPBS and PBS(Pro)
- from Veridian
- Maui scheduler
- Condor
- could also go under middleware
- Data
- Storage Resource Broker (SRB)
- Replica Management
- OGSA-DAI
- Web services (WSDL, SOAP, UDDI)
- IBM Websphere
- Microsoft .NET
- Sun Open Net Environment (Sun ONE)
- PC Grids
- Peer-to-Peer computing
19- Seamless Access to Multiple Datasets
- www.sve.man.ac.uk/Research/AtoZ/SAMD
20SAMD
- ESRC demonstrator showing the benefits of
applying grid technologies to an ordinary social
science query
- We solve a genuine problem from the UK academic
social science community - a multivariate
analysis using a complex mathematical algorithm
- Based on a major social science databank, the
Office for National Statistics Time Series Data,
hosted at MIMAS
21Before SAMD
22Motivation
- Web-based access to socio-economic datasets such
as Office of National Statistics Time series data
has lead to greatly increased use, but-
- No standard authentication or authorisation
- too many usernames and passwords to remember
- To automate search and retrieval, can only
emulate navigation through "screen scraping"
- breaks whenever the interface is "improved"
- discourages third party developments and periodic
re-analysis
- Data must be downloaded and saved to local disk
- not necessarily the system on which subsequent
analysis is to be performed
- inefficient, especially for large datasets
23SAMD solution
- Use Grid Security Infrastructure for "single
sign-on" authentication everywhere
- Modified standard Apache web server to accept
proxy credentials
- Permits re-use of existing CGI code
- Use third party file transfers (grid-ftp) to move
data directly to where it's needed
- Use standard Globus mechanisms to
- Locate HPC facility for analysis
- Stage analysis binary from local repository and
run analysis job on HPC facility
- Retrieve results
- It all worked, and cut the data collection and
analysis time down to around 5 minutes.
24SAMD Architecture
25SAMD User Interface
26What's new?
- Web interfaces to datasets?
- We show that there are more flexible ways of
delivering access to data over the internet than
through static web pages alone
- Single sign-on?
- We show that the domain of single sign-on can be
much broader than provided by Athens
- Graphical User Interfaces?
- We show that it's possible for a third party to
develop new tools independently of data
providers
- A short script can encapsulate all the essential
functionality of the SAMD GUI
- Integration, Interoperability!
27- If one centre is good then many must be better
28National Centres
- National e-Science Centre
- EPSRC, www.nesc.ac.uk
- National e-Social Science Centre
- ESRC, www.ncess.ac.uk
- National Institute for Environmental e-Science
- NERC, www.niees.ac.uk
- OMII
- www.omii.ac.uk
- Data Curation Centre
- www.dcc.ac.uk
- National Text Mining Centre
- National Grid Service (Grid Support Centre)
- www.ngs.ac.uk
- Access Grid Support Centre
29Regional Centres Centres of Excellence
30National Grid Service
- UK Production Data Computational Grid
- Oxford and Leeds (White Rose Grid)
- Compute Nodes
- Bristol Cardiff
- Manchester and CCLRC-RAL
- Data Nodes
http//www.csar.cfs.ac.uk/ 512 Itanium2 processo
r SGI Altix
512 processor Origin3800
http//www.hpcx.ac.uk/ 1600 IBM p690 Regatta pr
ocessors
31National Grid Service
- Thus, the NGS provides access to
- over 3,000 processors,
- over 36TB of "data-grid" capacity,
- common scientific applications
- and extensive data archives.
- Other resource providers anticipated to join in
the future
32National Grid Service
- More than just computation and data resources
- In future will include services to facilitate
collaborative (grid) computing
- Authentication (PKI X509)
- Job submission/batch service
- Authorisation
- Certificate management
- Virtual Organisation management
- Data access/integration services
(SRB/OGSA-DAI/DQPS)
- Information service
- National Registry (of registrys)
- Data replication
- Data caching
- Grid monitoring
- Accounting
33 34Todays Grid
- A Single System Image
- Transparent wide-area access to large data banks
- Transparent wide-area access to applications on
heterogeneous platforms
- Transparent wide-area access to processing
resources
- Security, certification, single sign-on
authentication, AAA
- Grid Security Infrastructure,
- Data access,Transfer Replication
- GridFTP, Giggle
- Computational resource discovery, allocation and
process creation
- GRAAM, Unicore, Condor-G
35Reality Checks!!
- The Technology is Ready
- Not true its emerging
- Building middleware, Advancing Standards,
Developing, Dependability
- Building demonstrators.
- The computational grid is in advance of the data
intensive middleware
- Integration and curation are probably the
obstacles
- But!! It doesnt have to be all there to be
useful.
- We know how we will use grid services
- No Disruptive technology
- Lower the barriers of entry.
36Grid Evolution
- 1st Generation Grid
- Computationally intensive, file access/transfer
- Bag of various heterogeneous protocols
toolkits
- Recognises internet, Ignores Web
- Academic teams
- 2nd Generation Grid
- Data intensive - knowledge intensive
- Services-based architecture
- Recognises Web and Web services
- Global Grid Forum
- Industry participation
We are here!
37I don't want to share!Do I need a grid?
38In conclusion
- The GRID is not, and will not, be free
- must pay for resources
39Acknowledgements
- Carole Goble
- Stephen Pickles
- Keith Cole
- John Brooke
- Paul Jeffreys
- University of Manchester
- Academic collaborators
- Industrial collaborators
- Funding Agencies DTI, EPSRC, NERC, ESRC, PPARC
40SVE _at_ Manchester Computing
World Leading Supercomputing Service, Support and
Research Bringing Science and Supercomputers To
gether
www.man.ac.uk/sve sve_at_man.ac.uk