What is the e in eScience - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

What is the e in eScience

Description:

e-Science will change the dynamic of the way science is undertaken.' John Taylor, ... Maui scheduler. Condor. could also go under middleware. Data. Storage ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 41
Provided by: Carole153
Category:
Tags: escience | maui | taylor

less

Transcript and Presenter's Notes

Title: What is the e in eScience


1
What is the e in e-Science?
W T Hewitt Wednesday 30th June 2004 MIMAS
Open Forum 2004
Manchester
2
e-Science
  • e-Science is about global collaboration in key
    areas of science, and the next generation of
    infrastructure that will enable it.
  • e-Science will change the dynamic of the way
    science is undertaken.
  • John Taylor,
  • Director General of Research Councils,
  • Office of Science and Technology

3
Behind The Wall
  • Today - many bits of walls, ad hoc Client-Server

Scientist
4
Behind The Wall
  • Next generation - Information Utilities and
    col-laboratories

MIDLEWARE
Scientist
GRID
Scientist
Scientist
5
Why GRID?
  • VERY VERY IMPORTANT
  • The GRID is one way to realise the e-Science
    vision
  • WE ARE TRYING TO DO E-SCIENCE!

6
Why Grids?
  • Large-scale science and engineering are done
    through
  • the interaction of people,
  • heterogeneous computing resources, information
    systems, and instruments,
  • all of which are geographically and
    organizationally dispersed.
  • The overall motivation for Grids is to
    facilitate the routine interactions of these
    resources in order to support large-scale science
    and engineering.

7
The Grid is
  • "the web on steroids."
  • "Napster for Scientists" of data grids
  • "the solution to all your problems."
  • evil." a system manager, of Globus
  • "distributed computing re-badged."
  • "distributed computing across multiple
    administrative domains"

8
The Grid
  • provides "Flexible, secure, coordinated
    resource sharing among dynamic collections of
    individuals, institutions, and resource"
  • From The Anatomy of the Grid Enabling Scalable
    Virtual Organizations
  • "enables communities (virtual organizations)
    to share geographically distributed resources as
    they pursue common goals -- assuming the absence
    of central location, central control,
    omniscience, existing trust relationships.
  • Which tense?
  • Provides
  • May provide
  • Will provide

9
CERN Large Hadron Collider (LHC)
Raw Data 1 Petabyte / sec Filtered 100Mbyte / se
c 1 Petabyte / year 1 Million CD ROMs
CMS Detector
10
Data Grids forHigh Energy Physics
11
Examples?
  • A biochemist exploits 10,000 computers to screen
    100,000 compounds in an hour
  • A biologist combines a range of diverse and
    distributed resources (databases, tools,
    instruments) to answer complex questions
  • 1,000 physicists worldwide pool resources for
    petaop analyses of petabytes of data
  • Civil engineers collaborate to design, execute,
    analyze shake table experiments

12
Examples
  • Climate scientists visualize, annotate, analyze
    terabyte simulation datasets
  • An emergency response team couples real time
    data, weather model, population data
  • A multidisciplinary analysis in aerospace couples
    code and data in four companies
  • A home user invokes architectural design
    functions at an application service provider

13
Broader Context
  • Grid has much in common with major industrial
    thrusts
  • Business-to-business,
  • Peer-to-peer,
  • Application Service Providers,
  • Storage Service Providers,
  • Distributed Computing,
  • Internet Computing
  • Sharing not adequately addressed by existing
    technologies
  • Complicated requirements run program X at site
    Y subject to community policy P, providing access
    to data at Z according to policy Q
  • High performance unique demands of advanced
    high-performance systems

14
What is the Grid?
  • Grid computing is distinguished from
    conventional distributed computing by its focus
    on large-scale resource sharing, innovative
    applications, and, in some cases,
    high-performance orientation...we review the
    "Grid problem", which we define as flexible,
    secure, coordinated resource sharing among
    dynamic collections of individuals, institutions,
    and resources - what we refer to as virtual
    organizations."
  • From "The Anatomy of the Grid Enabling Scalable
    Virtual Organizations" by Foster, Kesselman and
    Tuecke

15
What is the Grid?
  • Resource sharing coordinated problem solving in
    dynamic, multi-institutional virtual
    organizations
  • On-demand, ubiquitous access to computing, data,
    and all kinds of services
  • New capabilities constructed dynamically and
    transparently from distributed services
  • No central location, No central control, No
    existing trust relationships, Little
    predetermination
  • Uniformity
  • Pooling Resources

16
  • Grid Middleware

Diverse global services
Grid services
Local OS
17
Common principles
  • Single sign-on
  • Often implying Public Key Infrastructure (PKI)
  • Standard protocols and services
  • Respect for autonomy of resource owner
  • Layered architectures
  • Higher-level infrastructures hiding heterogeneity
    of lower levels
  • Interoperability is paramount

18
Grid Middleware
  • Middleware
  • Globus
  • UNICORE
  • Legion and Avaki
  • Scheduling
  • Sun Grid Engine
  • Load Sharing Facility (LSF)
  • from Platform Computing
  • OpenPBS and PBS(Pro)
  • from Veridian
  • Maui scheduler
  • Condor
  • could also go under middleware
  • Data
  • Storage Resource Broker (SRB)
  • Replica Management
  • OGSA-DAI
  • Web services (WSDL, SOAP, UDDI)
  • IBM Websphere
  • Microsoft .NET
  • Sun Open Net Environment (Sun ONE)
  • PC Grids
  • Peer-to-Peer computing

19
  • Seamless Access to Multiple Datasets
  • www.sve.man.ac.uk/Research/AtoZ/SAMD

20
SAMD
  • ESRC demonstrator showing the benefits of
    applying grid technologies to an ordinary social
    science query
  • We solve a genuine problem from the UK academic
    social science community - a multivariate
    analysis using a complex mathematical algorithm
  • Based on a major social science databank, the
    Office for National Statistics Time Series Data,
    hosted at MIMAS

21
Before SAMD
22
Motivation
  • Web-based access to socio-economic datasets such
    as Office of National Statistics Time series data
    has lead to greatly increased use, but-
  • No standard authentication or authorisation
  • too many usernames and passwords to remember
  • To automate search and retrieval, can only
    emulate navigation through "screen scraping"
  • breaks whenever the interface is "improved"
  • discourages third party developments and periodic
    re-analysis
  • Data must be downloaded and saved to local disk
  • not necessarily the system on which subsequent
    analysis is to be performed
  • inefficient, especially for large datasets

23
SAMD solution
  • Use Grid Security Infrastructure for "single
    sign-on" authentication everywhere
  • Modified standard Apache web server to accept
    proxy credentials
  • Permits re-use of existing CGI code
  • Use third party file transfers (grid-ftp) to move
    data directly to where it's needed
  • Use standard Globus mechanisms to
  • Locate HPC facility for analysis
  • Stage analysis binary from local repository and
    run analysis job on HPC facility
  • Retrieve results
  • It all worked, and cut the data collection and
    analysis time down to around 5 minutes.

24
SAMD Architecture
25
SAMD User Interface
26
What's new?
  • Web interfaces to datasets?
  • We show that there are more flexible ways of
    delivering access to data over the internet than
    through static web pages alone
  • Single sign-on?
  • We show that the domain of single sign-on can be
    much broader than provided by Athens
  • Graphical User Interfaces?
  • We show that it's possible for a third party to
    develop new tools independently of data
    providers
  • A short script can encapsulate all the essential
    functionality of the SAMD GUI
  • Integration, Interoperability!

27
  • If one centre is good then many must be better

28
National Centres
  • National e-Science Centre
  • EPSRC, www.nesc.ac.uk
  • National e-Social Science Centre
  • ESRC, www.ncess.ac.uk
  • National Institute for Environmental e-Science
  • NERC, www.niees.ac.uk
  • OMII
  • www.omii.ac.uk
  • Data Curation Centre
  • www.dcc.ac.uk
  • National Text Mining Centre
  • National Grid Service (Grid Support Centre)
  • www.ngs.ac.uk
  • Access Grid Support Centre

29
Regional Centres Centres of Excellence
30
National Grid Service
  • UK Production Data Computational Grid
  • Oxford and Leeds (White Rose Grid)
  • Compute Nodes
  • Bristol Cardiff
  • Manchester and CCLRC-RAL
  • Data Nodes

http//www.csar.cfs.ac.uk/ 512 Itanium2 processo
r SGI Altix
512 processor Origin3800
http//www.hpcx.ac.uk/ 1600 IBM p690 Regatta pr
ocessors
31
National Grid Service
  • Thus, the NGS provides access to
  • over 3,000 processors,
  • over 36TB of "data-grid" capacity,
  • common scientific applications
  • and extensive data archives.
  • Other resource providers anticipated to join in
    the future

32
National Grid Service
  • More than just computation and data resources
  • In future will include services to facilitate
    collaborative (grid) computing
  • Authentication (PKI X509)
  • Job submission/batch service
  • Authorisation
  • Certificate management
  • Virtual Organisation management
  • Data access/integration services
    (SRB/OGSA-DAI/DQPS)
  • Information service
  • National Registry (of registrys)
  • Data replication
  • Data caching
  • Grid monitoring
  • Accounting

33
  • Conclusions

34
Todays Grid
  • A Single System Image
  • Transparent wide-area access to large data banks
  • Transparent wide-area access to applications on
    heterogeneous platforms
  • Transparent wide-area access to processing
    resources
  • Security, certification, single sign-on
    authentication, AAA
  • Grid Security Infrastructure,
  • Data access,Transfer Replication
  • GridFTP, Giggle
  • Computational resource discovery, allocation and
    process creation
  • GRAAM, Unicore, Condor-G

35
Reality Checks!!
  • The Technology is Ready
  • Not true its emerging
  • Building middleware, Advancing Standards,
    Developing, Dependability
  • Building demonstrators.
  • The computational grid is in advance of the data
    intensive middleware
  • Integration and curation are probably the
    obstacles
  • But!! It doesnt have to be all there to be
    useful.
  • We know how we will use grid services
  • No Disruptive technology
  • Lower the barriers of entry.

36
Grid Evolution
  • 1st Generation Grid
  • Computationally intensive, file access/transfer
  • Bag of various heterogeneous protocols
    toolkits
  • Recognises internet, Ignores Web
  • Academic teams
  • 2nd Generation Grid
  • Data intensive - knowledge intensive
  • Services-based architecture
  • Recognises Web and Web services
  • Global Grid Forum
  • Industry participation

We are here!
37
I don't want to share!Do I need a grid?
38
In conclusion
  • The GRID is not, and will not, be free
  • must pay for resources

39
Acknowledgements
  • Carole Goble
  • Stephen Pickles
  • Keith Cole
  • John Brooke
  • Paul Jeffreys
  • University of Manchester
  • Academic collaborators
  • Industrial collaborators
  • Funding Agencies DTI, EPSRC, NERC, ESRC, PPARC

40
SVE _at_ Manchester Computing
World Leading Supercomputing Service, Support and
Research Bringing Science and Supercomputers To
gether
www.man.ac.uk/sve sve_at_man.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com