Title: Grid%20Computing:%20Concepts,%20Appplications,%20and%20Technologies
1 Grid ComputingConcepts, Appplications, and
Technologies
- Ian Foster
- Mathematics and Computer Science Division
- Argonne National Laboratory
- and
- Department of Computer Science
- The University of Chicago
- http//www.mcs.anl.gov/foster
Grid Computing in Canada Workshop, University of
Alberta, May 1, 2002
2Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
3Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
4Living in an Exponential World(1) Computing
Sensors
- Moores Law transistor count doubles each 18
months
Magnetohydro- dynamics star formation
5Living in an Exponential World(2) Storage
- Storage density doubles every 12 months
- Dramatic growth in online data (1 petabyte 1000
terabyte 1,000,000 gigabyte) - 2000 0.5 petabyte
- 2005 10 petabytes
- 2010 100 petabytes
- 2015 1000 petabytes?
- Transforming entire disciplines in physical and,
increasingly, biological sciences humanities
next?
6Data Intensive Physical Sciences
- High energy nuclear physics
- Including new experiments at CERN
- Gravity wave searches
- LIGO, GEO, VIRGO
- Time-dependent 3-D systems (simulation, data)
- Earth Observation, climate modeling
- Geophysics, earthquake modeling
- Fluids, aerodynamic design
- Pollutant dispersal scenarios
- Astronomy Digital sky surveys
7Ongoing Astronomical Mega-Surveys
- Large number of new surveys
- Multi-TB in size, 100M objects or larger
- In databases
- Individual archives planned and under way
- Multi-wavelength view of the sky
- gt 13 wavelength coverage within 5 years
- Impressive early discoveries
- Finding exotic objects by unusual colors
- L,T dwarfs, high redshift quasars
- Finding objects by time variability
- Gravitational micro-lensing
MACHO 2MASS SDSS DPOSS GSC-II COBE
MAP NVSS FIRST GALEX ROSAT OGLE ...
8Crab Nebula in 4 Spectral Regions
X-ray
Optical
Infrared
Radio
9Coming Floods of Astronomy Data
- The planned Large Synoptic Survey Telescope will
produce over 10 petabytes per year by 2008! - All-sky survey every few days, so will have
fine-grain time series for the first time
10Data Intensive Biology and Medicine
- Medical data
- X-Ray, mammography data, etc. (many petabytes)
- Digitizing patient records (ditto)
- X-ray crystallography
- Molecular genomics and related disciplines
- Human Genome, other genome databases
- Proteomics (protein structure, activities, )
- Protein interactions, drug delivery
- Virtual Population Laboratory (proposed)
- Simulate likely spread of disease outbreaks
- Brain scans (3-D, time dependent)
11A Brainis a Lotof Data!(Mark Ellisman, UCSD)
And comparisons must be made among many
We need to get to one micron to know location of
every cell. Were just now starting to get to
10 microns Grids will help get us there and
further
12An Exponential World (3) Networks(Or,
Coefficients Matter )
- Network vs. computer performance
- Computer speed doubles every 18 months
- Network speed doubles every 9 months
- Difference order of magnitude per 5 years
- 1986 to 2000
- Computers x 500
- Networks x 340,000
- 2001 to 2010
- Computers x 60
- Networks x 4000
Moores Law vs. storage improvements vs. optical
improvements. Graph from Scientific American
(Jan-2001) by Cleo Vilett, source Vined Khoslan,
Kleiner, Caufield and Perkins.
13Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
14Evolution of the Scientific Process
- Pre-electronic
- Theorize /or experiment, alone or in small
teams publish paper - Post-electronic
- Construct and mine very large databases of
observational or simulation data - Develop computer simulations analyses
- Exchange information quasi-instantaneously within
large, distributed, multidisciplinary teams
15Evolution of Business
- Pre-Internet
- Central corporate data processing facility
- Business processes not compute-oriented
- Post-Internet
- Enterprise computing is highly distributed,
heterogeneous, inter-enterprise (B2B) - Outsourcing becomes feasible gt service providers
of various sorts - Business processes increasingly computing- and
data-rich
16The Grid
- Resource sharing coordinated problem solving
in dynamic, multi-institutional virtual
organizations
17An Example Virtual Organization CERNs Large
Hadron Collider
- 1800 Physicists, 150 Institutes, 32 Countries
- 100 PB of data by 2010 50,000 CPUs?
18Grid Communities ApplicationsData Grids for
High Energy Physics
www.griphyn.org www.ppdg.net
www.eu-datagrid.org
19Data Integration and Mining (credit Sara
Graves) From Global Information to Local Knowledge
Emergency Response
Precision Agriculture
Urban Environments
Weather Prediction
20Intelligent InfrastructureDistributed Servers
and Services
21The Grid OpportunityeScience and eBusiness
- Physicists worldwide pool resources for peta-op
analyses of petabytes of data - Civil engineers collaborate to design, execute,
analyze shake table experiments - An insurance company mines data from partner
hospitals for fraud detection - An application service provider offloads excess
load to a compute cycle provider - An enterprise configures internal external
resources to support eBusiness workload
22Grid Computing
23The GridA Brief History
- Early 90s
- Gigabit testbeds, metacomputing
- Mid to late 90s
- Early experiments (e.g., I-WAY), academic
software projects (e.g., Globus, Legion),
application experiments - 2002
- Dozens of application communities projects
- Major infrastructure deployments
- Significant technology base (esp. Globus
ToolkitTM) - Growing industrial interest
- Global Grid Forum 500 people, 20 countries
24Challenging Technical Requirements
- Dynamic formation and management of virtual
organizations - Online negotiation of access to services who,
what, why, when, how - Establishment of applications and systems able to
deliver multiple qualities of service - Autonomic management of infrastructure elements
- Open Grid Services Architecture
- http//www.globus.org/ogsa
25Grid Concept (Take 1)
- Analogy with the electrical power grid
- On-demand access to ubiquitous distributed
computing - Transparent access to multi-petabyte distributed
data bases - Easy to plug resources into
- Complexity of the infrastructure is hidden
- When the network is as fast as the computer's
internal links, the machine disintegrates across
the net into a set of special purpose appliances
(George Gilder)
26Grid Vision (Take 2)
- e-Science and information utilities (Taylor)
- Science increasingly done through distributed
global collaborations between people, enabled by
the Internet - Using very large data collections, terascale
computing resources, and high performance
visualisation - Derived from instruments and facilities
controlled and shared via the infrastructure - Scaling x1000 in processing power, data, bandwidth
27Elements of the Problem
- Resource sharing
- Computers, storage, sensors, networks,
- Heterogeneity of device, mechanism, policy
- Sharing conditional negotiation, payment,
- Coordinated problem solving
- Integration of distributed resources
- Compound quality of service requirements
- Dynamic, multi-institutional virtual orgs
- Dynamic overlays on classic org structures
- Map to underlying control mechanisms
28The Grid World Current Status
- Dozens of major Grid projects in scientific
technical computing/research education - www.mcs.anl.gov/foster/grid-projects
- Considerable consensus on key concepts and
technologies - Open source Globus Toolkit a de facto standard
for major protocols services - Industrial interest emerging rapidly
- IBM, Platform, Microsoft, Sun, Compaq,
- Opportunity convergence of eScience and
eBusiness requirements technologies
29Contemporary Grid Projects
- Computer science research
- Wide variety of projects worldwide
- Situation confused by profligate use of label
- Technology development
- RE Condor, Globus, EU DataGrid, GriPhyN
- Industrial significant efforts emerging
- Infrastructure development
- Persistent services as well as hardware
- Application
- Deployment and production application
30Selected Major Grid Projects (1)
Name URL Sponsors Focus
Access Grid www.mcs.anl.gov/FL/accessgrid DOE, NSF Create deploy group collaboration systems using commodity technologies
BlueGrid IBM Grid testbed linking IBM laboratories
DISCOM www.cs.sandia.gov/discomDOE Defense Programs Create operational Grid providing access to resources at three U.S. DOE weapons laboratories
DOE Science Grid sciencegrid.org DOE Office of Science Create operational Grid providing access to resources applications at U.S. DOE science laboratories partner universities
Earth System Grid (ESG) earthsystemgrid.orgDOE Office of Science Delivery and analysis of large climate model datasets for the climate research community
European Union (EU) DataGrid eu-datagrid.org European Union Create apply an operational grid for applications in high energy physics, environmental science, bioinformatics
New
New
31Selected Major Grid Projects (2)
Name URL/Sponsor Focus
EuroGrid, Grid Interoperability (GRIP) eurogrid.org European Union Create tech for remote access to supercomp resources simulation codes in GRIP, integrate with Globus Toolkit
Fusion Collaboratory fusiongrid.org DOE Off. Science Create a national computational collaboratory for fusion research
Globus Project globus.org DARPA, DOE, NSF, NASA, Msoft Research on Grid technologies development and support of Globus Toolkit application and deployment
GridLab gridlab.org European Union Grid technologies and applications
GridPP gridpp.ac.uk U.K. eScience Create apply an operational grid within the U.K. for particle physics research
Grid Research Integration Dev. Support Center grids-center.org NSF Integration, deployment, support of the NSF Middleware Infrastructure for research education
New
New
New
New
New
32Selected Major Grid Projects (3)
Name URL/Sponsor Focus
Grid Application Dev. Software hipersoft.rice.edu/grads NSF Research into program development technologies for Grid applications
Grid Physics Network griphyn.org NSF Technology RD for data analysis in physics expts ATLAS, CMS, LIGO, SDSS
Information Power Grid ipg.nasa.gov NASA Create and apply a production Grid for aerosciences and other NASA missions
International Virtual Data Grid Laboratory ivdgl.org NSF Create international Data Grid to enable large-scale experimentation on Grid technologies applications
Network for Earthquake Eng. Simulation Grid neesgrid.org NSF Create and apply a production Grid for earthquake engineering
Particle Physics Data Grid ppdg.net DOE Science Create and apply production Grids for data analysis in high energy and nuclear physics experiments
New
New
33Selected Major Grid Projects (4)
Name URL/Sponsor Focus
TeraGrid teragrid.org NSF U.S. science infrastructure linking four major resource sites at 40 Gb/s
UK Grid Support Center grid-support.ac.uk U.K. eScience Support center for Grid projects within the U.K.
Unicore BMBFT Technologies for remote access to supercomputers
New
New
Also many technology RD projects e.g., Condor,
NetSolve, Ninf, NWS See also www.gridforum.org
34Grid Activities in Japan
- Ninf ETL/TIT
- Developing Network enabled servers
- Collaborating with NetSolve, UTK
- Grid RPC APM WG proposal
- Metacomputing TACC/JAERI
- MPI for vectors, PACX-MPI, STAMPI
- Stuttgart, Manchester, Taiwan, Pittsburgh
- Virtual Supercomputing Center
- Deploy portal for assembling supercomputer center
- Globus promotion ?
- Firewall compliance extension
- ApGrid
- A regional testbed across the Pacific Rim
- Resources
- Data Reservoir 300M JPY x 3yrs
- Ninf-g/Grid-RPC 200M JPY
- Networking Infrastructure
- SuperSINET, JGN unknown
- GRID-like
Grid RPC
35Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
36Grid TechnologiesResource Sharing Mechanisms
That
- Address security and policy concerns of resource
owners and users - Are flexible enough to deal with many resource
types and sharing modalities - Scale to large number of resources, many
participants, many program components - Operate efficiently when dealing with large
amounts of data computation
37Aspects of the Problem
- Need for interoperability when different groups
want to share resources - Diverse components, policies, mechanisms
- E.g., standard notions of identity, means of
communication, resource descriptions - Need for shared infrastructure services to avoid
repeated development, installation - E.g., one port/service/protocol for remote access
to computing, not one per tool/appln - E.g., Certificate Authorities expensive to run
- A common need for protocols services
38Hence, a Protocol-Oriented Viewof Grid
Architecture, that Emphasizes
- Development of Grid protocols services
- Protocol-mediated access to remote resources
- New services e.g., resource brokering
- On the Grid speak Intergrid protocols
- Mostly (extensions to) existing protocols
- Development of Grid APIs SDKs
- Interfaces to Grid protocols services
- Facilitate application development by supplying
higher-level abstractions
39The Hourglass Model
- Focus on architecture issues
- Propose set of core services as basic
infrastructure - Use to construct high-level, domain-specific
solutions - Design principles
- Keep participation cost low
- Enable local control
- Support for adaptation
- IP hourglass model
A p p l i c a t i o n s
Diverse global services
Core services
Local OS
40Layered Grid Architecture(By Analogy to Internet
Architecture)
41Globus Toolkit
- A software toolkit addressing key technical
problems in the development of Grid-enabled
tools, services, and applications - Offer a modular set of orthogonal services
- Enable incremental development of grid-enabled
tools and applications - Implement standard Grid protocols and APIs
- Available under liberal open source license
- Large community of developers users
- Commercial support
42General Approach
- Define Grid protocols APIs
- Protocol-mediated access to remote resources
- Integrate and extend existing standards
- On the Grid speak Intergrid protocols
- Develop a reference implementation
- Open source Globus Toolkit
- Client and server SDKs, services, tools, etc.
- Grid-enable wide variety of tools
- Globus Toolkit, FTP, SSH, Condor, SRB, MPI,
- Learn through deployment and applications
43Key Protocols
- The Globus Toolkit centers around four key
protocols - Connectivity layer
- Security Grid Security Infrastructure (GSI)
- Resource layer
- Resource Management Grid Resource Allocation
Management (GRAM) - Information Services Grid Resource Information
Protocol (GRIP) and Index Information Protocol
(GIIP) - Data Transfer Grid File Transfer Protocol
(GridFTP) - Also key collective layer protocols
- Info Services, Replica Management, etc.
44Globus Toolkit Structure
Service naming
Soft state management
Reliable invocation
GRAM
MDS
GridFTP
MDS
???
Notification
GSI
GSI
GSI
Other Service or Application
Compute Resource
Data Resource
45Connectivity LayerProtocols Services
- Communication
- Internet protocols IP, DNS, routing, etc.
- Security Grid Security Infrastructure (GSI)
- Uniform authentication, authorization, and
message protection mechanisms in
multi-institutional setting - Single sign-on, delegation, identity mapping
- Public key technology, SSL, X.509, GSS-API
- Supporting infrastructure Certificate
Authorities, certificate key management,
GSI www.gridforum.org/security/gsi
46Why Grid Security is Hard
- Resources being used may be extremely valuable
the problems being solved extremely sensitive - Resources are often located in distinct
administrative domains - Each resource may have own policies procedures
- The set of resources used by a single computation
may be large, dynamic, and/or unpredictable - Not just client/server
- It must be broadly available applicable
- Standard, well-tested, well-understood protocols
- Integration with wide variety of tools
47Grid Security Requirements
User View
Resource Owner View
1) Easy to use 2) Single sign-on 3) Run
applicationsftp,ssh,MPI,Condor,Web, 4) User
based trust model 5) Proxies/agents (delegation)
1) Specify local access control 2) Auditing,
accounting, etc. 3) Integration w/ local
systemKerberos, AFS, license mgr. 4) Protection
from compromisedresources
Developer View
API/SDK with authentication, flexible message
protection, flexible communication, delegation,
...Direct calls to various security functions
(e.g. GSS-API)Or security integrated into
higher-level SDKs E.g. GlobusIO, Condor-G,
MPICH-G2, HDF5, etc.
48Grid Security Infrastructure (GSI)
- Extensions to existing standard protocols APIs
- Standards SSL/TLS, X.509 CA, GSS-API
- Extensions for single sign-on and delegation
- Globus Toolkit reference implementation of GSI
- SSLeay/OpenSSL GSS-API delegation
- Tools and services to interface to local security
- Simple ACLs SSLK5 PKINIT for access to K5,
AFS, etc. - Tools for credential management
- Login, logout, etc.
- Smartcards
- MyProxy Web portal login and delegation
- K5cert Automatic X.509 certificate creation
49GSI in Action Create Processes at A and B that
Communicate Access Files at C
User
Site B (Unix)
Site A (Kerberos)
Computer
Computer
Site C (Kerberos)
Storage system
50GSI Working Group Documents
- Grid Security Infrastructure (GSI) Roadmap
- Informational draft overview of working group
activities and documents - Grid Security Protocols Syntax
- X.509 Proxy Certificates
- X.509 Proxy Delegation Protocol
- The GSI GSS-API Mechanism
- Grid Security APIs
- GSS-API Extensions for the Grid
- GSI Shell API
51GSI Futures
- Scalability in numbers of users resources
- Credential management
- Online credential repositories (MyProxy)
- Account management
- Authorization
- Policy languages
- Community authorization
- Protection against compromised resources
- Restricted delegation, smartcards
52Community Authorization
User
Laura Pearlman, Steve Tuecke, Von Welch, others
53Resource LayerProtocols Services
- Grid Resource Allocation Management (GRAM)
- Remote allocation, reservation, monitoring,
control of compute resources - GridFTP protocol (FTP extensions)
- High-performance data access transport
- Grid Resource Information Service (GRIS)
- Access to structure state information
- Others emerging Catalog access, code repository
access, accounting, etc. - All built on connectivity layer GSI IP
GRAM, GridFTP, GRIS www.globus.org
54Resource Management
- The Grid Resource Allocation Management (GRAM)
protocol and client API allows programs to be
started and managed on remote resources, despite
local heterogeneity - Resource Specification Language (RSL) is used to
communicate requirements - A layered architecture allows application-specific
resource brokers and co-allocators to be defined
in terms of GRAM services - Integrated with Condor, PBS, MPICH-G2,
55Resource Management Architecture
RSL specialization
RSL
Application
Information Service
Queries
Info
Ground RSL
Simple ground RSL
Local resource managers
GRAM
GRAM
GRAM
LSF
Condor
NQE
56Data Access Transfer
- GridFTP extended version of popular FTP protocol
for Grid data access and transfer - Secure, efficient, reliable, flexible,
extensible, parallel, concurrent, e.g. - Third-party data transfers, partial file
transfers - Parallelism, striping (e.g., on PVFS)
- Reliable, recoverable data transfers
- Reference implementations
- Existing clients and servers wuftpd, ncftp
- Flexible, extensible libraries in Globus Toolkit
57The Grid Information Problem
- Large numbers of distributed sensors with
different properties - Need for different views of this information,
depending on community membership, security
constraints, intended purpose, sensor type
58The Globus Toolkit Solution MDS-2
- Registration enquiry protocols, information
models, query languages - Provides standard interfaces to sensors
- Supports different directory structures
supporting various discovery/access strategies
59Globus Applications and Deployments
- Application projects include
- GriPhyN, PPDG, NEES, EU DataGrid, ESG, Fusion
Collaboratory, etc., etc. - Infrastructure deployments include
- DISCOM, NASA IPG, NSF TeraGrid, DOE Science Grid,
EU DataGrid, etc., etc. - UK Grid Center, U.S. GRIDS Center
- Technology projects include
- Data Grids, Access Grid, Portals, CORBA,
MPICH-G2, Condor-G, GrADS, etc., etc.
60Globus Futures
- Numerous large projects are pushing hard on
production deployment application - Much will be learned in next 2 years!
- Active RD program, focused for example on
- Security policy for resource sharing
- Flexible, high-perf., scalable data sharing
- Integration with Web Services etc.
- Programming models and tools
- Community code development producing a true Open
Grid Architecture
61Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
62Important Grid Applications
- Data-intensive
- Distributed computing (metacomputing)
- Collaborative
- Remote access to, and computer enhancement of,
experimental facilities
63Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
64Data Intensive Science 2000-2015
- Scientific discovery increasingly driven by IT
- Computationally intensive analyses
- Massive data collections
- Data distributed across networks of varying
capability - Geographically distributed collaboration
- Dominant factor data growth (1 Petabyte 1000
TB) - 2000 0.5 Petabyte
- 2005 10 Petabytes
- 2010 100 Petabytes
- 2015 1000 Petabytes?
How to collect, manage, access and interpret
this quantity of data?
Drives demand for Data Grids to
handleadditional dimension of data access
movement
65Data Grid Projects
- Particle Physics Data Grid (US, DOE)
- Data Grid applications for HENP expts.
- GriPhyN (US, NSF)
- Petascale Virtual-Data Grids
- iVDGL (US, NSF)
- Global Grid lab
- TeraGrid (US, NSF)
- Dist. supercomp. resources (13 TFlops)
- European Data Grid (EU, EC)
- Data Grid technologies, EU deployment
- CrossGrid (EU, EC)
- Data Grid technologies, EU emphasis
- DataTAG (EU, EC)
- Transatlantic network, Grid applications
- Japanese Grid Projects (APGrid) (Japan)
- Grid deployment throughout Japan
- Collaborations of application scientists
computer scientists - Infrastructure devel. deployment
- Globus based
66Grid Communities ApplicationsData Grids for
High Energy Physics
www.griphyn.org www.ppdg.net
www.eu-datagrid.org
67Biomedical InformaticsResearch Network (BIRN)
- Evolving reference set of brains provides
essential data for developing therapies for
neurological disorders (multiple sclerosis,
Alzheimers, etc.). - Today
- One lab, small patient base
- 4 TB collection
- Tomorrow
- 10s of collaborating labs
- Larger population sample
- 400 TB data collection more brains, higher
resolution - Multiple scale data integration and analysis
68Digital Radiology (Hollebeek, U. Pennsylvania)
mammograms X-rays MRI CAT scans endoscopies ...
- Hospital digital data
- Very large data sources great clinical value to
digital storage and manipulation and significant
cost savings - 7 Terabytes per hospital per year
- Dominated by digital images
- Why mammography
- Clinical need for film recall computer analysis
- Large volume ( 4,000 GB/year ) (57 of total)
- Storage and records standards exist
- Great clinical value
69Earth System Grid(ANL, LBNL, LLNL, NCAR, ISI,
ORNL)
- Enable a distributed community of thousands to
perform computationally intensive analyses on
large climate datasets - Via
- Creation of Data Grid supporting secure,
high-performance remote access - Smart data servers supporting reduction and
analyses - Integration with environmental data analysis
systems, protocols, and thin clients
www.earthsystemgrid.org (soon)
70Earth System Grid Architecture
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Selected Replica
Replica Selection
GridFTP commands
Performance Information Predictions
NWS
Disk Cache
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
71Data Grid Toolkit Architecture
72A UniversalAccess/Transport Protocol
- Suite of communication libraries and related
tools that support - GSI security
- Third-party transfers
- Parameter set/negotiate
- Partial file access
- Reliability/restart
- Logging/audit trail
- All based on a standard, widely deployed protocol
- Integrated instrumentation
- Parallel transfers
- Striping (cf DPSS)
- Policy-based access control
- Server-side computation
- later
73And the Universal Protocol is GridFTP
- Why FTP?
- Ubiquity enables interoperation with many
commodity tools - Already supports many desired features, easily
extended to support others - We use the term GridFTP to refer to
- Transfer protocol which meets requirements
- Family of tools which implement the protocol
- Note GridFTP gt FTP
- Note that despite name, GridFTP is not restricted
to file transfer!
74GridFTP Basic Approach
- FTP is defined by several IETF RFCs
- Start with most commonly used subset
- Standard FTP get/put etc., 3rd-party transfer
- Implement RFCed but often unused features
- GSS binding, extended directory listing, simple
restart - Extend in various ways, while preserving
interoperability with existing servers - Stripe/parallel data channels, partial file,
automatic manual TCP buffer setting, progress
and extended restart
75The GridFTP Family of Tools
- Patches to existing FTP code
- GSI-enabled versions of existing FTP client and
server, for high-quality production code - Custom-developed libraries
- Implement full GridFTP protocol, targeting custom
use, high-performance - Custom-developed tools
- E.g., high-performance striped FTP server
76High-Performance Data Transfer
77GridFTP for Efficient WAN Transfer
- Transfer Tb datasets
- Highly-secure authentication
- Parallel transfer for speed
- LLNL-gtChicago transfer (slow site network
interfaces) - FUTURE Integrate striped GridFTP with parallel
storage systems, e.g., HPSS
Parallel TransferFully utilizes bandwidth
of network interface on single nodes.
Parallel Filesystem
Parallel Filesystem
Striped TransferFully utilizes bandwidth of Gb
WAN using multiple nodes.
78GridFTP for User-Friendly Visualization Setup
- High-res visualization is too large for display
on a single system - Needs to be tiled, 24bit-gt16bit depth
- Needs to be staged to display units
- GridFTP/ActiveMural integration application
performs tiling, data reduction, and staging in a
single operation - PVFS/MPI-IO on server
- MPI process group transforms data as needed
before transfer - Performance is currently bounded by 100Mb/s NICs
on display nodes
79Distributed ComputingVisualization
Remote CenterGenerates Tb datasets
fromsimulation code
FLASH data transferredto ANL for
visualizationGridFTP parallelismutilizes high
bandwidth (Capable of utilizinggtGb/s WAN links)
WAN Transfer
Chiba City
Visualization code constructsand
stores high-resolutionvisualizationframes
fordisplay onmany devices
Job SubmissionSimulation code submitted
to remote center for execution on 1000s of nodes
LAN/WAN Transfer User-friendly striped GridFTP
application tiles the frames and stages tiles
onto display nodes
ActiveMural DisplayDisplays very high
resolutionlarge-screen dataset animations
- FUTURE (1-5 yrs)
- 10s Gb/s LANs, WANs
- End-to-end QoS
- Automated replica management
- Server-side data reduction analysis
- Interactive portals
80SC2001 ExperimentSimulation of HEP Tier 1 Site
- Tiered (Hierarchical) Site Structure
- All data generated at lower tiers must be
forwarded to the higher tiers - Tier 1 sites may have many sites transmitting to
them simultaneously and will need to sink a
substantial amount of bandwidth - We demonstrated the ability of GridFTP to support
this at SC 2001 in the Bandwidth Challenge - 16 Sites, with 27 Hosts, pushed a peak of 2.8 Gbs
to the showfloor in Denver with a sustained
bandwidth of nearly 2 Gbs
81Visualization of Network Traffic During the
Bandwidth Challenge
82The ReplicaManagement Problem
- Maintain a mapping between logical names for
files and collections and one or more physical
locations - Important for many applications
- Example CERN high-level trigger data
- Multiple petabytes of data per year
- Copy of everything at CERN (Tier 0)
- Subsets at national centers (Tier 1)
- Smaller regional centers (Tier 2)
- Individual researchers will have copies
83Our Approach to Replica Management
- Identify replica cataloging and reliable
replication as two fundamental services - Layer on other Grid services GSI, transport,
information service - Use as a building block for other tools
- Advantage
- These services can be used in a wide variety of
situations
84Replica Catalog Structure A Climate Modeling
Example
Replica Catalog
Logical Collection C02 measurements 1998
Logical Collection C02 measurements 1999
Filename Jan 1998 Filename Feb 1998
Logical File Parent
Location jupiter.isi.edu
Location sprite.llnl.gov
Filename Mar 1998 Filename Jun 1998 Filename
Oct 1998 Protocol gsiftp UrlConstructor
gsiftp//jupiter.isi.edu/ nfs/v6/climate
Filename Jan 1998 Filename Dec 1998 Protocol
ftp UrlConstructor ftp//sprite.llnl.gov/
pub/pcmdi
Logical File Jan 1998
Logical File Feb 1998
Size 1468762
85Giggle A Scalable Replication Location Service
- Local replica catalogs maintain definitive
information about replicas - Publish (perhaps approximate) information using
soft state techniques - Variety of indexing strategies possible
86GriPhyN App. Science CS Grids
- GriPhyN Grid Physics Network
- US-CMS High Energy Physics
- US-ATLAS High Energy Physics
- LIGO/LSC Gravity wave research
- SDSS Sloan Digital Sky Survey
- Strong partnership with computer scientists
- Design and implement production-scale grids
- Develop common infrastructure, tools and services
- Integration into the 4 experiments
- Application to other sciences via Virtual Data
Toolkit - Multi-year project
- RD for grid architecture (funded at 11.9M
1.6M) - Integrate Grid infrastructure into experiments
through VDT
87GriPhyN Institutions
- U Florida
- U Chicago
- Boston U
- Caltech
- U Wisconsin, Madison
- USC/ISI
- Harvard
- Indiana
- Johns Hopkins
- Northwestern
- Stanford
- U Illinois at Chicago
- U Penn
- U Texas, Brownsville
- U Wisconsin, Milwaukee
- UC Berkeley
- UC San Diego
- San Diego Supercomputer Center
- Lawrence Berkeley Lab
- Argonne
- Fermilab
- Brookhaven
88GriPhyN PetaScale Virtual Data Grids
Production Team
Individual Investigator
Workgroups
1 Petaop/s 100 Petabytes
Interactive User Tools
Request Planning
Request Execution
Virtual Data Tools
Management Tools
Scheduling Tools
Resource
Other Grid
Security and
Management
Policy
Services
Services
Services
Transforms
Distributed resources(code, storage,
CPUs,networks)
Raw data
source
89GriPhyN/PPDGData Grid Architecture
Application
initial solution is operational
DAG
Catalog Services
Monitoring
Planner
Info Services
DAG
Repl. Mgmt.
Executor
Policy/Security
Reliable Transfer Service
Compute Resource
Storage Resource
Ewa Deelman, Mike Wilde
90GriPhyN Research Agenda
- Virtual Data technologies
- Derived data, calculable via algorithm
- Instantiated 0, 1, or many times (e.g., caches)
- Fetch value vs. execute algorithm
- Potentially complex (versions, cost calculation,
etc) - E.g., LIGO Get gravitational strain for 2
minutes around 200 gamma-ray bursts over last
year - For each requested data value, need to
- Locate item materialization, location, and
algorithm - Determine costs of fetching vs. calculating
- Plan data movements, computations to obtain
results - Execute the plan
91Virtual Data in Action
- Data request may
- Compute locally
- Compute remotely
- Access local data
- Access remote data
- Scheduling based on
- Local policies
- Global policies
- Cost
Major facilities, archives
Regional facilities, caches
Local facilities, caches
92GriPhyN Research Agenda (cont.)
- Execution management
- Co-allocation (CPU, storage, network transfers)
- Fault tolerance, error reporting
- Interaction, feedback to planning
- Performance analysis (with PPDG)
- Instrumentation, measurement of all components
- Understand and optimize grid performance
- Virtual Data Toolkit (VDT)
- VDT virtual data services virtual data tools
- One of the primary deliverables of RD effort
- Technology transfer to other scientific domains
93Programs as Community ResourcesData Derivation
and Provenance
- Most scientific data are not simple
measurements essentially all are - Computationally corrected/reconstructed
- And/or produced by numerical simulation
- And thus, as data and computers become ever
larger and more expensive - Programs are significant community resources
- So are the executions of those programs
94 Ive come across some interesting data, but I
need to understand the nature of the corrections
applied when it was constructed before I can
trust it for my purposes.
Ive detected a calibration error in an
instrument and want to know which derived data to
recompute.
Data
created-by
consumed-by/ generated-by
Transformation
Derivation
execution-of
I want to apply an astronomical analysis program
to millions of objects. If the results already
exist, Ill save weeks of computation.
I want to search an astronomical database for
galaxies with certain characteristics. If a
program that performs this analysis exists, I
wont have to write one from scratch.
95The Chimera Virtual Data System(GriPhyN Project)
- Virtual data catalog
- Transformations, derivations, data
- Virtual data language
- Data definition query
- Applications include browsers and data analysis
applications
96SDSS Galaxy Cluster Finding
97Cluster-finding Data Pipeline
98Virtual Data in CMS
Virtual Data Long Term Vision of CMS CMS Note
2001/047, GRIPHYN 2001-16
99Early GriPhyN Challenge ProblemCMS Data
Reconstruction
2) Launch secondary job on WI pool input files
via Globus GASS
Master Condor job running at Caltech
Secondary Condor job on WI pool
5) Secondary reports complete to master
Caltech workstation
6) Master starts reconstruction jobs via Globus
jobmanager on cluster
3) 100 Monte Carlo jobs on Wisconsin Condor pool
9) Reconstruction job reports complete to master
4) 100 data files transferred via GridFTP, 1 GB
each
7) GridFTP fetches data from UniTree
NCSA Linux cluster
NCSA UniTree - GridFTP-enabled FTP server
8) Processed objectivity database stored to
UniTree
Scott Koranda, Miron Livny, others
100Trace of a Condor-G Physics Run
101Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
102Distributed Computing
- Aggregate computing resources codes
- Multidisciplinary simulation
- Metacomputing/distributed simulation
- High-throughput/parameter studies
- Challenges
- Heterogeneous compute network capabilities,
latencies, dynamic behaviors - Example tools
- MPICH-G2 Grid-aware MPI
- Condor-G, Nimrod-G parameter studies
103Multidisciplinary Simulations Aviation Safety
Wing Models
- Lift Capabilities
- Drag Capabilities
- Responsiveness
Stabilizer Models
Airframe Models
- Deflection capabilities
- Responsiveness
Crew Capabilities - accuracy - perception -
stamina - re-action times - SOPs
Engine Models
- Braking performance
- Steering capabilities
- Traction
- Dampening capabilities
- Thrust performance
- Reverse Thrust performance
- Responsiveness
- Fuel Consumption
Landing Gear Models
Whole system simulations are produced by coupling
all of the sub-system simulations
104MPICH-G2 A Grid-Enabled MPI
- A complete implementation of the Message Passing
Interface (MPI) for heterogeneous, wide area
environments - Based on the Argonne MPICH implementation of MPI
(Gropp and Lusk) - Requires services for authentication, resource
allocation, executable staging, output, etc. - Programs run in wide area without change
- Modulo accommodating heterogeneous communication
performance - See also MetaMPI, PACX, STAMPI, MAGPIE
www.globus.org/mpi
105Grid-based Computation Challenges
- Locate suitable computers
- Authenticate with appropriate sites
- Allocate resources on those computers
- Initiate computation on those computers
- Configure those computations
- Select appropriate communication methods
- Compute with suitable algorithms
- Access data files, return output
- Respond appropriately to resource changes
106MPICH-G2 Use of Grid Services
grid-proxy-init mpirun -np 256 myprog
Generates resource specification
mpirun
107Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke)
- Modular, portable framework for parallel,
multidimensional simulations - Construct codes by linking
- Small core (flesh) mgmt services
- Selected modules (thorns) Numerical methods,
grids domain decomps, visualization and
steering, etc. - Custom linking/configuration tools
- Developed for astrophysics, but not
astrophysics-specific
Thorns
Cactus flesh
www.cactuscode.org
108Cactus An ApplicationFramework for Dynamic Grid
Computing
- Cactus thorns for active management of
application behavior and resource use - Heterogeneous resources, e.g.
- Irregular decompositions
- Variable halo for managing message size
- Msg compression (comp/comm tradeoff)
- Comms scheduling for comp/comm overlap
- Dynamic resource behaviors/demands, e.g.
- Perf monitoring, contract violation detection
- Dynamic resource discovery migration
- User notification and steering
109Cactus ExampleTerascale Computing
- Solved EEs for gravitational waves (real code)
- Tightly coupled, communications required through
derivatives - Must communicate 30MB/step between machines
- Time step take 1.6 sec
- Used 10 ghost zones along direction of machines
communicate every 10 steps - Compression/decomp. on all data passed in this
direction - Achieved 70-80 scaling, 200GF (only 14 scaling
without tricks)
110Cactus Example (2)Migration in Action
111IPG Milestone 3Large Computing NodeCompleted
12/2000
high-lift subsonicwind tunnel model
Glenn Cleveland, OH
Ames Moffett Field, CA
Langley Hampton, VA
Sharp
OVERFLOW on IPG using Globus and
MPICH-G2 for intra-problem, wide area
communication
Lomax 512 node SGI Origin 2000
Application POC Mohammad J. Djomehri
Slide courtesy Bill Johnston, LBNL NASA
112High-Throughput ComputingCondor
- High-throughput computing platform for mapping
many tasks to idle computers - Three major components
- Scheduler manages pool(s) of distributively
owned or dedicated computers - DAGman manages user task pools
- Matchmaker schedules tasks to computers
- Parameter studies, data analysis
- Condor-G extensions support wide area execution
in Grid environment
www.cs.wisc.edu/condor
113Defining a DAG
- A DAG is defined by a .dag file, listing each of
its nodes and their dependencies - diamond.dag
- Job A a.sub
- Job B b.sub
- Job C c.sub
- Job D d.sub
- Parent A Child B C
- Parent B C Child D
- Each node runs the Condor job specified by its
accompanying Condor submit file
114High-Throughput ComputingMathematicians Solve
NUG30
- Looking for the solution to the NUG30 quadratic
assignment problem - An informal collaboration of mathematicians and
computer scientists - Condor-G delivered 3.46E8 CPU seconds in 7 days
(peak 1009 processors) in U.S. and Italy (8 sites)
14,5,28,24,1,3,16,15, 10,9,21,2,4,29,25,22, 13,26,
17,30,6,20,19, 8,18,7,27,12,11,23
MetaNEOS Argonne, Iowa, Northwestern, Wisconsin
115Grid Application Development Software (GrADS)
Project
hipersoft.rice.edu/grads
116Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
117Access Grid
- High-end group work and collaboration technology
- Grid services being used for discovery,
configuration, authentication - O(50) systems deployed worldwide
- Basis for SC2001 SC Global event in November
2001 - www.scglobal.org
www.accessgrid.org
118Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
119Grid-Enabled Research FacilitiesLeverage
Investments
- Research instruments, satellites, particle
accelerators, MRI machines, etc., cost a great
deal - Data from those devices can be accessed and
analyzed by many more scientists - Not just the team that gathered the data
- More productive use of instruments
- Calibration, data sampling during a run, via
on-demand real-time processing
120Telemicroscopy Grid-Based Computing
DATA ACQUISITION
PROCESSING,ANALYSIS
ADVANCEDVISUALIZATION
NETWORK
COMPUTATIONALRESOURCES
IMAGING INSTRUMENTS
LARGE-SCALE DATABASES
121APAN Trans-Pacific Telemicroscopy Collaboration,
Osaka-U, UCSD, ISI(slide courtesy Mark
Ellisman_at_UCSD)
1st
NCMIR (San Diego)
UHVEM (Osaka, Japan)
(Chicago) STAR TAP
(San Diego) SDSC
Tokyo XP
TransPAC
vBNS
Globus
UCSD
CRL/MPT
2nd
NCMIR (San Diego)
UHVEM (Osaka, Japan)
122Network forEarthquake Engineering Simulation
- NEESgrid US national infrastructure to couple
earthquake engineers with experimental
facilities, databases, computers, each other - On-demand access to experiments, data streams,
computing, archives, collaboration
Argonne, Michigan, NCSA, UIUC, USC
www.neesgrid.org
123Experimental Facilities Can Include Field Sites
- Remotely controlled sensor grids for field
studies, e.g., in seismology and biology - Wireless/satellite communications
- Sensor net technology for low-cost communications
124Outline
- The technology landscape
- Grid computing
- The Globus Toolkit
- Applications and technologies
- Data-intensive distributed computing
collaborative remote access to facilities - Grid infrastructure
- Open Grid Services Architecture
- Global Grid Forum
- Summary and conclusions
125Nature and Role of Grid Infrastructure
- Persistent Grid infrastructure is critical to the
success of many eScience projects - High-speed networks, certainly
- Remotely accessible compute storage
- Persistent, standard services PKI, directories,
reservation, - Operational support procedures
- Many projects creating such infrastructures
- Production operation is the goal, but much to
learn about how to create operate
126A National Grid Infrastructure
127Example Grid Infrastructure Projects
- I-WAY (1995) 17 U.S. sites for one week
- GUSTO (1998) 80 sites worldwide, exp
- NASA Information Power Grid (since 1999)
- Production Grid linking NASA laboratories
- INFN Grid, EU DataGrid, iVDGL, (2001)
- Grids for data-intensive science
- TeraGrid, DOE Science Grid (2002)
- Production Grids link supercomputer centers
- U.S. GRIDS Center
- Software packaging, deployment, support
128The 13.6 TF TeraGridComputing at 40 Gb/s
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
NCSA, SDSC, Caltech, Argonne
www.teragrid.org
129TeraGrid (Details)
130Targeted StarLightOptical Network Connections
CERN
Asia-Pacific
SURFnet
CAnet4
Vancouver
Seattle
NTON
Portland
U Wisconsin
San Francisco
NYC
Chicago
PSC
NTON
IU
NCSA
Asia-Pacific
DTF 40Gb
Los Angeles
Atlanta
San Diego (SDSC)
AMPATH
www.startap.net
131CAnet 4 Architecture
CANARIE
GigaPOP
ORAN DWDM
Carrier DWDM
Edmonton
Saskatoon
St. Johns
Calgary
Regina
Quebec
Winnipeg
Charlottetown
Thunder Bay
Montreal
Victoria
Ottawa
Vancouver
Fredericton
Halifax
Boston
Chicago
Seattle
New York
CAnet 4 node)
Toronto
Possible future CAnet 4 node
Windsor
1322001.9.3
622Mbps x 2
133iVDGL A Global Grid Laboratory
We propose to create, operate and evaluate, over
asustained period of time, an international
researchlaboratory for data-intensive
science. From NSF proposal, 2001
- International Virtual-Data Grid Laboratory
- A global Grid laboratory (US, Europe, Asia, South
America, ) - A place to conduct Data Grid tests at scale
- A mechanism to create common Grid infrastructure
- A laboratory for other disciplines to perform
Data Grid tests - A focus of outreach efforts to small institutions
- U.S. part funded by NSF (2001-2006)
- 13.7M (NSF) 2M (matching)
134Initial US-iVDGL Data Grid
SKC
BU
Wisconsin
PSU
BNL
Fermilab
Hampton
Indiana
JHU
Caltech
UCSD
Florida
Brownsville
Other sites to be added in 2002
135iVDGLInternational Virtual Data Grid Laboratory
U.S. PIs Avery, Foster, Gardner, Newman, Szalay
www.ivdgl.org
136iVDGL Architecture(from proposal)
137US iVDGL Interoperability
- US-iVDGL-1 Milestone (August 02)
US-iVDGL-1 Aug 2002
iGOC
ATLAS
SDSS/NVO
CMS
LIGO
1
1
2
2
1
1
2
2
138Transatlantic Interoperability
- iVDGL-2 Milestone (November 02)
DataTAG
iVDGL-2 Nov 2002
iGOC
Outreach
ATLAS
SDSS/NVO
CMS
LIGO
ANL
CS Research
UC
BNL
FNAL
CERN
BU
CIT
CIT
JHU
INFN
HU
PSU
UCSD
UK PPARC
IU
ANL
UTB
UF
U of A
LBL
UC
UWM
FNAL
UM
UCB
OU
IU
UTA
ISI
NU
UW
139Another ExampleINFN Grid in Italy
- 20 sites, 200 persons, 90 FTEs, 20 IT
- Preliminary budget for 3 years 9 M Euros
- Activities organized around
- S/w development with Datagrid, DataTAG.
- Testbeds (financed by INFN) for DataGrid,
DataTAG, US-EU Intergrid - Experiments applications
- Tier1..Tiern prototype infrastructure
- Large scale testbeds provided by LHC experiments,
Virgo..
140U.S. GRIDS CenterNSF Middleware Infrastructure
Program
- GRIDS Grid Research, Integration, Deployment,
Support - NSF-funded center to provide
- State-of-the-art middleware infrastructure to
support national-scale collaborative science and
engineering - Integration platform for experimental middleware
technologies - ISI, NCSA, SDSC, UC, UW
- NMI software release o