Title: CS 267: Applications of Parallel Computers Grid Computing
1CS 267 Applications of Parallel
ComputersGrid Computing
- Kathy Yelick
- Material based on lectures by
- Ian Foster, Carl Kesselman
2Grid Computing is NOT
Slide source Jim Napolitano _at_ RPI
3Problem Overview
- Many activities in research and education are
collaborative - sharing of data and code
- sharing of computing and experimental facilities
- Goal of grids is to simplify these activities
- Approach
- Create advanced middleware services that enable
large-scale, flexible resource sharing on a
national and international scale.
Slide source Ian Foster _at_ ANL
4Outline
- Problem statement What are Grids?
- Architecture and technologies
- Projects
- Future
5The Grid Problem
- Resource sharing coordinated problem solving
in dynamic, multi-institutional virtual
organizations
Slide source Ian Foster _at_ ANL
6What is a Grid?
- The "Grid problem is defined by Ian Foster as
- flexible, secure, coordinated resource sharing
- among dynamic collections of individuals,
institutions, and resources (referred to as
virtual organizations) - Terminology from analogy to the electric power
grid - Computing as a (public) utility
- The key technical challenges in grid computing
- authentication
- authorization
- resource access
- resource discovery
- Grid sometime categorized
- Data grids, computational grids, business grids,
Slide derived from Sara Murphy _at_ HP
7The evolving notion of the Grid
- A computational grid is a hardware and software
infrastructure that provides dependable,
consistent, pervasive, and inexpensive access to
high-end computing capabilities. - Ian Foster and Carl Kesselman, editors, The
GRID Blueprint for a New Computing
Infrastructure (Morgan-Kaufmann Publishers, SF,
1999) 677 pp. ISBN 1-55860-8 - The Grid is an infrastructure to enable virtual
communities to share distributed resources to
pursue common goals - The Grid infrastructure consists of protocols,
application programming interfaces, and software
development kits to provide authentication,
authorization, and resource location/access - Foster, Kesselman, Tuecke The Anatomy of the
Grid Enabling Scalable Virtual Organizations
http//www.globus.org/research/papers.html - The Grid integrates services across distributed,
heterogeneous, dynamic virtual organizations
formed from the disparate resources within a
single enterprise and/or from external resource
sharing and service provider relationships in
both e-business and e-science - Foster, Kesselman, Nick, Tuecke The Physiology
of the Grid http//www.globus.org/research/papers
/ogsa.pdf
8Elements of the Problem
- Resource sharing
- Computers, storage, sensors, networks,
- Sharing always conditional issues of trust,
policy, negotiation, payment, - Coordinated problem solving
- Beyond client-server distributed data analysis,
computation, collaboration, - Dynamic, multi-institutional virtual
organizations - Community overlays on classic org structures
- Large or small, static or dynamic
Slide source Ian Foster _at_ ANL
9Why Grids?
- A biochemist exploits 10,000 computers to screen
100,000 compounds in an hour - 1,000 physicists worldwide pool resources for
petaflop analyses of petabytes of data - Civil engineers collaborate to design, execute,
analyze shake table experiments - Climate scientists visualize, annotate, analyze
terabyte simulation datasets - A home user invokes architectural design
functions at an application service provider - Service provider buys cycles from compute cycle
providers
Slide source Dave Angulo _at_ U Chicago
10Why Grids Now?
Conventional Wisdom, General Terms
- Deployed internet bandwidth is increasing at a
faster rate than either CPU speed or memory or
data storage size - Therefore, it makes more sense to plan for
computing that is portable so that it can be
distributed worldwide.
11Why Grids Now?
- CPU speed doubles every 18 months
- Moores Law
- Data storage doubles every 12 months
- Deployed network bandwidth doubles every 9 months
- 1986-2001 x340,000
- Gilders Law
- Internet bandwidth increasing faster than CPU
speed or memory or data storage size - Therefore, plan for computing that is portable
so that it can be distributed worldwide.
12The Grid World Current Status
- Dozens of major Grid projects in scientific
technical computing/research education - Deployment, application, technology
- Considerable consensus on key concepts and
technologies - Globus Toolkit has emerged as de facto standard
for major protocols services - Although there are still competing alternatives
- Global Grid Forum is a significant force
- Cross project commnity
Slide derived from Dave Angulo _at_ U Chicago
13Science Grids Really Big Science
- The process of Large-Scale Science is changing
- Large-scale science and engineering problems
- require collaborative use of many compute, data,
and instrument resources all of which must be
integrated with application components and - efficient use of large resources is important
- data sets that are
- developed by independent teams of researchers
- or are obtained from multiple instruments
- at different geographic locations
-
Slide derived from Bill Johnston _at_ LBNL
14Grid Applications in Physics
- GriPhyN
- CS and Physics collaboration to develop virtual
data concept - Physics ATLAS, CMS, LIGO, Sloan Digital Sky
Survey - CS build on existing technologies such as
Globus, Condor, fault tolerance, storage
management (SRB), plus new computer science
research - Main goal is to develop a virtual data catalog
and data language (VDC, VDL, VDLI) - iVDGL
- International Virtual Data Grid Laboratory
- Platform to design, implement, integrate and test
grid software - Infrastructure for ATLAS and CMS prototype Tier2
centers - Forum for grid interoperability collaborate
with EU DataGrid, DataTag, etc.
Slide source Rob Gardner _at_ Indiana U
15Data Grids for High Energy Physics
Compact Muon Spectrometer at CERN
PBytes/sec
100 MBytes/sec
Offline Processor Farm 20 TIPSz
There is a bunch crossing every 25 nsecs. There
are 100 triggers per second Each triggered
event is 1 MByte in size
100 MBytes/sec
Tier 0
CERN Computer Centre
622 Mbits/sec
or Air Freight
(deprecated)
Tier 1
FermiLab 4 TIPS
France Regional Centre
Italy Regional Centre
Germany Regional Centre
622 Mbits/sec
Tier 2
622 Mbits/sec
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
1 MBytes/sec
Tier 4
Physicist workstations
Slide source Ian Foster _at_ ANL
Image courtesy Harvey Newman, Caltech
16Network for Earthquake Eng. Simulation
- NEESgrid national infrastructure to couple
earthquake engineers with experimental
facilities, databases, computers, each other - On-demand access to experiments, data streams,
computing, archives, collaboration
Slide source Ian Foster _at_ ANL
NEESgrid Argonne, Michigan, NCSA, UIUC, USC
17What is a Grid Architecture?
- Descriptive
- Provide a common vocabulary for use when
describing Grid systems - Guidance
- Identify key areas in which services are required
- Prescriptive
- Define standard Intergrid protocols and APIs to
facilitate creation of interoperable Grid systems
and portable applications
Slide source Ian Foster _at_ ANL
18What Sorts of Standards?
- Need for interoperability when different groups
want to share resources - E.g., IP lets me talk to your computer, but how
do we establish maintain sharing? - How do I discover, authenticate, authorize,
describe what I want to do, etc., etc.? - Need for shared infrastructure services to avoid
repeated development, installation, e.g. - One port/service for remote access to computing,
not one per tool/application - X.509 enables sharing of Certificate Authorities
Slide source Ian Foster _at_ ANL
19A Grid Architecture Must Address
- Development of Grid protocols services
- Protocol-mediated access to remote resources
- New services e.g., resource brokering
- On the Grid speak Intergrid protocols
- Mostly (extensions to) existing protocols
- Development of Grid APIs SDKs
- Facilitate application development by supplying
higher-level abstractions - The model is the Internet and Web
Slide source Ian Foster _at_ ANL
20Grid Services (aka Middleware) and Tools
net
Slide source Ian Foster _at_ ANL
21Layered Grid Architecture
A Grid architecture uses layers, analogous to IP
layers
Application
Slide source Ian Foster _at_ ANL
22Where Are We With Architecture?
- No official standards exist
- But
- Globus Toolkit has emerged as the de facto
standard for several important Connectivity,
Resource, and Collective protocols - GGF has an architecture working group
- Technical specifications are being developed for
architecture elements e.g., security, data,
resource management, information - Internet drafts submitted in security area
Slide source Ian Foster _at_ ANL
23Grid Services Architecture (2) Connectivity Layer
- Communication
- Internet protocols IP, DNS, routing, etc.
- Security Grid Security Infrastructure (GSI)
- Uniform authentication authorization mechanisms
in multi-institutional setting - Single sign-on, delegation, identity mapping
- Public key technology, SSL, X.509, GSS-API
(several Internet drafts document extensions) - Supporting infrastructure Certificate
Authorities, key management, etc.
Slide source Ian Foster _at_ ANL
24GSI in Action Create Processes at A and B that
Communicate Access Files at C
User
Site B (Unix)
Site A (Kerberos)
Computer
Computer
Site C (Kerberos)
Storage system
Slide source Ian Foster _at_ ANL
25Grid Services Architecture (3) Resource Layer
- Resource Layer has Protocols and Services
- Resource management GRAM
- Remote allocation, reservation, monitoring,
control of compute resources - Data access GridFTP
- High-performance data access transport
- Information MDS (GRRP, GRIP)
- Access to structure state information
- others emerging catalog access, code
repository access, accounting, - All integrated with GSI
Slide source Ian Foster _at_ ANL
26GRAM Resource Management Protocol
- Grid Resource Allocation Management
- Allocation, monitoring, control of computations
- Simple HTTP-based RPC
- Job request Returns opaque, transferable job
contact string for access to job - Job cancel, Job status, Job signal
- Event notification (callbacks) for state changes
- Protocol/server address robustness (exactly once
execution), authentication, authorization - Servers for most schedulers C and Java APIs
Slide source Ian Foster _at_ ANL
27Resource Management
- Advance reservations
- As prototyped in GARA in previous 2 years
- Multiple resource types
- Manage anything storage, networks, etc., etc.
- Recoverable requests, timeout, etc.
- Build on early work with Condor group
- Use of SOAP (RPC using HTTP XML)
- First step towards Web Services
- Policy evaluation points for restricted proxies
Slide source Ian Foster _at_ ANL
Karl Czajkowski, Steve Tuecke, others
28Data Access Transfer
- GridFTP extended version of popular FTP protocol
for Grid data access and transfer - Secure, efficient, reliable, flexible,
extensible, parallel, concurrent, e.g. - Third-party data transfers, partial file
transfers - Parallelism, striping (e.g., on PVFS)
- Reliable, recoverable data transfers
- Reference implementations
- Existing clients and servers wuftpd, nicftp
- Flexible, extensible libraries
Slide source Ian Foster _at_ ANL
29Grid Services Architecture (4) Collective Layer
- Index servers aka metadirectory services
- Custom views on dynamic resource collections
assembled by a community - Resource brokers (e.g., Condor Matchmaker)
- Resource discovery and allocation
- Replica management and replica selection
- Optimize aggregate data access performance
- Co-reservation and co-allocation services
- End-to-end performance
- Etc.
Slide source Ian Foster _at_ ANL
30The Grid Information Problem
- Large numbers of distributed sensors with
different properties - Need for different views of this information,
depending on community membership, security
constraints, intended purpose, sensor type
Slide source Ian Foster _at_ ANL
31The Globus Toolkit Solution MDS-2
- Registration enquiry protocols, information
models, query languages - Provides standard interfaces to sensors
- Supports different directory structures
supporting various discovery/access strategies
Slide source Ian Foster _at_ ANL
Karl Czajkowski, Steve Fitzgerald, others
32GriPhyN/PPDG Data Grid Architecture
Application
initial solution is operational
DAG
Catalog Services
Monitoring
Planner
Info Services
Repl. Mgmt.
DAG
Executor
Policy/Security
Reliable Transfer Service
Compute Resource
Storage Resource
Slide source Ian Foster _at_ ANL
Ewa Deelman, Mike Wilde, others
www.griphyn.org
33The Network Weather Service
- A distributed system for producing short-term
deliverable performance forecasts - Goal dynamically measure and forecast the
performance deliverable at the application level
from a set of network resources - Measurements currently supported
- Available fraction of CPU time
- End-to-end TCP connection time
- End-to-end TCP network latency
- End-to-end TCP network bandwidth
Slide source Rich Wolski _at_ UCSB
34NWS System Architecture
- Design objectives
- Scalability scales to any metacomputing
infrastructure - Predictive accuracy provides accurate
measurements and forecasts - Non-intrusiveness shouldnt load the resources
it monitors - Execution longevity available all time
- Ubiquity accessible from everywhere, monitors
all resources
Slide source Rich Wolski _at_ UCSB
35System Components
- Four different component processes
- Persistent State process handles storage of
measurements - Name Server process directory server for the
system - Sensor processes measure current performance of
different resources - Forecaster process predicts deliverable
performance of a resource during a given time
Slide source Rich Wolski _at_ UCSB
36NWS Processes
Slide source Rich Wolski _at_ UCSB
37NWS Components
- Persistent State Management
- Naming Server
- Performance Monitoring NWS Sensors
- CPU Sensor
- Network Sensor
- Sensor Control
- Cliques hierarchy and contention
- Adaptive time-out discovery
- Forecasting
- Forecaster and forecasting models
- Sample forecaster results
Slide source Rich Wolski _at_ UCSB
38Persistent State Management
- All NWS processes are stateless
- The system state (measurements) are managed by
the PS process - Storage retrieval of measurements
- Measurements are time-stamped plain-text strings
- Measurements are written to disk immediately and
acknowledged - Measurements are stored in a circular queue of
tunable size
Slide source Rich Wolski _at_ UCSB
39Naming Server
- Primitive text string directory service for the
NWS system - The only component known system-wide
- Information stored include
- Name to IP binding information
- Group configuration
- Parameters for various processes
- Each process must refresh its registration with
the name server periodically - Centralized
Slide source Rich Wolski _at_ UCSB
40Performance Monitoring
- Actual monitoring is performed by a set of
sensors - Accuracy vs. Intrusiveness
- A sensors life
Register with the NS Query the NS for
parameters Generate conditional test Forever
if conditions are met then perform
test time-stamp results and send them to the
PS refresh registration with the NS
Slide source Rich Wolski _at_ UCSB
41CPU Sensor
- Measures available CPU fraction
- Testing tools
- Unix uptime reports load average in the past x
minutes - Unix vmstat reports idle-, user- and system-time
- Active probes
- Accuracy
- Results assume a full priority job
- Doesnt know the priority of jobs in the queue
Slide source Rich Wolski _at_ UCSB
42Active Probing Improvements
Measurements produced using vmstat
Measurements produced using uptime
Slide source Rich Wolski _at_ UCSB
43Network Sensor
- Carries network-related measurements
- Testing using active network probes
- Establish and release TCP connections
- Moving large (small) data to measure bandwidth
(delay) - Measures connections with all peer sensors
- Problems
- Accuracy depends on socket interface
- Complexity N2-N tests, collisions (contention)
Slide source Rich Wolski _at_ UCSB
44Network Sensor Control
- Sensors are organized into sensor sets called
cliques - Each clique is configurable and has one leader
- Clique sets are logical, but can be based on
physical topology - Leaders are elected using a distributed election
protocol - A sensor can participate in many cliques
- Advantages
- Scalability by organizing cliques in a hierarchy
- Reduce the N2-N
- Accuracy by more frequent tests
Slide source Rich Wolski _at_ UCSB
45Clique Hierarchy
Slide source Rich Wolski _at_ UCSB
46Contention
- Each leader maintains a clique token (and time
between tokens) - The sensor that has the token performs all its
tests then passes the token to the next sensor in
the list - Adaptive time-out discovery
- Tokens have time-out field
- Tokens have sequence numbers
- The leader adaptively controls the time-out
Slide source Rich Wolski _at_ UCSB
47Forecaster Process
- A forecasting driver compile-time prediction
modules - Forecasting process
- Fetching required measurements from the PS
- Passing the time series to each prediction module
- Choosing the best returned prediction
- Incorporate sophisticated prediction techniques?
UC Santa Barbara Kansas State U. Recorded
Bandwidth
UC Santa Barbara Kansas State U. Forecasted
Bandwidth
Slide source Rich Wolski _at_ UCSB
48Sample Graph
49Selected Major Grid Projects
Name URL Sponsors Focus
Access Grid www.mcs.anl.gov/FL/accessgrid DOE, NSF Create deploy group collaboration systems using commodity technologies
BlueGrid IBM Grid testbed linking IBM laboratories
DISCOM www.cs.sandia.gov/discomDOE Defense Programs Create operational Grid providing access to resources at three U.S. DOE weapons laboratories
DOE Science Grid sciencegrid.org DOE Office of Science Create operational Grid providing access to resources applications at U.S. DOE science laboratories partner universities
Earth System Grid (ESG) earthsystemgrid.orgDOE Office of Science Delivery and analysis of large climate model datasets for the climate research community
European Union (EU) DataGrid eu-datagrid.org European Union Create apply an operational grid for applications in high energy physics, environmental science, bioinformatics
New
New
Slide source Ian Foster _at_ ANL
50Selected Major Grid Projects
Name URL/Sponsor Focus
EuroGrid, Grid Interoperability (GRIP) eurogrid.org European Union Create technologies for remote access to supercomputer resources simulation codes in GRIP, integrate with Globus
Fusion Collaboratory fusiongrid.org DOE Off. Science Create a national computational collaboratory for fusion research
Globus Project globus.org DARPA, DOE, NSF, NASA, Msoft Research on Grid technologies development and support of Globus Toolkit application and deployment
GridLab gridlab.org European Union Grid technologies and applications
GridPP gridpp.ac.uk U.K. eScience Create apply an operational grid within the U.K. for particle physics research
Grid Research Integration Dev. Support Center grids-center.org NSF Integration, deployment, support of the NSF Middleware Infrastructure for research education
New
New
New
New
New
Slide source Ian Foster _at_ ANL
51Selected Major Grid Projects
Name URL/Sponsor Focus
Grid Application Dev. Software hipersoft.rice.edu/grads NSF Research into program development technologies for Grid applications
Grid Physics Network griphyn.org NSF Technology RD for data analysis in physics expts ATLAS, CMS, LIGO, SDSS
Information Power Grid ipg.nasa.gov NASA Create and apply a production Grid for aerosciences and other NASA missions
International Virtual Data Grid Laboratory ivdgl.org NSF Create international Data Grid to enable large-scale experimentation on Grid technologies applications
Network for Earthquake Eng. Simulation Grid neesgrid.org NSF Create and apply a production Grid for earthquake engineering
Particle Physics Data Grid ppdg.net DOE Science Create and apply production Grids for data analysis in high energy and nuclear physics experiments
New
New
Slide source Ian Foster _at_ ANL
52Selected Major Grid Projects
Name URL/Sponsor Focus
TeraGrid teragrid.org NSF U.S. science infrastructure linking four major resource sites at 40 Gb/s
UK Grid Support Center grid-support.ac.uk U.K. eScience Support center for Grid projects within the U.K.
Unicore BMBFT Technologies for remote access to supercomputers
New
New
Also many technology RD projects e.g., Condor,
NetSolve, Ninf, NWS See also www.gridforum.org
Slide source Ian Foster _at_ ANL
53The 13.6 TF TeraGridComputing at 40 Gb/s
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
Slide source Ian Foster _at_ ANL
TeraGrid/DTF NCSA, SDSC, Caltech, Argonne
www.teragrid.org
54iVDGLInternational Virtual Data Grid Laboratory
Slide source Ian Foster _at_ ANL
U.S. PIs Avery, Foster, Gardner, Newman, Szalay
www.ivdgl.org
55NSF GRIDS Center
- Grid Research, Integration, Deployment, Support
(GRIDS) Center - Develop, deploy, support
- Middleware infrastructure for national-scale
collaborative science and engineering - Integration platform for experimental middleware
technologies - UC, USC/ISI, UW, NCSA, SDSC
- Partner with Internet-2, SURA, Educause in NSF
Middleware Initiative
Slide source Ian Foster _at_ ANL
www.grids-center.org
www.nsf-middleware-org
56The State of Grids Some Case Studies
- Further, Grids are becoming a critical element of
many projects e.g. - The High Energy Physics problem of managing and
analyzing petabytes of data per year has driven
the development of Grid Data Services - The National Earthquake Engineering Simulation
Grid has developed a highly application oriented
approach to using Grids - The Astronomy data federation problem has
promoted work in Web Services based interfaces
57High Energy Physics Data Management
- Petabytes of data per year must be distributed to
hundreds of sites around the world for analysis - This involves
- Reliable, wide-area, high-volume data management
- Global naming, replication, and caching of
datasets - Easily accessible pools of computing resources
- Grids have been adopted as the infrastructure for
this HEP data problem
58High Energy Physics Data Management CERN / LHC
Data One of Sciences most challenging data
management problems
100 MBytes/sec
event simulation
Online System
PByte/sec
Tier 0 1
eventreconstruction
human2m
HPSS
CERN LHC CMS detector 15m X 15m X 22m, 12,500
tons, 700M.
2.5 Gbits/sec
Tier 1
German Regional Center
French Regional Center
FermiLab, USA Regional Center
Italian Center
0.6-2.5 Gbps
analysis
Tier 2
0.6-2.5 Gbps
Tier 3
CERN/CMS data goes to 6-8 Tier 1 regional
centers, and from each of these to 6-10 Tier 2
centers. Physicists work on analysis channels
at 135 institutes. Each institute has 10
physicists working on one or more channels. 2000
physicists in 31 countries are involved in this
20-year experiment in which DOE is a major player.
Institute 0.25TIPS
Institute
Institute
Institute
100 - 1000 Mbits/sec
Physics data cache
Tier 4
Courtesy Harvey Newman, CalTech
Workstations
59High Energy Physics Data Management
- Virtual data catalogues and on-demand data
generation have turned out to be an essential
aspect - Some types of analysis are pre-defined and
catalogued prior to generation - and then the
data products are generated on demand when the
virtual data catalogue is accessed - Sometimes regenerating derived data is faster and
easier than trying to store and/or retrieve that
data from remote repositories - For similar reasons this is also of great
interest to the EOS (Earth Observing Satellite)
community
60US-CMS/LHC Grid Data Services TestbedInternation
al Virtual Data Grid Laboratory
metadatadescriptionof analyzeddata
Interactive User Tools
Data GenerationRequestExecution Management
Tools
Data Generation RequestPlanning Scheduling
Tools
Virtual Data Tools
- Metadata catalogues
- Virtual data catalogues
Security andPolicy
Other GridServices
ResourceManagement
Core Grid Services
Transforms
Distributed resources(code, storage,
CPUs,networks)
Raw datasource
61CMS Event Simulation Using GriPhyN
- Production Run on Integration Testbed (400 CPUs
at 5 sites) - Simulate 1.5 million full CMS events for physics
studies - 2 months continuous running across 5 testbed
sites - Managed by a single person at the US-CMS Tier
1site - 30 CPU years delivered 1.5 Million Events to
CMS Physicists
62National Earthquake Engineering Simulation Grid
- NEESgrid will link earthquake researchers across
the U.S. with leading-edge computing resources
and research equipment, allowing collaborative
teams (including remote participants) to plan,
perform, and publish their experiments - Through the NEESgrid, researchers will
- perform tele-observation and tele-operation of
experiments shake tables, reaction walls, etc.
- publish to, and make use of, a curated data
repository using standardized markup - access computational resources and open-source
analytical tools - access collaborative tools for experiment
planning, execution, analysis, and publication
63NEES Sites
- Large-Scale Laboratory Experimentation Systems
- University at Buffalo, State University of New
York - University of California at Berkeley
- University of Colorado, Boulder
- University of Minnesota-Twin Cities
- Lehigh University
- University of Illinois, Urbana-Champaign
- Field Experimentation and Monitoring
Installations - University of California, Los Angeles
- University of Texas at Austin
- Brigham Young University
- Shake Table Research Equipment
- University at Buffalo, State University of New
York - University of Nevada, Reno
- University of California, San Diego
- Centrifuge Research Equipment
- University of California, Davis
- Rensselaer Polytechnic Institute
- Tsunami Wave Basin
- Oregon State University, Corvallis, Oregon
- Large-Scale Lifeline Testing
- Cornell University
64NEESgrid Earthquake Engineering Collaboratory
Instrumented Structures and Sites
Remote Users
Simulation Tools Repository
High-Performance Network(s)
Laboratory Equipment
Field Equipment
Curated Data Repository
Large-scale Computation
Global Connections
Remote Users (K-12 Faculty and Students)
Laboratory Equipment
65NEESgrid Approach
- Package a set of application level services and
the supporting Grid software in a singlepoint
of presence (POP) - Deploy the POP to a select set of earthquake
engineering sites to provide the applications,
data archiving, and Grid services - Assist in developing common metadata so that the
various instruments and simulations can work
together - Provide the required computing and data storage
infrastructure
66NEESgrid Multi-Site Online Simulation (MOST)
- A partnership between the NEESgrid team, UIUC and
Colorado Equipment Sites to showcase NEESgrid
capabilities - A large-scale experiment conducted in multiple
geographical locations which combines physical
experiments with numerical simulation in an
interchangeable manner - The first integration of NEESgrid services with
application software developed by Earthquake
Engineers (UIUC, Colorado and USC) to support a
real EE experiment - See http//www.neesgrid.org/most/
67NEESgrid Multi-Site Online Simulation (MOST)
UIUC Experimental Setup
U. Colorado Experimental Setup
68Multi-Site, On-Line Simulation Test (MOST)
Colorado Experimental Model
UIUC Experimental Model
SIMULATION COORDINATOR
- UIUC MOST-SIM
- Dan Abrams
- Amr Elnashai
- Dan Kuchma
- Bill Spencer
- and others
- Colorado FHT
- Benson Shing
- and others
NCSA Computational Model
691994 Northridge Earthquake SimulationRequires a
Complex Mix of Data and Models
Pier 7
Pier 5
Pier 8
Pier 6
NEESgrid provides the common data formats,
uniform dataarchive interfaces, and
computational services needed to supportthis
multidisciplinary simulation
Amr Elnashai, UIUC
70NEESgrid Architecture
Java Applet
Web Browser
User Interfaces
MultidisciplinarySimulations
Collaborations
Experiments
Curated Data Repository
Simulation Tools Repository
SIMULATION COORDINATOR
Data AcquisitionSystem
NEESpop
NEES Operations
E-Notebook Services
Metadata Services
CompreHensive collaborativE Framework (CHEF)
NEESgrid Monitoring
Video Services
GridFTP
NEESGrid StreamingData System
Accounts MyProxy
Grid Services
NEES distributed resources
Instrumented Structures and Sites
Large-scale Storage
Large-scale Computation
Laboratory Equipment
71The Changing Face of Observational Astronomy
- Large digital sky surveys are becoming the
dominant source of data in astronomy gt 100 TB,
growing rapidly - Current examples SDSS, 2MASS, DPOSS, GSC,
FIRST, NVSS, RASS, IRAS CMBR experiments
Microlensing experiments NEAT, LONEOS, and other
searches for Solar system objects - Digital libraries ADS, astro-ph, NED, CDS, NSSDC
- Observatory archives HST, CXO, space and
ground-based - Future QUEST2, LSST, and other synoptic surveys
GALEX, SIRTF, astrometric missions, GW detectors - Data sets orders of magnitude larger, more
complex, and more homogeneous than in the past
72The Changing Face of Observational Astronomy
- Virtual Observatory Federation of N archives
- Possibilities for new discoveries grow as O(N2)
- Current sky surveys have proven this
- Very early discoveries from Sloan (SDSS),2
micron (2MASS), Digital Palomar (DPOSS) - see http//www.us-vo.org
73Sky Survey Federation
74Mining Data is Often a Critical Aspect of Doing
Science
- The ability to federate survey data is enormously
important - Studying the Cosmic Microwave Background a key
tool in studying the cosmology of the universe
requires combined observations from many
instruments in order to isolate the extremely
weak signals of the CMB - The datasets that represent the material
between us and the CMB are collected from
different instruments and are stored and curated
at many different institutions - This is immensely difficult without approaches
like National Virtual Observatory in order to
provide a uniform interface for all of the
different data formats and locations
(Julian Borrill, NERSC, LBNL)
75NVO Approach
- Focus is on adapting emerging information
technologies to meet the astronomy research
challenges - Metadata, standards, protocols (XML, http)
- Interoperability
- Database federation
- Web Services (SOAP, WSDL, UDDI)
- Grid-based computing (OGSA)
- Federating data bases is difficult, but very
valuable - An XML-based mark-up for astronomical tables and
catalogs - VOTable - Developed metadata management framework
- Formed international registry, dm (data
models), semantics, and dal (data access
layer) discussion groups - As with NEESgrid, Grids are helping to unify the
community
76NVO Image Mosaicking
- Specify box by position and size
- SIAP server returns relevant images
- Footprint
- Logical Name
- URL
Can choose standard URL http//....... SRB
URL srb//nvo.npaci.edu/..
77Atlasmaker Virtual Data System
Metadata repositories Federated by OAI
Higher LevelGrid Services
Data repositories Federated by SRB
2d Store result return result
Core Grid Services
2c Compute on TG/IPG
Compute resources Federated by TG/IPG
78Background Correction
Uncorrected
Corrected
79NVO Components
Visualization
Resource/Service Registries
Web Services
Simple Image Access Services
Cone Search Services
VOTable
VOTable
Cross-Correlation Engine
UCDs
UCDs
Streaming
Grid Services
Data archives
Computing resources
80International Virtual Observatory Collaborations
- German AVO
- Russian VO
- e-Astronomy Australia
- IVOA(International Virtual Observatory
Alliance)
- Astrophysical Virtual Observatory (European
Commission) - AstroGrid, UK e-scienceprogram
- Canada
- VO India
-
- VO Japan
- (leading the work on VO query language)
- VO China
US contacts Alex Szalay szalay_at_jhu.edu, Roy
Williams roy_at_cacr.caltech.edu,Bob Hanisch
lthanisch_at_stsci.edugt
81And Whats This Got To Do With
- CORBA?
- Grid-enabled CORBA underway
- Java, Jini, Jxta?
- Java CoG Kit. Jini, Jxta future uncertain
- Web Services, .NET, J2EE?
- Major Globus focus (GRAM-2 SOAP, WSDL)
- Workflow/choreography services
- Q What can Grid offer to Web services?
- Next revolutionary technology of the month?
- Theyll need Grid technologies too
Slide source Ian Foster _at_ ANL
82The Future All Software is Network-Centric
- We dont build or buy computers anymore, we
borrow or lease required resources - When I walk into a room, need to solve a problem,
need to communicate - A computer is a dynamically, often
collaboratively constructed collection of
processors, data sources, sensors, networks - Similar observations apply for software
Slide source Ian Foster _at_ ANL
83And Thus
- Reduced barriers to access mean that we do much
more computing, and more interesting computing,
than today gt Many more components ( services)
massive parallelism - All resources are owned by others gt Sharing (for
fun or profit) is fundamental trust, policy,
negotiation, payment - All computing is performed on unfamiliar systems
gt Dynamic behaviors, discovery, adaptivity,
failure
Slide source Ian Foster _at_ ANL
84Acknowledgments
- Globus RD is joint with numerous people
- Carl Kesselman, Co-PI
- Steve Tuecke, principal architect at ANL
- Others to be acknowledged
- GriPhyN RD is joint with numerous people
- Paul Avery, Co-PI Newman, Lazzarini, Szalay
- Mike Wilde, project coordinator
- Carl Kesselman, Miron Livny CS leads
- ATLAS, CMS, LIGO, SDSS participants others
- Support DOE, DARPA, NSF, NASA, Microsoft
Slide source Ian Foster _at_ ANL
85Summary
- The Grid problem Resource sharing coordinated
problem solving in dynamic, multi-institutional
virtual organizations - Grid architecture Emphasize protocol and service
definition to enable interoperability and
resource sharing - Globus Toolkit a source of protocol and API
definitions, reference implementations - See globus.org, griphyn.org, gridforum.org,
grids-center.org, nsf-middleware.org
Slide source Ian Foster _at_ ANL