Title: An Introduction to The Grid
1An Introduction toThe Grid
- Mike Wilde
- Mathematics and Computer Science Division
- Argonne National Laboratory
Oak Park River Forest High School - 2002.0522
2Topics
- Grids in a nutshell
- What are Grids
- Why are we building Grids?
- What Grids are made of
- The Globus Project and Toolkit
- How Grids are helping (big) Science
3A Grid to ShareComputing Resources
4Grid Applications
- Authenticate once
- Submit a grid computation (code,
resources, data,) - Locate resources
- Negotiate authorization, acceptable use, etc.
- Select and acquire resources
- Initiate data transfers, computation
- Monitor progress
- Steer computation
- Store and distribute results
- Account for usage
5Natural Sciencedrives Computer Science
6Scientists write software to probe the nature of
the universe
7Data Grids for High Energy Physics
Image courtesy Harvey Newman, Caltech
8The Grid
- Emerging computational and networking
infrastructure - Pervasive, uniform, and reliable access to remote
data, computational, sensor, and human resources - Enable new approaches to applications and problem
solving - Remote resources the rule, not the exception
- Challenges
- Many different computers and operating systems
- Failures are common something is always broken
- Different organizations have different rules for
security and computer usage
9Motivation
- Sharing the computing power of multiple
organizations to help virtual organizations solve
big problems
10Elements of the Problem
- Resource sharing
- Computers, storage, sensors, networks,
- Sharing always conditional issues of trust,
policy, negotiation, payment, - Coordinated problem solving
- Beyond client-server distributed data analysis,
computation, collaboration, - Dynamic, multi-institutional virtual orgs
- Community overlays on classic org structures
- Large or small, static or dynamic
11Size of the problem
- Terflops of compute power
- Equal to n,000 1GHz Pentiums
- Petabytes of data per year per experiment
- 1 PB 25,000 40 GB Disks
- 40 Gb/sec of network bandwidth
- 400 100Mb/sec LAN cables (streched across the
country and the Atlantic)
12Sockets the basic building block
Program B
IP network
Program A
13Services are built on Sockets
Server Web Server
IP network
Client Web Browser
Protocolhttp
14Client-Server Model
Server Web Server
IP network
Protocolhttp
15Familiar Client-Server Apps
- Email
- Protocols POP, SMTP
- File Copying
- Protocol FTP
- Logging in to remote computers
- Protocol Telnet
16Peer-to-Peer Model
IP network
Protocolgnutella
17Familiar Peer-to-Peer Apps
- File (music) Sharing
- Protocols Napster, Gnutella
- Chat (sort of)
- Protocols IRC, Instant Messenger
- Video Conferencing
- Protocols H323
18The Globus ProjectandThe Globus Toolkit
19The Globus ToolkitFour Main Components
- Grid Security Infrastructure
- A trustable digital ID for every user and
computer - Information Services
- Find are all the computers and file servers I can
use - Resource Management
- Select computers and run programs on them
- Data Management
- Fast and secure data transfer (parallel)
- Making and tracking replicas (copies) of files
- plus Common Software Infrastructure
- Libraries for writing Grid software applications
20Running Programs on the Grid
MDS client API calls to locate resources
Client
MDS Grid Index Info Server
Site boundary
MDS client API calls to get resource info
GRAM client API calls to request resource
allocation and process creation.
MDS Grid Resource Info Server
Query current status of resource
GRAM client API state change callbacks
Globus Security Infrastructure
Local Resource Manager
Allocate create processes
Request
Job Manager
Create
Gatekeeper
Process
Parse
Monitor control
Process
RSL Library
Process
21The Grid Information Problem
- Large numbers of distributed sensors with
different properties - Need for different views of this information,
depending on community membership, security
constraints, intended purpose, sensor type
22Grid Information Service
OS
OS
23GridFTPUbiquitous, Secure, High PerformanceData
Access Protocol
- Common transfer protocol
- all systems can exchange files with each other
- VERY Fast
- Send files faster than 1 Gigabit per second
- Secure
- Makes important data hard to damage or intercept
- Applications can tailor it to their needs
- Building in security or on the fly processing
- Interfaces to many storage systems
- Disk Farms, Tape Robots
24Striped GridFTP Server
GridFTPclient
To Client or Another Striped GridFTP Server
GridFTP Control Channel
GridFTP Data Channels
mpirun
GridFTP Server Parallel Backend
GridFTPserver master
MPI (Comm_World)
Control socket
MPI (Sub-Comm)
MPI-IO
Parallel File System (e.g. PVFS, PFS, etc.)
25StripedGridFTPApplicationVideoServer
26Replica Catalog Structure
27Programming with Globus
- UNIX based Windows coming soon
- Used by rest of Globus Toolkit
- User can use for portability convenience
- Windows, UNIX, and Macintosh computers can all
join the Grid - Portable programming very important
- Event Driving Programming
- A way of writing programs that handle many things
at once - Parallel Programs
- Wiriting programs that can utilize many computers
to solve a single problem - MPI A popular Message Passing Interface
developed at Argonne and other laboratories
28Grids and Applications
29Hunting for Gravity Waves
30Grid Communities and ApplicationsNetwork for
Earthquake Eng. Simulation
- NEESgrid national infrastructure to couple
earthquake engineers with experimental
facilities, databases, computers, each other - On-demand access to experiments, data streams,
computing, archives, collaboration
NEESgrid Argonne, Michigan, NCSA, UIUC, USC
www.neesgrid.org
31The 13.6 TF TeraGridComputing at 40 Gb/s
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
TeraGrid/DTF NCSA, SDSC, Caltech, Argonne
www.teragrid.org
32iVDGL Map Circa 2002-2003
Tier0/1 facility
Tier2 facility
Tier3 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
33Whats it like to Work on the Grid?
- A fascinating problem on the frontiers of
computer science - Work with people from around the world and many
branches of science - Local Labs and Universities at the forefront
- Argonne, Fermilab
- Illinois (UIC and UIUC), U of Chicago,
Northwestern - Wisconsin also very active!
34Access Grid
- Collaborative work among large groups
- 50 sites worldwide
- Use Grid services for discovery, security
- See also www.scglobal.org
Access Grid Argonne, others
www.mcs.anl.gov/FL/accessgrid
35Come Visit and Explore
- Argonne and Fermilab are right in our own
backyard! - Visits
- Summer programs
36Supplementary Material
37 Executor Example Condor DAGMan
- Directed Acyclic Graph Manager
- Specify the dependencies between Condor jobs
using DAG data structure - Manage dependencies automatically
- (e.g., Dont run job B until job A has
completed successfully.)
- Each job is a node in DAG
- Any number of parent or children nodes
- No loops
Slide courtesy Miron Livny, U. Wisconsin
38 Executor Example Condor DAGMan
(Cont.)
- DAGMan acts as a meta-scheduler
- holds submits jobs to the Condor queue at the
appropriate times based on DAG dependencies - If a job fails, DAGMan continues until it can no
longer make progress and then creates a rescue
file with the current state of the DAG - When failed job is ready to be re-run, the rescue
file is used to restore the prior state of the DAG
Slide courtesy Miron Livny, U. Wisconsin
39Virtual Data in CMS
Virtual Data Long Term Vision of CMS CMS Note
2001/047, GRIPHYN 2001-16
40CMS Data Analysis
Dominant use of Virtual Data in the Future
Event 1
Event 2
Event 3
Tag 2
100b
100b
200b
200b
Reconstructed data (produced by
physics analysis jobs)
Tag 1
Jet finder 2
7K
7K
5K
5K
Jet finder 1
Reconstruction Algorithm
100K
100K
Calibration data
100K
300K
100K
50K
200K
100K
300K
100K
50K
200K
Raw data (simulated or real)
100K
100K
100K
100K
50K
50K
Uploaded data
Virtual data
Algorithms
41Production Pipeline GriphyN-CMS Demo
pythia cmsim writeHits
writeDigis
CPU 2 min 8 hours 5 min
45 min
1 run
1 run
1 run
. . .
. . .
. . .
. . .
. . .
. . .
1 run
Data 0.5 MB 175 MB
275 MB 105 MB
truth.ntpl hits.fz hits.DB
digis.DB
1 run 500 events
SC2001 Demo Version
1 event
42GriPhyN Virtual DataTracking Complex
Dependencies
psearch t 10
file1
file8
simulate t 10
file2
reformat f fz
Requestedfile
file7
conv I esd o aod
summarize t 10
file6
- Dependency graph is
- Files 8
- Programs 8 reformat, 6
43Re-creating Virtual Data
psearch t 10
file1
file8
simulate t 10
file2
reformat f fz
Requestedfile
file7
conv I esd o aod
summarize t 10
file6
- To recreate file 8 Step 1
- simulate file1, file2
44Re-creating Virtual Data
psearch t 10
file1
file8
simulate t 10
file2
reformat f fz
Requestedfile
file7
conv I esd o aod
summarize t 10
file6
- To re-create file8 Step 2
- files 3, 4, 5, 6 derived from file 2
- reformat file3, file4, file5
- conv file 6
45Re-creating Virtual Data
psearch t 10
file1
file8
simulate t 10
file2
reformat f fz
Requestedfile
file7
conv I esd o aod
summarize t 10
file6
- To re-create file 8 step 3
- File 7 depends on file 6
- Summarize file 7
46Re-creating Virtual Data
psearch t 10
file8
simulate t 10
Requestedfile
file7
summarize t 10
- To re-create file 8 final step
- File 8 depends on files 1, 3, 4, 5, 7
- psearch
file 8
47Virtual Data CatalogConceptual Data Structure
PARAMETER LIST
PARAMETER
i filename1
PARAMETER
p -g
PARAMETER
E PTYPEmuon
PARAMETER
O filename2
48CMS Pipeline in VDL
begin v /usr/local/demo/scripts/cmkin_input.csh
file i ntpl_file_path file i template_file
file i num_events stdout cmkin_param_fileendb
egin v /usr/local/demo/binaries/kine_make_ntpl_pyt
_cms121.exe pre cms_env_var stdin
cmkin_param_file stdout cmkin_log file o
ntpl_fileendbegin v /usr/local/demo/scripts/cms
im_input.csh file i ntpl_file file i
fz_file_path file i hbook_file_path file i
num_trigs stdout cmsim_param_fileendbegin v
/usr/local/demo/binaries/cms121.exe condor
copy_to_spoolfalse condor getenvtrue stdin
cmsim_param_file stdout cmsim_log file o
fz_file file o hbook_fileendbegin v
/usr/local/demo/binaries/writeHits.sh condor
getenvtrue pre orca_hits file i fz_file
file i detinput file i condor_writeHits_log
file i oo_fd_boot file i datasetname stdout
writeHits_log file o hits_dbendbegin v
/usr/local/demo/binaries/writeDigis.sh pre
orca_digis file i hits_db file i
oo_fd_boot file i carf_input_dataset_name
file i carf_output_dataset_name file i
carf_input_owner file i carf_output_owner
file i condor_writeDigis_log stdout
writeDigis_log file o digis_dbend
pythia_input
pythia.exe
cmsim_input
cmsim.exe
writeHits
writeDigis
49Virtual Data for Real ScienceA Prototype
Virtual Data Catalog
Architecture of the System
Virtual Data Catalog (PostgreSQL)
Virtual Data Language VDL Interpreter (VDLI)
Grid testbed
Production DAG of Simulated CMS Data
50Early GriPhyN Challenge ProblemCMS Data
Reconstruction
2) Launch secondary job on WI pool input files
via Globus GASS
Master Condor job running at Caltech
Secondary Condor job on WI pool
5) Secondary reports complete to master
Caltech workstation
6) Master starts reconstruction jobs via Globus
jobmanager on cluster
3) 100 Monte Carlo jobs on Wisconsin Condor pool
9) Reconstruction job reports complete to master
4) 100 data files transferred via GridFTP, 1 GB
each
7) GridFTP fetches data from UniTree
NCSA Linux cluster
NCSA UniTree - GridFTP-enabled FTP server
8) Processed objectivity database stored to
UniTree
Scott Koranda, Miron Livny, others
51GriPhyN-LIGO SC2001 Demo
52GriPhyN CMS SC2001 Demo
http//pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall
_JJB.htm
Denver Client
Full Event Database of 100,000 large objects
Full Event Database of 40,000 large objects
?
?
?
Request
?
Request
?
?
Parallel tuned GSI FTP
Parallel tuned GSI FTP
Tag database of 140,000 small objects
Bandwidth Greedy Grid-enabled Object Collection
Analysis for Particle Physics
53iVDGL
- International Virtual-Data Grid Laboratory
- A place to conduct Data Grid tests at scale
- Concrete manifestation of world-wide grid
activity - Continuing activity that will drive Grid
awareness - Scale of effort
- For national, intl scale Data Grid tests,
operations - Computation data intensive computing
- Who
- Initially US-UK-Italy-EU Japan, Australia
- Russia, China, Pakistan, India, South America?
- StarLight and other international networks vital
U.S. Co-PIs Avery, Foster, Gardner, Newman,
Szalay
54iVDGL Map Circa 2002-2003
Tier0/1 facility
Tier2 facility
Tier3 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
55Summary
- Grids Resource sharing problem solving in
dynamic virtual organizations - Many projects now working to develop, deploy,
apply relevant technologies - Common protocols and services are critical
- Globus Toolkit a source of protocol and API
definitions, reference implementations - Rapid progress on definition, implementation, and
application of Data Grid architecture - Harmonizing U.S. and E.U. efforts important