An Introduction to The Grid - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

An Introduction to The Grid

Description:

limewire. send. recv. limewire. send. recv. limewire. send. recv. limewire. send. recv. www.globus.org. 17. www.griphyn.org. Familiar Peer-to-Peer Apps ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 56
Provided by: ianfo
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to The Grid


1
An Introduction toThe Grid
  • Mike Wilde
  • Mathematics and Computer Science Division
  • Argonne National Laboratory

Oak Park River Forest High School - 2002.0522
2
Topics
  • Grids in a nutshell
  • What are Grids
  • Why are we building Grids?
  • What Grids are made of
  • The Globus Project and Toolkit
  • How Grids are helping (big) Science

3
A Grid to ShareComputing Resources
4
Grid Applications
  • Authenticate once
  • Submit a grid computation (code,
    resources, data,)
  • Locate resources
  • Negotiate authorization, acceptable use, etc.
  • Select and acquire resources
  • Initiate data transfers, computation
  • Monitor progress
  • Steer computation
  • Store and distribute results
  • Account for usage

5
Natural Sciencedrives Computer Science
6
Scientists write software to probe the nature of
the universe
7
Data Grids for High Energy Physics
Image courtesy Harvey Newman, Caltech
8
The Grid
  • Emerging computational and networking
    infrastructure
  • Pervasive, uniform, and reliable access to remote
    data, computational, sensor, and human resources
  • Enable new approaches to applications and problem
    solving
  • Remote resources the rule, not the exception
  • Challenges
  • Many different computers and operating systems
  • Failures are common something is always broken
  • Different organizations have different rules for
    security and computer usage

9
Motivation
  • Sharing the computing power of multiple
    organizations to help virtual organizations solve
    big problems

10
Elements of the Problem
  • Resource sharing
  • Computers, storage, sensors, networks,
  • Sharing always conditional issues of trust,
    policy, negotiation, payment,
  • Coordinated problem solving
  • Beyond client-server distributed data analysis,
    computation, collaboration,
  • Dynamic, multi-institutional virtual orgs
  • Community overlays on classic org structures
  • Large or small, static or dynamic

11
Size of the problem
  • Terflops of compute power
  • Equal to n,000 1GHz Pentiums
  • Petabytes of data per year per experiment
  • 1 PB 25,000 40 GB Disks
  • 40 Gb/sec of network bandwidth
  • 400 100Mb/sec LAN cables (streched across the
    country and the Atlantic)

12
Sockets the basic building block
Program B
IP network
Program A
13
Services are built on Sockets
Server Web Server
IP network
Client Web Browser
Protocolhttp
14
Client-Server Model
Server Web Server
IP network
Protocolhttp
15
Familiar Client-Server Apps
  • Email
  • Protocols POP, SMTP
  • File Copying
  • Protocol FTP
  • Logging in to remote computers
  • Protocol Telnet

16
Peer-to-Peer Model
IP network
Protocolgnutella
17
Familiar Peer-to-Peer Apps
  • File (music) Sharing
  • Protocols Napster, Gnutella
  • Chat (sort of)
  • Protocols IRC, Instant Messenger
  • Video Conferencing
  • Protocols H323

18
The Globus ProjectandThe Globus Toolkit

19
The Globus ToolkitFour Main Components
  • Grid Security Infrastructure
  • A trustable digital ID for every user and
    computer
  • Information Services
  • Find are all the computers and file servers I can
    use
  • Resource Management
  • Select computers and run programs on them
  • Data Management
  • Fast and secure data transfer (parallel)
  • Making and tracking replicas (copies) of files
  • plus Common Software Infrastructure
  • Libraries for writing Grid software applications

20
Running Programs on the Grid
MDS client API calls to locate resources
Client
MDS Grid Index Info Server
Site boundary
MDS client API calls to get resource info
GRAM client API calls to request resource
allocation and process creation.
MDS Grid Resource Info Server
Query current status of resource
GRAM client API state change callbacks
Globus Security Infrastructure
Local Resource Manager
Allocate create processes
Request
Job Manager
Create
Gatekeeper
Process
Parse
Monitor control
Process
RSL Library
Process
21
The Grid Information Problem
  • Large numbers of distributed sensors with
    different properties
  • Need for different views of this information,
    depending on community membership, security
    constraints, intended purpose, sensor type

22
Grid Information Service
OS
OS
23
GridFTPUbiquitous, Secure, High PerformanceData
Access Protocol
  • Common transfer protocol
  • all systems can exchange files with each other
  • VERY Fast
  • Send files faster than 1 Gigabit per second
  • Secure
  • Makes important data hard to damage or intercept
  • Applications can tailor it to their needs
  • Building in security or on the fly processing
  • Interfaces to many storage systems
  • Disk Farms, Tape Robots

24
Striped GridFTP Server
GridFTPclient
To Client or Another Striped GridFTP Server
GridFTP Control Channel
GridFTP Data Channels
mpirun
GridFTP Server Parallel Backend
GridFTPserver master
MPI (Comm_World)

Control socket
MPI (Sub-Comm)
MPI-IO
Parallel File System (e.g. PVFS, PFS, etc.)

25
StripedGridFTPApplicationVideoServer
26
Replica Catalog Structure
27
Programming with Globus
  • UNIX based Windows coming soon
  • Used by rest of Globus Toolkit
  • User can use for portability convenience
  • Windows, UNIX, and Macintosh computers can all
    join the Grid
  • Portable programming very important
  • Event Driving Programming
  • A way of writing programs that handle many things
    at once
  • Parallel Programs
  • Wiriting programs that can utilize many computers
    to solve a single problem
  • MPI A popular Message Passing Interface
    developed at Argonne and other laboratories

28
Grids and Applications
29
Hunting for Gravity Waves
30
Grid Communities and ApplicationsNetwork for
Earthquake Eng. Simulation
  • NEESgrid national infrastructure to couple
    earthquake engineers with experimental
    facilities, databases, computers, each other
  • On-demand access to experiments, data streams,
    computing, archives, collaboration

NEESgrid Argonne, Michigan, NCSA, UIUC, USC
www.neesgrid.org
31
The 13.6 TF TeraGridComputing at 40 Gb/s
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
TeraGrid/DTF NCSA, SDSC, Caltech, Argonne
www.teragrid.org
32
iVDGL Map Circa 2002-2003
Tier0/1 facility
Tier2 facility
Tier3 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
33
Whats it like to Work on the Grid?
  • A fascinating problem on the frontiers of
    computer science
  • Work with people from around the world and many
    branches of science
  • Local Labs and Universities at the forefront
  • Argonne, Fermilab
  • Illinois (UIC and UIUC), U of Chicago,
    Northwestern
  • Wisconsin also very active!

34
Access Grid
  • Collaborative work among large groups
  • 50 sites worldwide
  • Use Grid services for discovery, security
  • See also www.scglobal.org

Access Grid Argonne, others
www.mcs.anl.gov/FL/accessgrid
35
Come Visit and Explore
  • Argonne and Fermilab are right in our own
    backyard!
  • Visits
  • Summer programs

36
Supplementary Material
37
Executor Example Condor DAGMan
  • Directed Acyclic Graph Manager
  • Specify the dependencies between Condor jobs
    using DAG data structure
  • Manage dependencies automatically
  • (e.g., Dont run job B until job A has
    completed successfully.)
  • Each job is a node in DAG
  • Any number of parent or children nodes
  • No loops

Slide courtesy Miron Livny, U. Wisconsin
38
Executor Example Condor DAGMan
(Cont.)
  • DAGMan acts as a meta-scheduler
  • holds submits jobs to the Condor queue at the
    appropriate times based on DAG dependencies
  • If a job fails, DAGMan continues until it can no
    longer make progress and then creates a rescue
    file with the current state of the DAG
  • When failed job is ready to be re-run, the rescue
    file is used to restore the prior state of the DAG

Slide courtesy Miron Livny, U. Wisconsin
39
Virtual Data in CMS
Virtual Data Long Term Vision of CMS CMS Note
2001/047, GRIPHYN 2001-16
40
CMS Data Analysis
Dominant use of Virtual Data in the Future
Event 1
Event 2
Event 3
Tag 2
100b
100b
200b
200b
Reconstructed data (produced by
physics analysis jobs)
Tag 1
Jet finder 2
7K
7K
5K
5K
Jet finder 1
Reconstruction Algorithm
100K
100K
Calibration data
100K
300K
100K
50K
200K
100K
300K
100K
50K
200K
Raw data (simulated or real)
100K
100K
100K
100K
50K
50K
Uploaded data
Virtual data
Algorithms
41
Production Pipeline GriphyN-CMS Demo
pythia cmsim writeHits
writeDigis
CPU 2 min 8 hours 5 min
45 min
1 run
1 run
1 run
. . .
. . .
. . .
. . .
. . .
. . .
1 run
Data 0.5 MB 175 MB
275 MB 105 MB
truth.ntpl hits.fz hits.DB
digis.DB
1 run 500 events
SC2001 Demo Version
1 event
42
GriPhyN Virtual DataTracking Complex
Dependencies
psearch t 10
file1
file8
simulate t 10
file2
reformat f fz
Requestedfile
file7
conv I esd o aod
summarize t 10
file6
  • Dependency graph is
  • Files 8
  • Programs 8 reformat, 6

43
Re-creating Virtual Data
psearch t 10
file1
file8
simulate t 10
file2
reformat f fz
Requestedfile
file7
conv I esd o aod
summarize t 10
file6
  • To recreate file 8 Step 1
  • simulate file1, file2

44
Re-creating Virtual Data
psearch t 10
file1
file8
simulate t 10
file2
reformat f fz
Requestedfile
file7
conv I esd o aod
summarize t 10
file6
  • To re-create file8 Step 2
  • files 3, 4, 5, 6 derived from file 2
  • reformat file3, file4, file5
  • conv file 6

45
Re-creating Virtual Data
psearch t 10
file1
file8
simulate t 10
file2
reformat f fz
Requestedfile
file7
conv I esd o aod
summarize t 10
file6
  • To re-create file 8 step 3
  • File 7 depends on file 6
  • Summarize file 7

46
Re-creating Virtual Data
psearch t 10
file8
simulate t 10
Requestedfile
file7
summarize t 10
  • To re-create file 8 final step
  • File 8 depends on files 1, 3, 4, 5, 7
  • psearch
    file 8

47
Virtual Data CatalogConceptual Data Structure
PARAMETER LIST
PARAMETER
i filename1
PARAMETER
p -g
PARAMETER
E PTYPEmuon
PARAMETER
O filename2
48
CMS Pipeline in VDL
begin v /usr/local/demo/scripts/cmkin_input.csh
file i ntpl_file_path file i template_file
file i num_events stdout cmkin_param_fileendb
egin v /usr/local/demo/binaries/kine_make_ntpl_pyt
_cms121.exe pre cms_env_var stdin
cmkin_param_file stdout cmkin_log file o
ntpl_fileendbegin v /usr/local/demo/scripts/cms
im_input.csh file i ntpl_file file i
fz_file_path file i hbook_file_path file i
num_trigs stdout cmsim_param_fileendbegin v
/usr/local/demo/binaries/cms121.exe condor
copy_to_spoolfalse condor getenvtrue stdin
cmsim_param_file stdout cmsim_log file o
fz_file file o hbook_fileendbegin v
/usr/local/demo/binaries/writeHits.sh condor
getenvtrue pre orca_hits file i fz_file
file i detinput file i condor_writeHits_log
file i oo_fd_boot file i datasetname stdout
writeHits_log file o hits_dbendbegin v
/usr/local/demo/binaries/writeDigis.sh pre
orca_digis file i hits_db file i
oo_fd_boot file i carf_input_dataset_name
file i carf_output_dataset_name file i
carf_input_owner file i carf_output_owner
file i condor_writeDigis_log stdout
writeDigis_log file o digis_dbend
pythia_input
pythia.exe
cmsim_input
cmsim.exe
writeHits
writeDigis
49
Virtual Data for Real ScienceA Prototype
Virtual Data Catalog
Architecture of the System
Virtual Data Catalog (PostgreSQL)
Virtual Data Language VDL Interpreter (VDLI)
Grid testbed
Production DAG of Simulated CMS Data
50
Early GriPhyN Challenge ProblemCMS Data
Reconstruction
2) Launch secondary job on WI pool input files
via Globus GASS
Master Condor job running at Caltech
Secondary Condor job on WI pool
5) Secondary reports complete to master
Caltech workstation
6) Master starts reconstruction jobs via Globus
jobmanager on cluster
3) 100 Monte Carlo jobs on Wisconsin Condor pool
9) Reconstruction job reports complete to master
4) 100 data files transferred via GridFTP, 1 GB
each
7) GridFTP fetches data from UniTree
NCSA Linux cluster
NCSA UniTree - GridFTP-enabled FTP server
8) Processed objectivity database stored to
UniTree
Scott Koranda, Miron Livny, others
51
GriPhyN-LIGO SC2001 Demo
52
GriPhyN CMS SC2001 Demo
http//pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall
_JJB.htm
Denver Client
Full Event Database of 100,000 large objects
Full Event Database of 40,000 large objects
?
?
?
Request
?
Request
?
?
Parallel tuned GSI FTP
Parallel tuned GSI FTP
Tag database of 140,000 small objects
Bandwidth Greedy Grid-enabled Object Collection
Analysis for Particle Physics
53
iVDGL
  • International Virtual-Data Grid Laboratory
  • A place to conduct Data Grid tests at scale
  • Concrete manifestation of world-wide grid
    activity
  • Continuing activity that will drive Grid
    awareness
  • Scale of effort
  • For national, intl scale Data Grid tests,
    operations
  • Computation data intensive computing
  • Who
  • Initially US-UK-Italy-EU Japan, Australia
  • Russia, China, Pakistan, India, South America?
  • StarLight and other international networks vital

U.S. Co-PIs Avery, Foster, Gardner, Newman,
Szalay
54
iVDGL Map Circa 2002-2003
Tier0/1 facility
Tier2 facility
Tier3 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
55
Summary
  • Grids Resource sharing problem solving in
    dynamic virtual organizations
  • Many projects now working to develop, deploy,
    apply relevant technologies
  • Common protocols and services are critical
  • Globus Toolkit a source of protocol and API
    definitions, reference implementations
  • Rapid progress on definition, implementation, and
    application of Data Grid architecture
  • Harmonizing U.S. and E.U. efforts important
Write a Comment
User Comments (0)
About PowerShow.com