Title: Grids and 21st Century Data Intensive Science
1- Grids and 21st CenturyData Intensive Science
Paul Avery University of Florida avery_at_phys.ufl.ed
u
Physics ColloquiumJohns Hopkins
UniversityOctober 6, 2005
2Outline of Talk
- Cyberinfrastructure and Grids
- Data intensive disciplines and Data Grids
- The Trillium Grid collaboration
- GriPhyN, iVDGL, PPDG
- The LHC and its computing challenges
- Grid3 and the Open Science Grid
- A bit on networks
- Education and Outreach
- Challenges for the future
- Summary
Presented from a physicists perspective!
3Cyberinfrastructure (cont)
- Virtual teams, communities, enterprises and
organizations that use specific software
programs, services, instruments, data,
information, knowledge. - Cyberinfrastructure layer of enabling hardware,
algorithms, software, communications,
institutions, and personnel. A platform that
empowers researchers to innovate and eventually
revolutionize what they do, how they do it, and
who participates. - Base technologies Computation, storage, and
communication components that continue to advance
in raw capacity at exponential rates.
Paraphrased from NSF Blue Ribbon Panel report,
2003
Challenge Creating and operating advanced
cyberinfrastructure andintegrating it in science
and engineering applications.
4Cyberinfrastructure and Grids
- Grid Geographically distributed computing
resources configured for coordinated use - Fabric Physical resources networks provide
raw capability - Ownership Resources controlled by owners and
shared w/ others - Middleware Software ties it all together tools,
services, etc. - Enhancing collaboration via transparent resource
sharing
US-CMS Virtual Organization
5Data Grids Collaborative Research
- Team-based 21st century scientific discovery
- Strongly dependent on advanced information
technology - People and resources distributed internationally
- Dominant factor data growth (1 Petabyte 1000
TB) - 2000 0.5 Petabyte
- 2005 10 Petabytes
- 2010 100 Petabytes
- 2015-7 1000 Petabytes?
- Drives need for powerful linked resources Data
Grids - Computation Massive, distributed CPU
- Data storage and access Distributed hi-speed disk
and tape - Data movement International optical networks
- Collaborative research and Data Grids
- Data discovery, resource sharing, distributed
analysis, etc.
How to collect, manage, access and interpret
this quantity of data?
6Examples of Data Intensive Disciplines
- High energy nuclear physics
- Belle, BaBar, Tevatron, RHIC, JLAB
- Large Hadron Collider (LHC)
- Astronomy
- Digital sky surveys (SDSS), Virtual
Observatories - VLBI arrays multiple- Gb/s data streams
- Gravity wave searches
- LIGO, GEO, VIRGO, TAMA, ACIGA,
- Earth and climate systems
- Earth Observation, climate modeling,
oceanography, - Biology, medicine, imaging
- Genome databases
- Proteomics (protein structure interactions,
drug delivery, ) - High-resolution brain scans (1-10?m, time
dependent)
7Bottom-up Collaboration Trillium
- Trillium PPDG GriPhyN iVDGL
- PPDG 12M (DOE) (1999 2006)
- GriPhyN 12M (NSF) (2000 2005)
- iVDGL 14M (NSF) (2001 2006)
- 150 people with large overlaps between projects
- Universities, labs, foreign partners
- Strong driver for funding agency collaborations
- Inter-agency NSF DOE
- Intra-agency Directorate Directorate, Division
Division - Coordinated internally to meet broad goals
- CS research, developing/supporting Virtual Data
Toolkit (VDT) - Grid deployment, using VDT-based middleware
- Unified entity when collaborating internationally
8Our Vision Goals
- Develop the technologies tools needed to
exploit a Grid-based cyberinfrastructure - Apply and evaluate those technologies tools in
challenging scientific problems - Develop the technologies procedures to support
a permanent Grid-based cyberinfrastructure - Create and operate a persistent Grid-based
cyberinfrastructure in support of
discipline-specific research goals
End-to-end
GriPhyN iVDGL DOE Particle Physics Data Grid
(PPDG) Trillium
9Our Science Drivers
- Experiments at Large Hadron Collider
- New fundamental particles and forces
- 100s of Petabytes 2007 - ?
- High Energy Nuclear Physics expts
- Top quark, nuclear matter at extreme density
- 1 Petabyte (1000 TB) 1997 present
- LIGO (gravity wave search)
- Search for gravitational waves
- 100s of Terabytes 2002 present
- Sloan Digital Sky Survey
- Systematic survey of astronomical objects
- 10s of Terabytes 2001 present
10Common Middleware Virtual Data Toolkit
VDT
NMI
Test
Sources (CVS)
Build
Binaries
Build Test Condor pool 22 Op. Systems
Pacman cache
Package
Patching
RPMs
Build
Binaries
GPT src bundles
Build
Binaries
Test
Many Contributors
A unique laboratory for testing, supporting,
deploying, packaging, upgrading,
troubleshooting complex sets of software!
11VDT Growth Over 3 Years
www.griphyn.org/vdt/
VDT 1.1.8 First real use by LCG
VDT 1.0 Globus 2.0b Condor 6.3.1
of components
VDT 1.1.11 Grid3
VDT 1.1.7 Switch to Globus 2.2
12Components of VDT 1.3.5
- Globus 3.2.1
- Condor 6.7.6
- RLS 3.0
- ClassAds 0.9.7
- Replica 2.2.4
- DOE/EDG CA certs
- ftsh 2.0.5
- EDG mkgridmap
- EDG CRL Update
- GLUE Schema 1.0
- VDS 1.3.5b
- Java
- Netlogger 3.2.4
- Gatekeeper-Authz
- MyProxy1.11
- KX509
- System Profiler
- GSI OpenSSH 3.4
- Monalisa 1.2.32
- PyGlobus 1.0.6
- MySQL
- UberFTP 1.11
- DRM 1.2.6a
- VOMS 1.4.0
- VOMS Admin 0.7.5
- Tomcat
- PRIMA 0.2
- Certificate Scripts
- Apache
- jClarens 0.5.3
- New GridFTP Server
- GUMS 1.0.1
13Collaborative RelationshipsA VDT Perspective
Partner science projects Partner networking
projects Partner outreach projects
Requirements
Prototyping experiments
Production Deployment
- Other linkages
- Work force
- CS researchers
- Industry
Computer Science Research
Virtual Data Toolkit
Larger Science Community
Techniques software
Tech Transfer
Globus, Condor, NMI, iVDGL, PPDG, DISUN EGEE, LHC
Experiments, QuarkNet, CHEPREO, Digital Divide
U.S.Grids
Intl
Outreach
14Goal Peta-scale Data Grids forGlobal Science
Production Team
Single Researcher
Workgroups
Interactive User Tools
Request Execution Management Tools
Request Planning Scheduling Tools
Virtual Data Tools
ResourceManagementServices
Security andPolicyServices
Other GridServices
- PetaOps
- Petabytes
- Performance
Distributed resources(code, storage,
CPUs,networks)
Raw datasource
15Sloan Digital Sky Survey (SDSS)Using Virtual
Data in GriPhyN
16The LIGO Scientific Collaboration (LSC)and the
LIGO Grid
- LIGO Grid 6 US sites 3 EU sites (Cardiff/UK,
AEI/Germany)
Birmingham
LHO, LLO LIGO observatory sites LSC
LIGO Scientific Collaboration
17Large Hadron Collider its Frontier
Computing Challenges
18Large Hadron Collider (LHC)_at_ CERN
- 27 km Tunnel in Switzerland France
TOTEM
CMS
ALICE
LHCb
- Search for
- Origin of Mass
- New fundamental forces
- Supersymmetry
- Other new particles
- 2007 ?
ATLAS
19CMS Compact Muon Solenoid
Inconsequential humans
20LHC Data Rates Detector to Storage
40 MHz
TBytes/sec
Physics filtering
Level 1 Trigger Special Hardware
75 GB/sec
75 KHz
Level 2 Trigger Commodity CPUs
5 GB/sec
5 KHz
Level 3 Trigger Commodity CPUs
0.25 1.5 GB/sec
150 Hz
Raw Data to storage( simulated data)
21Complexity Higgs Decay to 4 Muons
(30 minimum bias events)
All charged tracks with pt gt 2 GeV
Reconstructed tracks with pt gt 25 GeV
109 collisions/sec, selectivity 1 in 1013
22LHC Petascale Global Science
- Complexity Millions of individual detector
channels - Scale PetaOps (CPU), 100s of Petabytes (Data)
- Distribution Global distribution of people
resources
BaBar/D0 Example - 2004 700 Physicists 100
Institutes 35 Countries
CMS Example- 2007 5000 Physicists 250
Institutes 60 Countries
23LHC Beyond Moores Law
Moores Law (2000)
24LHC Global Data Grid (2007)
- 5000 physicists, 60 countries
- 10s of Petabytes/yr by 2008
- 1000 Petabytes in lt 10 yrs?
CMS Experiment
Online System
CERN Computer Center
150 - 1500 MB/s
Tier 0
10-40 Gb/s
Tier 1
gt10 Gb/s
Tier 2
2.5-10 Gb/s
Tier 3
Tier 4
Physics caches
PCs
25Grids and Globally Distributed Teams
- Non-hierarchical Chaotic analyses productions
- Superimpose significant random data flows
26Grid3 and Open Science Grid
27- Grid3 A National Grid Infrastructure
- October 2003 July 2005
- 32 sites, 4,000 CPUs Universities 4 national
labs - Sites in US, Korea, Brazil, Taiwan
- Applications in HEP, LIGO, SDSS, Genomics, fMRI,
CS
Brazil
www.ivdgl.org/grid3
28Grid3 Applications
www.ivdgl.org/grid3/applications
29Grid3 Shared Use Over 6 months
Usage CPUs
Sep 10
30Grid3 Production Over 13 Months
31U.S. CMS 2003 Production
- 10M p-p collisions largest ever
- 2x simulation sample
- ½ manpower
- Multi-VO sharing
32Grid3 Lessons Learned
- How to operate a Grid as a facility
- Tools, services, error recovery, procedures,
docs, organization - Delegation of responsibilities (Project, VO,
service, site, ) - Crucial role of Grid Operations Center (GOC)
- How to support people ? people relations
- Face-face meetings, phone cons, 1-1 interactions,
mail lists, etc. - How to test and validate Grid tools and
applications - Vital role of testbeds
- How to scale algorithms, software, process
- Some successes, but interesting failure modes
still occur - How to apply distributed cyberinfrastructure
- Successful production runs for several
applications
33http//www.opensciencegrid.org
34Open Science Grid July 20, 2005
- Production Grid 50 sites, 15,000 CPUs
- Sites in US, Korea, Brazil, Taiwan
- Integration Grid 10-12 sites
Taiwan, S.Korea
Sao Paolo
35OSG Participating Disciplines
36OSG Operations Snapshots
Taiwan, S.Korea
Sao Paolo
37OSG Grid Partners
38OSG Technical Groups Activities
- Technical Groups address and coordinate technical
areas - Propose and carry out activities related to their
given areas - Liaise collaborate with other peer projects
(U.S. international) - Participate in relevant standards organizations.
- Chairs participate in Blueprint, Integration and
Deployment activities - Activities are well-defined, scoped tasks
contributing to OSG - Each Activity has deliverables and a plan
- is self-organized and operated
- is overseen sponsored by one or more
Technical Groups
TGs and Activities are where the real work gets
done
39OSG Technical Groups
40OSG Activities
41Connections to European ProjectsLCG and EGEE
42OSG Integration Testbed
Taiwan
Brazil
Korea
43Networks
44Evolving Science Requirements for Networks (DOE
High Perf. Network Workshop)
See http//www.doecollaboratory.org/meetings/hpnpw
/
45UltraLight Advanced Networkingin Applications
Funded by ITR2004
- 10 Gb/s network
- Caltech, UF, FIU, UM, MIT
- SLAC, FNAL
- Intl partners
- Level(3), Cisco, NLR
46UltraLight New Information System
- A new class of integrated information systems
- Includes networking as a managed resource for the
first time - Uses Hybrid packet-switched and
circuit-switched optical network infrastructure - Monitor, manage optimize network and Grid
Systems in realtime - Flagship applications HEP, eVLBI, burst
imaging - Terabyte-scale data transactions in minutes
- Extend Real-Time eVLBI to the 10 100 Gb/s Range
- Powerful testbed
- Significant storage, optical networks for testing
new Grid services - Strong vendor partnerships
- Cisco, Calient, NLR, CENIC, Internet2/Abilene
47Education and Outreach
48iVDGL, GriPhyN Education/Outreach
- Basics
- 200K/yr
- Led by UT Brownsville
- Workshops, portals, tutorials
- New partnerships with QuarkNet, CHEPREO, LIGO
E/O,
49US Grid Summer Schools
- June 2004 First US Grid Tutorial (South Padre
Island, Tx) - 36 students, diverse origins and types
- July 2005 Second Grid Tutorial (South Padre
Island, Tx) - 42 students, simpler physical setup (laptops)
- Reaching a wider audience
- Lectures, exercises, video, on web
- Students, postdocs, scientists
- Coordination of training activities
- Grid Cookbook
- More tutorials, 3-4/year
- CHEPREO tutorial in 2006
50QuarkNet/GriPhyN e-Lab Project
http//quarknet.uchicago.edu/elab/cosmic/home.jsp
51Student Muon Lifetime Analysis in GriPhyN/QuarkNet
52CHEPREO Center for High Energy Physics Research
and Educational OutreachFlorida International
University
- Physics Learning Center
- CMS Research
- iVDGL Grid Activities
- AMPATH network (S. America)
- Funded September 2003
- 4M initially (3 years)
- MPS, CISE, EHR, INT
53Grids and the Digital Divide
- Background
- World Summit on Information Society
- HEP Standing Committee on Inter-regional
Connectivity (SCIC) - Themes
- Global collaborations, Grids and addressing the
Digital Divide - Focus on poorly connected regions
- Brazil (2004), Korea (2005)
54Grid Timeline
First US-LHCGrid Testbeds
Grid Communications
Grid3 operations
GriPhyN, 12M
UltraLight, 2M
Start of LHC
DISUN, 10M
LIGO Grid
iVDGL, 14M
CHEPREO, 4M
OSG operations
VDT 1.0
Grid Summer Schools
PPDG, 9.5M
Digital Divide Workshops
55Fulfilling the Promise ofNext Generation Science
- Supporting permanent, national-scale Grid
infrastructure - Large CPU, storage and network capability crucial
for science - Support personnel, equipment maintenance,
replacement, upgrade - Tier1 and Tier2 resources a vital part of
infrastructure - Open Science Grid a unique national
infrastructure for science - Supporting the maintenance, testing and
dissemination of advanced middleware - Long-term support of the Virtual Data Toolkit
- Vital for reaching new disciplines for
supporting large international collaborations - Continuing support for HEP as a frontier
challenge driver - Huge challenges posed by LHC global interactive
analysis - New challenges posed by remote operation of
Global Accelerator Network
56Fulfilling the Promise (2)
- Creating even more advanced cyberinfrastructure
- Integrating databases in large-scale Grid
environments - Interactive analysis with distributed teams
- Partnerships involving CS research with
application drivers - Supporting the emerging role of advanced networks
- Reliable, high performance LANs and WANs
necessary for advanced Grid applications - Partnering to enable stronger, more diverse
programs - Programs supported by multiple Directorates, a la
CHEPREO - NSF-DOE joint initiatives
- Strengthen ability of universities and labs to
work together - Providing opportunities for cyberinfrastructure
training, education outreach - Grid tutorials, Grid Cookbook
- Collaborative tools for student-led projects
research
57Summary
- Grids enable 21st century collaborative science
- Linking research communities and resources for
scientific discovery - Needed by global collaborations pursuing
petascale science - Grid3 was an important first step in developing
US Grids - Value of planning, coordination, testbeds, rapid
feedback - Value of learning how to operate a Grid as a
facility - Value of building sustaining community
relationships - Grids drive need for advanced optical networks
- Grids impact education and outreach
- Providing technologies resources for training,
education, outreach - Addressing the Digital Divide
- OSG a scalable computing infrastructure for
science? - Strategies needed to cope with increasingly large
scale
58Grid Project References
- Open Science Grid
- www.opensciencegrid.org
- Grid3
- www.ivdgl.org/grid3
- Virtual Data Toolkit
- www.griphyn.org/vdt
- GriPhyN
- www.griphyn.org
- iVDGL
- www.ivdgl.org
- PPDG
- www.ppdg.net
- CHEPREO
- www.chepreo.org
- UltraLight
- ultralight.cacr.caltech.edu
- Globus
- www.globus.org
- Condor
- www.cs.wisc.edu/condor
- LCG
- www.cern.ch/lcg
- EU DataGrid
- www.eu-datagrid.org
- EGEE
- www.eu-egee.org
59Extra Slides
60Partnerships Drive Success
- Integrating Grids in scientific research
- Lab-centric Activities center around large
facility - Team-centric Resources shared by distributed
teams - Knowledge-centric Knowledge generated/used by
a community - Strengthening the role of universities in
frontier research - Couples universities to frontier data intensive
research - Brings front-line research and resources to
students - Exploits intellectual resources at minority or
remote institutions - Driving advances in IT/science/engineering
- Domain sciences ? Computer Science
- Universities ? Laboratories
- Scientists ? Students
- NSF projects ? NSF projects
- NSF ? DOE
- Research communities ? IT industry
61University Tier2 Centers
- Tier2 facility
- Essential university role in extended computing
infrastructure - 20 25 of Tier1 national laboratory, supported
by NSF - Validated by 3 years of experience (CMS, ATLAS,
LIGO) - Functions
- Perform physics analysis, simulations
- Support experiment software
- Support smaller institutions
- Official role in Grid hierarchy (U.S.)
- Sanctioned by MOU with parent organization
(ATLAS, CMS, LIGO) - Selection by collaboration via careful process
- Local P.I. with reporting responsibilities