Title: Open Science Grid
1- Open Science Grid
- Linking Universities and Laboratories In National
Cyberinfrastructure
www.opensciencegrid.org
Paul AveryUniversity of Floridaavery_at_phys.ufl.ed
u
NRC-ICME MeetingNational AcademiesMarch 13, 2007
2Cyberinfrastructure and Grids
- Grid Geographically distributed computing
resources configured for coordinated use - Fabric Physical resources networks providing
raw capability - Ownership Resources controlled by owners and
shared w/ others - Middleware Software tying it all together
tools, services, etc. - Enhancing collaboration via transparent resource
sharing
US-CMS Virtual Organization
3Open Science Grid July 20, 2005
- Consortium of many organizations (multiple
disciplines) - Production grid cyberinfrastructure
- 70 sites, 24,000 CPUs US, UK, Brazil, Taiwan
4OSG Close-up on U.S. Sites
5The Open Science Grid Consortium
Science projects communities
U.S. gridprojects
LHC experiments
Universityfacilities
Regional andcampus grids
OpenScienceGrid
Educationcommunities
Multi-disciplinaryfacilities
ComputerScience
Laboratorycenters
Technologists(Network, HPC, )
6Open Science Grid Basics
- Who
- Comp. scientists, IT specialists, physicists,
biologists, etc. - What
- Shared computing and storage resources
- High-speed production and research networks
- Meeting place for research groups, software
experts, IT providers - Vision
- Maintain and operate a premier distributed
computing facility - Provide education and training opportunities in
its use - Expand reach capacity to meet needs of
stakeholders - Dynamically integrate new resources and
applications - Members and partners
- Members HPC facilities, campus, laboratory
regional grids - Partners Interoperation with TeraGrid, EGEE,
NorduGrid, etc.
7Crucial Ingredients in Building OSG
- Science Push ATLAS, CMS, LIGO, SDSS
- 1999 Foresaw overwhelming need for distributed
cyberinfrastructure - Early funding Trillium consortium
- PPDG 12M (DOE) (1999 2006)
- GriPhyN 12M (NSF) (2000 2006)
- iVDGL 14M (NSF) (2001 2007)
- Supplements new funded projects
- Social networks 150 people with many overlaps
- Universities, labs, SDSC, foreign partners
- Coordination pooling resources, developing broad
goals - Common middleware Virtual Data Toolkit (VDT)
- Multiple Grid deployments/testbeds using VDT
- Unified entity when collaborating internationally
- Historically, a strong driver for funding agency
collaboration
8OSG History in Context
LIGO operation
LIGO preparation
LHC construction, preparation
LHC Ops
iVDGL
(NSF)
OSG
GriPhyN
Trillium
Grid3
(DOENSF)
(NSF)
PPDG
(DOE)
1999
2000
2001
2002
2005
2003
2004
2006
2007
2008
2009
European Grid Worldwide LHC Computing Grid
Campus, regional grids
9Principal Science Drivers
- High energy and nuclear physics
- 100s of petabytes (LHC) 2007
- Several petabytes 2005
- LIGO (gravity wave search)
- 0.5 - several petabytes 2002
- Digital astronomy
- 10s of petabytes 2009
- 10s of terabytes 2001
- Other sciences coming forward
- Bioinformatics (10s of petabytes)
- Nanoscience
- Environmental
- Chemistry
- Applied mathematics
- Materials Science?
10OSG Virtual Organizations
11OSG Virtual Organizations (2)
12Partners Federating with OSG
- Campus and regional
- Grid Laboratory of Wisconsin (GLOW)
- Grid Operations Center at Indiana University
(GOC) - Grid Research and Education Group at Iowa (GROW)
- Northwest Indiana Computational Grid (NWICG)
- New York State Grid (NYSGrid) (in progress)
- Texas Internet Grid for Research and
Education (TIGRE) - nanoHUB (Purdue)
- LONI (Louisiana)
- National
- Data Intensive Science University Network (DISUN)
- TeraGrid
- International
- Worldwide LHC Computing Grid Collaboration (WLCG)
- Enabling Grids for E-SciencE (EGEE)
- TWGrid (from Academica Sinica Grid Computing)
- Nordic Data Grid Facility (NorduGrid)
- Australian Partnerships for Advanced Computing
(APAC)
13Defining the Scale of OSGExperiments at Large
Hadron Collider
- 27 km Tunnel in Switzerland France
CMS
TOTEM
LHC _at_ CERN
ALICE
LHCb
- Search for
- Origin of Mass
- New fundamental forces
- Supersymmetry
- Other new particles
- 2007 ?
ATLAS
14LHC Data and CPU Requirements
CMS
ATLAS
- Storage
- Raw recording rate 0.2 1.5 GB/s
- Large Monte Carlo data samples
- 100 PB by 2012
- 1000 PB later in decade?
- Processing
- PetaOps (gt 300,000 3 GHz PCs)
- Users
- 100s of institutes
- 1000s of researchers
LHCb
15OSG and LHC Global Grid
- 5000 physicists, 60 countries
- 10s of Petabytes/yr by 2009
- CERN / Outside 10-20
CMS Experiment
Online System
CERN Computer Center
200 - 1500 MB/s
Tier 0
10-40 Gb/s
Tier 1
gt10 Gb/s
OSG
Tier 2
2.5-10 Gb/s
Tier 3
Tier 4
Physics caches
PCs
16LHC Global Collaborations
CMS
ATLAS
- 2000 3000 physicists per experiment
- USA is 2031 of total
17LIGO Search for Gravity Waves
- LIGO Grid
- 6 US sites
- 3 EU sites (UK Germany)
Birmingham
LHO, LLO LIGO observatory sites LSC
LIGO Scientific Collaboration
18Sloan Digital Sky Survey Mapping the Sky
19Bioinformatics GADU / GNARE
- GADU Performs
- Acquisition to acquire Genome Data from a
variety of publicly available databases and store
temporarily on the file system. - Analysis to run different publicly available
tools and in-house tools on the Grid using
Acquired data data from Integrated database. - Storage Store the parsed data acquired from
public databases and parsed results of the tools
and workflows used during analysis.
Public Databases Genomic databases available on
the web. Eg NCBI, PIR, KEGG, EMP, InterPro, etc.
GADU using Grid Applications executed on Grid as
workflows and results are stored in integrated
Database.
TeraGrid
OSG
DOE SG
Bidirectional Data Flow
- SEED
- (Data Acquisition)
- Shewanella Consortium
- (Genome Analysis)
- Others..
Integrated Database
Services to Other Groups
- Integrated Database Includes
- Parsed Sequence Data and Annotation Data from
Public web sources. - Results of different tools used for Analysis
Blast, Blocks, TMHMM,
Applications (Web Interfaces) Based on the
Integrated Database
Chisel Protein Function Analysis Tool.
TARGET Targets for Structural analysis of
proteins.
PUMA2 Evolutionary Analysis of Metabolism
PATHOS Pathogenic DB for Bio-defense research
Phyloblocks Evolutionary analysis of protein
families
GNARE Genome Analysis Research Environment
20Nanoscience Simulations
nanoHUB.org
21OSG Engagement Effort
- Purpose Bring non-physics applications to OSG
- Led by RENCI (UNC NC State Duke)
- Specific targeted opportunities
- Develop relationship
- Direct assistance with technical details of
connecting to OSG - Feedback and new requirements for OSG
infrastructure - (To facilitate inclusion of new communities)
- More better documentation
- More automation
22OSG and the Virtual Data Toolkit
- VDT a collection of software
- Grid software (Condor, Globus, VOMS, dCache,
GUMS, Gratia, ) - Virtual Data System
- Utilities
- VDT the basis for the OSG software stack
- Goal is easy installation with automatic
configuration - Now widely used in other projects
- Has a growing support infrastructure
23Why Have the VDT?
- Everyone could download the software from the
providers - But the VDT
- Figures out dependencies between software
- Works with providers for bug fixes
- Automatic configures packages software
- Tests everything on 15 platforms (and growing)
- Debian 3.1
- Fedora Core 3
- Fedora Core 4 (x86, x86-64)
- Fedora Core 4 (x86-64)
- RedHat Enterprise Linux 3 AS (x86, x86-64, ia64)
- RedHat Enterprise Linux 4 AS (x64, x86-64)
- ROCKS Linux 3.3
- Scientific Linux Fermi 3
- Scientific Linux Fermi 4 (x86, x86-64, ia64)
- SUSE Linux 9 (IA-64)
24VDT Growth Over 5 Years (1.6.1 now)
vdt.cs.wisc.edu
of Components
25OSG Jobs Snapshot 6 Months
5000 simultaneous jobsfrom multiple VOs
Sep
Dec
Feb
Nov
Jan
Oct
Mar
26OSG Jobs Per Site 6 Months
5000 simultaneous jobsat multiple sites
Sep
Dec
Feb
Nov
Jan
Oct
Mar
27Completed Jobs/Week on OSG
400K
CMS Data Challenge
Sep
Dec
Feb
Nov
Jan
Oct
Mar
28CommunicationsInternational Science Grid This
Week
- SGTW ? iSGTW
- 2 years
- Diverse audience
- gt1000 subscribers
www.isgtw.org
29OSG News Monthly Newsletter
16 issues by Feb. 2007
www.opensciencegrid.org/osgnews
30Grid Summer Schools
- Summer 2004, 2005, 2006
- 1 week _at_ South Padre Island, Texas
- Lectures plus hands-on exercises to 40 students
- Students of differing backgrounds (physics CS),
minorities - Reaching a wider audience
- Lectures, exercises, video, on web
- More tutorials, 3-4/year
- Students, postdocs, scientists
- Agency specific tutorials
31Project Challenges
- Technical constraints
- Commercial tools fall far short, require (too
much) invention - Integration of advanced CI, e.g. networks
- Financial constraints (slide)
- Fragmented short term funding injections
(recent 30M/5 years) - Fragmentation of individual efforts
- Distributed coordination and management
- Tighter organization within member projects
compared to OSG - Coordination of schedules milestones
- Many phone/video meetings, travel
- Knowledge dispersed, few people have broad
overview
32Funding Milestones 1999 2007
Grid Communications
First US-LHCGrid Testbeds
DISUN, 10M
UltraLight, 2M
GriPhyN, 12M
Grid3 start
LHC start
OSG start
iVDGL, 14M
VDT 1.3
LIGO Grid
VDT 1.0
CHEPREO, 4M
Grid Summer Schools 2004, 2005, 2006
OSG, 30M NSF, DOE
PPDG, 9.5M
- Grid, networking projects
- Large experiments
- Education, outreach, training
Digital Divide Workshops04, 05, 06
33Challenges from Diversity and Growth
- Management of an increasingly diverse enterprise
- Sci/Eng projects, organizations, disciplines as
distinct cultures - Accommodating new member communities
(expectations?) - Interoperation with other grids
- TeraGrid
- International partners (EGEE, NorduGrid, etc.)
- Multiple campus and regional grids
- Education, outreach and training
- Training for researchers, students
- but also project PIs, program officers
- Operating a rapidly growing cyberinfrastructure
- 25K ? 100K CPUs, 4 ? 10 PB disk
- Management of and access to rapidly increasing
data stores (slide) - Monitoring, accounting, achieving high
utilization - Scalability of support model (slide)
34Rapid Cyberinfrastructure Growth LHC
- Meeting LHC service challenges milestones
- Participating in worldwide simulation productions
Tier-2
2008 140,000 PCs
Tier-1
CERN
35OSG Operations
- Distributed model
- Scalability!
- VOs, sites, providers
- Rigorous problemtracking routing
- Security
- Provisioning
- Monitoring
- Reporting
Partners with EGEE operations
36Five Year Project Timeline Milestones
Contribute to Worldwide LHC Computing Grid
LHC Event Data Distribution and Analysis
LHC Simulations
Support 1000 Users 20PB Data Archive
Contribute to LIGO Workflow and Data Analysis
Advanced LIGO
LIGO Data Grid dependent on OSG
LIGO data run SC5
STAR, CDF, D0, Astrophysics
CDF Simulation
CDF Simulation and Analysis
D0 Simulations
D0 Reprocessing
STAR Data Distribution and Jobs
10KJobs per Day
Additional Science Communities
1 Community
1 Community
1 Community
1 Community
1 Community
1 Community
1 Community
1 Community
Facility Security Risk Assessment, Audits,
Incident Response, Management, Operations,
Technical Controls
Plan V1
1st Audit
Risk Assessment
Audit
Risk Assessment
Audit
Risk Assessment
Audit
Risk Assessment
Facility Operations and Metrics Increase
robustness and scale Operational Metrics defined
and validated each year.
Interoperate and Federate with Campus and
Regional Grids
VDT and OSG Software Releases Major Release
every 6 months Minor Updates as needed
VDT 1.4.0
VDT 1.4.1
VDT 1.4.2
OSG 0.6.0
OSG 0.8.0
OSG 1.0
OSG 2.0
OSG 3.0
VDT Incremental Updates
dCache with role based authorization
Accounting
Auditing
Federated monitoring and information services
VDS with SRM
Common S/w Distribution with TeraGrid
Transparent data and job movement with TeraGrid
EGEE using VDT 1.4.X
Transparent data management with EGEE
Extended Capabilities Increase Scalability and
Performance for Jobs and Data to meet Stakeholder
needs
Integrated Network Management
SRM/dCache Extensions
Just in Time Workload Management
VO Services Infrastructure
Data Analysis (batch and interactive) Workflow
Improved Workflow and Resource Selection
Work with SciDAC-2 CEDS and Security with Open
Science
37Extra Slides
38Motivation Data Intensive Science
- 21st century scientific discovery
- Computationally data intensive
- Theory experiment simulation
- Internationally distributed resources and
collaborations - Dominant factor data growth (1 petabyte 1000
terabytes) - 2000 0.5 petabyte
- 2007 10 petabytes
- 2013 100 petabytes
- 2020 1000 petabytes
- Powerful cyberinfrastructure needed
- Computation Massive, distributed CPU
- Data storage access Large-scale, distributed
storage - Data movement International optical networks
- Data sharing Global collaborations (100s
1000s) - Software Managing all of the above
How to collect, manage, access and interpret
this quantity of data?
39VDT Release Process (Subway Map)
Gather requirements
Time
Build software
Test
Validation test bed
VDT Release
ITB Release Candidate
Integration test bed
OSG Release
From Alain Roy
40VDT Challenges
- How should we smoothly update a production
service? - In-place vs. on-the-side
- Preserve old configuration while making big
changes - Still takes hours to fully install and set up
from scratch - How do we support more platforms?
- A struggle to keep up with the onslaught of Linux
distributions - AIX? Mac OS X? Solaris?
- How can we accommodate native packaging formats?
- RPM
- Deb
Fedora Core 6
Fedora Core 4
RHEL 3
BCCD
Fedora Core 3
RHEL 4
41OSG Integration TestBed (ITB)
Test new VDT versions before release
42Collaboration with Internet2www.internet2.edu
43Collaboration with National Lambda
Railwww.nlr.net
- Optical, multi-wavelength community owned or
leased dark fiber (10 GbE) networks for RE - Spawning state-wide and regional networks (FLR,
SURA, LONI, ) - Bulletin NLR-Internet2 merger announcement