Grids and 21st Century Data Intensive Science - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Grids and 21st Century Data Intensive Science

Description:

Virtual teams, communities, enterprises and organizations that use specific ... [Paraphrased from NSF Blue Ribbon Panel report, 2003] ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 62
Provided by: paula92
Category:

less

Transcript and Presenter's Notes

Title: Grids and 21st Century Data Intensive Science


1
  • Grids and 21st CenturyData Intensive Science

Paul Avery University of Florida avery_at_phys.ufl.ed
u
Physics ColloquiumJohns Hopkins
UniversityOctober 6, 2005
2
Outline of Talk
  • Cyberinfrastructure and Grids
  • Data intensive disciplines and Data Grids
  • The Trillium Grid collaboration
  • GriPhyN, iVDGL, PPDG
  • The LHC and its computing challenges
  • Grid3 and the Open Science Grid
  • A bit on networks
  • Education and Outreach
  • Challenges for the future
  • Summary

Presented from a physicists perspective!
3
Cyberinfrastructure (cont)
  • Virtual teams, communities, enterprises and
    organizations that use specific software
    programs, services, instruments, data,
    information, knowledge.
  • Cyberinfrastructure layer of enabling hardware,
    algorithms, software, communications,
    institutions, and personnel. A platform that
    empowers researchers to innovate and eventually
    revolutionize what they do, how they do it, and
    who participates.
  • Base technologies Computation, storage, and
    communication components that continue to advance
    in raw capacity at exponential rates.

Paraphrased from NSF Blue Ribbon Panel report,
2003
Challenge Creating and operating advanced
cyberinfrastructure andintegrating it in science
and engineering applications.
4
Cyberinfrastructure and Grids
  • Grid Geographically distributed computing
    resources configured for coordinated use
  • Fabric Physical resources networks provide
    raw capability
  • Ownership Resources controlled by owners and
    shared w/ others
  • Middleware Software ties it all together tools,
    services, etc.
  • Enhancing collaboration via transparent resource
    sharing

US-CMS Virtual Organization
5
Data Grids Collaborative Research
  • Team-based 21st century scientific discovery
  • Strongly dependent on advanced information
    technology
  • People and resources distributed internationally
  • Dominant factor data growth (1 Petabyte 1000
    TB)
  • 2000 0.5 Petabyte
  • 2005 10 Petabytes
  • 2010 100 Petabytes
  • 2015-7 1000 Petabytes?
  • Drives need for powerful linked resources Data
    Grids
  • Computation Massive, distributed CPU
  • Data storage and access Distributed hi-speed disk
    and tape
  • Data movement International optical networks
  • Collaborative research and Data Grids
  • Data discovery, resource sharing, distributed
    analysis, etc.

How to collect, manage, access and interpret
this quantity of data?
6
Examples of Data Intensive Disciplines
  • High energy nuclear physics
  • Belle, BaBar, Tevatron, RHIC, JLAB
  • Large Hadron Collider (LHC)
  • Astronomy
  • Digital sky surveys (SDSS), Virtual
    Observatories
  • VLBI arrays multiple- Gb/s data streams
  • Gravity wave searches
  • LIGO, GEO, VIRGO, TAMA, ACIGA,
  • Earth and climate systems
  • Earth Observation, climate modeling,
    oceanography,
  • Biology, medicine, imaging
  • Genome databases
  • Proteomics (protein structure interactions,
    drug delivery, )
  • High-resolution brain scans (1-10?m, time
    dependent)

7
Bottom-up Collaboration Trillium
  • Trillium PPDG GriPhyN iVDGL
  • PPDG 12M (DOE) (1999 2006)
  • GriPhyN 12M (NSF) (2000 2005)
  • iVDGL 14M (NSF) (2001 2006)
  • 150 people with large overlaps between projects
  • Universities, labs, foreign partners
  • Strong driver for funding agency collaborations
  • Inter-agency NSF DOE
  • Intra-agency Directorate Directorate, Division
    Division
  • Coordinated internally to meet broad goals
  • CS research, developing/supporting Virtual Data
    Toolkit (VDT)
  • Grid deployment, using VDT-based middleware
  • Unified entity when collaborating internationally

8
Our Vision Goals
  • Develop the technologies tools needed to
    exploit a Grid-based cyberinfrastructure
  • Apply and evaluate those technologies tools in
    challenging scientific problems
  • Develop the technologies procedures to support
    a permanent Grid-based cyberinfrastructure
  • Create and operate a persistent Grid-based
    cyberinfrastructure in support of
    discipline-specific research goals

End-to-end
GriPhyN iVDGL DOE Particle Physics Data Grid
(PPDG) Trillium
9
Our Science Drivers
  • Experiments at Large Hadron Collider
  • New fundamental particles and forces
  • 100s of Petabytes 2007 - ?
  • High Energy Nuclear Physics expts
  • Top quark, nuclear matter at extreme density
  • 1 Petabyte (1000 TB) 1997 present
  • LIGO (gravity wave search)
  • Search for gravitational waves
  • 100s of Terabytes 2002 present
  • Sloan Digital Sky Survey
  • Systematic survey of astronomical objects
  • 10s of Terabytes 2001 present

10
Common Middleware Virtual Data Toolkit
VDT
NMI
Test
Sources (CVS)
Build
Binaries
Build Test Condor pool 22 Op. Systems
Pacman cache
Package
Patching
RPMs
Build
Binaries
GPT src bundles
Build
Binaries
Test
Many Contributors
A unique laboratory for testing, supporting,
deploying, packaging, upgrading,
troubleshooting complex sets of software!
11
VDT Growth Over 3 Years
www.griphyn.org/vdt/
VDT 1.1.8 First real use by LCG
VDT 1.0 Globus 2.0b Condor 6.3.1
of components
VDT 1.1.11 Grid3
VDT 1.1.7 Switch to Globus 2.2
12
Components of VDT 1.3.5
  • Globus 3.2.1
  • Condor 6.7.6
  • RLS 3.0
  • ClassAds 0.9.7
  • Replica 2.2.4
  • DOE/EDG CA certs
  • ftsh 2.0.5
  • EDG mkgridmap
  • EDG CRL Update
  • GLUE Schema 1.0
  • VDS 1.3.5b
  • Java
  • Netlogger 3.2.4
  • Gatekeeper-Authz
  • MyProxy1.11
  • KX509
  • System Profiler
  • GSI OpenSSH 3.4
  • Monalisa 1.2.32
  • PyGlobus 1.0.6
  • MySQL
  • UberFTP 1.11
  • DRM 1.2.6a
  • VOMS 1.4.0
  • VOMS Admin 0.7.5
  • Tomcat
  • PRIMA 0.2
  • Certificate Scripts
  • Apache
  • jClarens 0.5.3
  • New GridFTP Server
  • GUMS 1.0.1

13
Collaborative RelationshipsA VDT Perspective
Partner science projects Partner networking
projects Partner outreach projects
Requirements
Prototyping experiments
Production Deployment
  • Other linkages
  • Work force
  • CS researchers
  • Industry

Computer Science Research
Virtual Data Toolkit
Larger Science Community
Techniques software
Tech Transfer
Globus, Condor, NMI, iVDGL, PPDG, DISUN EGEE, LHC
Experiments, QuarkNet, CHEPREO, Digital Divide
U.S.Grids
Intl
Outreach
14
Goal Peta-scale Data Grids forGlobal Science
Production Team
Single Researcher
Workgroups
Interactive User Tools
Request Execution Management Tools
Request Planning Scheduling Tools
Virtual Data Tools
ResourceManagementServices
Security andPolicyServices
Other GridServices
  • PetaOps
  • Petabytes
  • Performance

Distributed resources(code, storage,
CPUs,networks)
Raw datasource
15
Sloan Digital Sky Survey (SDSS)Using Virtual
Data in GriPhyN
16
The LIGO Scientific Collaboration (LSC)and the
LIGO Grid
  • LIGO Grid 6 US sites 3 EU sites (Cardiff/UK,
    AEI/Germany)

Birmingham
LHO, LLO LIGO observatory sites LSC
LIGO Scientific Collaboration
17
Large Hadron Collider its Frontier
Computing Challenges
18
Large Hadron Collider (LHC)_at_ CERN
  • 27 km Tunnel in Switzerland France

TOTEM
CMS
ALICE
LHCb
  • Search for
  • Origin of Mass
  • New fundamental forces
  • Supersymmetry
  • Other new particles
  • 2007 ?

ATLAS
19
CMS Compact Muon Solenoid
Inconsequential humans
20
LHC Data Rates Detector to Storage
40 MHz
TBytes/sec
Physics filtering
Level 1 Trigger Special Hardware
75 GB/sec
75 KHz
Level 2 Trigger Commodity CPUs
5 GB/sec
5 KHz
Level 3 Trigger Commodity CPUs
0.25 1.5 GB/sec
150 Hz
Raw Data to storage( simulated data)
21
Complexity Higgs Decay to 4 Muons
(30 minimum bias events)
All charged tracks with pt gt 2 GeV
Reconstructed tracks with pt gt 25 GeV
109 collisions/sec, selectivity 1 in 1013
22
LHC Petascale Global Science
  • Complexity Millions of individual detector
    channels
  • Scale PetaOps (CPU), 100s of Petabytes (Data)
  • Distribution Global distribution of people
    resources

BaBar/D0 Example - 2004 700 Physicists 100
Institutes 35 Countries
CMS Example- 2007 5000 Physicists 250
Institutes 60 Countries
23
LHC Beyond Moores Law
Moores Law (2000)
24
LHC Global Data Grid (2007)
  • 5000 physicists, 60 countries
  • 10s of Petabytes/yr by 2008
  • 1000 Petabytes in lt 10 yrs?

CMS Experiment
Online System
CERN Computer Center
150 - 1500 MB/s
Tier 0
10-40 Gb/s
Tier 1
gt10 Gb/s
Tier 2
2.5-10 Gb/s
Tier 3
Tier 4
Physics caches
PCs
25
Grids and Globally Distributed Teams
  • Non-hierarchical Chaotic analyses productions
  • Superimpose significant random data flows

26
Grid3 and Open Science Grid
27
  • Grid3 A National Grid Infrastructure
  • October 2003 July 2005
  • 32 sites, 4,000 CPUs Universities 4 national
    labs
  • Sites in US, Korea, Brazil, Taiwan
  • Applications in HEP, LIGO, SDSS, Genomics, fMRI,
    CS

Brazil
www.ivdgl.org/grid3
28
Grid3 Applications
www.ivdgl.org/grid3/applications
29
Grid3 Shared Use Over 6 months
Usage CPUs
Sep 10
30
Grid3 Production Over 13 Months
31
U.S. CMS 2003 Production
  • 10M p-p collisions largest ever
  • 2x simulation sample
  • ½ manpower
  • Multi-VO sharing

32
Grid3 Lessons Learned
  • How to operate a Grid as a facility
  • Tools, services, error recovery, procedures,
    docs, organization
  • Delegation of responsibilities (Project, VO,
    service, site, )
  • Crucial role of Grid Operations Center (GOC)
  • How to support people ? people relations
  • Face-face meetings, phone cons, 1-1 interactions,
    mail lists, etc.
  • How to test and validate Grid tools and
    applications
  • Vital role of testbeds
  • How to scale algorithms, software, process
  • Some successes, but interesting failure modes
    still occur
  • How to apply distributed cyberinfrastructure
  • Successful production runs for several
    applications

33
http//www.opensciencegrid.org
34
Open Science Grid July 20, 2005
  • Production Grid 50 sites, 15,000 CPUs
  • Sites in US, Korea, Brazil, Taiwan
  • Integration Grid 10-12 sites

Taiwan, S.Korea
Sao Paolo
35
OSG Participating Disciplines
36
OSG Operations Snapshots
Taiwan, S.Korea
Sao Paolo
37
OSG Grid Partners
38
OSG Technical Groups Activities
  • Technical Groups address and coordinate technical
    areas
  • Propose and carry out activities related to their
    given areas
  • Liaise collaborate with other peer projects
    (U.S. international)
  • Participate in relevant standards organizations.
  • Chairs participate in Blueprint, Integration and
    Deployment activities
  • Activities are well-defined, scoped tasks
    contributing to OSG
  • Each Activity has deliverables and a plan
  • is self-organized and operated
  • is overseen sponsored by one or more
    Technical Groups

TGs and Activities are where the real work gets
done
39
OSG Technical Groups
40
OSG Activities
41
Connections to European ProjectsLCG and EGEE
42
OSG Integration Testbed
Taiwan
Brazil
Korea
43
Networks
44
Evolving Science Requirements for Networks (DOE
High Perf. Network Workshop)
See http//www.doecollaboratory.org/meetings/hpnpw
/
45
UltraLight Advanced Networkingin Applications
Funded by ITR2004
  • 10 Gb/s network
  • Caltech, UF, FIU, UM, MIT
  • SLAC, FNAL
  • Intl partners
  • Level(3), Cisco, NLR

46
UltraLight New Information System
  • A new class of integrated information systems
  • Includes networking as a managed resource for the
    first time
  • Uses Hybrid packet-switched and
    circuit-switched optical network infrastructure
  • Monitor, manage optimize network and Grid
    Systems in realtime
  • Flagship applications HEP, eVLBI, burst
    imaging
  • Terabyte-scale data transactions in minutes
  • Extend Real-Time eVLBI to the 10 100 Gb/s Range
  • Powerful testbed
  • Significant storage, optical networks for testing
    new Grid services
  • Strong vendor partnerships
  • Cisco, Calient, NLR, CENIC, Internet2/Abilene


47
Education and Outreach
48
iVDGL, GriPhyN Education/Outreach
  • Basics
  • 200K/yr
  • Led by UT Brownsville
  • Workshops, portals, tutorials
  • New partnerships with QuarkNet, CHEPREO, LIGO
    E/O,

49
US Grid Summer Schools
  • June 2004 First US Grid Tutorial (South Padre
    Island, Tx)
  • 36 students, diverse origins and types
  • July 2005 Second Grid Tutorial (South Padre
    Island, Tx)
  • 42 students, simpler physical setup (laptops)
  • Reaching a wider audience
  • Lectures, exercises, video, on web
  • Students, postdocs, scientists
  • Coordination of training activities
  • Grid Cookbook
  • More tutorials, 3-4/year
  • CHEPREO tutorial in 2006

50
QuarkNet/GriPhyN e-Lab Project
http//quarknet.uchicago.edu/elab/cosmic/home.jsp
51
Student Muon Lifetime Analysis in GriPhyN/QuarkNet
52
CHEPREO Center for High Energy Physics Research
and Educational OutreachFlorida International
University
  • Physics Learning Center
  • CMS Research
  • iVDGL Grid Activities
  • AMPATH network (S. America)
  • Funded September 2003
  • 4M initially (3 years)
  • MPS, CISE, EHR, INT

53
Grids and the Digital Divide
  • Background
  • World Summit on Information Society
  • HEP Standing Committee on Inter-regional
    Connectivity (SCIC)
  • Themes
  • Global collaborations, Grids and addressing the
    Digital Divide
  • Focus on poorly connected regions
  • Brazil (2004), Korea (2005)

54
Grid Timeline
First US-LHCGrid Testbeds
Grid Communications
Grid3 operations
GriPhyN, 12M
UltraLight, 2M
Start of LHC
DISUN, 10M
LIGO Grid
iVDGL, 14M
CHEPREO, 4M
OSG operations
VDT 1.0
Grid Summer Schools
PPDG, 9.5M
Digital Divide Workshops
55
Fulfilling the Promise ofNext Generation Science
  • Supporting permanent, national-scale Grid
    infrastructure
  • Large CPU, storage and network capability crucial
    for science
  • Support personnel, equipment maintenance,
    replacement, upgrade
  • Tier1 and Tier2 resources a vital part of
    infrastructure
  • Open Science Grid a unique national
    infrastructure for science
  • Supporting the maintenance, testing and
    dissemination of advanced middleware
  • Long-term support of the Virtual Data Toolkit
  • Vital for reaching new disciplines for
    supporting large international collaborations
  • Continuing support for HEP as a frontier
    challenge driver
  • Huge challenges posed by LHC global interactive
    analysis
  • New challenges posed by remote operation of
    Global Accelerator Network

56
Fulfilling the Promise (2)
  • Creating even more advanced cyberinfrastructure
  • Integrating databases in large-scale Grid
    environments
  • Interactive analysis with distributed teams
  • Partnerships involving CS research with
    application drivers
  • Supporting the emerging role of advanced networks
  • Reliable, high performance LANs and WANs
    necessary for advanced Grid applications
  • Partnering to enable stronger, more diverse
    programs
  • Programs supported by multiple Directorates, a la
    CHEPREO
  • NSF-DOE joint initiatives
  • Strengthen ability of universities and labs to
    work together
  • Providing opportunities for cyberinfrastructure
    training, education outreach
  • Grid tutorials, Grid Cookbook
  • Collaborative tools for student-led projects
    research

57
Summary
  • Grids enable 21st century collaborative science
  • Linking research communities and resources for
    scientific discovery
  • Needed by global collaborations pursuing
    petascale science
  • Grid3 was an important first step in developing
    US Grids
  • Value of planning, coordination, testbeds, rapid
    feedback
  • Value of learning how to operate a Grid as a
    facility
  • Value of building sustaining community
    relationships
  • Grids drive need for advanced optical networks
  • Grids impact education and outreach
  • Providing technologies resources for training,
    education, outreach
  • Addressing the Digital Divide
  • OSG a scalable computing infrastructure for
    science?
  • Strategies needed to cope with increasingly large
    scale

58
Grid Project References
  • Open Science Grid
  • www.opensciencegrid.org
  • Grid3
  • www.ivdgl.org/grid3
  • Virtual Data Toolkit
  • www.griphyn.org/vdt
  • GriPhyN
  • www.griphyn.org
  • iVDGL
  • www.ivdgl.org
  • PPDG
  • www.ppdg.net
  • CHEPREO
  • www.chepreo.org
  • UltraLight
  • ultralight.cacr.caltech.edu
  • Globus
  • www.globus.org
  • Condor
  • www.cs.wisc.edu/condor
  • LCG
  • www.cern.ch/lcg
  • EU DataGrid
  • www.eu-datagrid.org
  • EGEE
  • www.eu-egee.org

59
Extra Slides
60
Partnerships Drive Success
  • Integrating Grids in scientific research
  • Lab-centric Activities center around large
    facility
  • Team-centric Resources shared by distributed
    teams
  • Knowledge-centric Knowledge generated/used by
    a community
  • Strengthening the role of universities in
    frontier research
  • Couples universities to frontier data intensive
    research
  • Brings front-line research and resources to
    students
  • Exploits intellectual resources at minority or
    remote institutions
  • Driving advances in IT/science/engineering
  • Domain sciences ? Computer Science
  • Universities ? Laboratories
  • Scientists ? Students
  • NSF projects ? NSF projects
  • NSF ? DOE
  • Research communities ? IT industry

61
University Tier2 Centers
  • Tier2 facility
  • Essential university role in extended computing
    infrastructure
  • 20 25 of Tier1 national laboratory, supported
    by NSF
  • Validated by 3 years of experience (CMS, ATLAS,
    LIGO)
  • Functions
  • Perform physics analysis, simulations
  • Support experiment software
  • Support smaller institutions
  • Official role in Grid hierarchy (U.S.)
  • Sanctioned by MOU with parent organization
    (ATLAS, CMS, LIGO)
  • Selection by collaboration via careful process
  • Local P.I. with reporting responsibilities
Write a Comment
User Comments (0)
About PowerShow.com