Title: Who Uses the Open Science Grid
1Who Uses the Open Science Grid
- Ruth Pordes
- Fermilab Computing Division
- US CMS
2Who?
- Scientists use the OSG
- Computer Scientists, Nuclear Physicists,
Biologists, Gravitational Wave Physicists,
Astrophysicists,, Particle Physicists, - Educators and students use the OSG
- To participate in sharing distributed data and
computing - To teach and learn about information technologies
- Information Technologists use OSG
- To provide and support their software and
services. - System and Facility Administrators use the OSG
- To enable their resources and facilities to be
used more broadly - To work with peer experts to solve computing
issues they have in common.
3the scientists
- Groups of researchers adapt their applications
and use those resources that give them the most
payback for their effort and time. - First they use Sites owned by members of their
research group. - The research groups are an integral part of the
OSG, driving the development and the mode of
operation.
4from large research groups
ATLAS Collaboration (building 40 at Cern)
5to single researchers
6educators and students
- The well defined software stack and the recipes
for building and monitoring the grid provide
cook books for grid exercises and examples
7high schools collaborating and sharing
8in e-labs
experiencial continuous open forums for
scholarly collaboration
9cluster, storage and fabric providers
- Interface common access to their facilities.
- Collaborate in the building and running of OSG as
a coherent distributed facility.
10which makes it all a bit like..
11facility administrators
- DOE Laboratory Facilities
- interfacing, configuring, monitoring, auditing
- their compute farms and storage silos
SLAC Compute Farm on OSG. Batch system priorities
manage use. Site policies control access.
12Fermilab STK disk-cache and mass storage
silo. Multiple GridFTP doors manage rate. Site
policies control access.
13testing new information technology
- Readiness Plans Review
- Description of the Service.
- Dependencies ond Interfaces to other services.
- Required Resources.
- Server Requirements.
- Packaging.
- Installation and Configuration.
- Test Harness
- Validation
- Contact Information
e.g. Testing new releases of GridFTP across OSG
14service / software discovery
- 66 servers registered across OSG.
15CCR University of Buffalo Applications
- Grid-enabling Application Templates (GATs)
provide tailored portals for many different
applications to run on the OSG infrastructure
transparently to the user. - GRASE VO Science and Engineering Applications
(courtesy Mark L. Green)
- The web portal provides the infrastructure
required for defining the job template workflows.
- The web portal is designed around a central
database that contains the ?state? of all the
grid users, the infrastructure and resource
information,et. - The ?state? of the ACDC-Grid can be queried by
grid users.
- Molecular Structure Determination
- Quantum Chemistry
- Earthquake Engineering
- Princeton Ocean Model and Biohazards
- Geophysical Mass Flows
- Optimization Software Tool (Numerical methods)
16Individual researchers taking advantage of
available compute cycles on the grid
2004 SDSS Southern Coadd project combined images
from a 300 square degree region along the
southern celestial equator, imaged an average of
about ten times to allow scientists to detect
very faint and distant astronomical objects.
15 sites used in an opportunistic more to
coadd all available data. gt44,000 computing
jobs requiring gttwo terabytes of input data
transferred to remote grid nodes were processed.
Elongated blue object shows strong-lensing arc
system discovered. The arc is a background galaxy
whose image has been distorted by the
gravitational strong lensing effect due to the
foreground cluster of galaxies (orange-yellow
objects).
2005 SDSS 1) quasar spectral fits
approximately 50,000 spectra, and each spectrum
requires 1 hours of CPU time to accomplish the
fit. 2) Cluster finding. Evaluating selecting
sites based on location of data. 2005Dark
Energy Survey (DES) may be interested in
simulations.
Courtesy J. Annis and SGTW
17BioInformatics - GADU
- Scan all publicly available protein sequences and
make repository of results, using several
bioinformatics tools (Blast, Blocks, etc.) - Goal to have 2 scans and full database updates
per month. Each database update entails 20000
jobs running at about 2 hours each on one node
(average 100 jobs DC). - Use both TeraGrid and Grid3/OSG (gt30 on Grid3).
TeraGrid press release http//www.teragrid.org/new
s/news05/gadu.html - GADU runs jobs opportunistically, uses Grid
Catalog to maintain the status of sites and
select a site that is appropriate for a given
job.
18GNARE Site Selection
One challenge in using the Grid reliably for
high-throughput analysis is monitoring the state
of all Grid sites and how well they have
performed for job requests from a given submit
host.
We view a site as available if our submit host
can communicate with it, if it is responding to
Globus job-submission commands, and if it will
run our jobs promptly, with minimal queuing
delays
GriPhyN presentation Dinanath Sulakhe, Alex
Rodriguez, Mike Wilde, Nika Nefedova, Jens
Voeckler Natalia Maltsev, Ian Foster, Rick
Stevens, ANL
Implementation of the Site Selector
19functional magnetic resonance imaging (fMRI) at
Dartmouth
- Using Virtual Data System, collaboration with
GriPhyN, to run fMRI normalization on local,
Greed campus Grid and Grid3 using the same
infrastructure.
Jed Dobson, Dartmouth College, Mike Wilde,
University of Chicago
20LIGO
- Large body of legacy software and ongoing
analyses. - Data Grid moving and tracking data replicas over
all LIGO sites. - LIGO sites and applications on OSG testing
workflow and integrating existing science
analyses. E.g. Workflow for inspiral search
(neutron star, black hole, other compact objects)
for a single 2048 sec stretch of data
Virtual Data System Workflows
21integrating existing facilitiese.g. STAR
- Jobs on 2 STAR sites while solving software
installation over the grid - Third site joining in South America
22production for Tevatron experiments e.g. D0
- D0 Re-reprocessing across 8 sites, gt2500 cpus, in
Europe US - (10-15 Cpu secs/event 107 events/day)
- 4-6 month task.
23Metrics efficiency
24For US LHC OSG is critical path
- Late 2007? Early 2008?
- Real Data collected O(10) to O(100) TB
- Simulated Data produced O(100) TB
- Number of LHC users 4000 (2000 active)
- Number of US-LHC users 1000 (500 active)
- Main type of analysis activity Understand the
detectors by reading and processing data many
many times. - Re-reconstruction
- Calibration, re-calibration
- RAW level stuff (i.e. not AOD, perhaps not even
ESD!) - Multiple passes through the same datasets
- Support 1000s of jobs per user
- Access to O(100) TB of (real and simulated) data
- Support VO-defined prioritisation of
work/resource access (based on roles) - Data distribution, Tracking and Access key
challenge - 2 years (and counting) to deploy a
production-level, multi-user infrastructure!
25Service Challenges ramp up to LHC start-up
service
June05 - Technical Design Report
Sep05 - SC3 Service Phase
May06 SC4 Service Phase
Sep06 Initial LHC Service in stable operation
Apr07 LHC Service commissioned
SC2
SC2 Reliable data transfer (disk-network-disk)
5 Tier-1s, aggregate 500 MB/sec sustained at
CERN SC3 Reliable base service most Tier-1s,
some Tier-2s basic experiment software chain
grid data throughput 500 MB/sec,
including mass storage (25 of the nominal final
throughput for the proton period) SC4
All Tier-1s, major Tier-2s capable of
supporting full experiment software chain inc.
analysis sustain nominal final grid
data throughput LHC Service in Operation
September 2006 ramp up to full operational
capacity by April 2007 capable of
handling twice the nominal data throughput
26Distributing the Data to the Analysis Farms
- Summer 2005 Service Challenge from CERN to all
Tier-1s 1/2 rate needed at start of experiments. - CMS CERN data transfer plots FNAL has separate
streams to Tape Disk. Data transferred to US
Tier-1 and Tier-2s Nebraska, Purdue, Wisconsin
through SRM/(resillient) dCache interface.
22 Terabytes/day
OSG has 2 SS (Storage Service) sites where Space
(disk or tape) is managed across VOs and can be
longlived and/or (soon) reserved.
27Getting to LHC Datataking
Simulation Production - sharing resources
ATLAS Rome Physics Production
28Metrics ATLAS efficiency
ATLAS DC2 Overview of Grids as of 2005-02-24
181130
Grid submitted pending running finished failed efficiency
Grid3 36 3 814 153028 46943 77
NorduGrid 29 130 1105 114264 70349 62
LCG 60 528 610 145692 242247 38
TOTAL 125 661 2529 412984 359539 53
- Capone submitted managed ATLAS jobs on Grid3 gt
150K - In 2004, 1.2M CPU-hours
- Grid3 sites with more than 1000 successful DC2
jobs 20 - Capone instances gt 1000 jobs 13
Courtesy of Rob Gardner
2991 entries (out of about 100 talks), 21 F plus
70 M
(preliminary)
and last but not least all of you!
30so now OSG is open
31- ensure stability that is a hallmark of Grid3
- add new capabilities, applications and sites.
- make failproof and easy installation of sites and
applications. - work on making policies and agreements easy and
transparent.
- acknowledge and appreciate all who contribute!