D Computing Model

About This Presentation

Title:

D Computing Model

Description:

Over 75 million Monte Carlo events produced in last year ... All Monte Carlo to be produced off-site ... Now: Monte Carlo and data reprocessing with SAM-Grid ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 22

Provided by: drga55

Category:

more less

Transcript and Presenter's Notes

Title: D Computing Model

1
DØ Computing Model Monte Carlo Data
Reprocessing
Gavin Davies Imperial College London
DOSAR Workshop, Sao Paulo, September 2005
2
Outline

Operational status
Globally continue to do well
Shared by recent Run II Computing Review
DØ Computing model
Ongoing, long established plan
Production Computing
Monte Carlo
Reprocessing of Run II data
109 events reprocessed on the grid largest HEP
grid effort
Looking forward
Conclusions

3
Snapshot of Current Status

Reconstruction keeping up with data taking
Data handling is performing well
Production computing is off-site and grid based.
It continues to grow work well
Over 75 million Monte Carlo events produced in
last year
Run IIa data set being reprocessed on the grid
109 events
Analysis cpu power has been expanded
Globally doing well
Shared by recent Run II Computing Review

4
Computing Model

Started with distributed computing with evolution
to automated use of common tools/solutions on the
grid (SAM-Grid) for all tasks
Scalable
Not alone Joint effort with others at FNAL and
elsewhere, LHC
1997 Original Plan
All Monte Carlo to be produced off-site
SAM to be used for all data handling, provides a
data-grid
Now Monte Carlo and data reprocessing with
SAM-Grid
Next Other production tasks e.g. fixing and then
user analysis
Use concept of Regional Centres
DOSAR one of pioneers
Builds local expertise

5
Reconstruction Release

Periodically update version of reconstruction
code
As develop new / more refined algorithms
As get better understanding of detector
Frequency of releases decreases with time
One major release in last year p17
Basis for current Monte Carlo (MC) data
reprocessing
Benefits of p17
Reco speed-up
Full calorimeter calibration
Fuller description of detector material
Use of zero-bias overlay for MC
(More details http//cdinternal.fnal.gov/RUNIIRev
/runIIMP05.asp)

6
Data Handling - SAM

SAM continues to perform well, providing a
data-grid
50 SAM sites worldwide
Over 2.5 PB (50B events)
consumed in the last year
Up to 300 TB moved per month
Larger SAM cache solved tape
access issues
Continued success of SAM shifters
Often remote collaborators
Form 1st line of defense
SAMTV monitors SAM SAM stations

http//d0db-prd.fnal.gov/sm_local/SamAtAGlance/
7
SAMGrid
More than 10 DØ execution sites http//samgrid.fna
l.gov8080/
SAM data handling JIM job submission
monitoring SAM JIM ? SAM-Grid
http//samgrid.fnal.gov8080/list_of_schedulers.ph
p http//samgrid.fnal.gov8080/list_of_resources.p
hp
8
Remote Production Activities Monte Carlo - I

Over 75M events produced in last year, at more
than 10 sites
More than double last years production
Vast majority on shared sites
DOSAR major part of this
SAM-Grid introduced in spring 04, becoming the
default
Based on request system and jobmanager-mc_runjob
MC software package retrieved via SAMo way, inc
central farm
Average production efficiency 90
Average inefficiency due to grid infrastructure
1-5
http//www-d0.fnal.gov/computing/grid/deployment-i
ssues.html
Continued move to common tools
DOSAR sites continue move to SAMGrid from McFarm

From 04
9
Remote Production Activities Monte Carlo - II

Beyond just shared resources
More than 17M events produced directly on LCG
via submission from Nikhef
Good example of remote site driving the
development
Similar momentum building on/for OSG
Two good site examples within p17 reprocessing

10
Remote Production Activities Reprocessing - I

After significant improvements to
reconstruction, reprocess old data
P14 Winter 2003/04
500M events, 100M remotely, from DST
Based around mc_runjob
Distributed computing rather than Grid
P17 End march ? Oct
x 10 larger ie 1000M events, 250TB
Basically all remote
From raw ie use of db proxy servers
SAM-Grid as default (using mc_runjob)
3200 1GHz PIIIs for 6 months
Massive activity - largest grid activity in HEP

http//www-d0.fnal.gov/computing/reprocessing/p17/
11
Reprocessing - II
Grid jobs spawns many batch jobs
Production
Merging
12
Reprocessing -III

SAMGrid provides
Common environment operation scripts at each
site
Effective book-keeping
SAM avoids data duplication defines recovery
jobs
JIMs XML-DB used to ease bug tracing
Tough deploying a product, under evolution with
limited manpower to new sites (we are a running
experiment)
Very significant improvements in JIM
(scalability) during this period
Certification of sites - Need to check
SAMGrid vs usual production
Remote sites vs central site
Merged vs unmerged files

FNAL vs SPRACE
13
Reprocessing - IV
http//samgrid.fnal.gov8080/cgi-bin/plot_efficien
cy.cgi

Monitoring (illustration)
Overall efficiency, speed or by site.
Status into the end-game
Data sets all allocated, moving to cleaning-up
Must now push on the Monte Carlo

855 Mevents done
14
SAM-Grid Interoperability

Need access to greater resources as data sets
grow
Ongoing programme on LCG and OSG interoperability
Step 1 (co-existence) use shared resources with
SAM-Grid head-node
Widely done for both Reprocessing and MC
OSG co-existence shown for data reprocessing
Step 2 SAMGrid-LCG interface
SAM does data handling JIM job submission
Basically forwarding mechanism
Prototype established at IN2P3/Wuppertal
Extending to production level
OSG activity increasing build on LCG experience
Team work between core developers / sites

15
Looking Forward

Increased data sets require increased resources
for MC, repro etc
Route to these is increased use of grid and
common tools
Have an ongoing joint program, but work to do..
Continue development of SAM-Grid
Automated production job submission by shifters
Deployment team
Bring in new sites in manpower efficient manner
Benefit of a new site goes well beyond a cpu
count we appreciate / value this.
Full interoperability
Ability to access efficiently all shared
resources
Additional resources for above recommended by
Taskforce

16
Conclusions

Computing model continues to be successful
Based around grid-like computing, using common
tools
Key part of this is the production computing MC
and reprocessing
Significant advances this year
Continued migration to common tools
Progress on interoperability, both LCG and OSG
Two reprocessing sites operating under OSG
P17 reprocessing a tremendous success
Strongly praised by Review Committee
DOSAR major part of this
More general contribution also strongly
acknowledged.
Thank you
Lets all keep up the good work

17
Back-up
18
Terms

Tevatron
Approx equiv challenge to LHC in todays money
Running experiments
SAM (Sequential Access to Metadata)
Well developed metadata and distributed data
replication system
Originally developed by DØ FNAL-CD
JIM (Job Information and Monitoring)
handles job submission and monitoring (all but
data handling)
SAM JIM ?SAM-Grid computational grid
Tools
Runjob - Handles job workflow management
dØtools User interface for job submission
dØrte - Specification of runtime needs

19
Reminder of Data Flow

Data acquisition (raw data in evpack format)
Currently limited to 50 Hz Level-3 accept rate
Request increase to 100 Hz, as planned for Run
IIb see later
Reconstruction (tmb/DST in evpack format)
Additional information in tmb ? tmb (DST format
stopped)
Sufficient for complex corrections, inc track
fitting
Fixing (tmb in evpack format)
Improvements / corrections coming after cut of
production release
Centrally performed
Skimming (tmb in evpack format)
Centralised event streaming based on
reconstructed physics objects
Selection procedures regularly improved
Analysis (out root histogram)
Common root-based Analysis Format (CAF)
introduced in last year
tmb format remains

20
Remote Production Activities Monte Carlo
21
The Good and Bad of the Grid

Only viable way to go
Increase in resources (cpu and potentially
manpower)
Work with, not against, LHC
Still limited
BUT
Need to conform to standards dependence on
others..
Long term solutions must be favoured over short
term idiosyncratic convenience
Or wont be able to maintain adequate resources.
Must maintain production level service (papers),
while increasing functionality
As transparent as possible to non-expert

Write a Comment

User Comments (0)

About PowerShow.com

D Computing Model - PowerPoint PPT Presentation

D Computing Model

Over 75 million Monte Carlo events produced in last year ... All Monte Carlo to be produced off-site ... Now: Monte Carlo and data reprocessing with SAM-Grid ... – PowerPoint PPT presentation