D Computing Model and Operational Status - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

D Computing Model and Operational Status

Description:

http://d0server1.fnal.gov/projects/Computing/Reviews/Sept2005/Index.html ... Approx equiv challenge to LHC in 'today's' money. Running experiments ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 21

Provided by: drgavin

Learn more at: https://pingprod.fnal.gov

Category:

more less

Transcript and Presenter's Notes

Title: D Computing Model and Operational Status

1
DØ Computing Model and Operational Status
Gavin Davies Imperial College London
Run II Computing Review, September 2005
2
Outline

Operational status
Globally continue to do well
DØ Computing model data flow
Ongoing, long established plan
Highlights from the last year
Algorithms
SAM-Grid reprocessing of Run II data
109 events reprocessed on the grid largest HEP
grid effort
Looking forward
Budget request
Manpower
Conclusions

3
Snapshot of Current Status

Reconstruction keeping up with data taking
Data handling is performing well
Production computing is off-site and grid based.
It continues to grow work well
Over 75 million Monte Carlo events produced in
last year
Run IIa data set being reprocessed on the grid
109 events
Analysis cpu power has been expanded
Globally doing well

4
Computing Model

Started with distributed computing with evolution
to automated use of common tools/solutions on the
grid (SAM-Grid) for all tasks
Scalable
Not alone Joint effort with CD and others, LHC
1997 Original Plan
All Monte Carlo to be produced off-site
SAM to be used for all data handling, provides a
data-grid
Now Monte Carlo and data reprocessing with
SAM-Grid
Next Other production tasks e.g. fixing and then
user analysis (in order of increasing
complexity)

5
Reminder of Data Flow

Data acquisition (raw data in evpack format)
Currently limited to 50 Hz Level-3 accept rate
Request increase to 100 Hz, as planned for Run
IIb see later
Reconstruction (tmb/DST in evpack format)
Additional information in tmb ? tmb (DST format
stopped)
Sufficient for complex corrections, inc track
fitting
Fixing (tmb in evpack format)
Improvements / corrections coming after cut of
production release
Centrally performed
Skimming (tmb in evpack format)
Centralised event streaming based on
reconstructed physics objects
Selection procedures regularly improved
Analysis (out root histogram)
Common root-based Analysis Format (CAF)
introduced in last year
tmb format remains

6
Reconstruction

Central farm
Processing
reprocessing (SAM-Grid) with spare cycles
Evolving to shared FNAL farms

Reco-timing
Significant improvement, especially
at higher instantaneous luminosity
See Qizhong s talk

7
Highlights Algorithms

Algorithms reaching maturity
P17 improvements include
Reco speed-up
Full calorimeter calibration
Fuller description of detector material
Common Analysis Format (CAF)
Limits development of different root-based
formats
Common object-selection, trigger-selection,
normalization tools
Simplify, accelerate analysis development
First 1fb-1 analyses by Moriond

See Qizhongs talk
8
Data Handling - SAM

SAM continues to perform well, providing a
data-grid
50 SAM sites worldwide
Over 2.5 PB (50B events)
consumed in the last year
Up to 300 TB moved per month
Larger SAM cache solved tape
access issues
Continued success of SAM shifters
Often remote collaborators
Form 1st line of defense
SAMTV monitors SAM SAM stations

http//d0db-prd.fnal.gov/sm_local/SamAtAGlance/
9
Remote Production Activities / SAM-Grid

Monte Carlo
Over 75M events produced in last year, at more
than 10 sites
More than double last years production
Vast majority on shared sites (often national
Tier 1 sites - primarily LCG)
SAM-Grid introduced in spring 04, becoming the
default
Consolidation of SAM-Grid / LCG co-existence
Over 17M events produced directly on LCG via
submission from Nikhef
Data reprocessing
After significant improvements to reconstruction,
reprocess old data
P14 winter 03/04 from DST - 500M events, 100M
off-site
P17 now from raw 1B events SAM-Grid
default basically all off-site
Massive task largest HEP activity on the grid
3200 1GHz PIIIs for 6 months
Led to significant improvements to SAM-Grid
Collaborative effort

10
Reprocessing / SAM-Grid - I
More than 10 DØ execution sites http//samgrid.fna
l.gov8080/
SAM data handling JIM job submission
monitoring SAM JIM ? SAM-Grid
http//samgrid.fnal.gov8080/list_of_schedulers.ph
p http//samgrid.fnal.gov8080/list_of_resources.p
hp
11
Reprocessing / SAM-Grid - II

SAM-Grid enables a common environment
operation scripts as well as effective
book-keeping
JIMs XML-DB used for monitoring / bug tracing
SAM avoids data duplication defines recovery
jobs
Monitor speed and efficiency by site or overall
(http//samgrid.fnal.gov8080/cgi-bin/plot_efficie
ncy.cgi)
Started end march
Comment
Tough deploying a product, under evolution to new
sites (we are a running experiment)

12
SAM-Grid Interoperability

Need access to greater resources as data sets
grow
Ongoing programme on LCG and OSG interoperability
Step 1 (co-existence) use shared resources with
SAM-Grid head-node
Widely done for both Reprocessing and MC
OSG co-existence shown for data reprocessing
Step 2 SAMGrid-LCG interface
SAM does data handling JIM job submission
Basically forwarding mechanism
Prototype established at IN2P3/Wuppertal
Extending to production level
OSG activity increasing build on LCG experience
Limited manpower

13
Looking Forward Budget Request

Long planned increase to 100Hz for Run IIb
Experiment performing well
Run II average data taking eff 84, now pushing
90
Making efficient use of data and resources
Many analyses published (have a complete analysis
with 0.6fb-1 data)
Core physics program saturates 50Hz rate at 1 x
1032
Maintaining 50Hz at 2 x 1032 ? an effective loss
of 1-2fb-1
http//d0server1.fnal.gov/projects/Computing/Revie
ws/Sept2005/Index.html
Increase requires 1.5M in FY06/07, and lt1M
after
Details to come in Ambers talk

14
Looking Forward Manpower - I

From directorate Taskforce report on manpower
issues.
Some vulnerability through limited number of
suitably qualified experts in either
collaboration or CD
Databases serve most of our needs but concern
wrt trigger and luminosity data bases
Online system key dependence on a single
individual
Offline code management, build and distribution
systems
Additional areas where central consolidation of
hardware support ? reduced overall manpower needs
e.g. Level-3 trigger hardware support
Under discussion with CD

15
Looking Forward Manpower - II

Increased data sets require increased resources
Route to these is increased use of grid and
common tools
Have an ongoing joint program, but work to do..
Need effort to
Continue development of SAM-Grid
Automated production job submission by shifters
User analyses
Deployment team
Bring in new sites in manpower efficient manner
Full interoperability
Ability to access efficiently all shared
resources
Additional resources for above recommended by
Taskforce
Support recommendation that some of additional
manpower come via more guest scientists, postdocs
and associate scientists

16
Conclusions

Computing model continues to be successful
Significant advances this year
Reco speed-up, Common Analysis Format
Extension of grid capabilities
P17 reprocessing with SAM-Grid
Interoperability
Want to maintain / build on this progress
Potential issues/ challenges being addressed
Short term - Ongoing action on immediate
vulnerabilities
Longer term larger data sets
Continued development of common tools, increased
use of the grid
Continued development of above in collaboration
with others
Manpower injection required to achieve reduced
effort in steady state, with increased
functionality see Taskforce talk
Globally doing well