Title: Status of D Computing at UTA
1Status of DØ Computing at UTA
DoE Site Visit Nov. 11, 2002 Jae Yu University of
Texas at Arlington
- IntroductionÂ
- The DØ Remote Computing Effort
- DØRACE
- DØRAC
- DØGrid Software Development Effort
- Conclusions
2Introduction
- Sharing large data set to large collaboration
throughout the world require immediacy in
implementation of Remote Analysis System - Remote Computing Capability at DØ was a primary
concern at UTAs DØ Computing Grid effort in
the past year - Leverage on experience as the ONLY active US DØ
MC farm - Home grown MC farm control software that handles
job scheduling and CPU resource management - Farm data delivery system switched to network
based - Mark will cover the details of farm operation
during the last year - UTA was the primary leader on DØRACE effort
- Code distribution setup system
- The UTA DØ Grid team played a leadership role in
DØ Grid software development effort - Need to play significant role in DØ Grid
development and prepare for ATLAS
3What do we want to do with the data?
- Want to analyze data no matter where we are!!!
Location and time independent analysis
4650 Collaborators 78 Institutions 18 Countries
5What do we need?
- Remote DØ software development environment
- Allow remote participation for code development
which might soon be a bottleneck in expediting
physics results - Allow remote analysis for histogram production
- Allow remote reconstruction or production
environment - Optimized resource management tools
- Allow to maximally utilize local resources
- Allow to tap into available computing resources
in other locations - Allow participation of remote resources for
global reconstruction or production - Efficient and transparent data delivery and
sharing - Allow location independent access to data
- Allow quicker access to sufficient statistics for
initial analyses - Allow data sharing throughout the entire network
of collaboration - Minimize central data storage dependence
- Alleviate load for central data storage and
servers
6Existing Grid Tools
- Condor (http//www.condor.org) Resource finding
batch control system - Globus (http//www.globus.org)middleware
- GSI Grid security infrastructure
- MDS Meta Directory Service for system monitor
- GRAM Grid Resource Allocation Manager
- GridFTP
- Condor-G High level interface to Grid
- DAGMAN Job chain manager
- Hawkeye Status monitoring tools
- And they sprouting like bamboo after rain
everyday - These are not fully completed on DØ Time scale
7DØRACE Why Do We Need It?
- DØ Remote Analysis Coordination Efforts (to be
replaced by an organization called, DØ Offsite
Analysis Task Force) - In existence to accomplish
- Setting up and maintaining remote analysis
environment - Promote institutional contribution remotely
- Allow remote institutions to participate in data
analysis - To prepare for the future of data analysis
- More efficient and faster delivery of multi-PB
data - More efficient sharing processing resources
- Prepare for possible massive re-processing and MC
production to expedite the process - Expedite physics result production
8DØRACE Contd
- Maintain self-sustained support amongst the
remote institutions to construct a broader bases
of knowledge - Alleviate the load on expert by sharing the
knowledge and allow them to concentrate on
preparing for the future - Improve communication between the central
experiment site and the remote institutions - Minimize travel around the globe for data access
- Sociological issues of HEP people at the home
institutions and within the field. - Prepare necessary software for extremely
distributed computing environment - Primary goal is allow individual desktop users to
make significant contribution without being at
the lab
9Software Distribution(DØRACE Setup)
10DØRACE Setup Strategy
- Categorized remote analysis system set up by the
functionality - Desk top only
- A modest analysis server
- Linux installation
- UPS/UPD Installation and deployment
- External package installation via UPS/UPD
- CERNLIB
- Kai-lib
- Root
- Download and Install a DØ release
- Tar-ball for ease of initial set up?
- Use of existing utilities for latest release
download - Installation of cvs
- Code development
- KAI C compiler
- SAM station setup
Phase IV Data Delivery
11Progressive
12DØRACE Deployment Map (US and EU only)
Processing Center
Analysis Site w/ SAM
Analysis Site w/o SAM
No DØRACE
13DØRAM Hardware Infrastructure(DØRAC)
14DØ Remote Analysis Model (DØRAM)
15What is a DØRAC?
- An institute with large concentrated and
available computing resources - Many 100s of CPUs
- Many 10s of TBs of disk cache
- Many 100Mbytes of network bandwidth
- Possibly equipped with HPSS
- An institute willing to provide certain services
to a few small institutes in the region - An institute willing to provide increased
infrastructure as the data from the experiment
grows - An institute willing to provide support personnel
if necessary - Complementary to the central facility
http//www-hep.uta.edu/d0race/d0rac-wg/d0rac-fina
l.pdf
16What services does DØRAC provide?
- Service to IACs
- Accept and execute analysis batch job requests
- Provide cache and storage space
- Store and provide access to desired data sets
- Provide database access
- Provide intermediary code distribution
- Services to Collaboration
- Generate and reconstruct MC data set
- Participate in re-reconstruction of data
- I think we will be needing abinitio
reconstruction as well - Provide manpower support for the above activities
17Regional Analysis Center Requirements
- Located in geographically and infrastructure
sensible place - Sufficiently large bandwidth to FNAL, other RACs,
and IACs - Large storage space (robotic and/or) disk to
store - 100 TMB (greatly reduced data set) in each RAC
- 100 DST in the sum of all RACs, distributed
randomly ? Most complementary to the CAC - Store MC data set
- Sufficiently large compute resources
- Support for the infrastructure and maintenance
18Sears Model of Categorization
- Best RACs
- Gbit or better network bandwidth
- Robotic tape storage 170TB
- Disk storage space 110TB
- Compute resources 50 cpu/year/RAC
- Provide database proxy service
- Cost 1M/RunIIa
- Good RACs
- Gbit or better network bandwidth
- Disk storage 60TB
- Compute resources50cpu/year/RAC
- Provide database proxy service
- Cost 300k/Run IIa
- Better 300k1M/Run IIa
19Other Issues
- Obtaining personnel support commitment
- Serious MOU structure to spell out commitment
- Sharing resources with other experiments and
discipline - Emergency resource loans
- Technical conflicts, such as difference in OS
- Need a world-wide management structure
- How do we resolve and allocate resources?
- How is the priority within the experiment between
physics groups determined? - How do we address issues that affect other
experiment and discipline?
20DØRAC Implementation Status
- DØ Offsite Analysis Task Force formed ? Central
coordination body to establish the RAC - Karlsruhe in Germany has been selected as the
first RAC - Full Thumbnail data files have been and are
constantly transferred - Data verification procedure in progress ?Being
improved - Cluster associated IACs (German institutions and
Imperial college, UK) and Implement services to
IACs - Monitor activities
- Data transfer from FNAL to Karlsruhe
- Data access from IACs to RACs
- IN2P3, Lyon, France, approved by the French
management to become another RAC - UTA has submitted a HEP-CSE joint proposal to MSF
and has been awarded 950k to establish a
Distributed Computing Center (MRI-DCC)? First US
RAC - BNL and UCR preparing to apply for funds to
establish RACs
21Special MC Farm Tasks
- Unique at UTA Farm
- Tracking algorithm task force
- 100k MC events under 7 different algorithms
- Performance testing efficiency, cpu time, etc
- Essential task (or use case) for a RAC
- Executables implemented at UTA HEP Farm via a
Tarball - Data retrieved from the central SAM system
- Automation process in progress
- Re-reconstruct multiple times with various
algorithms - Delivered back into the SAM system automatically
- Soon to be asked to carry out a second wave of
tracking reconstruction - Calorimeter Calibration Task Force MC Generation
- Multiple physics samples with various level of
underlying and noise added - Still in progress
- RAW data reprocessing needs access to various
databases
22DØGrid Network Map
23MRI-DCC
- The MRI award allows for a construction of a
distributed computing center (MRI-DCC) which acts
as a DØRAC - The maximum capacity of MRI-DCC can be is
- Close to 300 CPUs
- Over 50TB of disk space if not 100TB
- Fast network access to storage resources
- Multi-disciplinary research activity ? Promotes
improvements in software and algorithm
development - Immediate application to ATLAS
24DØRAM Software Infrastructure(SAM-Grid)
- DØ already has data delivery part of the Grid
system (SAM) - Project started in 2001 as part of the PPDG
collaboration to handle DØs expanded needs. - Recently formed joint group between FNAL-DØ-CDF
SAM-Grid group Project - Current SAM-Grid team includes
- Andrew Baranovski, Gabriele Garzoglio, Lee
Lueking, Dane Skow, Igor Terekhov, Rod Walker
(Imperial College), Jae Yu (UTA), Drew Meyer
(UTA), Tomasz Wlodek in Collaboration with U.
Wisconsin Condor team. - http//www-d0.fnal.gov/computing/grid
- UTA Proposed InGrid, a high level interface to
Grid (See Drews talk) - We have been getting flux of CSE Students to work
on InGrid development
25UTA DØ Grid Accomplishments
1. Submit DØ MC jobs via Condor-G DAGMAN 2.
Collect MC generation status from each farm
through MDS (McView) 3. Improve McFarm to Grid
enabled version (GEM) to fully utilize Grid
tools Tested using Oklahoma Univ. ? LTU and Tata
farm very close to be ready
26UTA-FNAL CSE Masters Student Exchange Program
- In order to establish usable Grid software in the
DØ time scale, the project needs highly skilled
software developers - FNAL cannot afford computer professionals
- UTA - CSE department has 450 MS students ? Many
are highly trained but back at school due to
economy - Students can participate in frontier Grid
computing topics in real-life situation - Students Masters thesis become a well
documented record of the work which lacks in many
HEP computing projects - The first generation students have completed
their tasks and are defending their thesis this
semester - Abhishek Rana and Siddarth Patel
- Arrangements for next generations of students
(6mo at a time) completed ? Looking for good
candidate students
27The UTA-DØGrid Team
- Faculty Jae Yu, Kaushik De, David Levine (CSE)
- Senior Research Associate Mark Sosebee
- DØRACE setup management, Remote SAM Initiative,
MC Farm System Management - Research Associate Tomasz Wlodek
- Grid Implementation of McFarm job submission and
bookkeeper - Software Program Consultant Drew Meyer
- Design of Grid interface (InGrid), Development of
McFarm software, Implementation of RAC software
development and implementation - CSE Masters Degree Students
- Anand Balasubramanian InGrid Prototype
Development - Vivek Desai InGrid C Implementation
- kumaran sambandan Network Analysis
- EE Volunteer Student Prashant Bhamidipati
- MC Farm operation
- CSE Undergraduate Student Karthik Gopalratnam
- Phasing out of MC Farm operation and into McFarm
Gridification
28Conclusions
- DØ detector is taking data at the rate of
2-3million events a day - Data taking progress pretty well (over 80
efficiency) - Awaiting for accelerator to deliver promised
luminosity - Now software and analysis are sitting on hot
seats - Exploiting available intelligence and resources
in an extremely distributed environments points
to practical implementation of grid - Three large steps in practical implementation of
Grid - Software distribution and setup
- Distributed Resource sharing
- Transparent data accessing
- DØ is a perfect place to test Grid in action
- Need to expedite analyses in timely fashion
- Need to distribute data set throughout the
collaboration - Some successes
- UTA McFarm deployment and job submission exercise
- FNAL JIM Monitoring package
29- DØ Remote Analysis Effort and UTA-DØGrid Team has
accomplished tremendously, including - 5 presentations at TSAPS in Oct.
- Will take multi-prong approach to accomplish the
ultimate goal in timely manner - Expeditious implementation of DØ RAC at UTA
- Develop RAC support software via DØ Grid effort
- Use the system for data analyses
- The university has been supporting UTA-DØGrid
effort tremendously - UTA DØ Grid team is carrying the Grid Torch
together with Fermilab team
30In Retrospect
- UTA has been the largest offsite farm in Run I
and still and will be playing crucial roll in DØ
MC production - The leadership is being taken away by the
European institutions - The recent appointment places UTA in a good
position to play a leading roll in DØ remote
analysis and Grid development effort - European dominated effort
- UTA must be a flagship institution in the US
effort - Must work together with DØ computing and PPDG
groups - UTA Plans to lead DØ Offsite Analysis System
establishment effort ? will play a leading role
in the effort for DØ GRID development - Plans to establish UTA as a DØ Regional Center
- Improved network bandwidths
- Establishment of the scalable storage system
- Resource management software
- Storage management software
- Prepare UTA to become an ATLAS Tier 2 site