Status of D Computing at UTA - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Status of D Computing at UTA

Description:

MDS: Meta Directory Service for system monitor. GRAM: Grid Resource Allocation Manager ... Need to expedite analyses in timely fashion ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 31

Provided by: jae51

Category:

more less

Transcript and Presenter's Notes

Title: Status of D Computing at UTA

1
Status of DØ Computing at UTA
DoE Site Visit Nov. 11, 2002 Jae Yu University of
Texas at Arlington

Introduction
The DØ Remote Computing Effort
DØRACE
DØRAC
DØGrid Software Development Effort
Conclusions

2
Introduction

Sharing large data set to large collaboration
throughout the world require immediacy in
implementation of Remote Analysis System
Remote Computing Capability at DØ was a primary
concern at UTAs DØ Computing Grid effort in
the past year
Leverage on experience as the ONLY active US DØ
MC farm
Home grown MC farm control software that handles
job scheduling and CPU resource management
Farm data delivery system switched to network
based
Mark will cover the details of farm operation
during the last year
UTA was the primary leader on DØRACE effort
Code distribution setup system
The UTA DØ Grid team played a leadership role in
DØ Grid software development effort
Need to play significant role in DØ Grid
development and prepare for ATLAS

3
What do we want to do with the data?

Want to analyze data no matter where we are!!!

Location and time independent analysis
4
650 Collaborators 78 Institutions 18 Countries
5
What do we need?

Remote DØ software development environment
Allow remote participation for code development
which might soon be a bottleneck in expediting
physics results
Allow remote analysis for histogram production
Allow remote reconstruction or production
environment
Optimized resource management tools
Allow to maximally utilize local resources
Allow to tap into available computing resources
in other locations
Allow participation of remote resources for
global reconstruction or production
Efficient and transparent data delivery and
sharing
Allow location independent access to data
Allow quicker access to sufficient statistics for
initial analyses
Allow data sharing throughout the entire network
of collaboration
Minimize central data storage dependence
Alleviate load for central data storage and
servers

6
Existing Grid Tools

Condor (http//www.condor.org) Resource finding
batch control system
Globus (http//www.globus.org)middleware
GSI Grid security infrastructure
MDS Meta Directory Service for system monitor
GRAM Grid Resource Allocation Manager
GridFTP
Condor-G High level interface to Grid
DAGMAN Job chain manager
Hawkeye Status monitoring tools
And they sprouting like bamboo after rain
everyday
These are not fully completed on DØ Time scale

7
DØRACE Why Do We Need It?

DØ Remote Analysis Coordination Efforts (to be
replaced by an organization called, DØ Offsite
Analysis Task Force)
In existence to accomplish
Setting up and maintaining remote analysis
environment
Promote institutional contribution remotely
Allow remote institutions to participate in data
analysis
To prepare for the future of data analysis
More efficient and faster delivery of multi-PB
data
More efficient sharing processing resources
Prepare for possible massive re-processing and MC
production to expedite the process
Expedite physics result production

8
DØRACE Contd

Maintain self-sustained support amongst the
remote institutions to construct a broader bases
of knowledge
Alleviate the load on expert by sharing the
knowledge and allow them to concentrate on
preparing for the future
Improve communication between the central
experiment site and the remote institutions
Minimize travel around the globe for data access
Sociological issues of HEP people at the home
institutions and within the field.
Prepare necessary software for extremely
distributed computing environment
Primary goal is allow individual desktop users to
make significant contribution without being at
the lab

9
Software Distribution(DØRACE Setup)
10
DØRACE Setup Strategy

Categorized remote analysis system set up by the
functionality
Desk top only
A modest analysis server
Linux installation
UPS/UPD Installation and deployment
External package installation via UPS/UPD
CERNLIB
Kai-lib
Root
Download and Install a DØ release
Tar-ball for ease of initial set up?
Use of existing utilities for latest release
download
Installation of cvs
Code development
KAI C compiler
SAM station setup

Phase IV Data Delivery
11
Progressive
12
DØRACE Deployment Map (US and EU only)
Processing Center
Analysis Site w/ SAM
Analysis Site w/o SAM
No DØRACE
13
DØRAM Hardware Infrastructure(DØRAC)
14
DØ Remote Analysis Model (DØRAM)
15
What is a DØRAC?

An institute with large concentrated and
available computing resources
Many 100s of CPUs
Many 10s of TBs of disk cache
Many 100Mbytes of network bandwidth
Possibly equipped with HPSS
An institute willing to provide certain services
to a few small institutes in the region
An institute willing to provide increased
infrastructure as the data from the experiment
grows
An institute willing to provide support personnel
if necessary
Complementary to the central facility

http//www-hep.uta.edu/d0race/d0rac-wg/d0rac-fina
l.pdf
16
What services does DØRAC provide?

Service to IACs
Accept and execute analysis batch job requests
Provide cache and storage space
Store and provide access to desired data sets
Provide database access
Provide intermediary code distribution
Services to Collaboration
Generate and reconstruct MC data set
Participate in re-reconstruction of data
I think we will be needing abinitio
reconstruction as well
Provide manpower support for the above activities

17
Regional Analysis Center Requirements

Located in geographically and infrastructure
sensible place
Sufficiently large bandwidth to FNAL, other RACs,
and IACs
Large storage space (robotic and/or) disk to
store
100 TMB (greatly reduced data set) in each RAC
100 DST in the sum of all RACs, distributed
randomly ? Most complementary to the CAC
Store MC data set
Sufficiently large compute resources
Support for the infrastructure and maintenance

18
Sears Model of Categorization

Best RACs
Gbit or better network bandwidth
Robotic tape storage 170TB
Disk storage space 110TB
Compute resources 50 cpu/year/RAC
Provide database proxy service
Cost 1M/RunIIa
Good RACs
Gbit or better network bandwidth
Disk storage 60TB
Compute resources50cpu/year/RAC
Provide database proxy service
Cost 300k/Run IIa
Better 300k1M/Run IIa

19
Other Issues

Obtaining personnel support commitment
Serious MOU structure to spell out commitment
Sharing resources with other experiments and
discipline
Emergency resource loans
Technical conflicts, such as difference in OS
Need a world-wide management structure
How do we resolve and allocate resources?
How is the priority within the experiment between
physics groups determined?
How do we address issues that affect other
experiment and discipline?

20
DØRAC Implementation Status

DØ Offsite Analysis Task Force formed ? Central
coordination body to establish the RAC
Karlsruhe in Germany has been selected as the
first RAC
Full Thumbnail data files have been and are
constantly transferred
Data verification procedure in progress ?Being
improved
Cluster associated IACs (German institutions and
Imperial college, UK) and Implement services to
IACs
Monitor activities
Data transfer from FNAL to Karlsruhe
Data access from IACs to RACs
IN2P3, Lyon, France, approved by the French
management to become another RAC
UTA has submitted a HEP-CSE joint proposal to MSF
and has been awarded 950k to establish a
Distributed Computing Center (MRI-DCC)? First US
RAC
BNL and UCR preparing to apply for funds to
establish RACs

21
Special MC Farm Tasks

Unique at UTA Farm
Tracking algorithm task force
100k MC events under 7 different algorithms
Performance testing efficiency, cpu time, etc
Essential task (or use case) for a RAC
Executables implemented at UTA HEP Farm via a
Tarball
Data retrieved from the central SAM system
Automation process in progress
Re-reconstruct multiple times with various
algorithms
Delivered back into the SAM system automatically
Soon to be asked to carry out a second wave of
tracking reconstruction
Calorimeter Calibration Task Force MC Generation
Multiple physics samples with various level of
underlying and noise added
Still in progress
RAW data reprocessing needs access to various
databases

22
DØGrid Network Map
23
MRI-DCC

The MRI award allows for a construction of a
distributed computing center (MRI-DCC) which acts
as a DØRAC
The maximum capacity of MRI-DCC can be is
Close to 300 CPUs
Over 50TB of disk space if not 100TB
Fast network access to storage resources
Multi-disciplinary research activity ? Promotes
improvements in software and algorithm
development
Immediate application to ATLAS

24
DØRAM Software Infrastructure(SAM-Grid)

DØ already has data delivery part of the Grid
system (SAM)
Project started in 2001 as part of the PPDG
collaboration to handle DØs expanded needs.
Recently formed joint group between FNAL-DØ-CDF
SAM-Grid group Project
Current SAM-Grid team includes
Andrew Baranovski, Gabriele Garzoglio, Lee
Lueking, Dane Skow, Igor Terekhov, Rod Walker
(Imperial College), Jae Yu (UTA), Drew Meyer
(UTA), Tomasz Wlodek in Collaboration with U.
Wisconsin Condor team.
http//www-d0.fnal.gov/computing/grid
UTA Proposed InGrid, a high level interface to
Grid (See Drews talk)
We have been getting flux of CSE Students to work
on InGrid development

25
UTA DØ Grid Accomplishments
1. Submit DØ MC jobs via Condor-G DAGMAN 2.
Collect MC generation status from each farm
through MDS (McView) 3. Improve McFarm to Grid
enabled version (GEM) to fully utilize Grid
tools Tested using Oklahoma Univ. ? LTU and Tata
farm very close to be ready
26
UTA-FNAL CSE Masters Student Exchange Program

In order to establish usable Grid software in the
DØ time scale, the project needs highly skilled
software developers
FNAL cannot afford computer professionals
UTA - CSE department has 450 MS students ? Many
are highly trained but back at school due to
economy
Students can participate in frontier Grid
computing topics in real-life situation
Students Masters thesis become a well
documented record of the work which lacks in many
HEP computing projects
The first generation students have completed
their tasks and are defending their thesis this
semester
Abhishek Rana and Siddarth Patel
Arrangements for next generations of students
(6mo at a time) completed ? Looking for good
candidate students

27
The UTA-DØGrid Team

Faculty Jae Yu, Kaushik De, David Levine (CSE)
Senior Research Associate Mark Sosebee
DØRACE setup management, Remote SAM Initiative,
MC Farm System Management
Research Associate Tomasz Wlodek
Grid Implementation of McFarm job submission and
bookkeeper
Software Program Consultant Drew Meyer
Design of Grid interface (InGrid), Development of
McFarm software, Implementation of RAC software
development and implementation
CSE Masters Degree Students
Anand Balasubramanian InGrid Prototype
Development
Vivek Desai InGrid C Implementation
kumaran sambandan Network Analysis
EE Volunteer Student Prashant Bhamidipati
MC Farm operation
CSE Undergraduate Student Karthik Gopalratnam
Phasing out of MC Farm operation and into McFarm
Gridification

28
Conclusions

DØ detector is taking data at the rate of
2-3million events a day
Data taking progress pretty well (over 80
efficiency)
Awaiting for accelerator to deliver promised
luminosity
Now software and analysis are sitting on hot
seats
Exploiting available intelligence and resources
in an extremely distributed environments points
to practical implementation of grid
Three large steps in practical implementation of
Grid
Software distribution and setup
Distributed Resource sharing
Transparent data accessing
DØ is a perfect place to test Grid in action
Need to expedite analyses in timely fashion
Need to distribute data set throughout the
collaboration
Some successes
UTA McFarm deployment and job submission exercise
FNAL JIM Monitoring package

DØ Remote Analysis Effort and UTA-DØGrid Team has
accomplished tremendously, including
5 presentations at TSAPS in Oct.
Will take multi-prong approach to accomplish the
ultimate goal in timely manner
Expeditious implementation of DØ RAC at UTA
Develop RAC support software via DØ Grid effort
Use the system for data analyses
The university has been supporting UTA-DØGrid
effort tremendously
UTA DØ Grid team is carrying the Grid Torch
together with Fermilab team

30
In Retrospect

UTA has been the largest offsite farm in Run I
and still and will be playing crucial roll in DØ
MC production
The leadership is being taken away by the
European institutions
The recent appointment places UTA in a good
position to play a leading roll in DØ remote
analysis and Grid development effort
European dominated effort
UTA must be a flagship institution in the US
effort
Must work together with DØ computing and PPDG
groups
UTA Plans to lead DØ Offsite Analysis System
establishment effort ? will play a leading role
in the effort for DØ GRID development
Plans to establish UTA as a DØ Regional Center
Improved network bandwidths
Establishment of the scalable storage system
Resource management software
Storage management software
Prepare UTA to become an ATLAS Tier 2 site