Status of D Computing at UTA - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Status of D Computing at UTA

Description:

MDS: Meta Directory Service for system monitor. GRAM: Grid Resource Allocation Manager ... Need to expedite analyses in timely fashion ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 31
Provided by: jae51
Category:
Tags: uta | computing | status

less

Transcript and Presenter's Notes

Title: Status of D Computing at UTA


1
Status of DØ Computing at UTA
DoE Site Visit Nov. 11, 2002 Jae Yu University of
Texas at Arlington
  • Introduction 
  • The DØ Remote Computing Effort
  • DØRACE
  • DØRAC
  • DØGrid Software Development Effort
  • Conclusions

2
Introduction
  • Sharing large data set to large collaboration
    throughout the world require immediacy in
    implementation of Remote Analysis System
  • Remote Computing Capability at DØ was a primary
    concern at UTAs DØ Computing Grid effort in
    the past year
  • Leverage on experience as the ONLY active US DØ
    MC farm
  • Home grown MC farm control software that handles
    job scheduling and CPU resource management
  • Farm data delivery system switched to network
    based
  • Mark will cover the details of farm operation
    during the last year
  • UTA was the primary leader on DØRACE effort
  • Code distribution setup system
  • The UTA DØ Grid team played a leadership role in
    DØ Grid software development effort
  • Need to play significant role in DØ Grid
    development and prepare for ATLAS

3
What do we want to do with the data?
  • Want to analyze data no matter where we are!!!

Location and time independent analysis
4
650 Collaborators 78 Institutions 18 Countries
5
What do we need?
  • Remote DØ software development environment
  • Allow remote participation for code development
    which might soon be a bottleneck in expediting
    physics results
  • Allow remote analysis for histogram production
  • Allow remote reconstruction or production
    environment
  • Optimized resource management tools
  • Allow to maximally utilize local resources
  • Allow to tap into available computing resources
    in other locations
  • Allow participation of remote resources for
    global reconstruction or production
  • Efficient and transparent data delivery and
    sharing
  • Allow location independent access to data
  • Allow quicker access to sufficient statistics for
    initial analyses
  • Allow data sharing throughout the entire network
    of collaboration
  • Minimize central data storage dependence
  • Alleviate load for central data storage and
    servers

6
Existing Grid Tools
  • Condor (http//www.condor.org) Resource finding
    batch control system
  • Globus (http//www.globus.org)middleware
  • GSI Grid security infrastructure
  • MDS Meta Directory Service for system monitor
  • GRAM Grid Resource Allocation Manager
  • GridFTP
  • Condor-G High level interface to Grid
  • DAGMAN Job chain manager
  • Hawkeye Status monitoring tools
  • And they sprouting like bamboo after rain
    everyday
  • These are not fully completed on DØ Time scale

7
DØRACE Why Do We Need It?
  • DØ Remote Analysis Coordination Efforts (to be
    replaced by an organization called, DØ Offsite
    Analysis Task Force)
  • In existence to accomplish
  • Setting up and maintaining remote analysis
    environment
  • Promote institutional contribution remotely
  • Allow remote institutions to participate in data
    analysis
  • To prepare for the future of data analysis
  • More efficient and faster delivery of multi-PB
    data
  • More efficient sharing processing resources
  • Prepare for possible massive re-processing and MC
    production to expedite the process
  • Expedite physics result production

8
DØRACE Contd
  • Maintain self-sustained support amongst the
    remote institutions to construct a broader bases
    of knowledge
  • Alleviate the load on expert by sharing the
    knowledge and allow them to concentrate on
    preparing for the future
  • Improve communication between the central
    experiment site and the remote institutions
  • Minimize travel around the globe for data access
  • Sociological issues of HEP people at the home
    institutions and within the field.
  • Prepare necessary software for extremely
    distributed computing environment
  • Primary goal is allow individual desktop users to
    make significant contribution without being at
    the lab

9
Software Distribution(DØRACE Setup)
10
DØRACE Setup Strategy
  • Categorized remote analysis system set up by the
    functionality
  • Desk top only
  • A modest analysis server
  • Linux installation
  • UPS/UPD Installation and deployment
  • External package installation via UPS/UPD
  • CERNLIB
  • Kai-lib
  • Root
  • Download and Install a DØ release
  • Tar-ball for ease of initial set up?
  • Use of existing utilities for latest release
    download
  • Installation of cvs
  • Code development
  • KAI C compiler
  • SAM station setup

Phase IV Data Delivery
11
Progressive
12
DØRACE Deployment Map (US and EU only)
Processing Center
Analysis Site w/ SAM
Analysis Site w/o SAM
No DØRACE
13
DØRAM Hardware Infrastructure(DØRAC)
14
DØ Remote Analysis Model (DØRAM)
15
What is a DØRAC?
  • An institute with large concentrated and
    available computing resources
  • Many 100s of CPUs
  • Many 10s of TBs of disk cache
  • Many 100Mbytes of network bandwidth
  • Possibly equipped with HPSS
  • An institute willing to provide certain services
    to a few small institutes in the region
  • An institute willing to provide increased
    infrastructure as the data from the experiment
    grows
  • An institute willing to provide support personnel
    if necessary
  • Complementary to the central facility

http//www-hep.uta.edu/d0race/d0rac-wg/d0rac-fina
l.pdf
16
What services does DØRAC provide?
  • Service to IACs
  • Accept and execute analysis batch job requests
  • Provide cache and storage space
  • Store and provide access to desired data sets
  • Provide database access
  • Provide intermediary code distribution
  • Services to Collaboration
  • Generate and reconstruct MC data set
  • Participate in re-reconstruction of data
  • I think we will be needing abinitio
    reconstruction as well
  • Provide manpower support for the above activities

17
Regional Analysis Center Requirements
  • Located in geographically and infrastructure
    sensible place
  • Sufficiently large bandwidth to FNAL, other RACs,
    and IACs
  • Large storage space (robotic and/or) disk to
    store
  • 100 TMB (greatly reduced data set) in each RAC
  • 100 DST in the sum of all RACs, distributed
    randomly ? Most complementary to the CAC
  • Store MC data set
  • Sufficiently large compute resources
  • Support for the infrastructure and maintenance

18
Sears Model of Categorization
  • Best RACs
  • Gbit or better network bandwidth
  • Robotic tape storage 170TB
  • Disk storage space 110TB
  • Compute resources 50 cpu/year/RAC
  • Provide database proxy service
  • Cost 1M/RunIIa
  • Good RACs
  • Gbit or better network bandwidth
  • Disk storage 60TB
  • Compute resources50cpu/year/RAC
  • Provide database proxy service
  • Cost 300k/Run IIa
  • Better 300k1M/Run IIa

19
Other Issues
  • Obtaining personnel support commitment
  • Serious MOU structure to spell out commitment
  • Sharing resources with other experiments and
    discipline
  • Emergency resource loans
  • Technical conflicts, such as difference in OS
  • Need a world-wide management structure
  • How do we resolve and allocate resources?
  • How is the priority within the experiment between
    physics groups determined?
  • How do we address issues that affect other
    experiment and discipline?

20
DØRAC Implementation Status
  • DØ Offsite Analysis Task Force formed ? Central
    coordination body to establish the RAC
  • Karlsruhe in Germany has been selected as the
    first RAC
  • Full Thumbnail data files have been and are
    constantly transferred
  • Data verification procedure in progress ?Being
    improved
  • Cluster associated IACs (German institutions and
    Imperial college, UK) and Implement services to
    IACs
  • Monitor activities
  • Data transfer from FNAL to Karlsruhe
  • Data access from IACs to RACs
  • IN2P3, Lyon, France, approved by the French
    management to become another RAC
  • UTA has submitted a HEP-CSE joint proposal to MSF
    and has been awarded 950k to establish a
    Distributed Computing Center (MRI-DCC)? First US
    RAC
  • BNL and UCR preparing to apply for funds to
    establish RACs

21
Special MC Farm Tasks
  • Unique at UTA Farm
  • Tracking algorithm task force
  • 100k MC events under 7 different algorithms
  • Performance testing efficiency, cpu time, etc
  • Essential task (or use case) for a RAC
  • Executables implemented at UTA HEP Farm via a
    Tarball
  • Data retrieved from the central SAM system
  • Automation process in progress
  • Re-reconstruct multiple times with various
    algorithms
  • Delivered back into the SAM system automatically
  • Soon to be asked to carry out a second wave of
    tracking reconstruction
  • Calorimeter Calibration Task Force MC Generation
  • Multiple physics samples with various level of
    underlying and noise added
  • Still in progress
  • RAW data reprocessing needs access to various
    databases

22
DØGrid Network Map
23
MRI-DCC
  • The MRI award allows for a construction of a
    distributed computing center (MRI-DCC) which acts
    as a DØRAC
  • The maximum capacity of MRI-DCC can be is
  • Close to 300 CPUs
  • Over 50TB of disk space if not 100TB
  • Fast network access to storage resources
  • Multi-disciplinary research activity ? Promotes
    improvements in software and algorithm
    development
  • Immediate application to ATLAS

24
DØRAM Software Infrastructure(SAM-Grid)
  • DØ already has data delivery part of the Grid
    system (SAM)
  • Project started in 2001 as part of the PPDG
    collaboration to handle DØs expanded needs.
  • Recently formed joint group between FNAL-DØ-CDF
    SAM-Grid group Project
  • Current SAM-Grid team includes
  • Andrew Baranovski, Gabriele Garzoglio, Lee
    Lueking, Dane Skow, Igor Terekhov, Rod Walker
    (Imperial College), Jae Yu (UTA), Drew Meyer
    (UTA), Tomasz Wlodek in Collaboration with U.
    Wisconsin Condor team.
  • http//www-d0.fnal.gov/computing/grid
  • UTA Proposed InGrid, a high level interface to
    Grid (See Drews talk)
  • We have been getting flux of CSE Students to work
    on InGrid development

25
UTA DØ Grid Accomplishments
1. Submit DØ MC jobs via Condor-G DAGMAN 2.
Collect MC generation status from each farm
through MDS (McView) 3. Improve McFarm to Grid
enabled version (GEM) to fully utilize Grid
tools Tested using Oklahoma Univ. ? LTU and Tata
farm very close to be ready
26
UTA-FNAL CSE Masters Student Exchange Program
  • In order to establish usable Grid software in the
    DØ time scale, the project needs highly skilled
    software developers
  • FNAL cannot afford computer professionals
  • UTA - CSE department has 450 MS students ? Many
    are highly trained but back at school due to
    economy
  • Students can participate in frontier Grid
    computing topics in real-life situation
  • Students Masters thesis become a well
    documented record of the work which lacks in many
    HEP computing projects
  • The first generation students have completed
    their tasks and are defending their thesis this
    semester
  • Abhishek Rana and Siddarth Patel
  • Arrangements for next generations of students
    (6mo at a time) completed ? Looking for good
    candidate students

27
The UTA-DØGrid Team
  • Faculty Jae Yu, Kaushik De, David Levine (CSE)
  • Senior Research Associate Mark Sosebee
  • DØRACE setup management, Remote SAM Initiative,
    MC Farm System Management
  • Research Associate Tomasz Wlodek
  • Grid Implementation of McFarm job submission and
    bookkeeper
  • Software Program Consultant Drew Meyer
  • Design of Grid interface (InGrid), Development of
    McFarm software, Implementation of RAC software
    development and implementation
  • CSE Masters Degree Students
  • Anand Balasubramanian InGrid Prototype
    Development
  • Vivek Desai InGrid C Implementation
  • kumaran sambandan Network Analysis
  • EE Volunteer Student Prashant Bhamidipati
  • MC Farm operation
  • CSE Undergraduate Student Karthik Gopalratnam
  • Phasing out of MC Farm operation and into McFarm
    Gridification

28
Conclusions
  • DØ detector is taking data at the rate of
    2-3million events a day
  • Data taking progress pretty well (over 80
    efficiency)
  • Awaiting for accelerator to deliver promised
    luminosity
  • Now software and analysis are sitting on hot
    seats
  • Exploiting available intelligence and resources
    in an extremely distributed environments points
    to practical implementation of grid
  • Three large steps in practical implementation of
    Grid
  • Software distribution and setup
  • Distributed Resource sharing
  • Transparent data accessing
  • DØ is a perfect place to test Grid in action
  • Need to expedite analyses in timely fashion
  • Need to distribute data set throughout the
    collaboration
  • Some successes
  • UTA McFarm deployment and job submission exercise
  • FNAL JIM Monitoring package

29
  • DØ Remote Analysis Effort and UTA-DØGrid Team has
    accomplished tremendously, including
  • 5 presentations at TSAPS in Oct.
  • Will take multi-prong approach to accomplish the
    ultimate goal in timely manner
  • Expeditious implementation of DØ RAC at UTA
  • Develop RAC support software via DØ Grid effort
  • Use the system for data analyses
  • The university has been supporting UTA-DØGrid
    effort tremendously
  • UTA DØ Grid team is carrying the Grid Torch
    together with Fermilab team

30
In Retrospect
  • UTA has been the largest offsite farm in Run I
    and still and will be playing crucial roll in DØ
    MC production
  • The leadership is being taken away by the
    European institutions
  • The recent appointment places UTA in a good
    position to play a leading roll in DØ remote
    analysis and Grid development effort
  • European dominated effort
  • UTA must be a flagship institution in the US
    effort
  • Must work together with DØ computing and PPDG
    groups
  • UTA Plans to lead DØ Offsite Analysis System
    establishment effort ? will play a leading role
    in the effort for DØ GRID development
  • Plans to establish UTA as a DØ Regional Center
  • Improved network bandwidths
  • Establishment of the scalable storage system
  • Resource management software
  • Storage management software
  • Prepare UTA to become an ATLAS Tier 2 site
Write a Comment
User Comments (0)
About PowerShow.com