Running Analysis on the GRID using AliEn, ROOT - PowerPoint PPT Presentation

About This Presentation
Title:

Running Analysis on the GRID using AliEn, ROOT

Description:

barbera/ 'If you're a programmer, one of the great things about Linux and Unix is that ... From devices to sockets, the 'everything is a file' paradigm has served Unix ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 41
Provided by: pred84
Learn more at: https://uscms.org
Category:

less

Transcript and Presenter's Notes

Title: Running Analysis on the GRID using AliEn, ROOT


1
Running Analysis on the GRID using AliEn, ROOT
PROOFP.Buncic, F. Rademakers
2
Timeline
Functionality Simulation
Interoperability Reconstruction
Performance, Scalability, Standards Analysis
3
AliEn Architecture
AliEn Core Components services
Interfaces
External software
Database Proxy
ADBI
User Application
File Metadata Catalogue
API (C/C/perl)
LDAP
Authentication
RB
FS
External Libraries
User Interface
Perl Core
Perl Modules
CE
Config Mgr
CLI
SOAP/XML
V.O. Packages Commands
SE
GUI
Package Mgr
Web Portal
()
Logger
Low level
High level
4
AliEn Components
AliEn Web of Collaborating Services
Modules libraries
5
Command Interface
6
GUI AliEn Xfiles
7
Portal
  • http//alien.cern.ch
  • Generic Web portal
  • User can
  • interact with alien
  • submit jobs
  • check jobs status
  • Administrator can
  • configure system
  • monitor status
  • check syslog
  • update distribution

8
File catalogue
If you're a programmer, one of the great things
about Linux and Unix is that everything is a file
-- or at least acts like one. From devices to
sockets, the "everything is a file" paradigm has
served Unix well for a long, long time.
9
Resource Broker
Pull instead of traditional Push architecture
Authen
Broker
TransferBroker
TransferOptimiser
Logger
IS
10
Job Execution and Scheduling Pull model Condor
JDL Components Computing Element, Task Queue,
Process Monitor, Cluster Monitor, Broker,
Manager, Optimizer
11
Production Status
PPR production (2002-03) http//alien.cern.ch/Ali
en/main?taskproduction
12
Near term plans
  • Grid federations
  • AliEn ? EDG
  • AliEn ? AliEn (just installed in Moscow, to be
    installed soon in OSC, Ohio)
  • AliEn ? LCG-1 (collaboration with India)
  • OGSA
  • Currently working on AliEn Web Service API
  • Next step AliEn Web Service ? OGSA Grid Service
  • P2P
  • Exploring possibility to use P2P technology
    (Jabber) as alternative SOAP transport
    resource discovery mechanisam
  • Performance
  • Ongoing work on internal DB API
  • Investigating semantic query caching on client
    side

13
Near term plans(..)
  • Monitoring
  • Deploying MonaLisa (collaboration with CMS)
  • 3D visualization and Grid control from mobile
    devices (collaboration with Ericsson)
  • Deployment
  • Virtual server
  • Remote Software Management (collaboration with
    Ericsson)
  • Analysis
  • SuperPROOF (collaboration with ROOT team and with
    help from HP)
  • SC2003
  • Simulation
  • Simulation of Web services based distributed
    computing environment (collaboration with UH and
    CMS/Monarc/MonaLisa)

14
Near term goal
  • SC2003
  • Demo
  • ALICE Physics Data Challenge
  • 1Q2004
  • 10 of resources
  • 300TB of data to be generated in 3 months using
  • LCG-1 resources
  • All other resources we can possibly get hold of
  • Our users (Alice physicists) expect us to provide
    them with seamless and transparent access to
    entire dataset (or large chunks if it) directly
    from the ROOT prompt

15
Two Analysis Scenarios
  • Asynchronous
  • Interactive batch
  • Job splitting batch processing (transparent to
    end user)
  • Can be done using existing tools
  • AliEn ROOT
  • Scheduled file transfers
  • True Interactive
  • Instantaneous analysis results
  • Needs
  • New functionality (AliEn PROOF)
  • High system availability
  • Reliable and fast file transport mechanism
    including transparent file caching

16
  • Bits Pieces

17
AliEn Catalogue
alien/
alice/
atlas/
soap//
mirror ltAgt
mirror ltAgt
prod/
data/
mirror ltBgt
mc/
root//
a/
b/
castor//
mirror ltBgt
original file
file01.root
original file
  • We already have a File Catalogue based on
    federated (MySQL) databases
  • We need reliable file transport mechanism to
    support interactive work in distributed
    environment

18
File Access
open(alien/alice/data/file01.root)
-where is the file?
PFN
alien/
mirror ltAgt
alice/
atlas/
soap//
PFN
prod/
data/
mirror ltBgt
mc/
root//
PFN
a/
b/
castor//
root//
original file
file01.root
19
Grid File Transfer Layer
Local Transfer Layer
User WS
MSS
SE2
MSS
Global Transfer Layer
SE1
MSS
Local Transfer Layer
SE3
  • Local File Access
  • on site access from SE to Mass Storage System
  • many solutions existing (rootd/rfio/posix/dcache
    api etc...)
  • Global File Access
  • access/transfer between SE's and user
    workstations
  • gridFTP, bbFTP, xrootd .... ?!?!?

20
Scheduled File transfers Follows the model of job
scheduling and execution Components Storage
Element Transfer Daemon, Transfer Queue, Broker,
Manager, Optimizer
21
Analysis Requirements
  • Certificate based authentication with ACLs as in
    File Catalogue
  • High/low-speed transfers (crypted/not crypted)
  • External/dynamic regulation of transfer speed per
    user/connection ('Network Weather Service')
  • Efficient and reliable enough to handle chaotic
    load from analysis
  • New AliEn I/O Service
  • Transfer re-routing through caches ('NWS')
  • distributed caches (gt1 entry point)
  • distributed I/O servers (gt1 entry point
    redirection)
  • only 2 operations allowed
  • read according to catalogue perm.
  • write once/according to catalogue permissions gt
    libAliEn
  • no directory manipulation/creation/deletion
  • done by SE service

22
Crosslink-Cache
Client A
Main Cache Cluster
Regional Cache
Cache Levels main regional local
23
Cache-And-Forward Server
Client A
host3port9999
host2port9999
host1port9999
API
I/O d
API
Off-site Cache
On-site Cache
Client B
Local Disk
Supports load balancing, multithreading, I/O
bufering
24
AliEn I/O Server Secure ACLs as set in the File
Catalogue Interfaces to local MSS via AliEn
MSI Uses Network Weather Service and Cache
discovery
25
The Global Grid File System
Transfer Layer
Data Catalogue
Storage Elements
DB
AliEn can make all storage resources distributed
worldwide appear as a single (albeit big) hard
disk
26
Case A
  • Interactive Batch Analysis

27
AliEnFS
  • AliEnFS is written as a module for LUFS, Linux
    Userland File System (http//lufs.sourceforge.net/
    )
  • Kernel module delegates VFS calls to various FS
    daemons, which run in user space allowing easy
    use of existing cryptographic libraries

28
LUFS GridFS Extension
User Space
Offered as a free gift to LCG Project
29
AliEn ROOT (A)
TGrid
TAlienTGrid
Authentication Catalogue Browsing
TAlienFileTFile
ROOT File access via AliEn
TAlienAnalysis
Parallel Grid Analysis Object
TAlienJob
AliEn Job (belonging to TAlienAnalysis Object)
TAlienJobIO
Managing File I/O for a specific Ana. Job
The Analysis Object
TAlienAnalysis
- each Analysis Object is stored with unique
names in the user directory - can be
reinstantiated anytime from a ROOT session
30
AliEn ROOT (A)

?
provides
Analysis Macro
Input Files
Query for Input Data
new TAliEnAnalysis Object
USER

List of Input Data Locations
produces
Job Splitting
IO Object 1 for Site BI
IO Object 1 for Site C
IO Object 1 for Site A
IO Object 2 for Site A
Job Submission




Job Object 1 for Site B
Job Object 1 for Site A
Job Object 2 for Site A
Job Object 1 for Site C
Execution
Histogram Merging Tree Chaining
Results
31
C equivalent (A)
// connect authenticate to the GRID Service
alien as user TGrid alien
TGridConnect("alien",user,"","") // create
a new analysis Object ( ltunique IDgt, lttitlegt,
subjobs) TAlienAnalysis analysis new
TAlienAnalysis(pass001",MyAnalysis",10) //
set the program, which executes the Analysis
Macro/Script analysis-gtExec("AliRoot.sh,"file/h
ome/peters/test.C") // script to
execute analysis-gtQuery("2002-10/V3.08.Rev.04/001
10/galice.root?ptgt0.2") analysis-gtOutputFileAut
oMerge(true) // merge all produced .root
files analysis-gtSplit() // split the task in
subjobs analysis-gtRun() // submit all subjobs
to the AliEn queue analysis-gtGetResults() //
download partial/final results and merge
them analysis-gtInfo() // display job
information
32
Interactive batch
AliEn
ROOT
33
AliEn ROOT Uses AliEn API to split jobs and
send them for execution Interactive batch Merges
output files (histograms, trees)
34
Case B
True Interactive Analysis
35
True Interactive Analysis (B)
Super PROOF
PROOF Classic
36
AliEn PROOF (B)
  • AliEn
  • data splitting
  • data access/replication
  • access control
  • SuperPROOF
  • Uses AliEn API C API to carry out job
    decomposition and collects output form slave
    PROOF servers
  • PROOF
  • Just like usual PROOF running on a local site
  • process control
  • static modelthe population of PROOF daemons is
    maintained on dedicated sites/nodes
  • dynamic modelPROOF daemons are started on
    demand by AliEn using dedicated queues
  • mixed modelminimum population for fast response
    and dynamical start-up of PROOF daemons

37
Pre-started analysis services
MetadataCatalog Service
ReplicaCatalog Service
ROOT
SuperPROOF
Match-makingService
InformationService
PROOF
Worker nodes
38
Analysis Requirements
  • Very preliminary personal estimate of satisfied
    requirements
  • Reflects the current status (no future plans for
    development)
  • Best worst case scenarios
  • ltOverallgt
  • All requirements taken together (equal weight)
  • ltServicegt
  • Average of all services

39
Torres Excel chart
  • In principle, the architecture is similar
  • Deviant workflow in case of 2 scenarios
  • Difference
  • AliEn uses pull architecture
  • ROOT as analysis prompt
  • AliEn is self contained and provides all required
    components services
  • But, they have unusual and different names

40
Conclusions
  • In its present state, AliEn ROOT can satisfy
    75-85 of user requirements
  • Adding SuperPROOF into the picture should further
    improve the situation
  • Possibility to do fast tests on subset of a large
    dataset in true interactive fashion
  • We are open for suggestion and cooperation on all
    fronts
Write a Comment
User Comments (0)
About PowerShow.com