EPP Grid Activities - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

EPP Grid Activities

Description:

Code & Software (CVS, CMT, PacMan...) Data Information (Meta Data systems) 2nd Generation Grid ... Software (CVS, CMT, PacMan...) Data Information (Meta Data ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 22
Provided by: LyleW5
Category:
Tags: epp | activities | grid | man | pac

less

Transcript and Presenter's Notes

Title: EPP Grid Activities


1
EPP Grid Activities
  • AusEHEP Wollongong
  • Nov 2004

2
Grid Anatomy
  • What are the essential components
  • CPU Resources Middleware (software common
    interface)
  • Data Resources Middleware
  • replica catalogues unifying many data sources
  • Authentication Mechanism
  • Certificates (Globus GSI), Certificate
    Authorities
  • Virtual Organisation Information Services
  • Grid consists of VOs!? users resources
    participating in a VO
  • Who is a part of what research/effort/group
  • Authorisation for resource use
  • Job Scheduling, Dispatch, and Information
    Services
  • Collaborative Information Sharing Services
  • Documentation Discussion (web, wiki,)
  • Meetings Conferences (video conf., AccessGrid)
  • Code Software (CVS, CMT, PacMan)
  • Data Information (Meta Data systems)

2nd Generation GridGlobus
3rd Generation Grid
3
2nd Generation
  • Accessible resources for Belle/ATLAS
  • We have access to around 120 CPU (over 2 GHz)
  • APAC, AC3, VPAC, ARC
  • currently 50 Grid accessible
  • Continuing to encourage HPCfacilities to install
    middleware
  • We have access to ANUSFpetabyte storage facility
  • Will request 100 TB for Belledata.
  • SRB (Storage ResourceBroker)
  • Replica catalogue federatingKEK/Belle, ANUSF,
    Melbourne EPP data storage
  • Used to participate in Belles 4x109 event MC
    production during 2004

4
2nd Generation
  • SRB (Storage Resource Broker)
  • Globally accessible virtual file system
  • Domains of storage resources
  • eg. ANUSF domain contains the ANU petabyte
    storage facility and disk on Roberts in Melbourne
  • Federations of Domains
  • eg. ANUSF and KEK are federated
  • Scd /anusf/home/ljw563.anusf Sls l
    Sget datafile.mdst Scd /bcs20zone/home/srb.KEK-
    B

5
Grid Anatomy
  • What are the essential components
  • CPU Resources Middleware
  • Data Resources Middleware
  • replica catalogues unifying many data sources
  • Authentication Mechanism
  • Globus GSI, Certificate Authorities
  • Virtual Organisation Information Services
  • Grid consists of VOs!? users resources
    participating in a VO
  • Who is a part of what research/effort/group
  • Authorisation for resource use
  • Job Scheduling, Dispatch, and Information
    Services
  • Collaborative Information Sharing Services
  • Documentation Discussion (web, wiki,)
  • Meetings Conferences (AccessGrid)
  • Code Software (CVS, CMT, PacMan)
  • Data Information (Meta Data systems)

3rd Generation Grid
6
3rd Generation Solutions
  • NorduGrid -gt ARC (Advanced Resource Connector)
  • Nordic Countries plus others like Australia
  • Weve used this for ATLAS DC2
  • Globus 2.4 based middleware
  • Stable, patched, and redesigned collection of
    existing middleware (Globus, EDG)
  • Grid 3 Middleware -gt VDT
  • US based coordination between iVDGL, GriPhyN,
    PPDG
  • Globus 2.4 based middleware
  • LHC Computing Grid (LCG) lt- EDG -gt EGEE
  • Multiple Tiers CERN T0, Japan/Taiwan T1,
    Australia T2 ?
  • Regional Operations Centre in Taiwan
  • Substantial recent development needs to be
    looked at once again!

7
3rd Generation Solutions
  • Still a lot of development going on.
  • data aware job scheduling is still developing
  • VO systems are starting to emerge
  • meta-data infrastructure is basic
  • Deployment is still a difficult task.
  • prescribed system/OS only

8
Grid Anatomy
  • What are the essential components
  • CPU Resources Middleware
  • Data Resources Middleware
  • replica catalogues unifying many data sources
  • Authentication Mechanism
  • Globus GSI, Certificate Authorities
  • Virtual Organisation Information Services
  • Grid consists of VOs!? users resources
    participating in a VO
  • Who is a part of what research/effort/group
  • Authorisation for resource use
  • Job Scheduling, Dispatch, and Information
    Services
  • Collaborative Information Sharing Services
  • Documentation Discussion (web, wiki,)
  • Meetings Conferences (AccessGrid)
  • Code Software (CVS, CMT, PacMan)
  • Data Information (Meta Data systems)

VOs
9
Virtual Organisation Systems
  • Now there are 3 systems available
  • EDG/NorduGrid LDAP based VO
  • VOMS (VO Membership Service) from LCG
  • CAS (Community Authorisation Service) from Globus
  • In 2003 we modified NorduGrid VO software for use
    with the Belle Demo Testbed, SC2003 HPC Challenge
    (worlds largest testbed)
  • More useful for rapid Grid deployment than above
    systems.
  • Accommodates Resource Owners security policies
  • resource organisations are partof the community
  • their internal security policies arefrequently
    ignored/by-passed
  • Takes into account CA
  • certificate authorities are a partof the
    community
  • a VO should be able to list CAswho they trust to
    sign certificates
  • Compatible with existing Globus
  • Might be of use/interest to theAustralian Grid
    community?
  • GridMgr (Grid Manager)

10
Virtual Organisation Systems
  • How do VOs manage internal priorities?
  • This problem has not yet become apparent!
  • This has been left up to local resource settings.
  • For non VO resources, changes would require
    allocation or configuration renegotiation.
  • CAS is only VO middleware to address this
  • done by VOs specifying policies allowing/denying
    access to resources
  • local resource priorities are not taken into
    account
  • difficult to predict the effect
  • VO manage job queue
  • centrally managed VO priorities, independent of
  • locally managed resource priorities
  • resource job consumers pull jobs from the queue
  • the VO decides and can change which jobs are run
    first
  • results of prototype testing fair-share
    system could be used
  • users/groups are allocated a target fraction of
    all resources

11
Grid Anatomy
  • What are the essential components
  • CPU Resources Middleware
  • Data Resources Middleware
  • replica catalogues unifying many data sources
  • Authentication Mechanism
  • Globus GSI, Certificate Authorities
  • Virtual Organisation Information Services
  • Grid consists of VOs!? users resources
    participating in a VO
  • Who is a part of what research/effort/group
  • Authorisation for resource use
  • Job Scheduling, Dispatch, and Information
    Services
  • Collaborative Information Sharing Services
  • Documentation Discussion (web, wiki,)
  • Meetings Conferences (AccessGrid)
  • Code Software (CVS, CMT, PacMan)
  • Data Information (Meta Data systems)

Job Scheduling
12
Data Grid Scheduling
  • Task -gt Job1, Job2 ...
  • Job1 -gt input replica 1, input replica 2 ...
  • Job1 Input -gt CPU resource 1 ...
  • How do you determine whatwhere is best?

13
Data Grid Scheduling
  • Whats the problem?
  • Try to schedule wisely
  • free resources, close to input data, less
    failures
  • Some resources are inappropriate
  • need to parse and check job requirements and
    resource info (RSL - Resource Specification
    Language)
  • Job failure is common
  • error reporting is minimal
  • need multiple retries for each operation
  • need to try other resources in case of resource
    failure
  • eventually we stop and mark a job as BAD
  • What about firewalls
  • some resource have CPUs which cannot access data
  • Schedulers
  • Nimrod/G (parameter sweep, not Data Grid)
  • GridBus Scheduler (2003, 2004 aided them towards
    SRB)
  • GQSched (prototype developed in 2002, used in
    2003 demos)

14
Data Grid Scheduling
  • GQSched (Grid Quick Scheduler)
  • Idea is based around the Nimrod model (user
    driven parameter sweep dispatcher)
  • Addition of sweeps over data files and
    collections
  • Built in 2002 as a demonstration to computer
    scientists of simple data grid scheduling
  • Simple tool familiar to Physicists
  • Shell script, Environment parameters
  • Data Grid Enabled
  • Seamless access to data catalogues and Grid
    storage systems
  • Protocols GSIFTP, GASS (and non-Grid protocols
    also HTTP, HTTPS, FTP)
  • Catalogues GTK2 Replica Catalog, SRB (currently
    testing)
  • Scheduling based on metrics for CPU Resource
    Data Resource combinations
  • previous failures of job on resource
  • nearness of physical file locations (replicas)
  • resource availability
  • Extra features
  • Pre- and Post-processing for preparation/collation
    of data and job status checks
  • Creation and clean-up of unique job execution
    area
  • Private network friendly staging of files for
    specific resources (3 stage jobs)
  • Automatic retry and resubmit of jobs

15
Grid Scheduling
  • gqsched myresources myscript.csh

!/bin/csh -f Param MYFILE GridFile
srb/anusf/home/ljw563.anusf/proc1/.mdst Stage
In MYFILE StageIn recon.conf
event.conf StageIn particle.conf echo
Processing Job JOBID on MYFILE on host
hostname basfexec -v b20020424_1007 ltlt
EOF path create main module register
user_ana path add_module main user_ana initialize
histogram define somehisto.hbook process_event
FILE 1000 EVENTSKIP terminate EOF echo
Finished JobID JOBID . StageOut output.mdst
srb/anusf/home/ljw563.anusf/procout1/ StageOut
myana.hbook myana.JOBID.hbook
16
Grid Anatomy
  • What are the essential components
  • CPU Resources Middleware
  • Data Resources Middleware
  • replica catalogues unifying many data sources
  • Authentication Mechanism
  • Globus GSI, Certificate Authorities
  • Virtual Organisation Information Services
  • Grid consists of VOs!? users resources
    participating in a VO
  • Who is a part of what research/effort/group
  • Authorisation for resource use
  • Job Scheduling, Dispatch, and Information
    Services
  • Collaborative Information Sharing Services
  • Documentation Discussion (web, wiki,)
  • Meetings Conferences (AccessGrid)
  • Code Software (CVS, CMT, PacMan)
  • Data Information (Meta Data systems)

Meta-Data
17
Meta-Data System
  • Advanced Meta-Data Repository
  • Advanced Above and beyond file/collectionorient
    ed meta-data
  • Data oriented queries
  • List the files resulting from task X.
  • Retrieve the list of all simulation data ofevent
    type X.
  • How can file X be regenerated? (if lost or
    expired)
  • Other queries we can imagine
  • What is the status of job X ?
  • What analyses similar to Xhave been undertaken?
  • What tools are being used for X analysis?
  • Who else is doing analysis X or using tool Y ?
  • What are the typical parameters used for tool X
    ? And for analysis Y ?
  • Search for data skims (filtered sets) thatare
    supersets of my analysis criteria.

18
Meta-Data System
  • XML
  • some great advantages
  • natural tree structure
  • strict schema, data can be validated
  • powerful query language (XPath)
  • format is very portable
  • information readily transformable (XSLT)
  • some real disadvantages
  • XML databases are still developing, not scalable
  • XML DBs are based on lots of documents of the
    same type
  • Would need to break tree into domains, query
    becomes difficult
  • LDAP
  • compromise
  • natural tree structure
  • loose schema but well defined
  • reasonable query feature, not as good as XML
  • very scalable (easily distributed and mirrored)
  • information can be converted to XML with little
    effort if necessary
  • structure/schema is easily accessible and
    describes itself !

19
Meta-Data System
  • Components
  • Navigation, Search, Management of MD
  • Task/Job/Application generated MD
  • Merging and Uploading MD

LDAP Server
User supplied
Software generated
20
Meta-Data System
  • Navigation and Creation via Web
  • Search is coming

21
How to use it all together?
  • Getting set up
  • Certificate from a recognised CA (VPAC)
  • Accounts on each CPU/storage resource
  • ANUSF storage, VPAC, ARC (UniMelb), APAC
  • Install require software on resources (eg. BASF)
  • Your certificate in the VO system
  • Running jobs
  • Find SRB input files, set up output collection
  • Convert your scripts to GQSched scripts
  • Run GQSched to execute jobs
  • Meta Data
  • Find/Create a context for your tasks (what you
    are currently doing)
  • Submit this with your job, or store with output
  • Merge context output meta-data, then upload
  • NOT COMPLETE need auto generated MD from
    BASF/jobs
Write a Comment
User Comments (0)
About PowerShow.com