Title: The new EGRID infrastructure
1The new EGRID infrastructure
- An update on the status of the EGRID project
2The new EGRID infrastructure
- The EGRID project
- To implement Italian national grid facility for
processing Economic and Financial data. - Underlying fabric on top of which partner
projects develop Economic and Financial
applications.
3The new EGRID infrastructure
- Summary
- Original user requirements
- The first EGRID release
- Operating problems
- Redesigning EGRID
- A web portal to access EGRID
4- I. Original user requirements
5Original user requirements
- HW infrastructure to storemanage 2TB Stock
Exchange Data NYSE, LSE, Borsa di Milano, etc. - Privacy legally binding disclosure policies
- Users do not have the same read rights a
research group has contract with NYSE for a
specific company another group has contract with
LSE for all companies etc. - Two classes of users those that upload stock
exchange raw data and that may remove it and
those that work on the data. - Facility organised for raw data pre-processing
and end-user applications.
6- II. The first EGRID release
7The first EGRID release
- Meeting the HW infrastructure requirement
- Bulk computing power access and bulk storage
rented from INFN Padova, part of Physics grid! - Employed same EDG middleware INFN uses.
- Two tiered topology dictated by network
connectivity - Partner projects have limited connectivity
installed peripheral sites supply local services. - Cache area for large data transfers.
- Job execution points for non CPU intensive data
processing. - INFN Padova has good connectivity supplies
services to whole community.
8The first EGRID release
- Meeting data privacy EDGs data access mechanism
implied critical and fragile fine-tuning. - Classic SE local files exposed through GridFTP.
- GridFTP allows file manipulation compatible with
underlying Unix filesystem permissions. - The underlying filesystem must be carefully
managed - Users mapped to specific local accounts not pool
accounts. - Users partitioned into especially created groups
reflects data access patterns. - Carefully crafted directory tree guides data
access. - Users have same UID across all SEs.
- Replication/Synchronisation of directory
structure across all SEs. - Users supplied with tools to manage permissions
coherently across all SEs.
9The first EGRID infrastructure
- Meeting pre-processing requirement supported
with tailor made wrapper component. - Developers can more easily grid enable
pre-processing operations. - Users to more easily run grid pre-processing on
given datasets. - Common Unix commands such as cat, cut and grep,
were adapted to operate on grid stored files.
10The first EGRID infrastructure
- Meeting user needs
- User applications are specific to research
interests programmes and function libraries
developed to aid porting of applications. - To facilitate installation of grid client SW,
LiveCD technology was employed.
11 12Operating problems
- HW infrastructure
- Only one large computing site insufficient to
demonstrate grid potential for distributed
resource allocation. - Two tiered topology problematic maintenance task
on designated local user EGRID could not
dedicate enough manpower to job.
13Operating problems
- Privacy
- EDG and successor middleware LCG still lacked
data access mechanism strong enough for EGRID. - Implemented solution is complex and does not
scale real account for each user in each SE,
permissions on filesystem make tree replication
tricky, etc - The middleware did not allow a solution in line
with a pervasive grid view.
14Operating problems
- User needs
- Only small part of community used tailor made
command line tools. - UI distributed on LiveCD spared users workstation
reinstallation, but - users complained of awkward usage
- interference with usual way of working
15 16Redesigning EGRID
- Driving factors
- Leaner and more general infrastructure
- Robust privacy
- Thoroughly re-examined grid usability
17Redesigning EGRID
- HW infrastructure
- Added second large computing centre INFN
Catania. - Dropped two tiered topology.
18Redesigning EGRID
- Privacy
- Classic SE replaced with specific implementation
of Storage Resource Manager (SRM) protocol
currently being completed. - Implementation is result of StoRM collaboration
with INFN-CNAF. - Not a proprietary solution SRM becoming
standard for grid disk access security solution
compatible with mainstream grid trends.
19Redesigning EGRID
- How StoRM solves privacy
- All file requests are brokered with SRM protocol.
- When StoRM receives an SRM request for a file
- StoRM asks policy source for access rights to
given file for given grid credentials. - Check is made at the grid credential level not
local user as before! - Physical enforcement through JustInTime ACL
setup - All files have no ACLs setup no user can access
files. - Local Unix account corresponding to grid
credentials is determined. - ACL granting requested access set up for local
user. - ACL removed when file no longer needed.
- StoRM leverages grids LogicalFileCatalogue (LFC)
as policy source compatible with mainstream grid
trends
20Redesigning EGRID
- Completing data privacy
- ELFI tool developed to allow classic POSIX I/O
software interface access to grid files. - ELFI is FUSE filesystem implementation grid
resources are seen through local mount points. - ELFI speaks SRM protocol there is lack of SRM
clients.
21Redesigning EGRID
- ELFI allows more
- All existing file management tools work
automatically with grid files - Text tools cat, grep, etc.
- Graphical tools Konqueror, etc.
- Helps RAD/Prototyping developers not got to
learn new APIs when porting applications. - Sites supporting ELFI on WNs applications spared
need to explicitly run grid file transfer
commands.
22Redesigning EGRID
- Grid usability
- Web portal key solution portals long proved to
be effective ways to allow user interaction with
organisations information system. - Old command line tools will remain
- For backwards compatibility.
- For few users that eagerly adopted them.
- New development will concentrate on web portal.
23- V. A web portal to access EGRID
24A web portal to access EGRID
- Main entrance to new EGRID infrastructure.
- All tools in one place Graphical UI
- Closer to users way of working.
- Lowers resistance to new technology.
- No need to install grid SW on users workstation
- Interaction through portal as displayed in web
browser. - P-grade chosen as portal technology
- Sufficiently sophisticated as starting point to
meet EGRID requirements. - Does not fully meet EGRID requirements extra
development needed.
25A web portal to access EGRID
- P-grades GUI simplifies many routine task and
masks complexity - No need to manually handle job identification
strings. - Display keeps track of launched jobs, status,
allows output retrieval, job cancelling, etc. - Easily choose Broker for automatic job submission
or specific CEs. - Enough flexibility to allow direct jdl attribute
specification. - Graphical browsing of grid resources file
management no need for distinct tools.
26A web portal to access EGRID
- P-grade portal adds new functionality
- Although MPI jobs can also be run from the CLI,
P-grade supplies a special API that allows a
graphical report on such jobs to be displayed. - Workflow manager
- Graphically specify several jobs.
- Define connections among them showing data flow.
- Portal takes care of retrieving job output and
feeding it to linked jobs. - Monitoring of workflow done graphically showing
data flow.
27A web portal to access EGRID
- Extra development needed
- Improved proxy management
- SRM data management
- SRM support in Workflow
- Support for special workflow jobs swarm jobs
28A web portal to access EGRID
- Improved proxy management
- P-grade first uploads users private key into
host where Portal resides then transfers it to
MyProxy Server. - To lower security risks EGRID needs key to be
transferred directly from user workstation to
MyProxy server. - Java WebStart application developed by EGRID and
seamlessly integrated into P-grade credentials
portlet.
29A web portal to access EGRID
- SRM data management
- P-grade allows browsing of files in classic SE
files local to user workstation. - P-grade does not support SRM does not support
browsing of files in portal hosting machine. - ELFI allows access to StoRM through local mount
point. - It is easier to write a portlet that allows
browsing of portal local resources rather than
one that deals with the new SRM protocol. - EGRID developed a new portlet to allow such
browsing.
30A web portal to access EGRID
- SRM support in Workflow
- Workflow definition requires for each job to
define input and output files. - For each file must be specified respective
location. - P-grade supports classic SEs user workstation.
- SRM is not supported.
- New file location support in P-grade host
containing portal itself StoRM will be accessed
through ELFI local mount point! - On going collaboration with P-grade developers to
better define requirement and study feasibility.
31A web portal to access EGRID
- Swarm Workflow jobs
- Swarm jobs application run repeatedly on
different datasets final job collects results
and carries out final aggregate computation. - Currently P-grade workflows allow only manual job
parameter specification automatic mechanism
needed. - This feature is already present in P-grades
release schedule.
32A web portal to EGRID
- Possible drawback
- Java technology is used extensively also on
client side Applets and JavaWebStart used for
certain operations users must have a Java
Virtual Machine installed. - Given ubiquitous nature of Java should not be a
big problem.
33Acknowledgements
- StoRM collaboration with INFN-CNAF of grid.IT
project Dr. Mirco Mazzuccato, Dr. Antonia
Ghiselli. - P-grade team headed by Prof. Peter Kacsuk of MTA
Sztaki Hungarian Academy Sciences - EGRID project leaders Dr. Alvise Nobile of ICTP,
Dr. Stefano Cozzini of INFM Democritos. - EGRID team Alessio Terpin, Angelo Leto, Antonio
Messina, Ezio Corso, Riccardo di Meo, Riccardo
Murri.