Title: The GridPort Toolkit: a System for Building Grid Portals
1The GridPort Toolkit a System for Building Grid
Portals
- M. Thomas, S. Mock, M. Dahan, K. Mueller, D.
Sutton - San Diego Supercomputer Center, UCSD
- And
- J. Boisseau
- Texas Advance Computing Center, Univ. of Texas at
Austin - Presented at the
- 10th IEEE International Symposium on
- High Performance Distributed Computing
- 7-9 August, 2001 San Francisco, CA
2Outline
- Intro/Background/Motivation
- The GridPort Toolkit
- GridPort-based Application Portals
- Web Services Experiments
- Future Work/Conclusions
- FOR EDs IPAQ (wherever it is)
- https//hotpage.npaci.edu/pda
3Motivation
- Computational science environment is complex
- Users now have access to a variety of distributed
resources - Interfaces to these resources vary and change
often - Policies, accounts, etc. differ across sites/orgs
- Computational scientists are not computer
scientists - Provide universal, easy access to resources
- Focus on keeping the GCE simple, easy to use,
easily accessible. - Users of GridPort-based portals require no
software downloads or configuration changes on
the client side, and run on common web browsers. - Driving philosophy
- Focus on Grid users and developers that will
benefit from simple portals and portal
technologies - reduce workload on Grid users and Grid
application developers. - Support users and smaller projects
4A Few Grid Resources
5Evolution of GridPort/HotPage
- 1997-1998 (the intern years)
- NPACI HotPage project started Informational
services - 1999
- Informational HotPage installed at other sites
- Globus Toolkit? interactive services (beta,
GRAM/GSI) - Formed GridPort Toolkit project Technology
transfer - 2000
- Developed GridPort v1.0 to support application
portals (LAPK, GAMESS) - User portal collaboration ? GSI across NPACI,
Alliance, NASA/IPG - PDA version
- 2001
- Released GridPort Toolkit v2.0 for public use
- Session state, single login SRB integration
coupled to GSI - Web services experiments
- HotPage updated interface supporting PACI/NPACI
personalization - Creating Perl Package version of GridPort
- Created Globus Perl CoG Module
6GridPort Design Requirements
- Universal access
- Portals will be web-based must support old
browsers - Portals must run anywhere, anytime, leave no data
- Require no downloads, plug-ins or applications
- Technology transfer
- GridPort is a Grid jump-start kit
- Leverage infrastructure provided by World Wide
Web - Use common Grid technologies and standards
- minimize impact on already burdened resource
administrators - GridPort Toolkit should not require that
additional services be run on the HPC Systems - Provide a scalable and flexible infrastructure
- Facilitate adding/removing Grid resources,
services, jobs, and users
7GridPort Design Requirements (cont.)
- Security
- Support HTTPS/SSL encryption at all layers, and
provide access control. Base on GSI. - Single login
- Required for easy access/navigation across Grid
resources. - Client applications and portal services should be
able to run on separate webservers - Enable scientists to build their own application
portals and use existing portals for common
infrastructure services - Any site should be able to host a portal
- Any user should be able to create their own
portal if they have accounts and certificate - Adopt Global Grid Forum standards
- Actively collaborate and promote Global Grid
Forum activities
8GridPort Toolkit Architecture
9GridPort Layers
- Clients
- Web browsers, including PDA versions
- Plan to expand to other wireless devices
- Application Portals
- Currently, they exist on same physical machine
and share domain (cookies) - Served to clients by separate virtual webservers
- hotpage.npaci.edu or gridport.npaci.edu
- All use the same instance of the GridPort
libraries. - Share data, libraries, filespace, and other
services on the webserver machine. - Single-login environment
10GridPort Layers (cont.)
- Portal Services.
- For portals and users
- Managing session state, portal accounts, file
collections, - Monitoring the information services
- Services that are portal specific
- not typically addressed by Grid or web
developers. - Grid Services
- Standard middle and backend tiers of the Grid
- Globus, Legion, SRB, NWS, Apples and (someday)
etaschedulers - Resources
- Compute
- Archival
11GridPort Layers
12Commercial Technologies Employed
- Server
- Netscape or Apache servers
- HTTPS, SSL, HTML/JavaScript, SSH
- Perl 5.0/CGI
- Database flat text configuration files
- migrating to SQL/Oracle in limited cases
(reliability) - Will use DB to generate text files
- OS Unix/Solaris, Linux
- Client
- Netscape Communicator, IE (4.0 or greater)
- PC, Mac, Sun/Solaris, SGI
- HTTPS, SSL, HTML/JavaScript (limited use)
13Grid Technologies
- Globus/GRAM gatekeeper
- used to run interactive jobs and tasks, and to
submit batch jobs on remote resources - Grid Security Infrastructure (GSI)/MyProxy
- used for security and authentication
- Grid Information Systems/Grid Resource
Information System (GIS/GRIS) - used for information services where available
- SDSC Storage Resource Broker (SRB)
- used for distributed file collection and
management - Key problem
- not all partners install and maintain all
software
14Services Supported
- Portal user accounts
- On-line account/certificate creation ? unique
portal ID - Associate portalID with DN
- Associate DN with user IDs in mapfiles ?
authenticate - Track sessions, user preferences, distributed
filespace - Portal users must have valid PKI/GSI certificate.
- Accepted certs NPACI, Alliance, NASA/IPG,
Cactus, Globus - This is a complex process, so it does not scale
yet - Authentication (2 ways)
- Authentication against certificate data stored in
the SDSC certificate repository. - Myproxy server
- We save proxy file for duration of session
- Sessions expire after timeout period or user logs
out.
15Services Supported
- Jobs
- Executed via the Globus/GRAM gatekeeper.
- Simple Unix-type commands
- mkdir, ls, rmdir, cd, and pwd. (part of API)
- Compiling and running programs
- job and batchscript submission and deletion, and
viewing of job status and history. - Files
- Access to compute, archival, portal file space
- file transfer
- between the local workstation and the HPC
resources - Between any 2 resources (via SRB)
- Perform common file management operations on
remote files - tar/untar, gzip/gunzip, and movement to archival
storage.
16GridPort Interactive Services Diagram
17GridPort File Management
18Resources Supported
- Compute
- IBM (Blue Horizon, SP)
- Compaq (TCS1)
- CRAY (T3E, T90)
- Sun (E10K)
- SGI (O2K)
- Hewlett Packard (V2500)
- Workstations and clusters.
- Others
- Archival
- HPSS, DMF, MASS
- Any system running Globus can be added
- Multiple sites, centers, and orgs
- PACI Grid NPACI, Alliance, PSC, hopefully DTF
- NASA/IPG
- Multiple sites/locations
- SDSC
- NCSA
- Pittsburgh Supercomputing Center
- Universities UT Austin, Univ. of Kentucky,
Boston Univ.
19Security Implementation
- Security between the client -gt web server -gt
grid - SSL/RC4-40 128 bit key/ SSL RSA X509 certificate
- GSI authentication used for all portal services
- Transparent access to the grid via GSI
infrastructure - Authentication tracked with cookies
- Coupled to server DB/session tracking, maintain
session state - Assigned a random value by the webserver at login
- Random value in the cookie corresponds to a
session file - Session file contains a timestamp
- Single login environment
- Provides access to all NPACI Resources where GSI
available - With full account access privileges for specific
host - Within same domain because of cookies
20Security Implementation (cont.)
- User authentication via valid proxy files
- Proxy generated from key/cert pair or retrieved
from MyProxy - Sensitive data (proxies) stored in restricted
access portal repository - Repository located outside webserver filespace
- Has user and group permissions control
- Portal acts as proxy
- Executing requests on behalf of the user
- Only what user is authorized to access
- Based on credentials presented when portal
account created - Users have same level of access to resource as if
logged on - Globus used for client requests on resources
- GSI used at all layers ? forward session proxy
file
21Applications Running on GridPort
- 2 approaches for portals
- Those developed by the NPACI Team
- Those developed by the application team/developer
(blue) - Application portals in production
- PACI HotPage, https//hotpage.npaci.edu
- NPACI HotPage, https//hotpage.paci.org
- Pharmacokinetic Modeling, https//gridport.npaci.e
du/LAPK - General Atomic and Molecular electronic Structure
System, https//gridport.npaci.edu/GAMESS - Portals developed by project application
developers - Bays to Estuaries Project (BBE),
http//bbe.npaci.edu - Protein Database/CE Portal, https//gridport.npaci
.edu/CE - Telescience (9/30/01), https//gridport.npaci.edu
/Telescience
22Using GridPort
- Install Perl libraries and GridPort code on
webserver - Application portal developer incorporates
GridPort libraries directly into code - Can modify or add subroutines
- General pattern (for our dev. team)
- Uses between 3 and 6 lines of Perl code to access
functions - Jobs, files, auth, etc.
- Each of the CGI scripts for application portals
developed with GridPort follow this pattern - An Example HotPage Batch job submission
- Contains three lines of code that reference
GridPort. - Other lines of code (750) are specific HotPage
23HotPage View Job Submission
24Laboratory for Applied Pharmacokinetics (LAPK)
- Community Model Portal
- users are Doctors, so need extremely simple
interface - Must be portable run from many countries/labs
- Need to hide details such as
- Resource, files, batch scrips, compilation, UNIX
env. - Uses gridport.npaci.edu portal services/capabiliti
es - File upload/download between local
host/portal/HPC systems - Jobs
- Job submit (builds batch script, moves files,
submit jobs) - Job tracking moves results to user filespace
when complet - Job cancel/delete
- Job History maintains relevant job information
- Impact
- LAPK users can now run multiple jobs at one time
using portal
25LAPK Job Submit and Job History
26GCE Web Services
- New architecture for GCEs is emerging
- Workshop held at SDSC (May 01) to discuss this.
- Grid Portals Markup Language/XML
- Constructing GCE Testbed
- Based on web services model that is currently
evolving in commercial world - Sun Jxta, IBM WebSphere Microsoft .NET
- XML/SOAP/UDDI/WSDL
- CCA (See Gannons talk)
- In this expt, our port is a URL
- Key Advantage Client may be a web page/portal,
another application or Grid service - Allows separation of the function of hosting
client from the service or application being used
27A Web Services Expt GridPort Client Toolkit
- Focus on medium/small applications and
researchers - Choose simple protocol (HTTP/CGI/Perl)
- Client/application can be located on any server
or system. - Connection to portal services is through the GCT
- https//portals.npaci.edu/client/tools/FUNCTIONS
- Inherits all existing portal services running on
portal - Including authentication/single login
- Its easy
- Took 1 week to develop GCT
- Key project goal
- Allow scientist to write local portals/apps/etc.
and use services
28Web Services Expt GridPort Client Toolkit
- Ease of use
- Do not have to install complex code to get
started - webservers, no Globus, no SSH, no SSL, no PKI,
etc. - Do not have to write complex interface scripts to
access these services (weve done that already) - Do not have to fund advanced web development
teams - Client has local control over project, including
filespace, etc. - Integration to existing portals has been done
- Bays to Estuaries project
29Services Implemented in GCT
- Authentication
- Login
- Logout
- Check authentication state
- Jobs
- Sumbit jobs to queues
- Cancel jobs
- Execute commands (command like interface)
- Files
- Upload from local host
- Download to local host
- FTP move FILE
- View Portal FILEpace (?)
- Commands
- Pwd
- Cd
- Whoami
- Etc.
30GridPort Client Toolkit DemoApp
31Basin, Bays to Estuaries (BBE) Portal
- Community model Scientific portal for conducting
multi-model Earth System Science (ESS) - Simulations are run to forecast the transport of
sediments within the San Diego Bay area during a
storm. - Technology developed for the BBE project
- Website located on BBE webserver/machine
- http//bbe.npaci.edu
- Uses SRB for file management (GSI)
- Perl/CGI based portal
- Minimal effort required to modify code
- Use GCT for all interactive functions
- Hardest part was installing Perl/LWP module on
local sytsem - Roughly 14 tests needed to integrate GCT into
portal - 4 new Perl scripts required
32Basin, Bays to Estuaries (BBE) Portal
33Conclusions
- Remember your client
- Developer ! User
- Robust portals can be built with simple
technologies - GridPort is a good jump-start Toolkit
- Promotes rapid deployment
- We need
- Grid accounts so we dont have to update 10
billion mapfiles - Common/shared security
- Grid metaschedulers so our users can run on best
available system - Grid aware compilers
- Grid information services that are fast
- Grid Web services
34Future Work
- GridPort V3.0
- In process of evaluating new technologies
- Java technologies to support CCA efforts
- Considering move away from Perl (XML
incompatibilities one reason) - Data portal technologies SRB and GSI-FTP
- Support personalization at account level
- Web services used by production application
portals - HotPage v3.0
- Expand to DTF, and non NSF PACI systems
- Expand personalization ? MyHotPage
- Implement use of NPACI Machines database
- Update to accommodate Virt. Org. concepts
- Automatic SRB collections for all users
35New Directions
- Continue Web services architecture research
- Collaboration with GGF/GCE Research Area and
working groups - GCE Testbed plan underway
- USA PACI, Alliance, NASA, Jefferson lab, PNNL,
others - Europe Daresbury, Cactus, others?
- Collaboration with Sun CAL(IT)2 project
36GridPort Team
- SDSC Staff
- Mary Thomas
- Steve Mock
- Kurt Mueller
- Maytal Dahan
- Cathie Mills
- Student interns Ray Regno, Akhil Seth
- A Collective Effort supported by SDSC services
- Server systems (Josh Polterock)
- HPC Systems (Victor Hazelwood)
- Databases (Dave Archibal)
- Distr. Computing (Keith Thompson, Bill Link)
37Acknowledgements
- San Diego Supercomputer Center and the NSF funded
PACI programs for their support (both with
resources and staff) - Grants
- NPACI, NSF-ACI-975249
- NPACINSF-NASA IPG Project Supplement
- Pharmacokinetic Modeling NCRR Grant No RR11526
- NBCR , NIH/NCRR P41 RR08605-07
- Collaborators
- PACI Partners
- The User Portal Collaboration members NASA/IPG,
LBL, PNNL - Globus team for providing valuable input and
ideas on this project Gregor von Laszewski,
Carl Kesselman and others. - The Global Grid Forum GCE working group
38References
- GridPort Toolkit Website
- https//gridport.npaci.edu
- NPACI HotPage User Portal
- HotPage https//hotpage.npaci.edu
- Accounts http//hotpage.npaci.edu/accounts
- Downloads
- http//gridport.npaci.edu/downloads
- GridPort Toolkit
- NPACI HotPage
- GCT Portal (frames based)
- Contact
- Mary Thomas (mthomas_at_sdsc.edu)