Title: Data Collection Management within the NPACI Toolkit
1Data Collection Management within the NPACI
Toolkit
- Mary Thomas
- Texas Advanced Computing Center
- The University of Texas at Austin
- and
- NPACI
- Presented at the NPACI AHM 2003, San Diego, CA
- NPACI AHM 2003Parallel Session 4 - Collection
Management
2Abstract
- The GridPort Toolkit is a simple portal
developer's API that accesses a large set of grid
services and software, including the Globus
Toolkit, the NWS, GridFTP, and the Storage
Resource Broker. GridPort portals include
Telescience, several NBCR portals, and the
"Cosmic Web Portal," a portal system developed
for astrophysicist Mike Norman and his research
community. We will discuss the approach currently
used by GridPort, the unique solutions applied to
the Cosmic Web Portal, and our future plans.
3Outline
- Motivation
- GridPort Architecture
- Data Management Tools
- GridFTP
- SRB
- GridPort-based Data Portals Examples
4Portals Provide Simple Interfaces
- Portals are web based and that has advantages -
- Users know understand the web
- Can serve as a layer in the middle-tier
infrastructure of the Grid - Integrate various Grid services and resources
- Users can be isolated from resource specific
details - Single web interface isolates system
changes/differences - Not and end-all solution - several
issues/challenges here - Performance, scalability
- Tradeoffs
5Simple Computational Grid
LSF
- Resource View
- Full Functionality
- Very Complex
6Conceptual Web Grid
7GridPort Architecture
8Services Provided by Grid Computing Portals
- Grid-Specific
- Security
- Account and allocation management
- Information, discovery and monitoring
- Resource scheduling and management
- Data and collection management
- Application support
- Portal specific
- view customization, user session management, and
portal logging. - Groups, roles, sharing, access control
- collaboration and communication systems -
chat/instant messaging services, whiteboards,
calendars, newsgroups, citation browsers - Ubiquitous access browsers, cmd line, cell, pda.
9GridPort SRB
- With SRB capabilities, file access is direct,
virtualized - Single SRB account access allows for more
flexible data management
10GridPort 2.0
- Part of NPACkage
- Perl/CGI
- Easy to install
- Dynamic
- Multiportal architecture
- Account data
- manage certs/keys, session info for users
- Grid Globus, SRB, NWS, etc.
- Thin client
11Recommended Technologies
- Collection Management
- GridFTP
- SDSC Storage Resource (SRB)
- for file collection management
- High speed network interfaces
- Info. Services
- Globus MDS 2.2, GIIS, GRIS
- NWS, data from LSF, United Devices, etc.
- Web service based GIS archival system Grid-IAS
- Custom information provider scripts
- Grid Monitoring System (Java enhanced version of
NCSA) - Portals
- Java, Jetspeed, portlets, CC
- Web services (in addtn to grid)
- Database back end
- XML
- Grid
- OGSI/OGSA Globus 3.0
- NPACKage
- Globus GT 2.x (NMI R1, R2 (also earlier versions)
- Security
- GSI is key enabling Techn.
- Grid Security Infrastructure
- MyProxy for remote proxies
- Job Execution
- Globus GRAM Gatekeeper (key)
- used to run batch, interactive jobs and tasks on
remote resources - Scheduler Platform Computing (LSF,
Multi-cluster) - Integration with SGE, AVAKI, others (Texas grid)
- Queuing systems PBS, LSF, etc
12PACI HotPage
- Access portal to all resources
- Information Portal to all users
- Secure access for authorized users
- PACI Grid Software used
- Globus Toolkit(GRAM, GSI, GRIS, GIIS), SRB,
MyProxy, NWS - Built with the GridPort Toolkit
- GP 2.0 Perl/CGI
- Services provided
- Resource information/status
- job control
- data collection management,
- command execution
- personalization
13Path Forward for GridPort
- OGSA ? huge impact
- Software packages compliance
- GT 3.0 integration
- NPACKage
- Continue with GridFTP, SRB integration, others
- NMI Releases
- Emerging Portal Technologies ? standards
- GCE Component Portal Architecture and repository
- Portlets/Jetspeed
- GridPort 3.0 Toolkit
- New architecture based on grid services and
workflow - Open source/team approach
- PyGridPort GridPort LBL PyGlobus (DOE)
14Web Services
- Architecture mechanisms for
- dynamic service discovery (UDDI)
- Separation of implementation from function (WSDL)
- Knowx protocol (SOAP/HTTP, SOAP/RPC)
- Service provider encapsulates implementation
details - Client doesnt need details, just where/how to
send request - Commercial world developing P2P web services
- In some ways, Globus/GRAM is a web service
- Advantage language independent, so can run on
any system - Community pursuing Python, Java, C at this time
15Open Grid Services Architecture
- IBM and Globus team integrated key concepts of
Grid and web - Taking Grid community to next level services
are interoperable - protocol based rather than implementation
- PROTOCOLSexamples telnet, ftp, ssh
- telnet
- Login
- password
- Grid
- Security (PKI, GSI)
- persistence stateless web is gone track task,
user info, etc. - Handles to instances
- Web
- HTTP transport layer
- Simple Object Access Protocol (SOAP)
- XML
- Web Services Description Language (WSDL)
16OGSA Component Approach Workflow
- Grid Web services components
- Standard interface
- Dynamic composition, transfer, exchange of data
17JetSpeed and Portlets
- New direction for grid computing portal community
based on Apache and open source - Uses Java plug-in software behind web servers
- Builds dynamic web pages based on client request
- Executes set of components (Java Portlets)
- Composites them into a web page
- Returns page to user
- Portlets exchanged by sharing code
- WSDL will be employed
- NCSA has developed GridFTP, GridPort team
developing SRB
18Jetspeed Advantages
- Overall portal customization
- Java Portlet mini code perform tasks.
- Can install someone elses portlets
- Individual user customization
- This will fulfill a need for users to tailor
their portal interface to their liking. - Open Source
- Always being debugged, re-released.
- One downside of Open Source is that documentation
is limited. But, tight user/developer community
provides some assistance. - Template interfaces such as Velocity and JSP
allow for presentation layer to be separated from
java program layer.
19Portlet-based Tools and Technology
- Provided Capability
- Management of user proxy certificates
- Remote file Management via Grid FTP
- Collaborations tools -News/Message systems
- Event/Logging service
- Access to OGSA services
- Specialized Application Factories
- Access to directory services and Metadata tools
See http//www.extreme.indiana.edu/xportlets
20Jetspeed Gridport 2.0
- Path forward allows adoption of new portal
technologies while supporting production NPACI
Infrastructure - Only minor modification made to Gridport
- Perl modules - authentication
- pass Jetspeed session data - set Gridport cookies
- Current Progress
- Gridport Login/Logout
- Globus Run
21GridPort 3.0 GCE Portal
- Expanded CE Layer thin client, GCE Shell,
Portals, Portlets, Apps, etc.) - Distributed grid and web services (OGSA)
- NPACKage compliant
- Workflow interaction between components
- Component Approach
- need OOPs capability ? Java
- Python, PHP/Perl
- XML, database at core
22GridPort-Based Portals
23GridPort Data Intensive Portals
- Cosmic Web Portal (PI Mike Norman, UCSD)
- Astrophysics Code ENZO
- Example of large collection browser
- Telescience (PI Mark Ellisman, UCSD)
- https//gridport.npaci.edu/Telescience
- Example of complex data
- Real time data acquisition system visualization
- Requires high bandwidth and metadata cataloguing
- Biomedical Infrastructure Research Network (BIRN,
PI Peter Arzberger, UCSD)
24Cosmic Data Portal(PI Mike Norman, UCSD)
- Astrophysics code ENZO -
- Generates TB data per run
- Blue horizon
- Data is output ? TBytes per run
- Output dumps occurr about 35 min apart
- about 14 GB/dump
- about 75 dumps/run
- Data Portal to view
- Enzo data collections
- Secure access
- Store user searches for future apps (viz,
compute) - Portal developer Cathie Mills (SDSC)
25Cosmic SRB web browser
- Virtualized Views
- Top logical location rather than physical
location - Bottom View is by attributes rather than
location - Future Plans include Enzo job submission and SRB
to migrate files during data runs
26PACI HotPage
- Access portal to all resources
- Information Portal to all users
- Secure access for authorized users
- PACI Grid Software used
- Globus Toolkit(GRAM, GSI, GRIS, GIIS), SRB,
MyProxy, NWS - Built with the GridPort Toolkit
- GP 2.0 Perl/CGI
- Services provided
- Resource information/status
- job control
- data collection management,
- command execution
- personalization
27Telescience for Advanced Tomography Applications
(PI Mark Ellisman, UCSD)
- Example of complex data
- Real time data acquisition system visualization
- Requires high bandwidth and metadata cataloguing
- NPACI Alpha Project create a set of tools
- Remote control of UCSD high energy microscope
- Computation of 3-D structureselectron
tomographic volumes - Deposit results into a database forming a library
of computerized cell-level brain structures - Tomography (a lay persons view)
- High energy (400keV) electron microscope is used
to scan a physical sample, sent by user to
facility - Series of projections are taken under user
control - Specimen successively tilted in small angular
increments - Data analyzed/reconstructed to perform 3-D image
28Telescience Data Portal
- Access to high performance and long term storage
facilities across computational domains with a
point and click interface - NPACI (SRB collections located at SDSC)
- NASA/IPG (SRB collections)
- others
- Seamless integration with SRB- Storage Resource
Broker - High speed access to data utilizing advanced
networks such as Internet2. - NREN and Abilene networks
- gt 60 Mbits/sec data transfer rate
- Portal (in production)
- On-line certificate generation (_at_NPACI)
- On-line SRB collection creation and mgmt
- Compute (O2K, Globus)
- Integrates existing tools
- GridPort provides grid access
29Telescience Access to Instruments/Data
30Data Performance
- Telescience Portal couples NASA/IPG and
SDSC/UCSD, NPACI resources - Mass storage (SRB) for collections
- Compute (O2K, Globus)
- NREN and Abilene networks
- Tests successfully ran on September 24, 2001
- gt 60 Mbits/sec data transfer rate
- Portal in production
31BIRN Portal
- Production Portal
- GridPort 2.2/NPACKage
- Extends Telescience Architecture
- Uses GridFTP and SRB
- Uses GridPort to Integrate Telescience
technologies with the Grid - Access to instruments
- Globus job control
- SRB data collections
32Future Directions
- Portals Workshop on Friday (open to all)
- Goal is to bring NPACI Portal developers users
together - GridPort 2.2 part of NPACKage, comliant with NMI
program (NSF) - Perl has no GSI security capabilities moving
away - Developing Jetspeed/Portlet solutions for
GridPort - Planning on pyGlobus version pyGridPort
- Collaborating with U. Mich, Indiana, Argonne,
NCSA to develop grid portlet repository - Developing GridPort GCE
- OGSA, Java/portlets, GCEShell interfaces
33GridPort Project Team
- GridPort Project represents collaboration efforts
spanning the PACI Program - Mary Thomas, Jay Boisseau, Maytal Dahan, Eric
Roberts, Tomislav Urban (TACC) - Cathie Mills, Steve Mock, Kurt Mueller (SDSC)
- Charles Severance, Joseph Hardin (U. Mich)
- Dennis Gannon, Goeffrey Fox, Marlon Pierce
(Indiana) - Argonne/ISI Globus development team
- And input from other Institutions/Projects
- NASA/IPG, GGF/GCE Research Group
- NBCR, Telescience, etc.
34References
- Related AHM Sessions
- Tutorial 7 SRB
- Tutorial 9 Grid Portals
- Parallel Session 2 (Weds) Grid Experiences
- Workshop on Portals (Friday)
- GridPort Toolkit Contact Mary Thomas
(mthomas_at_tacc.utexas.edu) - Project Websites http//gridport.npaci.edu
- Download http//gridport.npaci.edu/download