Title: Overview and Access to PACI Resources
1Overview and Access to PACI Resources
- John Towns
- Division Director, Scientific Computing
- NCSA and the Alliance
- jtowns_at_ncsa.edu
2PACI Resources at a Glance
- Alliance Resources
- NCSA
- SGI Origin2000
- HP X-Class (Exemplar)
- NT Supercluster
- Boston University
- SGI Origin2000
- Univ of New Mexico
- IBM SP
- RoadRunner cluster
- Los Lobos cluster
- Univ of Kentucky
- HP N-Class cluster
- MHPCC
- IBM SP
- Univ of Wisconsin
- Condor flock
- NPACI Resources
- SDSC
- IBM SP
- Cray T90
- Cray T3E
- SUN HPC10000
- Univ of Texas
- Cray SV1
- Cray T3E
- Univ of Michigan
- IBM SP
- Caltech
- HP X-Class (Exemplar)
3The Big Machines in PACI
- NCSA SGI Origin2000 Array
- 1528 MIPS R10k processors 618 GB RAM 4.3 TB
scratch disk 680 Gflop/s peak - 1 x 56-proc (195MHz), 14GB RAM
- 1 x 64-proc (195MHz), 16GB RAM
- 1 x 128-proc (195MHz), 64GB RAM
- 4 x 128-proc (195MHz), 32GB RAM
- 3 x 128-proc (250MHz), 64GB RAM
- 1 x 128-proc (250MHz), 72GB RAM
- 2 x 256-proc (250MHZ), 128GB RAM
- SDSC IBM SP (Blue Horizon)
- 1152 IBM Power3/222MHz processors in 144 nodes
512 GB RAM 5.0 TB scratch disk 1.0 Tflop/s
peak - Each node has
- 8 Power3 processors
- 4 GB RAM
4PACI Vector Machines
- NPACI Vector Resources
- SDSC
- Cray T90
- 14 Cray Vector processors
- 4 GB RAM
- 24 Gflop/s peak
- Univ of Texas
- Cray SV1
- 16 Cray CMOS Vector processors
- 16 GB RAM
- 19.2 Gflop/s peak
5PACI Shared Memory Systems
- Alliance SMP systems
- NCSA
- HP X-Class (Exemplar)
- 64 PA-8000/180MHz processors
- 16 GB RAM
- 46 Gflop/s peak
- Univ of Kentucky
- HP N-Class
- 96 PA-8500/440MHz processors
- 12 x 8-proc systems
- 96 GB RAM total
- 96 Gflop/s peak
- Boston University
- Origin2000
- 192 MIPS R10000/195MHz processors
- 36 GB RAM
- 74.8 Gflop/s peak
- NPACI SMP systems
- SDSC
- SUN HPC10000
- 64 UltraSPARC II processors
- 64 GB RAM
- 51 Gflop/s peak
- Caltech
- HP X-Class (Exemplar)
- 256 PA-8000/180MHz processors
- 64 GB RAM
- 185 Gflop/s
6PACI MPP Resources
- Alliance MPP systems
- Univ of New Mexico
- IBM SP
- 96 IBM Power2/66MHZ processors
- 6.0 GB RAM
- 25 Gflop/s peak
- Maui High Performance Computing Center (MHPCC)
- IBM SP
- 32 IBM P2SC/160MHz processors
- 24 GB RAM
- 20 Gflop/s peak
- NPACI MPP systems
- SDSC
- Cray T3E
- 272 DEC Alpha 21164 processors
- 34 GB RAM
- 154 Gflop/s peak
- Univ of Texas
- Cray T3E
- 88 DEC Alpha 21164 processors
- 11 GB RAM
- 34 Gflop/s peak
- Univ of Michigan
- IBM SP
- 64 IBM Power2/160MHz processors
- 64 GB RAM
- 30 Gflop/s peak
7PACI PC and Workstation Clusters
- Alliance PC and workstation clusters
- NCSA
- NT Supercluster
- 256 PentiumIII/550MHz processors
- 32 PentiumII/330MHz processors
- 72 GB RAM
- 151 Gflop/s peak
- University of Wisconsin
- Condor flock
- 700 processors of various types
- 64 MB - 1 GB RAM per processor
- University of New Mexico
- RoadRunner Linux cluster
- 128 PentiumII/450MHz processors
- 32 GB RAM
- 56 Gflop/s peak
- LosLobos Linux cluster
- 512 PentiumIII/733Mhz processors
- 256 GB RAM
- 375 Gflop/s peak
8Getting a Small Allocation
- Similar processes for Alliance and NPACI
- Alliance StartUp accounts
- Up to 10,000 SUs on any Alliance resource
- Online form with brief project description
- Submit at any time approximately 30 day
turn-around - NPACI Expedited accounts
- Up to 5,000 SUs on most systems
- Up to 100 SUs on Cray T90
- Up to 200 SUs on Cray SV1
- Printable forms available to complete and submit
- Submit any time
9Getting Moderate Allocations
- Similar process for Alliance and NPACI
- Both are peer-review processes
- Alliance Allocations Board (AAB)
- 10,001-100,000 SUs on any Alliance resource
- Meets quarterly with allocations active on the
first of January, April, July, and October - NPACI Partnership Resource Allocation Committee
(PRAC) - 5,001-50,000 SUs on most NPACI resources
- Up to 101-2,000 SUs on Cray T90
- Up to 201-4,000 SUs on Cray SV1
- Meets twice per year with allocations active on
the first of January and July
10Getting Large Allocations
- Single process for PACI program
- Jointly reviewed by AAB and PRAC
- National Resource Allocation Committee (NRAC)
- gt100,000 SUs on any Alliance resource
- gt50,000 SUs on most NPACI resources
- gt2,000 SUs on Cray T90
- gt4,000 SUs on Cray SV1
- Meets twice per year with allocations active on
the first of April and October
11The Alliance VMR
- An Evolving, Persistent Alliance Computational
Grid - Connects the PACS ACRS Sites
Give the User the Perception Of ONE Machine Room
12What is the Alliance Virtual Machine Room?
- Infrastructure
- Supercomputers, networks, visualization
resources, data archives and databases,
instruments, etc. - Middleware
- Primarily Globus components
- Grid services
- Security infrastructure, grid information
sources, resource management, job submission an
control, data management, etc. - Portal interfaces and portal services
- User Portal, Chemical Sciences Portal, etc.
- Support services
- Consulting and helpdesk
13VMR Deployment Areas
- VMR Operations
- Establish a distributed operations support team
for the VMR - Establish VMR Policies and Procedures
- Storage
- Tie together storage archive resources within the
VMR while adding new capabilities - Account Management
- Account creation and management
- Usage reporting for allocated projects
- Grid Security Infrastructure
- Deploy PKI/GSI authentication services
- Interface to local policies and mechanisms
- Globus Installation and Maintenance
- Deploy appropriate Globus components
- User Services
- Prepare users for new VMR technologies
- Deploy these technologies and nurture use
- User Portal
- User interface to VMR
- Portal services to be leveraged
14VMR Operations
- VMR Resource Monitoring
- Critical resource information monitored by
central VMR site management - Tools and mechanisms to monitor system resources
- 24x7 Operations
- Management policies and procedures
- Central VMR web site
- Common Helpdesk
- Central VMR trouble ticket system
- Base System Documentation and System Admin
Support - Links to local system documentation for each
participating site - System admin policies and procedures
- VMR Systems Software Repository
- Current set of software necessary for a site to
participate in the VMR
15Storage
- Access to established archives
- Secure FTP server (gsiftp) installed at all sites
- Common command line interface to local archive or
LES archive server - Will be testing Distributed UniTree disk caches
- In house at NCSA, testing stability
16Account Management
- Infrastructure
- Transaction-based info exchange
- Maintain local identities
- Account/Allocation management
- Alliance Distinguished Name (DN) creation
- Account creation/removal centrally managed
- Usage reporting
- Regular reporting from some sites of usage
against Alliance allocations
17Grid Security Infrastructure
- Certificate Authority (CA)
- Certificate request, creation, expiration
- Certificate management
- GSI deployment
- Globus, GSI FTPd and sshd
- Admin guide
- Client deployment
18Globus Installation and Maintenance
- Globus installation
- Globus v1.1.2 is current version used in VMR
- Globus capabilities
- Submit jobs to remote system
- MDS information services
- Initial infrastructure in place
- Data being published by all sites
- Hardware resources
- Hardware status info
- Queue status info
- Globus job info
19VMR User Support
- GSI Documentation
- Collaboration on GSI user tools
- Feedback and testing
- VMR Friendly User list generated
20Alliance User Portal aka MyGrid
- At SC99 prototype was shown
- Recent work has focused on component technologies
- Applicable to other portal efforts!
- Also working on component applications
- Been working with SDSC/NPACI on PACI User Portal
- User Portal intended to be interface to VMR
computing environment
21Component Technologies
- MyProxy security
- Credential delegation and authentication for
actions on behalf of the grid user - Focus on authentication
- File transfer facilities
- Primary concern is moving (large) data files
securely - Java interface using GSSFTP
- Initial Java bean interface using GSSFTP
- Job submission
- Mostly app specific today
- Initial general framework prototype using
plug-ins developed - Search engine and documentation
- Bought AltaVista license
- Currently indexing all HPC documentation at all
partner sites - Using this to re-vamp NCSA HPC documentation
22Component Applications
- Systems/job status information
- What is the state of every machine in the VMR?
- What is the state of every job in the VMR?
- Definition of XML DTD formats
- Currently implemented for Origin2000
- Mass store status
- Usage statistics
- Network link status
- Currently implemented for NCSA archive
- Consulting access
- On-line help with desktop sharing phone call
- Trying out WebEx
- Allocations data access
- Interface to check allocation status from User
Portal - Direct access to centralized database backend
- Proposal process
- Interface to manage proposal process from User
Portal
23User Portal Today
24User Portal Today
25User Portal Today
26User Portal Today
27User Portal Today
28Component Applications - Ideas and things to
integrate
- Usage analysis
- Use detailed job log info (JPMDB) for analysis of
usage - Standard views
- Ad Hoc queries
- Individual user and project
- Network status information
- Use information from various probes in network
- AMP, Surveyor, Ocxmon, Network Weather Service,
etc. - Provide info on network link status
- up/down, current traffic, current latency, etc
- Allocation/account management tools
- Build on database backend and ALPO project
- Proposal submission, review award
- Add/remove users, check allocation status
- Provide end of project report
29More Ideas
- Access to other information sources
- Abstract databases
- News, Calendars, discussion groups,...
- Web search engines
-
- Electronic scientific notebooks
- Collaboration tools
- Frameworks for user extensibility
- Job submission for specific applications
- Job monitoring hooks