UC3 Shared Research Computing Service - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

UC3 Shared Research Computing Service

Description:

24 projects suitable for running on clusters ... Assuming 1-2TB per project. Suggest minimum size of 50TB for ... Another consideration would be DIY Lustre. ... – PowerPoint PPT presentation

Number of Views:9
Avg rating:3.0/5.0
Slides: 14
Provided by: RPo22
Category:

less

Transcript and Presenter's Notes

Title: UC3 Shared Research Computing Service


1
UC3 Shared ResearchComputing Service
Gary Jung, LBNL UC3 Technical Architecture
TeamUC Grid Summit April 1, 2009
2
Background
  • UC Guidance Council chartered to identify and
    recommend strategic directions to guide future IT
    investments and academic information environment
  • Sustain and enhance academic quality and
    competitiveness was a primary goal
  • Recommendation to develop UC Research Cyber
    Infrastructure (CI) Services
  • High Performance Research Computing component
  • UC CI Planning and Implementation Committee
    formed
  • Proposed Pilot
  • Phase 1 Deploy 2 moderately-sized institutional
    clusters
  • Phase 2 Extend this infrastructure to other
    hosting models and refine application support and
    service model. Connect interested UC campuses to
    UC Grid pilot.

3
Phase 1 Pilot Proposals
  • 32 proposals received from all campuses (except
    UCM), LANL and LBNL
  • 24 projects suitable for running on clusters
  • Research Areas represented in selected proposals
    include
  • Astrophysics
  • Bioinformatics
  • Biology
  • Biophysics
  • Climate Modeling
  • Computational Chemistry
  • Computational Methods
  • Genomics
  • Geosciences
  • Material Sciences
  • Nanosciences
  • Oceanic Modeling

4
UC3 Phase 1 Implementation
  • Architectural Principles
  • Create a consistent user experience.
  • Identical use policies, administrative practices,
    and help mechanisms.
  • Minimize disorientation when moving between
    clusters
  • Not necessarily binary compatible
  • Design with future requirements in mind.
  • Shared filesystems
  • mutual disaster recovery
  • Future expansion of compute or storage
  • metascheduling capability
  • tighter integration
  • Respect local practices.
  • Operational practices at the two sites differ.
  • Okay as long differences are transparent to users
  • Build a balanced system.
  • Goal is a general purpose resource suitable for a
    broad scientific use

3
5
UC3 Phase 1 Implementation
  • Technical Architecture
  • Compute
  • 2 ea. 256 node dual-socket, quad-core processor
    Linux cluster
  • Spec will be for Nehalem processor, but
    alternatives allowed
  • RFQ will ask for pricing on multiple processor
    speeds so that review team can consider
    price/performance trade-offs
  • 16GB per node or 2GB/core. 24GB option
  • Fabric spec will be ConnectX IB, QLogic Truscale
    IB, but vendors can additionally bid Myrinet 10G
  • Single director class switch to provide full
    bi-section bandwidth

4
6
UC3 Phase 1 Implementation
  • Storage Architecture
  • Two-tier storage solution
  • Stable, robust, enterprise NFS for home
    filesystem
  • Parallel filesystem for scratch.
  • No use of local disks. Parallel filesystem needs
    to provide adequate performance to negate the
    need for local disk.
  • Will consider turnkey solutions because of
    minimal ongoing staffing

5
7
UC3 Phase 1 Implementation
  • Technical Architecture
  • Home Directory Considerations
  • 24 PIs/projects with about 10 users per project
  • Assuming 10GB per user for home directory use
  • Assuming 1-2TB per project
  • Suggest minimum size of 50TB for home directories
    and backups
  • 240 users x 10GB 2.4TB for users
  • 1TB per project x 24 projects 24TB
  • Low maintenance NFS appliance is desired.
  • Parallel Filesystem Considerations
  • Lustre is likely choice due to availability and
    cost
  • Terascala provides turnkey Lustre solution.
    Currently under eval at LBNL
  • Another consideration would be DIY Lustre.
    Initial and ongoing support effort will be a
    factor in deciding.

6
8
UC3 Phase 1 Network Architecture
7
9
UC3 Phase 1 Procurement
  • Procurement
  • High profile opportunity for vendors
  • 3 major procurements - Clusters, NFS storage,
    Parallel Filesystem Storage
  • Single procurement for each major component.
  • Scored and weight evaluation criteria
  • No acceptance criteria other than demonstated
    compatibility/integration requirements as
    specified in subcontract.

8
10
UC3 Phase 1 Timeline
  • Schedule
  • Feb - Develop spec for major components
  • Mar - Finalize RFQs
  • Apr - Issue Cluster and NFS Storage RFQ. Vendor
    responses due late-Apr
  • May - Issue Cluster and NFS Storage award early
    May.
  • Jun - Delivery of Cluster and NFS Storage
    hardware. Install.
  • Late Jun - Issue RFQ for Parallel Filesystem
    Storage.
  • Jul - Available for early users
  • Aug - Add Parallel Filesystem Storage

9
11
UC3 Phase 1 User Services
  • User Experience
  • Shared Logins across systems
  • Agreement on uniform UID space
  • Agreement on Centos 5.3 operating system,
    OpenMPI, Moab scheduler.
  • Still need to discuss filesystem layout
  • NFS cross-mounting of home directories across L2
    network
  • Need to work out
  • Help desk procedures
  • Ticket system
  • Web site
  • Documentation
  • How to get help

10
12
UC3 Phase 1 Governance
  • Governance
  • Oversight board consisting of stakeholders to be
    established
  • Scope will include
  • Policy and definition of metrics
  • Compute and storage allocations
  • System configuration details (e.g. scheduler
    priority)
  • What happens after the 2yr pilot?
  • Who gets access? Additional users
  • Strategy to make sustainable
  • Condo clusters

11
13
UC3 Phase 1 Open Issues
  • UC Grid Technical Issues
  • How would we configure UC Grid for UC3?
  • OTP support
  • Moab scheduler support
  • Integration into Gold Banking system for
    allocations
  • Other Issues
  • How might we implement a UC-wide distributed user
    services team?
  • How do we build the customer relationships?

12
Write a Comment
User Comments (0)
About PowerShow.com