Asia Pacific Grid: Towards a production Grid - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Asia Pacific Grid: Towards a production Grid

Description:

Climate simulation is used as a test application to evaluate progress of ... DNS lookup. reverse lookup is used for server authentication. firewall / TCP Wrapper ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 16
Provided by: yos46
Category:

less

Transcript and Presenter's Notes

Title: Asia Pacific Grid: Towards a production Grid


1
Asia Pacific GridTowards a production Grid
  • Yoshio Tanaka
  • Grid Technology Research Center,
  • Advanced Industrial Science and Technology, Japan

2
Contents
  • Updates from PRAGMA 5
  • Demo at SC2003 (climate simulation using Ninf-G)
  • Joint demo with NCHC
  • Joint demo with TeraGrid
  • Experiences and Lessons Learned
  • Towards a production Grid

3
Why the climate simulation?
  • Climate simulation is used as a test application
    to evaluate progress of resource sharing between
    institutions
  • We can confirm achievements of
  • Globus-level resource sharing
  • Globus is correctly installed
  • Mutual authentication based on GSI
  • High-level Middleware (GridRPC) level resource
    sharing
  • JobManager works well
  • Network configuration of the cluster(note that
    most clusters use private IP addresses)

4
Behavior of the System
Severs NCSA Cluster (225 CPU)
Ninf-G
Client (AIST)
Severs AIST Cluster (50 CPU) Titech Cluster (200
CPU) KISTI Cluster (25 CPU)
5
Terrible 3 weeks (PRAGMA5SC2003)
  • Increased resources
  • 14 clusters -gt 22 clusters
  • 317 cpus -gt 853 cpus
  • Installed Ninf-G and climate simulation on
    TeraGrid
  • Account was given in Nov. 4th
  • Port Ninf-G2 to IA64 architecture

6
Necessary steps for the demo
  • Apply my account to each site
  • Add an entry to grid-mapfile
  • Test globusrun
  • authentication
  • Is my CA trusted? Do I trust your CA?
  • Is my entry in grid-mapfile?
  • DNS lookup
  • reverse lookup is used for server authentication
  • firewall / TCP Wrapper
  • Can I connect to the Globus gatekeeper?
  • Can the globus jobmanager connect to my machine?
  • jobmanager
  • Is the queuing system (eg. pbs, sge) installed
    appropriately?
  • Does jobmanager script work as expected?
  • In case of TeraGrid
  • Obtained my user certificate from TeraGrid CA
    (NCSA CA)
  • Asked TITECH and KISTI to trust NCSA CA
  • It was not feasible to ask TeraGrid to trust AIST
    GTRC CA

7
Necessary steps for the demo (contd)
  • Install Ninf-G2
  • Frequently occurred problem due to inappropriate
    installation of GT2 SDK
  • GT2 manual
  • GRAM and DATA gcc32dbg
  • Info gcc32dbgpthr
  • Asked additional installation of Info SDK with
    gcc32dbg
  • Test Ninf-G application
  • Can Ninf-G server program connect to the client?
  • If private IP address is used for the backend
    node, NAT must be available
  • These are application/middleware specific
    requirements. Requirements depend on
    applications and middleware.
  • New Ninf-G application (TDDFT) needs Intel
    Fortran Compiler
  • Other application needs GAMESS / Gaussian

8
Lessons Learned
  • Need to pay much efforts for initiation
  • MDS is not scalable and still unstable
  • Need to modify some parameters in
    grid-info-slapd.conf
  • Testbed was unstable
  • Unstable / poor network
  • System maintenance (incl. version up of software)
    without notification
  • realized when the application would fail.
  • it worked well yesterday, but Im not sure
    whether it works today

9
Lessons Learned (contd)
  • Difficulties caused by the grass-roots approach.
  • It is not easy to keep the GT2 version coherent
    between sites.
  • Different requirements for the Globus Toolkit
    between users
  • Most resources are not dedicated to the Testbed.
  • resources may be busy / highly utilized
  • Need grid level scheduler, fancy Grid reservation
    system?
  • (from point of view of resource providers) we
    need flexible control of donated resources
  • e.g. 32 nodes for default user, 64 nodes for
    specific groups, 256 nodes for my organization

10
Summary of current status (contd)
  • What has been done?
  • Resource sharing between more than 20 sites
    (853cpus were used by Ninf-G application)
  • Use GT2 as a common software
  • What hasnt?
  • Formalize how to use the Grid Testbed
  • I could use, but it is difficult for others
  • I was given an account at each site by personal
    communication
  • Provide documentation
  • Keep the testbed stable
  • Develop management tools
  • Browse information
  • CA/Cert. management

11
Towards a production Grid
  • Define minimum requirements of Grid middleware
  • Resource WG has the responsibility
  • NMI, TeraGrid software stack
  • Each site must follow the requirement
  • Keep the testbed as stable as possible
  • Understand that the security is definitely
    essential for international collaboration
  • How is the security (CA) policy in Asia Pacific?

12
Towards a production Grid (contd)
  • Draft Asia Pacific Grid Middleware Deployment
    Guide, which is a recommendation document for
    deployment of Grid middleware
  • Minimum requirements
  • Configuration
  • Draft Instruction of Grid Operation in the Asia
    Pacific Region, which guides how to run Grid
    Operation Center to support management of stable
    Grid testbed.
  • Launch Asia Pacific Grid Policy Management
    Authority ( http//www.apgridpma.org/ )
  • Coordinate security level in Asia
  • Interact with outside of Asia (DOEGrids PMA,
    EUGrid PMA)
  • Sophisticated users Guide is necessary

13
Towards a production Grid (contd)
  • Each site should provide a document and/or web
    for users
  • Requirements for users
  • How to obtain an account
  • Available resources
  • hardware
  • software and its configuration
  • resource utilization policy
  • support and contact information

14
Future Plan (contd)
  • Should think about GT3/GT4-based Grid Testbed
  • Each CA must provide CP/CPS
  • International Collaboration
  • TeraGrid, UK eScience, EUDG, etc.
  • Run more applications to evaluate feasibility of
    Grid
  • large-scale cluster fat link
  • many small cluster thin link

15
Summary
  • It is tough work to make resources available for
    applications
  • many steps
  • It is tough to keep the testbed stable
  • Many issues to be solved toward a production Grid
  • Technical
  • local and global scheduler
  • dedication / reservation / co-allocation
  • Political
  • CA policy
  • How can I get an account on your site?
  • Both
  • Coordination of middlewares
  • More interaction between resource and
    applications WG is necessary
  • Need to establish necessary procedures for
    resource sharing
Write a Comment
User Comments (0)
About PowerShow.com