Asia Pacific Grid: Towards a production Grid - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Asia Pacific Grid: Towards a production Grid

Description:

Climate simulation is used as a test application to evaluate progress of ... DNS lookup. reverse lookup is used for server authentication. firewall / TCP Wrapper ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 16

Provided by: yos46

Category:

more less

Transcript and Presenter's Notes

Title: Asia Pacific Grid: Towards a production Grid

1
Asia Pacific GridTowards a production Grid

Yoshio Tanaka
Grid Technology Research Center,
Advanced Industrial Science and Technology, Japan

2
Contents

Updates from PRAGMA 5
Demo at SC2003 (climate simulation using Ninf-G)
Joint demo with NCHC
Joint demo with TeraGrid
Experiences and Lessons Learned
Towards a production Grid

3
Why the climate simulation?

Climate simulation is used as a test application
to evaluate progress of resource sharing between
institutions
We can confirm achievements of
Globus-level resource sharing
Globus is correctly installed
Mutual authentication based on GSI
High-level Middleware (GridRPC) level resource
sharing
JobManager works well
Network configuration of the cluster(note that
most clusters use private IP addresses)

4
Behavior of the System
Severs NCSA Cluster (225 CPU)
Ninf-G
Client (AIST)
Severs AIST Cluster (50 CPU) Titech Cluster (200
CPU) KISTI Cluster (25 CPU)
5
Terrible 3 weeks (PRAGMA5SC2003)

Increased resources
14 clusters -gt 22 clusters
317 cpus -gt 853 cpus
Installed Ninf-G and climate simulation on
TeraGrid
Account was given in Nov. 4th
Port Ninf-G2 to IA64 architecture

6
Necessary steps for the demo

Apply my account to each site
Add an entry to grid-mapfile
Test globusrun
authentication
Is my CA trusted? Do I trust your CA?
Is my entry in grid-mapfile?
DNS lookup
reverse lookup is used for server authentication
firewall / TCP Wrapper
Can I connect to the Globus gatekeeper?
Can the globus jobmanager connect to my machine?
jobmanager
Is the queuing system (eg. pbs, sge) installed
appropriately?
Does jobmanager script work as expected?
In case of TeraGrid
Obtained my user certificate from TeraGrid CA
(NCSA CA)
Asked TITECH and KISTI to trust NCSA CA
It was not feasible to ask TeraGrid to trust AIST
GTRC CA

7
Necessary steps for the demo (contd)

Install Ninf-G2
Frequently occurred problem due to inappropriate
installation of GT2 SDK
GT2 manual
GRAM and DATA gcc32dbg
Info gcc32dbgpthr
Asked additional installation of Info SDK with
gcc32dbg
Test Ninf-G application
Can Ninf-G server program connect to the client?
If private IP address is used for the backend
node, NAT must be available
These are application/middleware specific
requirements. Requirements depend on
applications and middleware.
New Ninf-G application (TDDFT) needs Intel
Fortran Compiler
Other application needs GAMESS / Gaussian

8
Lessons Learned

Need to pay much efforts for initiation
MDS is not scalable and still unstable
Need to modify some parameters in
grid-info-slapd.conf
Testbed was unstable
Unstable / poor network
System maintenance (incl. version up of software)
without notification
realized when the application would fail.
it worked well yesterday, but Im not sure
whether it works today

9
Lessons Learned (contd)

Difficulties caused by the grass-roots approach.
It is not easy to keep the GT2 version coherent
between sites.
Different requirements for the Globus Toolkit
between users
Most resources are not dedicated to the Testbed.
resources may be busy / highly utilized
Need grid level scheduler, fancy Grid reservation
system?
(from point of view of resource providers) we
need flexible control of donated resources
e.g. 32 nodes for default user, 64 nodes for
specific groups, 256 nodes for my organization

10
Summary of current status (contd)

What has been done?
Resource sharing between more than 20 sites
(853cpus were used by Ninf-G application)
Use GT2 as a common software
What hasnt?
Formalize how to use the Grid Testbed
I could use, but it is difficult for others
I was given an account at each site by personal
communication
Provide documentation
Keep the testbed stable
Develop management tools
Browse information
CA/Cert. management

11
Towards a production Grid

Define minimum requirements of Grid middleware
Resource WG has the responsibility
NMI, TeraGrid software stack
Each site must follow the requirement
Keep the testbed as stable as possible
Understand that the security is definitely
essential for international collaboration
How is the security (CA) policy in Asia Pacific?

12
Towards a production Grid (contd)

Draft Asia Pacific Grid Middleware Deployment
Guide, which is a recommendation document for
deployment of Grid middleware
Minimum requirements
Configuration
Draft Instruction of Grid Operation in the Asia
Pacific Region, which guides how to run Grid
Operation Center to support management of stable
Grid testbed.
Launch Asia Pacific Grid Policy Management
Authority ( http//www.apgridpma.org/ )
Coordinate security level in Asia
Interact with outside of Asia (DOEGrids PMA,
EUGrid PMA)
Sophisticated users Guide is necessary