Title: HPWREN and International Collaborations
1PRAGMA Grid Lessons Learned
Cindy Zheng, David Abramson, Peter Arzberger,
Shahaan Ayyub, Colin Enticott, Slavisa Garic,
Wojtek Goscinski, Mason J. Katz, Bu Sung Lee,
Phil M. Papadopoulos, Sugree Phatanapherom,
Somsak Sriprayoonsakul, Yoshio Tanaka, Yusuke
Tanimura, Osamu Tatebe, Putchong Uthayopas and
the whole PRAGMA Grid team Pacific Rim
Application and Grid Middleware
Assembly http//www.pragma-grid.net http//goc.pra
gma-grid.net
2Overview
- PRAGMA
- PRAGMA Grid
- People
- Hardware
- Software
- Operations
- Grid Applications
- Grid Middleware
- Security
- Infrastructure
- Services
- Grid Interoperations
Heterogeneity People Collaborations Integrations
Lessons learned
3PRAGMA
A Practical Collaborative Framework People and
applications
Overarching Goals
Strengthen Existing and Establish New
Collaborations Work with Science Teams to Advance
Grid Technologies and Improve the Underlying
Infrastructure In the Pacific Rim and Globally
http//www.pragma-grid.net
4PRAGMA Member Institutions
CRAY PNWG USA
JLU China
KBSI KISTI Konkuk Korea
CNIC China
AIST CCS CMC NARC OsakaU TITech Japan
UUtah USA
UoHyd India
CalIT2 CRBS SDSC UCSD USA
APAN Japan
NCSA StarLight TransPAC2 USA
ASGCC NCHC Taiwan
CICESE Mexico
KU NECTEC TNGC Thailand
APAC Australia
BII IHPC NGO Singapore
BeSTGRID New Zealand
MIMOS USM Malaysia
37 institutions from 12 countries/regions
Founded 2002 Supported by Members
MU Australia
http//www.pragma-grid.net
5Overview and ApproachProcess to Promote Routine
Use Team Science
Application-Driven Collaborations Applications Mid
dleware
Outcomes Improved middleware Broader Use New
Collaborations Transfer Tech. Standards Publicatio
ns New Knowledge Data Access Education
6PRAGMA Working Groups
- Bioscience
- Telescience
- Geo-science
- Resources and data
- Grid middleware interoperability
- Global grid usability and productivity
- PRAGMA Grid effort is led by resources and data
working group, but rely on collaborations and
contributions among all working groups.
7PRAGMA Grid
JLU China
AIST OsakaU UTsukuba TITech Japan
NCSA USA
CNIC GUCAS China
AIST
CNIC GUCAS
NCSA
UZH Switzerland
KISTI Korea
BU USA
UZH
UUtah USA
SDSC USA
SDSC
LZU China
LZU
UPRM Puerto Rico
ASGC NCHC Taiwan
UoHyd India
CICESE Mexico
ASGC
CUHK HongKong
UNAM Mexico
CUHK
NECTEC ThaiGrid Thailand
NECTEC ThaiGrid
HCMUT IOIT-HCM Vietnam
ITCR Costa Rica
APAC QUT Australia
IOIT-HCM
MIMOS USM Malaysia
MIMOS
BII IHPC NGO NTU Singapore
UCN Chile
BESTGrid New Zealand
NGO
UChile Chile
MU Australia
32 institutions in 16 countries/regions, 27
compute sites, 14 Gfarm sites ( 6 in preparation)
14 gfarm sites
8PRAGMA Grid Members and Teamhttp//goc.pragma-gri
d.net/wiki/index.php/Site_status_and_tasks
- Sites
- 23 sites from PRAGMA member institutions
- 15 sites from Non-PRAGMA member institutions
- 27 sites contributed compute clusters
- Team members
- 160 and growing
- one management contact / site
- 13 technical support contact / site
- 14 application drivers / application
- 15/Middleware development teams
9PRAGMA Grid Compute Resourceshttp//goc.pragma-gr
id.net/pragma-doc/computegrid.html
10Characteristics of PRAGMA Grid
- Grass-root
- Voluntary contribution
- Open (PRAGMA member or not, pacific rim or not)
- Long-term collaborative working experiment
- Heterogeneous
- Funding
- No uniform infrastructure management
- Variety of sciences and applications
- Site policies, system and network environments
- Realistically tough
- Good for development, collaborations,
integrations and testing
11PRAGMA Grid Software Layershttp//goc.pragma-grid
.net/pragma-doc/userguide/join.html
Applications
Phylogenetic
FMO
CSTFT
Savannah
MM5
AMBER
Siesta
Application Middleware
Infrastructure Middleware
Ninf-G
Nimrod/G
Mpich-GX
Gfarm
SCMSWeb
MOGAS
CSF
Globus (required)
Local job scheduler (require one)
SGE
PBS
LSF
SQMS
12PRAGMA Grid Operations
13One of the major lessons from PRAGMA Grid, that
everybody has noticed and would agree You have
to Grid People before you can Grid machines
14Grid Operationhttp//goc.pragma-grid.net,
http//wiki.pragma-grid.net
- Develop and maintain mutual beneficial and happy
relationships among all people involved - Geographies, time-zones, languages
- Funding, chain-of-command, priorities
- Mutual benefit, consensus, active leadership
- Coordinator, site contacts
- Collaboration tools
- Mailing lists, VTCs, Skype, semi-annual workshops
- Grid Operation Center (GOC)
- Wiki, all sites and application, middleware teams
collaborate - Heterogeneity
- Tolerate, technology, overcome and take advantage
- Software inventory instead of software stack
- Many sub-grids for applications
- Recommendation instead of requirements
- Software license
15Create New Ways To Operate http//goc.pragma-grid
.net, http//wiki.pragma-grid.net
- Lack precedence
- Everyone contributes ideas, suggestions
- Evolving and improving over time
- Everyone document and update (wiki)
- Create new procedures
- New site setup to join PRAGMA Grid
- http//goc.pragma-grid.net/pragma-doc/userguide/jo
in.html - New user/application to run in PRAGMA grid
- http//goc.pragma-grid.net/pragma-doc/userguide/pr
agma_user_guide.html - Tabulate information
- Application pages, site pages, resources tables,
status pages - Publish instructions
- Software deployment procedures, tools
16Application Driven
17Applications and Middleware http//goc.pragma-gri
d.net/applications/default.html
- Real science applications paired with and drive
middleware development - Open to applications of all scientific
disciplines - Achieve long-run and scientific results
- 30 applications in 3 years
- Climate simulation
- Savannah/Nimrod (MU, Australia)
- MM5/Mpich-Gx (CICESE, Mexico KISTI, Korea)
- Quantum-mechanics, quantum-chemistry
- TDDFT, QM-MD, FMO/Ninf-G (AIST, Japan)
- Genomics and meta-genomics
- iGAP/Gfarm/CSF (UCSD, USA AIST, Japan JLU,
China) - HPM genomics (IOIT-HCM, Vietnam)
- mpiBlast/Mpich-G2 (ASGC, Taiwan)
- Phylogenetic/Gfarm/CFS (UWisc and UCSD, USA)
- Computational chemistry and fluid dynamics
- CSE-Online (UUtah, USA)
- e-AIRS (KISTI, Korea)
- Gamess-APBS/Nimrod (UZurich, Switzerland)
- Molecular simulation
18Applications By PRIME Studentshttp//prime.ucsd.e
du/student_collections2007.htm
- Providing UCSD undergraduate students
international interdisciplinary research
internships and Cultural experiences since 2004. - Sample applications ran in PRAGMA grid this year
- Climate modeling
- Multi-walled carbon nanotube and polyethylene
oxide composite computer visualization model - Metabolic regulation of ionic currents and pumps
in rabbit ventricular myocyte model - Improving binding energy using quantum mechanics
- Cardiac mechanics modeling
- H5N1 simulation
- Shp2 Protein Tyrosine Phosphatase Inhibitor
simulation for cancer research
19Lessons Learned From Running Applications
- PRAGMA grid and its heterogeneous environment is
great for - Testing
- Collaborating
- Integrating
- Sharing
- Not easy
- Middleware needs improvements
- Work in heterogeneous environment
- Fault tolerance
- Need user friendly portals and services
- Automate and integrate
- Information collections (grid monitoring,
workflow) - Decisions and executions (scheduling)
- Domain specific easy user interfaces (portals, CE
tools)
20Grid Middleware
21Ninf-G http//ninf.apgrid.org
- Developed by AIST, Japan
- Based on GridRPC model
- Support parallel computing
- OGF standard
- Integrated to NMI release 8 (first non-US
software in NMI) - Integrate with Rocks
- 4 applications ran in PRAGMA grid, 2 ran in
multi-grid - TDDFT
- QM/MD
- FMO
- CSTFT (UPRM)
- Achieved long runs (50 days)
- Improved fault-tolerance
- Simplified deployment procedures
- Speed-up development cycles
22Nimrod/Ghttp//www.csse.monash.edu.au/davida/nim
rod
- Developed by Monash University (MU), Australia
- Supports large scale parameter sweeps on Grid
infrastructure - Easy user interface Nimrod portals
- MU, Australia
- UZurich, Switzerland
- UCSD, USA
- 3 applications ran in PRAGMA grid and 1 runs in
multi-grids - Savanah climate simulation (MU)
- GAMESS/APBS (UZurich)
- Siesta (UZurich)
- Developed interface to Unicore
- Achieved long runs (90 different scenarios of 6
weeks each - Improved fault-tolerance (innovate time_step)
- Enhancements in data and storage handling
Description of Parameters PLAN FILE
130pm Tutorial by David Abramson, Blair
Bethwaite
23Mpich-Gxhttp//www.moredream.org/mpich.htm
- Mpich-GX
- Korea Institute of Science and Technology
Information (KISTI), Korea - Based on Mpich-g2
- Grid-enabled MPI, support
- Private IP
- Fault tolerance
- MM5 and WRF
- CICESE, Mexico
- Medium scale atmospheric simulation model
- Experiment
- KGrid
- WRF work well with MPICH-GX
- MM5 experienced scaling problems with MPICH-GX
when use more than 24 processors in a cluster - Functionality of the private IP is usable
- Performance of the private IP is reasonable
24MM5-WRF/Mpich-GX Experiment
Hurricane Marty Simulation
Mpich-GX
Private IP support
Fault Tolerance support
Santana Winds Simulation
KGrid
output
USA
SDSC
CICESE Ensenada
México
eolo
4pm tomorrow Tutorial by Oh-kyoung Kwon
pluto
25ScienceTechnologiesCollaborationsIntegration
s
26PRAGMA is a great model and needs to be emulated.
Has helped weaken barriers between different
research groups across different continents and
allowed people to trust and collaborate rather
than compete.
- Arun Agarwal
- UoHyd, India
27Collaborations With Science and Technology Teams
- Grid security
- Naregi (Japan), APGrid, GAMA (SDSC, USA)
- Grid infrastructure
- Monitoring - SCMSWeb (ThaiGrid, Thailand)
- Accounting - MOGAS (NTU Singapore)
- Metascheduling - Community Scheduler Framework
(JLU, China) - Cyber-environment - CSE-Online (UUtah, USA)
- Rocks and middleware (SDSC, USA )
- Ninf-G, SCE, Gfarm, Bio, KRocks, Condor,
- Science, datagrid, sensor, network
- Biosciences Avian Flu, portal,
- Gfarm-fuse (AIST, Japan)
- GEON data network
- GLEON sensor network
- OptIPuter
- High performance networked TDW
- Telescience
28Grid Security
- Trust in PRAGMA grid, http//goc.pragma-grid.net/p
ragma-doc/certificates.html - IGTF distribution
- Non-IGTF distribution (trust all PRAGMA Grid
sites) - APGrid PMA
- One of three IGTF founding PMAs
- Many PRAGMA grid sites are members
- PRAGMA CA
- Naregi-CA
- AIST, UCSD, UChile, UoHyd, UPRM
- PRAGMA CA (experimental and production)
- Based on Naregi-CA
- Catch-all CA for PRAGMA
- Production CA is IGTF compliant
- Myproxy and VOMS services
- APAC
- Work with GAMA
- Integrate with Naregi-CA (Naregi, UCSD)
- Integration with VOMS (AIST)
- Add servelet for account management (UChile)
- Lessons learned
- Leverage resources, setups and expertise
- Balance and consider both security and easy
access and use - Get more user communities involved with grid
security
29Gfarm Grid File Systemhttp//datafarm.apgrid.org
- AIST, UTsukuba, Open source development at
SourceForge.net - Grid file system that Federates storage of each
site - Meta-server keeps track of file copies and
locations - Can be mounted from cluster nodes and clients
(GfarmFS-FUSE) - Parallel I/O, near site copy for scalable
performance - Replication for fault tolerance
- Use GSI authentication
- Easy application deployment, file sharing
30PRAGMA Gfarm Datagridhttp//goc.pragma-grid.net/p
ragma-doc/datagrid.html
- Compute Cluster
31Develop and Test GfarmFS-FUSE in PRAGMA
Gridhttp//goc.pragma-grid.net/wiki/index.php/Res
ources_and_Data
- Testing with applications
- Igap (Gfarm, Japan, UCSD, USA JLU, China)
- Huge number of small files
- High meta-data access overhead
- Meta-data cache server
- Dramatic improvements (44sec -gt 3.54sec)
- AMBER (USM, Malaysia Gfarm, Japan)
- Remote Gfarm meta-server
- Meta-server is bottle-neck
- File sharing permission, security
- 2.0 improved performance
- Use as a shared storage only
- Version 1.4 works well in local or regional grid
- GeoGrid, Japan
- CLGrid, Chile
- Integration
- SCMSWeb (ThaiGrid, Thailand)
- Rocks (SDSC, USA UZH, Switzerland)
32SCMSWebhttp//www.opensce.org/components/SCMSWeb
- Developed by Kasetsart University and ThaiGrid
- Web-based real-time grid monitoring system
- System usage, Job/queue status
- Probe Globus authentication, job submission,
gridftp, Gfarm access, - Network bandwidth measurements with Iperf
- PRAGMA grid geo map
- Support Linux, Solaris. Good meta-view, easy user
interface, excellent user support - Develop and test in PRAGMA grid
- Deployed in 27 sites, improve scalability and
performance - Sites help with porting to ia64 and Solaris
- Demands push fast expansion of functionalities
- More regional/national grids learned and adopted
33SCMSWeb Collaborations and Integrations
- Grid Interoperation Now (GIN, OGF)
http//forge.gridforum.org/sf/wiki/do/viewPage/pro
jects.gin/wiki/GinOps - Worked with PRAGMA grid, TeraGrid, OSG, NorduGrid
and EGEE on GIN testbed monitoring
http//goc.pragma-grid.net/cgi-bin/scmsweb/probe.c
gi, added probes to handle various grid service
configurations/tests. - Worked with CERN and Implemented a XML-gt LDIF
translator for GIN geo map http//maps.google.com/
maps?qhttp//lfield.home.cern.ch/lfield/gin.kml - Worked with many grid monitor software developers
on a common schema for cross-grid monitoring
http//wiki.pragma-grid.net/index.php?titleGIN_2
8Grid_Inter-operation_Now29_Monitoring - Software integration and interoperations
- Rocks SCE roll
- MOGAS, grid accounting
- CSE-Online, CSF, provide resource info
- Things are being worked on and planned
- Data federator for grid applications
- Provide site software information
- Standardize data extractions and formats
- Improve data storage with RDBMS
- Interoperate with other monitoring software
- Ganglia support
34MOGAShttp//ntu-cg.ntu.edu.sg/pragma/index.jsp
- Multi-Organization Grid Accounting System (MOGAS)
- Lead by NanYang University, funded by National
Grid Office in Singapore - Build on globus core (gridftp, GRAM, GSI)
- Support GT2,3,4, SGE, PBS
- Job/user/cluster/OU/grid levels usages job logs
metering and charging tools - Develop and test in PRAGMA grid
- Deployed on 14 sites different GT versions, job
schedulers, GRAM scripts, security policies - Feedbacks, improve, automate deployment procedure
- Decentralized servers and better database to
improve scalability and performance - Collaborations and integrations with applications
and other middleware teams push the development
of easy database interface
4pm MOGAS tutorial by Francis Lee
35CSF4http//goc.pragma-grid.net/wiki/index.php/CSF
_server_and_portal
- Community Scheduler Framework, v4
meta-scheduler - Developed by Jilin University, China
- Grid services host in GT4, WSRF compliant,
execution Component in Globus Toolkit 4 - Open Source, http//sourceforge.net/projects/gcsf
- Support GT24, LSF, PBS, SGE, Condor
- Easy user interface - portal
- Testing and collaborating in PRAGMA
- Testing with application iGAP (UCSD, AIST, KISTI,
) - Collaborate and integrate with Gfarm on data
staging (AIST, Japan) - Setup a CSF server and portal (SDSC, USA)
- Collaborate/integrate with SCMSWeb for resource
information (Thaigrid, Thailand) - Leverage resources and global grid testing
environment
130pm CSF4 Tutorial by Zhao-hui Ding
36Computational Science Engineering
Onlinehttp//cse-online.net
- Developed by University of Utah, USA (Thanh N.
Truong) - Desktop tool, user friendly interface enables
seamless access to remote data, tools and grid
computing resources - Currently support computational Chemistry
- Can be customized for other domain science
- Developed interface to TeraGrid
- Collaborate with ThaiGrid as case study
- Used for Computational workshop
- Extend grid access to portal architecture
- Improved security
- Working on interface PRAGMA grid
- Heterogeneity
Quantum Chemistry
Nano-materials
Drug Design
37Collaborations with OptIPuterhttp//www.optiputer
.net
- OptIPuter (Optical networking, Internet Protocol,
computer storage, processing and visualization
technologies) - Infrastructure that will tightly couple
computational resources over parallel optical
networks using the IP communication mechanism - central architectural element is optical
networking, not computers - enable scientists who are generating terabytes
and petabytes of data to interactively visualize,
analyze, and correlate their data from multiple
storage sites connected to optical networks - Rocks VIS-roll (SDSC)
- Networked Tile Display Walls (TDW)
- Low cost
- For research collaboration
- For remote education and conferencing
- Deployed in PRAGMA grid
- 9 sites and more to follow
- Future plan
- Global Lambda Integrated Facility (GLIF)
- Solve grid application bandwidth problem
CNIC, China
UCSD, USA
38Grid Interoperation Now (GIN)http//forge.gridfor
um.org/sf/wiki/do/viewPage/projects.gin/wiki/GinOp
s
- OGF GIN-OPS
- GIN testbed (February, 2006 on-going)
- Application driven
- TDDFT/Ninf-G (PRAGMA - AIST, Japan)
- PRAGMA, TeraGrid, OSG, NorduGrid EGEE
- Savanah fire simulation (PRAGMA Monash
University, Australia) - PRAGMA, TeraGrid, OSG
- Multi-Grid monitoring
- SCMSWeb probe matrix (PRAGMA - ThaiGrid,
Thailand) - Common schema (PRAGMA, TeraGrid, EGEE, NorduGrid)
39OSG-PRAGMA Grid Interoperation Experimentshttp//
goc.pragma-grid.net/wiki/index.php/Main_PageGrid_
Inter-operations
- More resources and support from each grid, but no
special arrangements - Application long-run
- GridFMO/Ninf-G Large scale quantum Chemistry
(Tsutomo Ikegami, AIST, Japan) - 240 CPUs from OSG and PRAGMA grid, 10 days x 7
calculations - Fault-tolerance enabled long-run
- Meaningful and usable scientific results
40Lessons Learned From Grid Interoperation
- Grid interoperation make large scale calculations
possible - Differences among grids provide learning,
collaboration and integration opportunities - IGTF, VOMS (GIN)
- Common Software Area (TeraGrid)
- Ninf-G (AIST/PRAGMA) interface to NorduGrid
- Nimrod-G (MU/PRAGMA) interface to Unicore (PRIME)
- VDT (OSG) and Rocks (SDSC/PRAGMA) integration
- Differences in grid environment are source of
difficulties for users and applications - Different user access setup procedure - take
extra effort - Different job submission protocols
- GRAM, Sandbox, gridftp, modified GRAM,
- One-to-one interface building is not scalable,
nor desirable. Need standard. - Middleware fault tolerance and flexible resource
management is important
41Collaborate in Publishing Research Results
- Some published papers in 2007
- Amaro, RE, Minh DDL, Cheng LS, Lindstrom, WM Jr,
Olson AJ, Lin JH, Li WW, and McCammon JA.
Remarkable Loop Flexibility in Avian Influenza N1
and Its Implications for Antiviral Drug Design.
J. AM. CHEM. SOC. 2007, 129, 7764-7765 (PRIME) - Choi Y, Jung S, Kim D, Lee J, Jeong K, Lim SB,
Heo D, Hwang S, and Byeon OH."Glyco-MGrid A
Collaborative Molecular Simulation Grid for
e-Glycomics," in 3rd IEEE International
Conference on e-Science and Grid Computing,
Banglore, India, 2007. Accepted. - Ding Z, Wei W, Luo Y, Ma D, Arzberger PW, and Li
WW, "Customized Plug-in Modules in Metascheduler
CSF4 for Life Sciences Applications," New
Generation Computing, p. In Press, 2007. - Ding Z, Wei S, Ma, D and Li WW, "VJM -- A
Deadlock Free Resource Co-allocation Model for
Cross Domain Parallel Jobs," in HPC Asia 2007,
Seoul, Korea, 2007, p. In Press. - Görgen K, Lynch H, Abramson D, Beringer J and
Uotila P. "Savanna fires increase monsoon
rainfall as simulated using a distributed
computing environment", to appear, Geophysical
Research Letters. - Ichikawa K, Date S, Krishnan S, Li W, Nakata K,
Yonezawa Y, Nakamura H, and Shimojo S, "Opal OP
An extensible Grid-enabling wrapping approach
for legacy applications", GCA2007 - Proceedings
of the 3rd workshop on Grid Computing
Applications -, pp.117-127 , Singapore, June 2007
a. (PRIUS) - Ichikawa K, Date S, and Shimojo S. A Framework
for Meta-Scheduling WSRF Based Services,
Proceedings of 2007 IEEE Pacific Rim Conference
on Communications, Computers and Signal
Processing (PACRIM 2007), Victoria, Canada, pp.
481-484, Aug. 2007 b. (PRIUS) - Kuwabara S, Ichikawa K, Date S, and Shimojo S. A
Built-in Application Control Module for SAGE,
Proceedings of 2007 IEEE Pacific Rim Conference
on Communications, Computers and Signal
Processing (PACRIM 2007), Victoria, Canada, pp.
117-120, Aug. 2007. (PRIUS) - Takeda S, Date S, Zhang J, Lee BU, and Shimojo S.
Security Monitoring Extension For MOGAS,
GCA2007 - Proceedings of the 3rd workshop on Grid
Computing Applications - , pp.128-137
Singapore, June 2007. (PRIUS) - Tilak S, Hubbard P, Miller M, and Fountain T,
The Ring Buffer Network Bus (RBNB) DataTurbine
Streaming Data Middleware for Environmental
Observing Systems," to appear in the Proceedings
of the e-Science 2007 - Zheng C, Katz M, Papadopoulos P, Abramson D,
Ayyub S, Enticott C, Garic S, Goscinski W,
Arzberger P, Lee B S, Phatanapherom S,
Sriprayoonsakul S, Uthayopas P, Tanaka Y,
Tanimura Y, Tatebe O. Lesson Learned Through
Driving Science Applications in the PRAGMA Grid.
Int. J. Web and Grid Servies, Vol.3, No.3,
pp287-312. 2007
42Summary
- PRAGMA grid
- Shared vision lower resistance to use others
software, test on others resources - Formed new development collaborations
- Size and heterogeneity, explore issues which
functional grid must resolve - Management, resources and software coordination
- Identity and fault management
- Scalability and performance
- Feedback between application and middleware help
improve software and promote software integration - Heterogeneous global grid
- Is realistic and challenge
- Can be good for middleware development and
testing - Can be useful for real science
- Impact
- Software dissemination (Rocks, Ninf-G, Nimrod,
SCMSWeb, Naregi-CA, ) - Help new national/regional grids (Chile, Vietnam,
Hong kong, ) - Key is people, is collaboration
43A Grass Roots Effort
- One of the most important lessons of the
Internet is that it grows most successfully where
grass roots initiatives are encouraged and
enabled. The Internet has historically grown from
the bottom up, and this aspect continues to fuel
its continued growth in the academic and
commercial sectors. - Vint Cert, UN Economic and Social Council in 2000
44- PRAGMA is supported by the National Science
Foundation (Grant No. INT-0216895, INT-0314015,
OCI -0627026) and by member institutions - PRIME is supported by the National Science
Foundation under NSF INT 04007508 - PRAGMA grid is the result of contributions and
support from all PRAGMA grid team members
Thank You
http//www.pragma-grid.net http//goc.pragma-grid.
net