Title: Deployment, Deployment, Deployment March, 2002
1Deployment, Deployment, DeploymentMarch, 2002
- Randy Burris
- Center for Computational Sciences
- Oak Ridge National Laboratory
2Overview of this presentation
- Our goal let scientists (our customers) do
science without worrying about their computer
environment - Our clientele
- Four disciplines (climate, astrophysics, genomics
and proteomics, high-energy physics) - National labs and universities
- Using resources all over the country
- Residing all over the place
- We must deploy the result (Deploy or die)
3Well, OK. Butdeploy what?
- Where are the commonalities in our space?
- Security and trust nonexistent to extreme
- Network connectivity dialup to OC12
- File sizes bytes to terabytes
- File location local unit to partitions around
the world - Visualization static to dynamic real-time
- And so on.
- We cant do it all.
- So exactly what are we going to deploy?
- And how should we proceed?
4Achieving successful deployment
- For each of the 4 projects, define basic steps
- Define target environment(s)
- Characterize successful deployment (in each)
- Prototype in a close-to-production environment
- Deploy in production
- In parallel with the above
- Produce documentation at every step
- Develop tools for support staff
- Start now.
5Step 1 Define target environment(s)
- We cannot support all combinations.
- Security DCE, Kerberos, PKI, gss, firewalls,
- Compute resource MPP, cluster, workstation,
- User platform MPP, cluster, Unix/linux,
Windows, - Storage
- Storage resource HPSS, PVFS, ?
- User API for access to data
- NetCDF, HDF5, both, something else?
- HRM, pftp, GridFTP, hsi,
- Network
- WAN GigE/jumbo, FastE, OC12, OC3, ESnet, hops,
- LAN GigE, FastE, iSCSI, FibreChannel,
- Visualization CAVE, workstations, Palm Pilots,
- We will have to choose.
6Step 2 Characterize successful deployment
- A. Correct operation in the security environment
- B. Optimized performance in the target network
environment - C. Rugged infrastructure
- D. Unobtrusive infrastructure
- E. Thorough documentation for users and support
staff
7Step 2 Characterize A Security
- I believe we must define the environment into
which we intend to deploy. - Starting now
- Because it will take a long time and will almost
certainly require development. - Questions to which we need answers
- Are we concerned with DOE sites or DOENSF?
- Are there circumstances where clear-text
passwords are OK? Where no security is OK? - Must we support authentication in pki, gsi, dce
and/or Kerberos? - Will all of our infrastructure work with
firewalls at one or both ends of a transfer?
Whose firewalls, what filtering parameters,
8Step 2 Characterize B Network
- On what network are the end nodes?
- What is our target environment ESnet,
ESnetInternet2, Grid, www, - What throughput is needed for effective science?
9Step 2 Characterize C Rugged
- Must not crash (of course)
- Must be in service when needed
- Must be secure
- Must have a support plan (which does not require
an army of support people) - Must have trouble-resolution mechanism and
resources - Must be survivable over normal maintenance
- System software patches and upgrades
- Equipment upgrades
10Step 2 Characterize D Unobtrusive
- User should need minimal knowledge
- The deeper the infrastructure, the less the user
should need to know - User should be protected from mistakes
- Try not to let the user screw things up
- Documentation and real-time warnings
- Effective defaults
11Step 2 Characterize E Documentation
- White papers to inform larger community
- For users how-to-use documents
- For system-admin staff
- How to install, debug, maintain, troubleshoot
- For user-support staff
- How to troubleshoot
- Tuning knobs
- For programmers
- Overview documents to give context
- Correct interface documents
- Correct documentation for all appropriate
platforms
12Step 3 Prototype in close-to-production
environment
- Example of deployment approach on Probe
- Deploy early prototypes in Oak Ridge and NERSC
- Use Probe, Probe HPSS, Production HPSS and
supercomputers - Use (and require) documented code and procedures
- As development progresses, evaluate and address
deployment issues such as security, network
performance, system-admin documentation - As prototype becomes more robust, migrate more
functions to Oak Ridge and NERSC production
environments - Continue to evaluate and address deployment
issues that now include user and user-support
documentation - Iterate as necessary
- When this sequence is done, youre in production.
13Overview of ORNL Architecture, March 2002
Gigabit Ethernet (jumbo frames)
IBM and Compaq Supercomputers and 64-node linux
cluster
Production HPSS
Disk Cache
STK Library
STK Library
Disk Cache
Probe HPSS
Origin 2000 Reality Monster
Stingray RS/6000 S80
Marlin RS/6000 H70
Other Probe Nodes
220 GB SCSI RAID
360 GB Sun FibreChannel RAID
CAVE
360 GB FibreChannel RAID
600 GB SCSI JBOD
14Example How Terascale Supernova Initiative could
be prototyped
Bulk storage
IBM and Compaq Supercomputers
Production HPSS
Probe HPSS
CAVE
Data reduction, pre-viz manipulation
Rendering
Stingray RS/6000 S80
Marlin RS/6000 H70
Origin 2000 Reality Monster
Other Probe Nodes
External Esnet Router
15We should start right away
- Select initial, intermediate and ultimate target
environments - Including supported applications, platforms,
security and target network - Describe in a white paper
- Seek common elements in supported applications
- Develop a deployment plan for common elements
- Write white paper describing deployment plan
- Specify our approach to deploying support for
those elements - Identifying un-met requirements, and how to
remedy - Describing approach to ruggedness and
unobtrusiveness - Address non-common elements in supported
applications - Seek to minimize their impact
- Specify our approach to deploying support for
those elements - Develop deployment plans and describe them
- Write white paper describing deployment plan
16DISCUSSION?
17Serious questions for early resolution
- What is the role of HPSS?
- HPSS will never be pervasive expensive.
- Treat HPSS sites as primary repositories?
- Which file transfer protocol(s) do we support?
- GridFTP, pftp, his
18Probe Place to beOverview of ORNL Probe
Cell, February 2002
Probe Production
Compaq DS20
IBM and Compaq Supercomputers
RS/6000 44P-170
Gigabit Ethernet
Sun Ultra 10
External Esnet Router
To NERSC Probe
IBM F50
Sun E450
GSN Switch
GSN Bridge
SGI Origin 200
Marlin RS/6000 H70
RS/6000 B80
Origin 2000 Reality Monster
Sun E250
Stingray RS/6000 S80
360 GB Sun FibreChannel Disks
FibreChannel Switch
STK Silo
STK Silo
200 GB SCSI RAID Disks
360 GB STK FibreChannel Disks
3494 Library
19Backupslide
20Technology on hand and available
- Software
- HPSS (unlimited instantiations) and HPSS
development license - HDF5, NetCDF
- R, ggobi
- gcc suite
- C on Solaris, AIX, IRIX and Tru64
- Fortran on AIX
- Oracle 8i and DB2 (current developers editions)
on AIX - Globus 2.0/AIX and Solaris
- HRM
- Inter-HPSS hsi application
- OPNET modeling product
- MPI/IO testbed
- 18 nodes IBM/AIX, Sun/Solaris, SGI/IRIX,
Compaq/Tru64 - GRID nodes (Sun/Solaris, IBM/AIX, possibly linux)
- ESnet III OC12 externally, GigE jumbo and Fast
Ethernet internally - Web100 and NET100 participation