Title: SCICOMP, IBM, and TACC: Then, Now, and Next
1SCICOMP, IBM, and TACCThen, Now, and Next
- Jay Boisseau, Director
- Texas Advanced Computing Center
- The University of Texas at Austin
- August 10, 2004
2Precautions
- This presentation contains some historical
recollections from over 5 years ago. I cant
usually recall what I had for lunch yesterday. - This presentation contains some ideas on where I
think things might be going next. If I cant
recall yesterdays lunch, it seems unlikely that
I can predict anything. - This presentation contains many tongue-in-cheek
observations, exaggerations for dramatic effect,
etc. - This presentation may cause boredom, drowsiness,
nausea, or hunger.
3Outline
- Why Did We Create SCICOMP5 Years Ago?
- What Did I Do with My Summer (and the Previous 3
Years)? - What is TACC Doing Now with IBM?
- Where Are We Now? Where Are We Going?
4Why Did We Create SCICOMP5 Years Ago?
5The Dark Ages of HPC
- In late 1990s, most supercomputing was
accomplished on proprietary systems from IBM, HP,
SGI (including Cray), etc. - User environments were not very friendly
- Limited development environment (debuggers,
optimization tools, etc.) - Very few cross platform tools
- Difficult programming tools (MPI, OpenMP some
things havent changed)
6Missing Cray Research
- Cray was no longer the dominant company, and it
showed - Trend towards commoditization had begun
- Systems were not balanced
- Cray T3Es were used longer than any production
MPP - Software for HPC was limited, not as reliable
- Who doesnt miss real checkpoint/restart,
automatic performance monitoring, no weekly PM
downtime, etc.? - Companies were not as focused on HPC/research
customers as on larger markets
71998-99 Making Things Better
- John Levesque hired by IBM to start the Advanced
Computing Technology Center - Goal ACTC should provide to customers what Cray
Research used to provide - Jay Boisseau became first Associate Director of
Scientific Computing at SDSC - Goal Ensure SDSC helped users migrate from Cray
T3E to IBM SP and do important, effective
computational research
8Creating SCICOMP
- John and Jay hosted workshop at SDSC in March
1999 open to users and center staff - to discuss current state, issues, techniques, and
results in using IBM systems for HPC - SP-XXL already existed, but was exclusive and
more systems-oriented - Success led to first IBM SP Scientific Computing
User Group meeting (SCICOMP) in August 1999 in
Yorktown Heights Jay as first director - Second meeting held in early 2000 at SDSC
- In late 2000, John Jay invited international
participation in SCICOMP at IBM ACTC workshop in
Paris
9What Did I Do with My Summer(and the Previous 3
Years)?
10Moving to TACC?
- In 2001, I accepted job as director of TACC
- Major rebuilding task
- Only 14 staff
- No RD programs
- Outdated HPC systems
- No visualization, grid computing or
data-intensive computing - Little funding
- Not much profile
- Past political issues
11Moving to TACC!
- But big opportunities
- Talented key staff in HPC, systems, and
operations - Space for growth
- IBM Austin across the street
- Almost every other major HPC vendor has large
presence in Austin - UT Austin has both quality and scale in sciences,
engineering, CS - UT and Texas have unparalleled internal
external support (pride is not always a vice) - Austin is a fantastic place to live (and recruit)
12Moving to TACC!
- TEXAS-SIZED opportunities
- Talented key staff in HPC, systems, and
operations - Space for growth
- IBM Austin across the street
- Almost every other major HPC vendor has large
presence in Austin - UT Austin is has both quality and scale in
sciences, engineering, CS - UT and Texas have unparalleled internal
external support (pride is not always a vice) - Austin is fantastic place to live (and recruit)
13Moving to TACC!
- TEXAS-SIZED opportunities
- Talented key staff in HPC, systems, and
operations - Space for growth
- IBM Austin across the street
- Almost every other major HPC vendor has large
presence in Austin - UT Austin is has both quality and scale in
sciences, engineering, CS - UT and Texas have unparalleled internal
external support (pride is not always a vice) - Austin is fantastic place to live (and recruit)
- I got the chance to build something else good and
important
14TACC Mission
- To enhance the research education programsof
The University of Texas at Austin and its
partners through research, development, operation
support of advanced computing technologies.
15TACC Strategy
- To accomplish this mission, TACC
- Evaluates, acquires operatesadvanced computing
systems - Provides training, consulting, anddocumentation
to users - Collaborates with researchers toapply advanced
computing techniques - Conducts research development toproduce new
computational technologies
Resources Services
Research Development
16TACC Advanced ComputingTechnology Areas
- High Performance Computing (HPC)
- numerically intensive computing produces data
17TACC Advanced ComputingTechnology Areas
- High Performance Computing (HPC)
- numerically intensive computing produces data
- Scientific Visualization (SciVis)
- rendering data into information knowledge
18TACC Advanced ComputingTechnology Areas
- High Performance Computing (HPC)
- numerically intensive computing produces data
- Scientific Visualization (SciVis)
- rendering data into information knowledge
- Data Information Systems (DIS)
- managing and analyzing data for information
knowledge
19TACC Advanced ComputingTechnology Areas
- High Performance Computing (HPC)
- numerically intensive computing produces data
- Scientific Visualization (SciVis)
- rendering data into information knowledge
- Data Information Systems (DIS)
- managing and analyzing data for information
knowledge - Distributed and Grid Computing (DGC)
- integrating diverse resources, data, and people
to produce and share knowledge
20TACC Activities Scope
Since 2001!
Since 1986
21TACC Applications Focus Areas
- TACC advanced computing technology RD must be
driven by applications - TACC Applications Focus Areas
- Chemistry -gt Biosciences
- Climate/Weather/Ocean -gt Geosciences
- CFD
22TACC HPC Storage Systems
LONGHORN
LONESTAR
TEJAS
IBM Power4 System 224 CPUs (1.16 Tflops) ½ TB
memory, 7.1 TB disk
Cray-Dell Xeon Linux Cluster1028 CPUs (6.3
Tflops) 1 TB memory, 40 TB disk
IBM Linux Pentium III Cluster 64 CPUs (64
Gflops) 32 GB memory, 1 TB disk
ARCHIVE
SAN
STK PowderHorns (2) 2.8 PB max capacity managed
by Cray DMF
Sun SANs (2) 8 TB / 4 TB to be expanded
23ACES VisLab
- Front and Rear Projection Systems
- 3x1 cylindrical immersive environment, 24
diameter - 5x2 large-screen, 169 panel tiled display
- Full immersive capabilities with head/motion
tracking - High end rendering systems
- Sun E25K 128 processors, ½ TB memory, gt 3
Gpoly/sec - SGI Onyx2 24 CPUs, 6 IR2 Graphics Pipes, 25 GB
Memory - Matrix switch between systems, projectors, rooms
24TACC Services
- TACC resources and services include
- Consulting
- Training
- Technical documentation
- Data storage/archival
- System selection/configuration consulting
- System hosting
25TACC RD High Performance Computing
- Scalability, performance optimization, and
performance modeling for HPC applications - Evaluation of cluster technologies for HPC
- Portability and performance issues of
applications on clusters - Climate, weather, ocean modeling collaboration
and support of DoD - Starting CFD activities
26TACC RD Scientific Visualization
- Feature detection / terascale data analysis
- Evaluation of performance characteristics and
capabilities of high-end visualization
technologies - Hardware accelerated visualization and
computation on GPUs - Remote interactive visualization / grid-enabled
interactive visualization
27TACC RD Data Information Systems
- Newest technology group at TACC
- Initial RD focused on creating/hosting
scientific data collections - Interests / plans
- Geospatial and biological database extensions
- Efficient ways to collect/create metadata
- DB clusters / parallel DB I/O for scientific data
28TACC RD Distributed Grid Computing
- Web-based grid portals
- Grid resource data collection and information
services - Grid scheduling and workflow
- Grid-enabled visualization
- Grid-enabled data collection hosting
- Overall grid deployment and integration
29TACC RD - Networking
- Very new activities
- Exploring high-bandwidth (OC-12, GigE, OC-48,
OC192) remote and collaborative grid-enabled
visualization - Exploring network performance for moving
terascale data on 10 Gbps networks (TeraGrid) - Exploring GigE aggregation to fill 10 Gbps
networks (parallel file I/O, parallel database
I/O) - Recruiting a leader for TACC networking RD
activities
30TACC Growth
- New infrastructure provides UT with
comprehensive, balanced, world-class resources - 50x HPC capability
- 20x archival capability
- 10x network capability
- World-class VisLab
- New SAN
- New comprehensive RD program with focus on
impact - Activities in HPC, SciVis, DIS, DGC
- New opportunities for professional staff
- 40 new, wonderful people in 3 years, adding to
the excellent core of talented people that have
been at TACC for many years
31Summary of My Time with TACCOver Past 3 years
- TACC provides terascale HPC, SciVis, storage,
data collection, and network resources - TACC provides expert support services
consulting, documentation, and training in HPC,
SciVis, and Grid - TACC conducts applied research development in
these advanced computing technologies - TACC has become one of the leading academic
advanced computing centers in years - I have the best job in the world, mainly
becauseI have the best staff in the world (but
also because of UT and Austin)
32And one other thing kept me busy the past 3 years
33What is TACC Doing Now with IBM?
34UT Grid Enable Campus-wide Terascale Distributed
Computing
- Vision provide high-end systems, but move from
island to hub of campus computing continuum - provide models for local resources (clusters,
vislabs, etc.), training, and documentation - develop procedures for connecting local systems
to campus grid - single sign-on, data space, compute space
- leverage every PC, cluster, NAS, etc. on campus!
- integrate digital assets into campus grid
- integrate UT instruments sensors into campus
grid - Joint project with IBM
35Building a Grid Together
- UT Grid Joint Between UT and IBM
- TACC wants to be leader in e-science
- IBM is a leader in e-business
- UT Grid enables both to
- Gain deployment experience (IBM Global Services)
- Have a RD testbed
- Deliverables/Benefits
- Deployment experience
- Grid Zone papers
- Other papers
36UT Grid Initial Focus on Computing
- High-throughput parallel computing
- Project Rodeo
- Use CSF to schedule to LSF, PBS, SGE clusters
across campus - Use Globus 3.2 -gt GT4
- High-throughput serial computing
- Project Roundup uses United Devices software on
campus PCs - Also interfacing to Condor flock in CS department
37UT Grid Initial Focus on Computing
- Develop CSF adapters for popular resource
management systems through collaboration - LSF done by Platform Computing
- Globus done by Platform Computing
- PBS partially done
- SGE
- LoadLeveler
- Condor
38UT Grid Initial Focus on Computing
- Develop CSF capability for flexible job
requirements - Serial vs parallel no diff, just specify Ncpus
- Number facilitate ensembles
- Batch whenever, or by priority
- Advanced reservation needed for coupling,
interactive - On-demand needed for urgency
- Integrate data management for jobs into CSF
- SAN makes it easy
- GridFTP is somewhat simple, if crude
- Avaki Data Grid is a possibility
39UT Grid Initial Focus on Computing
- Completion time in a compute grid is a function
of - data transfer times
- Use NWS for network bandwidth predictions, file
transfer time predictions (Rich Wolski, UCSB) - queue wait times
- Use new software from Wolski for prediction of
start of execution in batch systems - application performance times
- Use Prophesy (Valerie Taylor) for applications
performance prediction - Develop CSF scheduling module that is data,
network, and performance aware
40UT Grid Full Service!
- UT Grid will offer a complete set of services
- Compute services
- Storage services
- Data collections services
- Visualization services
- Instruments services
- But this will take 2 yearsfocusing on compute
services now
41UT Grid Interfaces
- Grid User Portal
- Hosted, built on GridPort
- Augment developers by providing info services
- Enable productivity by simplifying production
usage - Grid User Node
- Hosted, software includes GridShell plus client
versions of all other UT Grid software - Downloadable version enables configuring local
Linux box into UT Grid (eventually, Windows and
Mac)
42UT Grid Logical View
- Integrate distributed TACCresources first
(Globus, LSF, NWS,SRB, United Devices, GridPort)
TACC HPC, Vis, Storage
(actually spread across two campuses)
43UT Grid Logical View
- Next add other UTresources in one bldg.as spoke
usingsame tools andprocedures
TACC HPC, Vis, Storage
ICES Data
ICES Cluster
ICES Cluster
44UT Grid Logical View
PGE Data
- Next add other UTresources in one bldg.as spoke
usingsame tools andprocedures
PGE Cluster
TACC HPC, Vis, Storage
PGE Cluster
ICES Cluster
ICES Cluster
ICES Cluster
45UT Grid Logical View
PGE Data
BIO Instrument
- Next add other UTresources in one bldg.as spoke
usingsame tools andprocedures
BIO Cluster
PGE Cluster
GEO Data
TACC HPC, Vis, Storage
PGE Cluster
GEO Instrument
ICES Cluster
ICES Cluster
ICES Cluster
46UT Grid Logical View
PGE Data
BIO Instrument
- Finally negotiateconnectionsbetween spokesfor
willing participantsto develop a P2P grid.
Bio Cluster
PGE Cluster
GEO Data
TACC HPC, Vis, Storage
PGE Cluster
GEO Instrument
ICES Data
ICES Cluster
ICES Cluster
47UT Grid Physical ViewTACC Systems
Ext nets
Research campus
NOC
GAATN
NOC
CMS
Switch
TACC Storage
TACC PWR4
ACES
TACC Cluster
Switch
TACC Vis
Main campus
48UT Grid Physical ViewAdd ICES Resources
Ext nets
Research campus
NOC
GAATN
NOC
CMS
Switch
TACC Storage
TACC PWR4
ACES
TACC Cluster
Switch
ICES Cluster
TACC Vis
ICES Data
ICES Cluster
Main campus
49UT Grid Physical ViewAdd Other Resources
Ext nets
Research campus
NOC
GAATN
NOC
CMS
Switch
TACC Storage
PGE
TACC PWR4
ACES
TACC Cluster
Switch
ICES Cluster
PGE Cluster
Switch
TACC Vis
ICES Data
PGE Cluster
ICES Cluster
PGE Data
Main campus
50Texas Internet Grid for Research Education
(TIGRE)
- Multi-university grid Texas, AM, Houston, Rice,
Texas Tech - Build-out in 2004-5
- Will integrate additional universities
- Will facilitate academic research capabilities
across Texas using Internet2 initially - Will extend to industrial partners to foster
academic/industrial collaboration on RD
51NSF TeraGrid National Cyberinfrastructure for
Computational Science
- TeraGrid is worlds largest cyerinfrastructure
for computational research - Includes NCSA, SDSC, PSC, Caltech, Argonne, Oak
Ridge, Indiana, Purdue - Massive bandwidth! Each connection is one or more
10 Gbps links!
- TACC will provide terascale computing, storage,
and visualization resources - UT will provide
terascale geosciences data sets
52Where Are We Now?Where are We Going?
53The Buzz Words
- Clusters, Clusters, Clusters
- Grids Cyberinfrastructure
- Data, Data, Data
54Clusters, Clusters, Clusters
- No sense in trying to make long-term predictions
here - 64-bit is going to be more important (duh)but is
not yet (for most workloads) - Evaluate options, but differences are not so
great (for diverse workloads) - Pricing is generally normalized to performance
(via sales) for commodities
55Grids Cyberinfrastructure Are Coming Really!
- The Grid is coming eventually
- The concept of a Grid was ahead of the standards
- But we all use distributed computing anyway, and
the advantages are just too big not to solve the
issues - Still have to solve many of the same distributed
computing research problems (but at least now we
have standards to develop to) - grid computing is here almost
- WSRF means finally getting the standards right
- Federal agencies and companies alike are
investing heavily in good projects and starting
to see results
56TACC Grid Tools and Deployments
- Grid Computing Tools
- GridPort transparent grid computing from Web
- GridShell transparent grid computing from CLI
- CSF grid scheduling
- GridFlow / GridSteer for coupling vis, steering
simulations - Cyberinfrastructure Deployments
- TeraGrid national cyberinfrastructure
- TIGRE state-wide cyberinfrastructure
- UT Grid campus cyberinfrastructure for research
education
57Data, Data, Data
- Our ability to create and collect data (computing
systems, instruments, sensors) is exploding - Availability of data even driving new modes of
science (e.g., bioinformatics) - Data availability and need for sharing, analysis,
is driving the other aspects of computing - Need for 64-bit microprocessors, improved memory
systems - Parallel file I/O
- Use of scientific databases, parallel databases
- Increased network bandwidth
- Grids for managing, sharing remote data
58Renewed U.S. Interest in HEC Will Have Impact
- While clusters are important, non-clusters are
still important!!! - Projects like IBM Blue Gene/L, Cray Red Storm,
etc. address different problems than clusters - DARPA HPCS program is really important, but only
a start - Strategic national interests require national
investment!!! - I think well see more federal funding for
innovative research into computer systems
59Visualization Will Catch Up
- Visualization often lags behind HPC, storage
- Flops get publicity
- Bytes cant get lost
- Even Rainman cant get insight from terabytes of
0s and 1s - Explosion in data creates limitations requiring
- Feature detection (good)
- Downsizing problem (bad)
- Downsampling data (ugly)
60Visualization Will Catch Up
- As PCs impacted HPC, so will are graphics cards
impacting visualization - Custom SMP systems using graphics cards (Sun,
SGI) - Graphics clusters (Linux, Windows)
- As with HPC, still a need for custom, powerful
visualization solutions on certain problems - SGI has largely exited this market
- IBM left long agoplease come back!
- Again, requires federal investment
61What Should You Do This Week?
62Austin is Fun, Cool, Weird, Wonderful
- Mix of hippies, slackers, academics, geeks,
politicos, musicians, and cowboys - Keep Austin Weird
- Live Music Capital of the World (seriously)
- Also great restaurants, cafes, clubs, bars,
theaters, galleries, etc. - http//www.austinchronicle.com/
- http//www.austin360.com/xl/content/xl/index.html
- http//www.research.ibm.com/arl/austin/index.html
63Your Austin To-Do List
- Eat barbecue at Rudys, Stubbs, Iron Works,
Green Mesquite, etc. - Eat Tex-Mex and at Chuys, Trudys, Maudies,
etc. - Have a cold Shiner Bock (not Lone Star)
- Visit 6th Street and Warehouse District at night
- See sketch comedy at Esthers Follies
- Go to at least one live music show
- Learn to two-step at The Broken Spoke
- Walk/jog/bike around Town Lake
- See a million bats emerge from Congress Ave.
bridge at sunset - Visit the Texas State History Museum
- Visit the UT main campus
- See movie at Alamo Drafthouse Cinema (arrive
early, order beer food) - See the Round Rock Express at the Dell Diamond
- Drive into Hill Country, visit small towns and
wineries - Eat Amys Ice Cream
- Listen to and buy local music at Waterloo Records
- Buy a bottle each of Rudys Barbecue Sause and
Titos Vodka
64Final Comments Thoughts
- Im very pleased to see SCICOMP is still going
strong - Great leaders and a great community make it last
- Still a need for groups like this
- technologies get more powerful, but not
necessarily simpler, and impact comes from
effective utilization - More importantly, always a need for energetic,
talented people to make a difference in advanced
computing - Contribute to valuable efforts
- Dont be afraid to start something if necessary
- Change is good (even if the only thing certain
about change is that things will be different
afterwards) - Enjoy Austin!
- Ask any TACC staff about places to go and things
to do
65More About TACC
- Texas Advanced Computing Center
- www.tacc.utexas.edu
- info_at_tacc.utexas.edu
- (512) 475-9411