SCICOMP, IBM, and TACC: Then, Now, and Next - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

SCICOMP, IBM, and TACC: Then, Now, and Next

Description:

This presentation contains some historical recollections from over 5 years ago. ... LONGHORN. TEJAS. ARCHIVE. Cray-Dell Xeon Linux Cluster. 1028 CPUs (6.3 Tflops) ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 66
Provided by: johnr183
Learn more at: http://www.spscicomp.org
Category:
Tags: ibm | scicomp | tacc | longhorn | next | now

less

Transcript and Presenter's Notes

Title: SCICOMP, IBM, and TACC: Then, Now, and Next


1
SCICOMP, IBM, and TACCThen, Now, and Next
  • Jay Boisseau, Director
  • Texas Advanced Computing Center
  • The University of Texas at Austin
  • August 10, 2004

2
Precautions
  • This presentation contains some historical
    recollections from over 5 years ago. I cant
    usually recall what I had for lunch yesterday.
  • This presentation contains some ideas on where I
    think things might be going next. If I cant
    recall yesterdays lunch, it seems unlikely that
    I can predict anything.
  • This presentation contains many tongue-in-cheek
    observations, exaggerations for dramatic effect,
    etc.
  • This presentation may cause boredom, drowsiness,
    nausea, or hunger.

3
Outline
  • Why Did We Create SCICOMP5 Years Ago?
  • What Did I Do with My Summer (and the Previous 3
    Years)?
  • What is TACC Doing Now with IBM?
  • Where Are We Now? Where Are We Going?

4
Why Did We Create SCICOMP5 Years Ago?
5
The Dark Ages of HPC
  • In late 1990s, most supercomputing was
    accomplished on proprietary systems from IBM, HP,
    SGI (including Cray), etc.
  • User environments were not very friendly
  • Limited development environment (debuggers,
    optimization tools, etc.)
  • Very few cross platform tools
  • Difficult programming tools (MPI, OpenMP some
    things havent changed)

6
Missing Cray Research
  • Cray was no longer the dominant company, and it
    showed
  • Trend towards commoditization had begun
  • Systems were not balanced
  • Cray T3Es were used longer than any production
    MPP
  • Software for HPC was limited, not as reliable
  • Who doesnt miss real checkpoint/restart,
    automatic performance monitoring, no weekly PM
    downtime, etc.?
  • Companies were not as focused on HPC/research
    customers as on larger markets

7
1998-99 Making Things Better
  • John Levesque hired by IBM to start the Advanced
    Computing Technology Center
  • Goal ACTC should provide to customers what Cray
    Research used to provide
  • Jay Boisseau became first Associate Director of
    Scientific Computing at SDSC
  • Goal Ensure SDSC helped users migrate from Cray
    T3E to IBM SP and do important, effective
    computational research

8
Creating SCICOMP
  • John and Jay hosted workshop at SDSC in March
    1999 open to users and center staff
  • to discuss current state, issues, techniques, and
    results in using IBM systems for HPC
  • SP-XXL already existed, but was exclusive and
    more systems-oriented
  • Success led to first IBM SP Scientific Computing
    User Group meeting (SCICOMP) in August 1999 in
    Yorktown Heights Jay as first director
  • Second meeting held in early 2000 at SDSC
  • In late 2000, John Jay invited international
    participation in SCICOMP at IBM ACTC workshop in
    Paris

9
What Did I Do with My Summer(and the Previous 3
Years)?
10
Moving to TACC?
  • In 2001, I accepted job as director of TACC
  • Major rebuilding task
  • Only 14 staff
  • No RD programs
  • Outdated HPC systems
  • No visualization, grid computing or
    data-intensive computing
  • Little funding
  • Not much profile
  • Past political issues

11
Moving to TACC!
  • But big opportunities
  • Talented key staff in HPC, systems, and
    operations
  • Space for growth
  • IBM Austin across the street
  • Almost every other major HPC vendor has large
    presence in Austin
  • UT Austin has both quality and scale in sciences,
    engineering, CS
  • UT and Texas have unparalleled internal
    external support (pride is not always a vice)
  • Austin is a fantastic place to live (and recruit)

12
Moving to TACC!
  • TEXAS-SIZED opportunities
  • Talented key staff in HPC, systems, and
    operations
  • Space for growth
  • IBM Austin across the street
  • Almost every other major HPC vendor has large
    presence in Austin
  • UT Austin is has both quality and scale in
    sciences, engineering, CS
  • UT and Texas have unparalleled internal
    external support (pride is not always a vice)
  • Austin is fantastic place to live (and recruit)

13
Moving to TACC!
  • TEXAS-SIZED opportunities
  • Talented key staff in HPC, systems, and
    operations
  • Space for growth
  • IBM Austin across the street
  • Almost every other major HPC vendor has large
    presence in Austin
  • UT Austin is has both quality and scale in
    sciences, engineering, CS
  • UT and Texas have unparalleled internal
    external support (pride is not always a vice)
  • Austin is fantastic place to live (and recruit)
  • I got the chance to build something else good and
    important

14
TACC Mission
  • To enhance the research education programsof
    The University of Texas at Austin and its
    partners through research, development, operation
    support of advanced computing technologies.

15
TACC Strategy
  • To accomplish this mission, TACC
  • Evaluates, acquires operatesadvanced computing
    systems
  • Provides training, consulting, anddocumentation
    to users
  • Collaborates with researchers toapply advanced
    computing techniques
  • Conducts research development toproduce new
    computational technologies

Resources Services
Research Development
16
TACC Advanced ComputingTechnology Areas
  • High Performance Computing (HPC)
  • numerically intensive computing produces data

17
TACC Advanced ComputingTechnology Areas
  • High Performance Computing (HPC)
  • numerically intensive computing produces data
  • Scientific Visualization (SciVis)
  • rendering data into information knowledge

18
TACC Advanced ComputingTechnology Areas
  • High Performance Computing (HPC)
  • numerically intensive computing produces data
  • Scientific Visualization (SciVis)
  • rendering data into information knowledge
  • Data Information Systems (DIS)
  • managing and analyzing data for information
    knowledge

19
TACC Advanced ComputingTechnology Areas
  • High Performance Computing (HPC)
  • numerically intensive computing produces data
  • Scientific Visualization (SciVis)
  • rendering data into information knowledge
  • Data Information Systems (DIS)
  • managing and analyzing data for information
    knowledge
  • Distributed and Grid Computing (DGC)
  • integrating diverse resources, data, and people
    to produce and share knowledge

20
TACC Activities Scope
Since 2001!
Since 1986
21
TACC Applications Focus Areas
  • TACC advanced computing technology RD must be
    driven by applications
  • TACC Applications Focus Areas
  • Chemistry -gt Biosciences
  • Climate/Weather/Ocean -gt Geosciences
  • CFD

22
TACC HPC Storage Systems
LONGHORN
LONESTAR
TEJAS
IBM Power4 System 224 CPUs (1.16 Tflops) ½ TB
memory, 7.1 TB disk
Cray-Dell Xeon Linux Cluster1028 CPUs (6.3
Tflops) 1 TB memory, 40 TB disk
IBM Linux Pentium III Cluster 64 CPUs (64
Gflops) 32 GB memory, 1 TB disk
ARCHIVE
SAN
STK PowderHorns (2) 2.8 PB max capacity managed
by Cray DMF
Sun SANs (2) 8 TB / 4 TB to be expanded
23
ACES VisLab
  • Front and Rear Projection Systems
  • 3x1 cylindrical immersive environment, 24
    diameter
  • 5x2 large-screen, 169 panel tiled display
  • Full immersive capabilities with head/motion
    tracking
  • High end rendering systems
  • Sun E25K 128 processors, ½ TB memory, gt 3
    Gpoly/sec
  • SGI Onyx2 24 CPUs, 6 IR2 Graphics Pipes, 25 GB
    Memory
  • Matrix switch between systems, projectors, rooms

24
TACC Services
  • TACC resources and services include
  • Consulting
  • Training
  • Technical documentation
  • Data storage/archival
  • System selection/configuration consulting
  • System hosting

25
TACC RD High Performance Computing
  • Scalability, performance optimization, and
    performance modeling for HPC applications
  • Evaluation of cluster technologies for HPC
  • Portability and performance issues of
    applications on clusters
  • Climate, weather, ocean modeling collaboration
    and support of DoD
  • Starting CFD activities

26
TACC RD Scientific Visualization
  • Feature detection / terascale data analysis
  • Evaluation of performance characteristics and
    capabilities of high-end visualization
    technologies
  • Hardware accelerated visualization and
    computation on GPUs
  • Remote interactive visualization / grid-enabled
    interactive visualization

27
TACC RD Data Information Systems
  • Newest technology group at TACC
  • Initial RD focused on creating/hosting
    scientific data collections
  • Interests / plans
  • Geospatial and biological database extensions
  • Efficient ways to collect/create metadata
  • DB clusters / parallel DB I/O for scientific data

28
TACC RD Distributed Grid Computing
  • Web-based grid portals
  • Grid resource data collection and information
    services
  • Grid scheduling and workflow
  • Grid-enabled visualization
  • Grid-enabled data collection hosting
  • Overall grid deployment and integration

29
TACC RD - Networking
  • Very new activities
  • Exploring high-bandwidth (OC-12, GigE, OC-48,
    OC192) remote and collaborative grid-enabled
    visualization
  • Exploring network performance for moving
    terascale data on 10 Gbps networks (TeraGrid)
  • Exploring GigE aggregation to fill 10 Gbps
    networks (parallel file I/O, parallel database
    I/O)
  • Recruiting a leader for TACC networking RD
    activities

30
TACC Growth
  • New infrastructure provides UT with
    comprehensive, balanced, world-class resources
  • 50x HPC capability
  • 20x archival capability
  • 10x network capability
  • World-class VisLab
  • New SAN
  • New comprehensive RD program with focus on
    impact
  • Activities in HPC, SciVis, DIS, DGC
  • New opportunities for professional staff
  • 40 new, wonderful people in 3 years, adding to
    the excellent core of talented people that have
    been at TACC for many years

31
Summary of My Time with TACCOver Past 3 years
  • TACC provides terascale HPC, SciVis, storage,
    data collection, and network resources
  • TACC provides expert support services
    consulting, documentation, and training in HPC,
    SciVis, and Grid
  • TACC conducts applied research development in
    these advanced computing technologies
  • TACC has become one of the leading academic
    advanced computing centers in years
  • I have the best job in the world, mainly
    becauseI have the best staff in the world (but
    also because of UT and Austin)

32
And one other thing kept me busy the past 3 years
33
What is TACC Doing Now with IBM?
34
UT Grid Enable Campus-wide Terascale Distributed
Computing
  • Vision provide high-end systems, but move from
    island to hub of campus computing continuum
  • provide models for local resources (clusters,
    vislabs, etc.), training, and documentation
  • develop procedures for connecting local systems
    to campus grid
  • single sign-on, data space, compute space
  • leverage every PC, cluster, NAS, etc. on campus!
  • integrate digital assets into campus grid
  • integrate UT instruments sensors into campus
    grid
  • Joint project with IBM

35
Building a Grid Together
  • UT Grid Joint Between UT and IBM
  • TACC wants to be leader in e-science
  • IBM is a leader in e-business
  • UT Grid enables both to
  • Gain deployment experience (IBM Global Services)
  • Have a RD testbed
  • Deliverables/Benefits
  • Deployment experience
  • Grid Zone papers
  • Other papers

36
UT Grid Initial Focus on Computing
  • High-throughput parallel computing
  • Project Rodeo
  • Use CSF to schedule to LSF, PBS, SGE clusters
    across campus
  • Use Globus 3.2 -gt GT4
  • High-throughput serial computing
  • Project Roundup uses United Devices software on
    campus PCs
  • Also interfacing to Condor flock in CS department

37
UT Grid Initial Focus on Computing
  • Develop CSF adapters for popular resource
    management systems through collaboration
  • LSF done by Platform Computing
  • Globus done by Platform Computing
  • PBS partially done
  • SGE
  • LoadLeveler
  • Condor

38
UT Grid Initial Focus on Computing
  • Develop CSF capability for flexible job
    requirements
  • Serial vs parallel no diff, just specify Ncpus
  • Number facilitate ensembles
  • Batch whenever, or by priority
  • Advanced reservation needed for coupling,
    interactive
  • On-demand needed for urgency
  • Integrate data management for jobs into CSF
  • SAN makes it easy
  • GridFTP is somewhat simple, if crude
  • Avaki Data Grid is a possibility

39
UT Grid Initial Focus on Computing
  • Completion time in a compute grid is a function
    of
  • data transfer times
  • Use NWS for network bandwidth predictions, file
    transfer time predictions (Rich Wolski, UCSB)
  • queue wait times
  • Use new software from Wolski for prediction of
    start of execution in batch systems
  • application performance times
  • Use Prophesy (Valerie Taylor) for applications
    performance prediction
  • Develop CSF scheduling module that is data,
    network, and performance aware

40
UT Grid Full Service!
  • UT Grid will offer a complete set of services
  • Compute services
  • Storage services
  • Data collections services
  • Visualization services
  • Instruments services
  • But this will take 2 yearsfocusing on compute
    services now

41
UT Grid Interfaces
  • Grid User Portal
  • Hosted, built on GridPort
  • Augment developers by providing info services
  • Enable productivity by simplifying production
    usage
  • Grid User Node
  • Hosted, software includes GridShell plus client
    versions of all other UT Grid software
  • Downloadable version enables configuring local
    Linux box into UT Grid (eventually, Windows and
    Mac)

42
UT Grid Logical View
  • Integrate distributed TACCresources first
    (Globus, LSF, NWS,SRB, United Devices, GridPort)

TACC HPC, Vis, Storage
(actually spread across two campuses)
43
UT Grid Logical View
  • Next add other UTresources in one bldg.as spoke
    usingsame tools andprocedures

TACC HPC, Vis, Storage
ICES Data
ICES Cluster
ICES Cluster
44
UT Grid Logical View
PGE Data
  • Next add other UTresources in one bldg.as spoke
    usingsame tools andprocedures

PGE Cluster
TACC HPC, Vis, Storage
PGE Cluster
ICES Cluster
ICES Cluster
ICES Cluster
45
UT Grid Logical View
PGE Data
BIO Instrument
  • Next add other UTresources in one bldg.as spoke
    usingsame tools andprocedures

BIO Cluster
PGE Cluster
GEO Data
TACC HPC, Vis, Storage
PGE Cluster
GEO Instrument
ICES Cluster
ICES Cluster
ICES Cluster
46
UT Grid Logical View
PGE Data
BIO Instrument
  • Finally negotiateconnectionsbetween spokesfor
    willing participantsto develop a P2P grid.

Bio Cluster
PGE Cluster
GEO Data
TACC HPC, Vis, Storage
PGE Cluster
GEO Instrument
ICES Data
ICES Cluster
ICES Cluster
47
UT Grid Physical ViewTACC Systems
Ext nets
Research campus
NOC
GAATN
NOC
CMS
Switch
TACC Storage
TACC PWR4
ACES
TACC Cluster
Switch
TACC Vis
Main campus
48
UT Grid Physical ViewAdd ICES Resources
Ext nets
Research campus
NOC
GAATN
NOC
CMS
Switch
TACC Storage
TACC PWR4
ACES
TACC Cluster
Switch
ICES Cluster
TACC Vis
ICES Data
ICES Cluster
Main campus
49
UT Grid Physical ViewAdd Other Resources
Ext nets
Research campus
NOC
GAATN
NOC
CMS
Switch
TACC Storage
PGE
TACC PWR4
ACES
TACC Cluster
Switch
ICES Cluster
PGE Cluster
Switch
TACC Vis
ICES Data
PGE Cluster
ICES Cluster
PGE Data
Main campus
50
Texas Internet Grid for Research Education
(TIGRE)
  • Multi-university grid Texas, AM, Houston, Rice,
    Texas Tech
  • Build-out in 2004-5
  • Will integrate additional universities
  • Will facilitate academic research capabilities
    across Texas using Internet2 initially
  • Will extend to industrial partners to foster
    academic/industrial collaboration on RD

51
NSF TeraGrid National Cyberinfrastructure for
Computational Science
  • TeraGrid is worlds largest cyerinfrastructure
    for computational research
  • Includes NCSA, SDSC, PSC, Caltech, Argonne, Oak
    Ridge, Indiana, Purdue
  • Massive bandwidth! Each connection is one or more
    10 Gbps links!

- TACC will provide terascale computing, storage,
and visualization resources - UT will provide
terascale geosciences data sets
52
Where Are We Now?Where are We Going?
53
The Buzz Words
  • Clusters, Clusters, Clusters
  • Grids Cyberinfrastructure
  • Data, Data, Data

54
Clusters, Clusters, Clusters
  • No sense in trying to make long-term predictions
    here
  • 64-bit is going to be more important (duh)but is
    not yet (for most workloads)
  • Evaluate options, but differences are not so
    great (for diverse workloads)
  • Pricing is generally normalized to performance
    (via sales) for commodities

55
Grids Cyberinfrastructure Are Coming Really!
  • The Grid is coming eventually
  • The concept of a Grid was ahead of the standards
  • But we all use distributed computing anyway, and
    the advantages are just too big not to solve the
    issues
  • Still have to solve many of the same distributed
    computing research problems (but at least now we
    have standards to develop to)
  • grid computing is here almost
  • WSRF means finally getting the standards right
  • Federal agencies and companies alike are
    investing heavily in good projects and starting
    to see results

56
TACC Grid Tools and Deployments
  • Grid Computing Tools
  • GridPort transparent grid computing from Web
  • GridShell transparent grid computing from CLI
  • CSF grid scheduling
  • GridFlow / GridSteer for coupling vis, steering
    simulations
  • Cyberinfrastructure Deployments
  • TeraGrid national cyberinfrastructure
  • TIGRE state-wide cyberinfrastructure
  • UT Grid campus cyberinfrastructure for research
    education

57
Data, Data, Data
  • Our ability to create and collect data (computing
    systems, instruments, sensors) is exploding
  • Availability of data even driving new modes of
    science (e.g., bioinformatics)
  • Data availability and need for sharing, analysis,
    is driving the other aspects of computing
  • Need for 64-bit microprocessors, improved memory
    systems
  • Parallel file I/O
  • Use of scientific databases, parallel databases
  • Increased network bandwidth
  • Grids for managing, sharing remote data

58
Renewed U.S. Interest in HEC Will Have Impact
  • While clusters are important, non-clusters are
    still important!!!
  • Projects like IBM Blue Gene/L, Cray Red Storm,
    etc. address different problems than clusters
  • DARPA HPCS program is really important, but only
    a start
  • Strategic national interests require national
    investment!!!
  • I think well see more federal funding for
    innovative research into computer systems

59
Visualization Will Catch Up
  • Visualization often lags behind HPC, storage
  • Flops get publicity
  • Bytes cant get lost
  • Even Rainman cant get insight from terabytes of
    0s and 1s
  • Explosion in data creates limitations requiring
  • Feature detection (good)
  • Downsizing problem (bad)
  • Downsampling data (ugly)

60
Visualization Will Catch Up
  • As PCs impacted HPC, so will are graphics cards
    impacting visualization
  • Custom SMP systems using graphics cards (Sun,
    SGI)
  • Graphics clusters (Linux, Windows)
  • As with HPC, still a need for custom, powerful
    visualization solutions on certain problems
  • SGI has largely exited this market
  • IBM left long agoplease come back!
  • Again, requires federal investment

61
What Should You Do This Week?
62
Austin is Fun, Cool, Weird, Wonderful
  • Mix of hippies, slackers, academics, geeks,
    politicos, musicians, and cowboys
  • Keep Austin Weird
  • Live Music Capital of the World (seriously)
  • Also great restaurants, cafes, clubs, bars,
    theaters, galleries, etc.
  • http//www.austinchronicle.com/
  • http//www.austin360.com/xl/content/xl/index.html
  • http//www.research.ibm.com/arl/austin/index.html

63
Your Austin To-Do List
  • Eat barbecue at Rudys, Stubbs, Iron Works,
    Green Mesquite, etc.
  • Eat Tex-Mex and at Chuys, Trudys, Maudies,
    etc.
  • Have a cold Shiner Bock (not Lone Star)
  • Visit 6th Street and Warehouse District at night
  • See sketch comedy at Esthers Follies
  • Go to at least one live music show
  • Learn to two-step at The Broken Spoke
  • Walk/jog/bike around Town Lake
  • See a million bats emerge from Congress Ave.
    bridge at sunset
  • Visit the Texas State History Museum
  • Visit the UT main campus
  • See movie at Alamo Drafthouse Cinema (arrive
    early, order beer food)
  • See the Round Rock Express at the Dell Diamond
  • Drive into Hill Country, visit small towns and
    wineries
  • Eat Amys Ice Cream
  • Listen to and buy local music at Waterloo Records
  • Buy a bottle each of Rudys Barbecue Sause and
    Titos Vodka

64
Final Comments Thoughts
  • Im very pleased to see SCICOMP is still going
    strong
  • Great leaders and a great community make it last
  • Still a need for groups like this
  • technologies get more powerful, but not
    necessarily simpler, and impact comes from
    effective utilization
  • More importantly, always a need for energetic,
    talented people to make a difference in advanced
    computing
  • Contribute to valuable efforts
  • Dont be afraid to start something if necessary
  • Change is good (even if the only thing certain
    about change is that things will be different
    afterwards)
  • Enjoy Austin!
  • Ask any TACC staff about places to go and things
    to do

65
More About TACC
  • Texas Advanced Computing Center
  • www.tacc.utexas.edu
  • info_at_tacc.utexas.edu
  • (512) 475-9411
Write a Comment
User Comments (0)
About PowerShow.com