Title: Title of Presentation Here
1(No Transcript)
2Future of Scientific Computing
- Marvin Theimer
- Software Architect
- Windows Server High Performance Computing Group
- Microsoft Corporation
3Supercomputing Goes Personal
4Molecular Biologists Workstation
- High-end workstation with internal cluster nodes
- 8 Opteron, 20 Gflops workstation/cluster for
O(10,000) - Turn-key system purchased from a standard OEM
- Pre-installed set of bioinformatics applications
- Run interactive workstation applications that
offload computationally intensive tasks to
attached cluster nodes - Run workflows consisting of visualization and
analysis programs that process the outputs of
simulations running on attached cluster nodes
5The Future Supercomputing on a Chip
- IBM Cell processor
- 256 Gflops today
- 4 node personal cluster 1 Tflops
- 32 node personal cluster Top100
- Intel many-core chips
- 100s of cores on a chip in 2015 (Justin
Rattner, Intel) - 4 cores/Tflop 25 Tflops/chip
6The Continuing Trend Towards Decentralized,
Dedicated Resources
Grids of personal departmental clusters
Personal workstations departmental servers
Minicomputers
Mainframes
7The Evolving Nature of HPC
8Exploding Data Sizes
- Experimental data TBs ? PBs
- Modeling data
- Today
- 10s to 100s of GB per simulation is the common
case - Applications mostly run in isolation
- Tomorrow
- 10s to 100s of TBs, all of it to be archived
- Whole-system modeling and multi-application
workflows
9How Do You Move A Terabyte?
Material courtesy of Jim Gray
10Anticipated HPC Grid Topology
- Islands of high connectivity
- Simulations done on personal workgroup
clusters - Data stored in data warehouses
- Data analysis best done inside the data
warehouse - Wide-area data sharing/replication via FedEx?
Personal cluster
Workgroup cluster
Data warehouse
11Data Analysis and Mining
- Traditional approach
- Keep data in flat files
- Write C or Perl programs to compute specific
analysis queries - Problems with this approach
- Imposes significant development times
- Scientists must reinvent DB indexing and query
technologies - Have to copy the data from the file system to the
compute cluster for every query - Results from the astronomy community
- Relational databases can yield speed-ups of one
to two orders of magnitude - SQL application/domain-specific stored
procedures greatly simplify creation of analysis
queries
12Is That the End of the Story?
Personal cluster
Relational Data warehouse
Workgroup cluster
13Too Much Complexity
2004 NAS supercomputing report O(35) new
computational scientists graduated per year
- Parallel application development
- Chip-level, node-level, cluster-level, LAN
grid-level, WAN grid-level parallelism - OpenMP, MPI, HPF, Global Arrays,
- Component architectures
- Performance configuration tuning
- Debugging/profiling/tracing/analysis
- Digital experimentation
- Experiment management
- Provenance (data workflows)
- Version management (data workflows)
Domain science
- Distributed systems issues
- Security
- System management
- Directory services
- Storage management
Personal cluster
Relational Data warehouse
Workgroup cluster
14Separating the Domain Scientist from the Computer
Scientist
Computer scientist
Parallel/distributed file systems, relational
data warehouses, dynamic systems management,
Web Services HPC grids
Concrete concurrency
Concrete workflow
Abstract concurrency
Computational scientist
Parallel domain application development
Abstract workflow
Domain scientist
(Interactive) scientific workflow, integrated
with collaboration-enhanced office automation
tools
Example
Write scientific paper (Word)
Record experiment data (Excel)
Individual experiment run (Workflow orchestrator)
Analyze data (SQL-Server)
Share paper with co-authors (Sharepoint)
Collaborate with co-authors (NetMeeting)
15Scientific Information WorkerPast and Future
- Past
- Buy lab equipment
- Keep lab notebook
- Run experiments by hand
- Assemble analyze data (using stat pkg)
- Collaborate by phone/email write up results with
Latex - Metaphor
- Physical experimentation
- Do it yourself
- Lots of disparate systems/pieces
- Future
- Buy hardware software
- Automatic provenance
- Workflow with 3rd party domain packages
- Excel Access/Sql-Server
- Office tool suite with collaboration support
- Metaphor
- Digital experimentation
- Turn-key desktop supercomputer
- Single integrated system