Title: Where Did the Big Visions Go?
1Where Did the Big Visions Go?
- Dan Reed
- Director, NCSA and the Alliance
- Chief Architect, NSF TeraGrid
- Edward William and Jane Marr Gutgsell Professor
- University of Illinois
- reed_at_ncsa.uiuc.edu
2Presentation Outline
Where there is no vision, the people
perish. Proverbs 2918
- A geobiology primer
- nature has a few lessons to share
- A bit of computing history
- heed the words of Santayana
- A snapshot of current reality
- science, clusters and Grids
- Musings on the future
- technical, political, and scientific
3A Bit of Perspective
4A Geobiology Primer
You Are Here!
http//www.ucmp.berkeley.edu
Source UC-Berkeley Museum of Paleontology
5A Few Key Points
- Paleozoic/Carboniferous (350M years)
- eastern U.S. covered by coal swamps
- most of U.S. is underwater
- reptiles arise and diversify
- Paleozoic/Devonian (400M years)
- mollusks, arthropods, and amphibians invade land
- fish diversify and dominate oceans
- Paleozoic/Silurian (430M years)
- invertebrates dominate
- a few plants and (maybe) some animals invade land
- Paleozoic/Ordovician (500M years)
- first vertebrates appear
- Paleozoic/Cambrian (600M years)
- most phyla first appear
- Precambrian (4.5B years)
- the earth forms and cools
- evidence of bacteria, cyanobacteria and
stromatolites
6A Few Key Points
- Cenozoic/Quaternary (2M years)
- humans dominate near the end (100K years)
- Cenozoic/Tertiary (65M years)
- mammals dominate as large animals
- K/T mass extinction due to meteor impact
- Mesozoic/Cretaceous (130M years)
- first flowers and primates appear
- Mesozoic/Jurassic (180M years)
- dinosaurs dominate birds appear
- Mesozoic/Triassic (230M years)
- dinosaurs appear
- Atlantic Ocean forms
- Paleozoic/Permian (270M years)
- reptiles dominate land seas contract
- Permian mass extinction (95 of all life)
- climate change?
7The Cambrian Explosion
- Most phyla appear
- sponges, archaeocyathids, brachiopods
- trilobites, primitive mollusks, echinoderms
- Indeed, most appeared quickly!
- Tommotian and Atdbanian
- as little as five million years
8Cambrian Explosion Causes
- Lots of theories
- plants lowered CO2 levels
- increased O2 enabled metazoan development
- snowball Earth
- climate change triggered rapid evolution
- genetics suggests earlier divergence
- fossil records generally require skeletons
- panspermia via comet seeding
- Lessons for computing
- it doesnt take long when conditions are right
- raw materials and environment
- leave fossil records if you want to be
remembered! ?
9Presentation Outline
Where there is no vision, the people
perish. Proverbs 2918
- A geobiology primer
- nature has a few lessons to share
- A bit of computing history
- heed the words of Santayana
- A snapshot of current reality
- science, clusters and Grids
- Musings on the future
- technical, political, and scientific
10Parallel Computing
- IBM Stretch
- design goal 100-200X IBM 704
- worlds fastest machine until 1964
- parallelism as an enabler
- design timeline
- 1961 LASL delivery retired 1971
- 1962 Harvest NSA delivery retired 1976
- 13.5M list price (95M in current )
- architectural features
- interleaving, pipelining, prefetching
- speculation and forwarding
- Illinois/Burroughs ILLIAC IV
- worlds fastest machine as design goal
- launched 1974, retired 1982
- 30M circa 1972 (130M in current )
- 64 processor SIMD (1/4th design target)
- array language support (Glypnr and IVTRAN)
- thin film memory (2K words/processor)
- ARPANET for remote access
Failing Gloriously
11Man-Computer Symbiosis
- It seems reasonable to envision, for a time 10 or
15 years hence, a 'thinking center' that will
incorporate the functions of present-day
libraries together with anticipated advances in
information storage and retrieval. - The picture readily enlarges itself into a
network of such centers, connected to one another
by wide-band communication lines and to
individual users by leased-wire services. In such
a system, the speed of the computers would be
balanced, and the cost of the gigantic memories
and the sophisticated programs would be divided
by the number of users. J.C.R. Licklider, 1960
12Hunan-Computer Symbiosis
- PLATO (Programmed Logic for Automated Teaching
Operations) - begun in 1960, led by Illinois Don Bitzer
- several spinoffs via CDC, NovaNET,
- Illinois classroom use until 1985
- 10 million hours 1978-1985
- over 3 million hours in Notes
- early online community
- computer music and plasma touch panel displays
- lessons later gave us Lotus Notes and Mosaic
- Project MAC
- Man and Computer or Multiple Access Computer
- 25M ARPA funding from 1963-1970
- 108M in current
- J.C.R. Licklider suggestion, Robert Fano
leadership - Multiplexed Information and Computing Service
(MULTICS) - virtual memory, hierarchical file systems, time
sharing, - a host of innovative ideas and collaborations
Failing Gloriously
13ARPANET
Vint Cerf
Len Kleinrock
BBN IMP Team
Bob Kahn
Larry Roberts
Note the timescale!
14Presentation Outline
Where there is no vision, the people
perish. Proverbs 2918
- A geobiology primer
- nature has a few lessons to share
- A bit of computing history
- heed the words of Santayana
- A snapshot of current reality
- science, clusters and Grids
- Musings on the future
- technical, political, and scientific
15The Really Big Questions
- Life and nature
- structures, processes, and interactions
- Matter and universe
- origins, structure, manipulation, and futures
- interactions, systems, and context
- Humanity
- creativity, socialization, and community
- Answering big questions requires
- boldness to engage opportunities
- new approaches and infrastructure
- new collaborations
- interdisciplinary partnerships
16Big Science Visions Are Common
- Multilevel biological modeling
- from molecules and structures to organisms and
ecologies - petascale systems and beyond
- Distributed, virtual astronomy
- real-time data analysis and multi-modal data
fusion from distributed archives - Personalized, in situ medicine
- drug design tailored to individual DNA with
embedded micro-transfusers - High-energy physics/cosmology fusion
- dark matter, the standard model, and the theory
of everything - Integrated climate change and urban/social
planning - multidisciplinary data fusion, modeling, and
analysis
Big Questions to Get Big Answers
17Big Science Visions Are Common
Source DOE Genomes to Life
18Distributed Virtual Astronomy
- Capabilities
- homogeneous, multi-wavelength data
- observations of millions of objects
- mega-sky surveys (2MASS, SLOAN, )
- Initiatives
- U.S. National Virtual Observatory (NVO)
- Caltech, JHU, ALMA, HST,
- EU Astrophysical Virtual Observatory (AVO)
- ESO, CNRS, CDS,
- Grid data mining and archives
- discovering significant patterns
- analysis of rich image/catalog databases
- understanding complex astrophysical systems
- integrated data/large numerical simulations
HST Data Access
19Earthquake Engineering
- NSF Network for Earthquake
- Engineering Simulation (NEES)
- seamless testing and simulation
- earthquake hazard mitigation
- structural, geotechnical tsunami
- national IT infrastructure
- NCSA/UIUC CE leadership
20The U.S. NSF PACI TeraGrid
ETF Expansion
53M NCSA, SDSC, Argonne, Caltech plus 7.5M
Illinois I-WIRE Initiative and California CENIC
Source Bill Cheswick
Internet circa 1999
Internet circa 1969
21NCSA Terascale Linux Clusters
- 1 TF IA-32 Pentium III cluster (Platinum)
- 512 1 GHz dual processor nodes
- Myrinet 2000 interconnect
- 5 TB of RAID storage
- deployed 2001
- 1 TF IA-64 Itanium cluster (Titan)
- 164 800 MHz dual processor nodes
- Myrinet 2000 interconnect
- deployed 2001
- Large-scale calculations on both
- molecular dynamics (Schulten)
- first nanosecond/day calculations
- gas dynamics (Woodward)
- others underway
- Several additional Pentium-4 TF planned
- beyond ETF TeraGrid
- Software packaging for communities
- NCSA machine room expansion
- capacity to 20 TF and expandable
- dedicated September 5, 2001
22Grid Projects in e-Science
Funding and Software
Source Randy Butler
23Presentation Outline
Where there is no vision, the people
perish. Proverbs 2918
- A geobiology primer
- nature has a few lessons to share
- A bit of computing history
- heed the words of Santayana
- A snapshot of current reality
- science, clusters and Grids
- Musings on the future
- technical, political, and scientific
24Lessons Learned?
- Commercial/historical
- ACRI ACRI-1, Ametek 2010, Burroughs BSP, TI ASC,
ETA ETA-10, Denelcor HEP, Gould NPL, Multiflow
8/256, TMC CM-2 and CM-5, BBN Butterfly, FPS
AP-128, SSI SS-1, Goodyear MPP, ICL DAP, INMOS
Transputer, CCC Cray-3, CRI T3E, KSR K-1,
Stardent, Convex C-4, Alliant FX/80, Sequent,
Encore, Intel Touchstone, MasPar MP-2, Meiko
CS-2, NCUBE nCube/10, Compaq AlphaServer - Commercial/current
- IBM p960, SGI Origin3000, HP Superdome, Fujitsu
VPP5000, Hitachi SR8000, NEC SX-6, Cray SV1/X1 - Research/historical
- Texas TRAC, Illinois ILLIAC, Illinois Cedar,
Stanford DASH/FLASH, NYU Ultracomputer, IBM RP3,
CMU C.mmp, CMU Cm, Manchester Dataflow, NASA
FEM, Purdue PASM, Purdue Pringle, SUPRENUM,
Caltech Cosmic Cube, DEC Andromeda, MIT
J-Machine, Monsoon, Beowulf - Research/current
- Tokyo GRAPE6, Columbia QCDOC, IBM BG/L, DIVA,
Gilgamesh,
25Near Term Directions for Clusters
- High-density web server farms (IA-32, AMD,
Transmeta) - blade servers optimized for dense web serving
- scalable, but not targeting high-performance
numerical computing - Passive backplane clusters (IA-32, Infiniband)
- reasonably dense packaging possible
- high-scalability is not a design goal
- Server-based clusters (IA-64, x86-64 and Power4)
- good price/performance but poor packaging density
- designed for commercial, I/O intensive
configurations - Sony Playstation2 (Emotion Engine, IBM Cell
Project) - excellent pure price/performance 50K/Teraflop
- imbalanced systems and complex microarchitectures
- NCSA is building a 100 node (0.5 TF peak) PS2
cluster
26Exoscale Computing Options
- Quantum
- superposition, Hilbert spaces, and the EPR
paradox - Biological
- DNA encoding and PCR
- Silicon
- escaping the von Neumann bottleneck
- PIM
27U.S. Terrestrial Network Supply
DWDMInput
- Transoceanic bandwidth is similar
- greater than 75 of lit capacity is unused!
- one of the pluses of the dot.com crash
- How do we leverage lambda dominance?
- provisioning sites, circuit switching redux,
Source TeleGeography 2002/Network Photonics
28The Revolution Is Here!
- PCs
- 100-150 million/year
- Embedded processors
- 4 billion in 1997
- 8 billion in 2000
- Wireless explosion
- telephones and 802.11
- Electronic tags and intelligent objects
- tags on everyday things (and individuals)
- books, instruments, papers, forms, clothes,
medicine - creating the ubiquitous infosphere
- EC is discussing RF tags on every Euro
- security, anti-counterfeiting, and currency
tracking
Smart Labels
SC02 Smart Badge Experience
Point of Sale
29Redefining Software
- Systems
- MULTICS/UNIX/Linux
- deus ex machina model still dominates
- how do we redefine system management?
- creating adaptive, nimble, resilient, autonomic
behavior - Programming
- FORTRAN/MPI
- little change since Backus et al
- how do we redefine programming?
- eliminating the user/developer dichotomy power
to the people - Interaction
- Globus/OGSA/
- how do we redefine interaction modalities?
- creating true plug and play
- Embrace dynamic equilibrium
- decentralized control and lossy behavior
- non-traditional metrics for efficacy
- new fault-tolerance approaches for gt 10K nodes
30GrADS The Big Picture
GrADS participants Andrew Chien, Fran Berman,
Jack Dongarra, Ian Foster, Dennis Gannon, Ken
Kennedy, Carl Kesselman, Lennart Johnsson, and
Dan Reed
31Signatures and Contracts
Knowledge Repository
- A contract specifies that given
- a set of resources (compute, network, I/O, )
- with certain capabilities (FLOP rate, latency, )
- for given application parameters (matrix size, )
- the application will
- exhibit a specified, measurable, and desirable
performance - sustain F FLOPS/second, render R frames/second,
- Performance contracts specify a convolution of
- application intrinsic behavior and system
resource responses (signatures)
Fuzzy Logic Rule Base
Autopilot
Fuzzy Logic Decision Process
Inputs
Fuzzifier
Outputs
Defuzzifier
Sensors
Actuators
System
Sensors
Actuators
Instrumented Grid Application(s)
m(t)
Trajectory
Signature
t
32The Large System Problem
- Detailed measurements enable
- flexible post-mortem analysis
- spatio-temporal correlations
- But, they produce large data volumes
- exacerbated by trans-terascale systems
- 10K-100K processors
- Possible solutions
- statistical clustering (activity identification)
- projection pursuit (metric identification)
- population sampling (behavioral identification)
- Population sampling (Linux clusters)
- 8 error bound and 90 percent confidence
- 87 node minimum sample size
- for 1024 processors does not rise
- 94 percent of cases within 8 percent
Statistical Clustering
Cluster Utilization Samples
Source Celso Mendes
33Big Means What?
- Big projects are getting smaller!
- remember the effects of inflation
- We need to think bigger!
- what is a gt100M project?
5 Escalation
34Whats the Moral?
- Set some priorities
- no priorities means no vision
- no vision means no intellectual commitment
- Choose some directions
- technology and applications
- identify a driving problem
- Think at appropriate scales
- financial and temporal
- you must be tall enough to attack the city ?
- Take some bigger risks
- technical and political
- most innovative projects fail
- at least by narrow technical measures
- and thats just fine!
35Expeditions The Research Time Tunnel
- Context shapes research, e.g.,
- nanotechnology and materials science
- atomic level manipulations
- biology, from structure to function
- PCR (polymerase chain reaction)
- microarrays, large-scale sequencing,
- microprocessors, Ethernet, and UNIX
- workstations, distributed systems, clusters,
- Prototyping the future
- early exploration of possibilities
- barely feasible now becomes commonplace then
- Anyone can play with todays technology
- the action is in defining the future
- that means playing with tomorrows technology
today
36Responding With Breakthrough Science
Smart Objects
Petabyte Archives
Ubiquitous Sensor/actuator Networks
National Petascale Systems
Collaboratories
Responsive Environments
Terabit Networks
Laboratory Terascale Systems
Contextual Awareness
Ubiquitous Infosphere
Building Up
Building Out
Science, Policy and Education
37The Instruments of Innovation
- Nothing tends so much to the advancement of
knowledge as the application of a new instrument.
The native intellectual powers of men in
different times are not so much the causes of the
different success of their labors, as the
peculiar nature of the means and artificial
resources in their possession. - Sir Humphrey Davy
38Questions?