Title: CS160
1CS160 Spring 2000http//www-cse.ucsd.edu/classe
s/sp00/cse160
- Prof. Fran Berman - CSE
- Dr. Philip Papadopoulos - SDSC
2Two Instructors/One Class
- We are team-teaching the class
- Lectures will be split about 50-50 along topic
lines. (Well keep you guessing as to who will
show up next lecture ?) - TA is Derrick Kondo. He is responsible for
grading homework and programs - Exams will be graded by Papadopoulos/Berman
3Prerequisites
- Know how to program in C
- CSE 100 (Data Structures)
- CSE 141 (Computer Architecture) would be helpful
but not required.
4Grading
- 25 Homework
- 25 Programming assignments
- 25 Midterm
- 25 Final
- Homework and Programming Assignments Due at
beginning of section
5Policies
- Exams are closed book, closed notes
- No Late Homework
- No Late Programs
- No Makeup exams
- All assignments are to be your own original work.
- Cheating/copying from anyone/anyplace will be
dealt with severely
6Office Hours (Papadopoulos)
- My office is SSB 251 (Next to SDSC)
- Hours will be TuTh 230 330 or by appointment.
- My email is phil_at_sdsc.edu
- My campus phone is 822-3628
7Course Materials
- Book Parallel Programming Techniques and
Applications using Networked Workstations and
Parallel Computers, by B. Wilkinson and Michael
Allen. - Web site Will try to make lecture notes
available before class - Handouts As needed.
8Computers/Programming
- Please see the TA about getting an account for
the undergrad APE lab. - We will use PVM for programming on workstation
clusters. - A word of advice With the web, you can probably
find almost completed source code somewhere.
Dont do this. Write the code yourself. Youll
learn more. See policy on copying.
9Any other Adminstrative Questions?
10Introduction to Parallel Computing
- Topics to be covered. See syllabus (online) for
full details - Machine architecture and history
- Parallel machine organization,
- Parallel algorithm paradigm
- Parallel programming environments and tools
- Heterogeneous computing.
- Evaluating Performance
- Grid Computing
- Parallel programming and project
assignments
11What IS Parallel Computing?
- Applying multiple processors to solve a single
problem - Why?
- Increased performance for rapid turnaround time
(wall clock time) - More available memory on multiple machines
- Natural progression of standard Von Neumann
Architecture
12Worlds 10th Fastest Machine (as of November
1999) _at_ SDSC
1152 Processors
13Are There Really Problems that Need O(1000)
processors?
- Grand Challenge Codes
- First Principles Materials Science
- Climate modeling (ocean, atmosphere)
- Soil Contamination Remediation
- Protein Folding (gene sequencing)
- Hydrocodes
- Simulated nuclear device detonation
- Code breaking (No Such Agency)
14There must be problems with the approach
- Scaling with efficiency (speedup)
- Unparallelizable portions of code (Amdahls law)
- Reliability
- Programmability
- Algorithms
- Monitoring
- Debugging
- I/O
-
- These and more keep the field interesting
15A Brief History of Parallel Super Computers
- There have been many (dead) supercomputers
- The Dead Supercomputer Society
- http//ei.cs.vt.edu/history/Parallel.html
- Parallel Computing Works
- Will touch on about a dozen of the important ones
16Basic Measurement Yardsticks
- Peak Performance (AKA, guaranteed never to
exceed) nprocs X FLOPS/proc - NAS Parallel Benchmarks
- Linpack Benchmark for the TOP 500
- Later in the course, We will explore about how to
Fool the Masses and valid ways to measure
performance
17Illiac IV (1966 1970)
- 100 Million of 1990 Dollars
- Single instruction multiple data (SIMD)
- 32 - 64 Processing elements
- 15 Megaflops
- Ahead of its time
18ICL DAP (1979)
- Distributed array Processor (also SIMD)
- 1K 4K bit Serial processors
- Connected in a mesh
- Required an ICL mainframe to front-end the main
processor array - Never caught on in the US
19Goodyear MPP (late 1970s)
- 16K bit-serial processors (SIMD)
- Goddard Space and Flight Center NASA
- Only a few sold. Similar to the ICL DAP
- About 100 Mflops (100 MHz Pentium)
20Cray-1 (1976)
- Seymour Cray, Designer
- NOT a parallel machine
- Single processor machine with vector registers
- Largely regarded as starting the modern
supercomputer revolution - 80 MHz Processor (80 MFlops)
21Denelcor HEP (Heterogeneous Element Processor,
early 80s)
- Burton Smith, Designer
- Multiple Instruction, Multiple Data (MIMD)
- Fine (instruction-level) and Large-grain
parallelism (16 processors) - Instructions from different programs ran in
per-processor hardware queues (128 threads/proc) - Precursor to the Tera MTA (Multithreaded
architecture - Full-empty bit for every memory location.
Allowed fast synchronization - Important research machine
22Caltech Cosmic Cube - 1983
- Chuck Seitz (Founded Myricom) and Geoffrey Fox
(Lattice gauge theory) - First Hypercube interconnection network
- 8086/8087 based machine with Eugene Brooks
Crystalline Operating System (CrOS) - 64 Processors by 1983
- About 15x cheaper than a VAX 11/780
- Begat nCUBE, Floating Point Systems, Ametek,
Intel Supercomputers (all dead companies) - 1987 Vector coprocessor system achieved
500MFlops
23Cray XMP (1983) and Cray-2 (1985)
- Up to 4-Way shared memory machines
- This was the first supercomputer at SDSC
- Best Performance (600 Mflop Peak)
- Best Price/Performance of the time
24Late 1980s
- Proliferation of (now dead) parallel computers
- CM-2 (SIMD) (Danny Hillis)
- 64K bit-serial, 2048 Vector Coprocessors
- Achieved 5.2 Gflops on Linpack (LU Factorization)
- Intel iPSC/860 (MIMD - MPP)
- 128 Processors
- 1.92 Gigaflops (Linpack)
- Cray Y/MP (Vector Super)
- 8 processors (333 Mflops/proc peak)
- Achieved 2.1 Gigaflops (Linpack)
- BBN Butterfly (Shared memory)
- Many others (long since forgotten)
25Early 90s
- Intel Touchstone Delta and Paragon (MPP)
- Follow-On iPSC/860
- 13.2 Gflops on 512 Processors
- 1024 Nodes delivered to ORNL in 1993 (150 GFLOPS
Peak) - Cray C-90 (Vector Super)
- 16 Processor update of the Y/MP
- Extremely popular, efficient and expensive
- Thinking Machines CM-5 (MPP)
- Upto 16K Processors
- 1024 Node System at Los Alamos National Lab
26More 90s
- Distributed Shared Memory
- KSR-1 (Kendall Square Research)
- COMA (Cache Only Memory Architecture)
- University Projects
- Stanford DASH Processor (Hennessy)
- MIT Alewife (Agarwal)
- Cray T3D/T3E. Fast Processor Mesh with upto 512
Alpha CPUs
27What Can you Buy Today? (not an exhaustive list)
- IBM SP
- Large MPP or Cluster
- SGI Origin 2000
- Large Distributed Shared Memory Machine
- Sun HPC 10000 64 Processor True Shared Memory
- Compaq Alpha Cluster
- Tera MTA
- Multithreaded architecture (one in existence)
- Cray SV-1 Vector Processor
- Fujitsu and Hitachi Vector Supers
28Clusters
- Poor mans Supercomputer?
- A pile-of-PCs
- Ethernet or High-speed (eg. Myrinet) network
- Likely to be the dominant high-end architecture.
- Essentially a build-it-yourself MPP.
29Next Time
- Flynns Taxonomy
- Bit-Serial, Vector, Pipelined Processors
- Interconnection Networks
- Routing Techniques
- Embedding
- Cluster interconnects
- Network Bisection