Title: CS433 Spring 2001 Introduction
1CS433Spring 2001Introduction
2Course objectives and outline
- You will learn about
- Parallel programming models
- Emphasis on 3 message passing, shared memory,
and shared objects - Ongoing evaluation and comparison of models
- Parallel application classes
- Parallel architectures
- Message passing support, routing, interconnection
networks - Cache-coherent scalable shared memory,
synchronization - Relaxed consistency models
- Novel architectures Tera, Blue Gene,
Processors-in-memory - Commonly needed parallel algorithms/operations
- Performance analysis of parallel applications
- Parallel application case studies
3Project and homeworks
- Significant (effort and grade percentage) course
project - groups of 5 students
- Homeworks/machine problems
- weekly (sometimes biweekly)
- Parallel machines
- NCSA Origin 2000, PC/SUN clusters
4Resources
- Much of the course will be run via the web
- Lecture slides, assignments, will be available on
the course web page - http//www-courses.cs.uiuc.edu/cs433
- Most of the reading material (papers, manuals)
will be on the web - Projects will coordinate and submit information
on the web - Web pages for individual pages will be linked to
the course web page - Newsgroup uiuc.class.cs433
- You are expected to read the newsgroup and web
pages regularly
5Advent of parallel computing
- Parallel computing is necessary to increase
speeds - cry of the 70s
- processors kept pace with Moores law
- Doubling speeds every 18 months
- Now, finally, the time is ripe
- uniprocessors are commodities (and proc. speeds
shows signs of slowing down) - Highly economical to build parallel machines
6Why parallel computing
- It is the only way to increase speed beyond
uniprocessors - Except, of course, waiting for uniprocessors to
become faster! - Several applications require orders of magnitude
higher performance than feasible on uniprocessors - Cost effectiveness
- older argument
- in 1985, a supercomputer cost 2000 times more
than a desktop, yet performed only 400 times
faster. - So combine microcomputers to get speed at lower
costs - Incremental scalability
- can get inbetween performance points with 20, 50,
100, processors - But
- You may get speedup lower than 400 on 2000
processors! - Microcomputers became faster, killing
supercomputers, effectively
7Technology Trends
The natural building block for multiprocessors is
now also about the fastest!
8General Technology Trends
- Microprocessor performance increases 50 - 100
per year - Transistor count doubles every 3 years
- DRAM size quadruples every 3 years
- Huge investment per generation is carried by huge
commodity market - Not that single-processor performance is
plateauing, but that parallelism is a natural way
to improve it.
180
160
140
DEC
120
alpha
Integer
FP
100
IBM
HP 9000
80
RS6000
750
60
540
MIPS
MIPS
40
M2000
Sun 4
M/120
20
260
0
1987
1988
1989
1990
1991
1992
9Technology A Closer Look
- Basic advance is decreasing feature size ( ??)
- Circuits become either faster or lower in power
- Die size is growing too
- Clock rate improves roughly proportional to
improvement in ? - Number of transistors improves like ????(or
faster) - Performance gt 100x per decade clock rate 10x,
rest transistor count - How to use more transistors?
- Parallelism in processing
- multiple operations per cycle reduces CPI
- Locality in data access
- avoids latency and reduces CPI
- also improves processor utilization
- Both need resources, so tradeoff
- Fundamental issue is resource distribution, as in
uniprocessors
10Clock Frequency Growth Rate
11Transistor Count Growth Rate
- 100 million transistors on chip by early 2000s
A.D. - Transistor count grows much faster than clock
rate - - 40 per year, order of magnitude more
contribution in 2 decades
12Similar Story for Storage
- Divergence between memory capacity and speed
- Capacity increased by 1000x from 1980-95, speed
only 2x - Gigabit DRAM by c. 2000, but gap with processor
speed greater - Larger memories are slower, while processors get
faster - Need to transfer more data in parallel
- Need deeper cache hierarchies
- How to organize caches?
- Parallelism increases effective size of each
level of hierarchy, without increasing access
time - Parallelism and locality within memory systems
too - New designs fetch many bits within memory chip
follow with fast pipelined transfer across
narrower interface - Buffer caches most recently accessed data
- Disks too Parallel disks plus caching
13Architectural Trends
- Architecture translates technologys gifts to
performance and capability - Resolves the tradeoff between parallelism and
locality - Current microprocessor 1/3 compute, 1/3 cache,
1/3 off-chip connect - Tradeoffs may change with scale and technology
advances - Understanding microprocessor architectural trends
- Helps build intuition about design issues or
parallel machines - Shows fundamental role of parallelism even in
sequential computers - Four generations of architectural history
- Vaccum tube, transistor, IC, VLSI
- Here focus only on VLSI generation
- Greatest delineation in VLSI has been in type of
parallelism exploited
14Architectural Trends
- Greatest trend in VLSI generation is increase in
parallelism - Up to 1985 bit level parallelism 4-bit -gt 8 bit
-gt 16-bit - slows after 32 bit
- adoption of 64-bit now under way, 128-bit far
(not performance issue) - great inflection point when 32-bit micro and
cache fit on a chip - Mid 80s to mid 90s instruction level parallelism
- pipelining and simple instruction sets,
compiler advances (RISC) - on-chip caches and functional units gt
superscalar execution - greater sophistication out of order execution,
speculation, prediction - to deal with control transfer and latency problems
15Economics
- Commodity microprocessors not only fast but CHEAP
- Development cost is tens of millions of dollars
(5-100 typical) - BUT, many more are sold compared to
supercomputers - Crucial to take advantage of the investment, and
use the commodity building block - Exotic parallel architectures no more than
special-purpose - Multiprocessors being pushed by software vendors
(e.g. database) as well as hardware vendors - Standardization by Intel makes small, bus-based
SMPs commodity - Desktop few smaller processors versus one larger
one? - Multiprocessor on a chip
16What to Expect?
- Parallel Machine classes
- Cost and usage defines a class! Architecture of a
class may change. - Desktops, Engineering workstations, database/web
servers, suprtcomputers, - Commodity (home/office) desktop
- less than 10,000
- possible to provide 10-50 processors for that
price! - Driver applications
- games, video /signal processing,
- possibly peripheral AI speech recognition,
natural language understanding (?), smart spaces
and agents - New applications?
17Engineeering workstations
- Price less than 100,000 (used to be)
- new proce level acceptable may be 50,000
- 100 processors, large memory,
- Driver applications
- CAD (Computer aided design) of various sorts
- VLSI
- Structural and mechanical simulations
- Etc. (many specialized applications)
18Commercial Servers
- Price range variable (10,000 - several hundreds
of thousands) - defining characteristic usage
- Database servers, decision support (MIS), web
servers, e-commerce - High availability, fault tolerance are main
criteria - Trends to watch out for
- Likely emergence of specialized
architectures/systems - E.g. Oracles No Native OS approach
- Currently dominated by database servers, and TPC
benchmarks - TPC transactions per second
- But this may change to data mining and
application servers, with corresponding impact on
architecure.
19Supercomputers
- Definition expensive system?!
- Used to be defined by architecture (vector
processors, ..) - More than a million US dollars?
- Thousands of processors
- Driving applications
- Grand challenges in science and engineering
- Global weather modeling and forecast
- Rational Drug design / molecular simulations
- Processing of genetic (genome) information
- Rocket simulation
- Airplane design (wings and fluid flow..)
- Operations research?? Not recognized yet
- Other non-traditional applications?
20Consider Scientific Supercomputing
- Proving ground and driver for innovative
architecture and techniques - Market smaller relative to commercial as MPs
become mainstream - Dominated by vector machines starting in 70s
- Microprocessors have made huge gains in
floating-point performance - high clock rates
- pipelined floating point units (e.g.,
multiply-add every cycle) - instruction-level parallelism
- effective use of caches (e.g., automatic
blocking) - Plus economics
- Large-scale multiprocessors replace vector
supercomputers - Well under way already
21Scientific Computing Demand
22Engineering Computing Demand
- Large parallel machines a mainstay in many
industries - Petroleum (reservoir analysis)
- Automotive (crash simulation, drag analysis,
combustion efficiency), - Aeronautics (airflow analysis, engine efficiency,
structural mechanics, electromagnetism), - Computer-aided design
- Pharmaceuticals (molecular modeling)
- Visualization
- in all of the above
- entertainment (films like Toy Story)
- architecture (walk-throughs and rendering)
- Financial modeling (yield and derivative
analysis) - etc.
23Applications Speech and Image Processing
- Also CAD, Databases, . . .
- 100 processors gets you 10 years, 1000 gets you
20 !
24Learning Curve for Parallel Applications
- AMBER molecular dynamics simulation program
- Starting point was vector code for Cray-1
- 145 MFLOP on Cray90, 406 for final version on
128-processor Paragon, 891 on 128-processor Cray
T3D
25Raw Uniprocessor Performance LINPACK
26500 Fastest Computers