CS433 Spring 2001 Introduction - PowerPoint PPT Presentation

About This Presentation

Title:

CS433 Spring 2001 Introduction

Description:

Vaccum tube, transistor, IC, VLSI. Here focus only on VLSI generation ... Commodity (home/office) desktop: less than $10,000 ... games, video /signal processing, ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 27

Provided by: laxmika

Learn more at: http://charm.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS433 Spring 2001 Introduction

1
CS433Spring 2001Introduction

Laxmikant Kale

2
Course objectives and outline

You will learn about
Parallel programming models
Emphasis on 3 message passing, shared memory,
and shared objects
Ongoing evaluation and comparison of models
Parallel application classes
Parallel architectures
Message passing support, routing, interconnection
networks
Cache-coherent scalable shared memory,
synchronization
Relaxed consistency models
Novel architectures Tera, Blue Gene,
Processors-in-memory
Commonly needed parallel algorithms/operations
Performance analysis of parallel applications
Parallel application case studies

3
Project and homeworks

Significant (effort and grade percentage) course
project
groups of 5 students
Homeworks/machine problems
weekly (sometimes biweekly)
Parallel machines
NCSA Origin 2000, PC/SUN clusters

4
Resources

Much of the course will be run via the web
Lecture slides, assignments, will be available on
the course web page
http//www-courses.cs.uiuc.edu/cs433
Most of the reading material (papers, manuals)
will be on the web
Projects will coordinate and submit information
on the web
Web pages for individual pages will be linked to
the course web page
Newsgroup uiuc.class.cs433
You are expected to read the newsgroup and web
pages regularly

5
Advent of parallel computing

Parallel computing is necessary to increase
speeds
cry of the 70s
processors kept pace with Moores law
Doubling speeds every 18 months
Now, finally, the time is ripe
uniprocessors are commodities (and proc. speeds
shows signs of slowing down)
Highly economical to build parallel machines

6
Why parallel computing

It is the only way to increase speed beyond
uniprocessors
Except, of course, waiting for uniprocessors to
become faster!
Several applications require orders of magnitude
higher performance than feasible on uniprocessors
Cost effectiveness
older argument
in 1985, a supercomputer cost 2000 times more
than a desktop, yet performed only 400 times
faster.
So combine microcomputers to get speed at lower
costs
Incremental scalability
can get inbetween performance points with 20, 50,
100, processors
But
You may get speedup lower than 400 on 2000
processors!
Microcomputers became faster, killing
supercomputers, effectively

7
Technology Trends
The natural building block for multiprocessors is
now also about the fastest!
8
General Technology Trends

Microprocessor performance increases 50 - 100
per year
Transistor count doubles every 3 years
DRAM size quadruples every 3 years
Huge investment per generation is carried by huge
commodity market
Not that single-processor performance is
plateauing, but that parallelism is a natural way
to improve it.

180
160
140
DEC
120
alpha
Integer
FP
100
IBM
HP 9000
80
RS6000
750
60
540
MIPS
MIPS
40
M2000
Sun 4
M/120
20
260
0
1987
1988
1989
1990
1991
1992
9
Technology A Closer Look

Basic advance is decreasing feature size ( ??)
Circuits become either faster or lower in power
Die size is growing too
Clock rate improves roughly proportional to
improvement in ?
Number of transistors improves like ????(or
faster)
Performance gt 100x per decade clock rate 10x,
rest transistor count
How to use more transistors?
Parallelism in processing
multiple operations per cycle reduces CPI
Locality in data access
avoids latency and reduces CPI
also improves processor utilization
Both need resources, so tradeoff
Fundamental issue is resource distribution, as in
uniprocessors

10
Clock Frequency Growth Rate

30 per year

11
Transistor Count Growth Rate

100 million transistors on chip by early 2000s
A.D.
Transistor count grows much faster than clock
rate
- 40 per year, order of magnitude more
contribution in 2 decades

12
Similar Story for Storage

Divergence between memory capacity and speed
Capacity increased by 1000x from 1980-95, speed
only 2x
Gigabit DRAM by c. 2000, but gap with processor
speed greater
Larger memories are slower, while processors get
faster
Need to transfer more data in parallel
Need deeper cache hierarchies
How to organize caches?
Parallelism increases effective size of each
level of hierarchy, without increasing access
time
Parallelism and locality within memory systems
too
New designs fetch many bits within memory chip
follow with fast pipelined transfer across
narrower interface
Buffer caches most recently accessed data
Disks too Parallel disks plus caching

13
Architectural Trends

Architecture translates technologys gifts to
performance and capability
Resolves the tradeoff between parallelism and
locality
Current microprocessor 1/3 compute, 1/3 cache,
1/3 off-chip connect
Tradeoffs may change with scale and technology
advances
Understanding microprocessor architectural trends
Helps build intuition about design issues or
parallel machines
Shows fundamental role of parallelism even in
sequential computers
Four generations of architectural history
Vaccum tube, transistor, IC, VLSI
Here focus only on VLSI generation
Greatest delineation in VLSI has been in type of
parallelism exploited

14
Architectural Trends

Greatest trend in VLSI generation is increase in
parallelism
Up to 1985 bit level parallelism 4-bit -gt 8 bit
-gt 16-bit
slows after 32 bit
adoption of 64-bit now under way, 128-bit far
(not performance issue)
great inflection point when 32-bit micro and
cache fit on a chip
Mid 80s to mid 90s instruction level parallelism
pipelining and simple instruction sets,
compiler advances (RISC)
on-chip caches and functional units gt
superscalar execution
greater sophistication out of order execution,
speculation, prediction
to deal with control transfer and latency problems

15
Economics

Commodity microprocessors not only fast but CHEAP
Development cost is tens of millions of dollars
(5-100 typical)
BUT, many more are sold compared to
supercomputers
Crucial to take advantage of the investment, and
use the commodity building block
Exotic parallel architectures no more than
special-purpose
Multiprocessors being pushed by software vendors
(e.g. database) as well as hardware vendors
Standardization by Intel makes small, bus-based
SMPs commodity
Desktop few smaller processors versus one larger
one?
Multiprocessor on a chip

16
What to Expect?

Parallel Machine classes
Cost and usage defines a class! Architecture of a
class may change.
Desktops, Engineering workstations, database/web
servers, suprtcomputers,
Commodity (home/office) desktop
less than 10,000
possible to provide 10-50 processors for that
price!
Driver applications
games, video /signal processing,
possibly peripheral AI speech recognition,
natural language understanding (?), smart spaces
and agents
New applications?

17
Engineeering workstations

Price less than 100,000 (used to be)
new proce level acceptable may be 50,000
100 processors, large memory,
Driver applications
CAD (Computer aided design) of various sorts
VLSI
Structural and mechanical simulations
Etc. (many specialized applications)

18
Commercial Servers

Price range variable (10,000 - several hundreds
of thousands)
defining characteristic usage
Database servers, decision support (MIS), web
servers, e-commerce
High availability, fault tolerance are main
criteria
Trends to watch out for
Likely emergence of specialized
architectures/systems
E.g. Oracles No Native OS approach
Currently dominated by database servers, and TPC
benchmarks
TPC transactions per second
But this may change to data mining and
application servers, with corresponding impact on
architecure.

19
Supercomputers

Definition expensive system?!
Used to be defined by architecture (vector
processors, ..)
More than a million US dollars?
Thousands of processors
Driving applications
Grand challenges in science and engineering
Global weather modeling and forecast
Rational Drug design / molecular simulations
Processing of genetic (genome) information
Rocket simulation
Airplane design (wings and fluid flow..)
Operations research?? Not recognized yet
Other non-traditional applications?

20
Consider Scientific Supercomputing

Proving ground and driver for innovative
architecture and techniques
Market smaller relative to commercial as MPs
become mainstream
Dominated by vector machines starting in 70s
Microprocessors have made huge gains in
floating-point performance
high clock rates
pipelined floating point units (e.g.,
multiply-add every cycle)
instruction-level parallelism
effective use of caches (e.g., automatic
blocking)
Plus economics
Large-scale multiprocessors replace vector
supercomputers
Well under way already

21
Scientific Computing Demand
22
Engineering Computing Demand

Large parallel machines a mainstay in many
industries
Petroleum (reservoir analysis)
Automotive (crash simulation, drag analysis,
combustion efficiency),
Aeronautics (airflow analysis, engine efficiency,
structural mechanics, electromagnetism),
Computer-aided design
Pharmaceuticals (molecular modeling)
Visualization
in all of the above
entertainment (films like Toy Story)
architecture (walk-throughs and rendering)
Financial modeling (yield and derivative
analysis)
etc.

23
Applications Speech and Image Processing

Also CAD, Databases, . . .
100 processors gets you 10 years, 1000 gets you
20 !

24
Learning Curve for Parallel Applications

AMBER molecular dynamics simulation program
Starting point was vector code for Cray-1
145 MFLOP on Cray90, 406 for final version on
128-processor Paragon, 891 on 128-processor Cray
T3D

25
Raw Uniprocessor Performance LINPACK
26
500 Fastest Computers

Write a Comment

User Comments (0)