Title: Parallel computer architecture overview
1Parallel computer architecture overview
2- Parallel computers definition A collection of
processing elements that cooperate to solve
large problems fast. - Some broad issues that distinguish parallel
computers - Resource Allocation
- how large a collection?
- how powerful are the elements?
- how much memory?
- Data access, Communication and Synchronization
- how do the elements cooperate and communicate?
- how are data transmitted between processors?
- what are the abstractions and primitives for
cooperation? - Performance and Scalability
- how does it all translate into performance?
- how does it scale?
3Studying the fundamental principles and design
trade-offs
- History diverse and innovative organizational
structures, often tied to novel programming
models - Rapidly matured under strong technological
constraints - The microprocessor is ubiquitous
- Laptops and supercomputers are fundamentally
similar! - Technological trends cause diverse approaches to
converge - Technological trends make parallel computing
inevitable - In the mainstream
- Need to understand fundamental principles and
design tradeoffs, not just taxonomies
4Technology trend
- Figure from Pattersons parallel architectures
book (1999) - The performance of micro-processors is catching
up with that of supercomputers.
5- In terms of performance improvement, nothing
beats micro-processors. - To maintain the improvement, more and more
supercomputer features are built in
micro-processors. - Use commodity micro-processors to build
everything (if you cant beat them, join them). - Mainframes and minicomputers pretty much
disappear in todays world, replaced by server
farms (clusters of servers). - Virtualization on clusters.
- Many supercomputers are clusters of
servers/workstations (see www.top500.org).
6Micro-processor architecture trend in parallelism
- Up to 1985 bit level parallelism 4-bit -gt 8 bit
-gt 16-bit -gt32-bit - slows after 32 bit
- adoption of 64-bit well under way, 128-bit is far
(not performance issue) - great inflection point when 32-bit micro and
cache fit on a chip - Basic pipelining, hardware support for complex
operations like FP multiply etc. - Intel 4004 to 386
7Micro-processor architecture trend in parallelism
- Mid 80s to mid 90s instruction level parallelism
- Pipelining and simple instruction sets,
compiler advances (RISC) - Larger on-chip caches
- But only halve miss rate on quadrupling cache
size - More functional units gt superscalar execution
- But limited performance scaling
- Intel 486 to Pentium III/IV
8Micro-processor architecture trend in parallelism
- After mid-90s
- Greater sophistication out of order execution,
speculation, prediction - to deal with control transfer and latency
problems - Very wide issue processors
- Dont help many applications very much
- Need multiple threads (SMT) to exploit
- Increased complexity and size leads to slowdown
- Long global wires
- Increased access times to data
- Time to market
9Potentials of ILP
- Depending on applications, with infinite
resources (memory bandwidth, perfect branch
prediction, register renaming, etc), the speedup
is limited to 1.3 to 17 (results of different
studies). - Next step (happening now) thread level
parallelism in micro-processors. - Multithreading, multicore
10Parallel architectures
- Thread level parallelism has traditionally been
supported by parallel architectures. - Shared memory
- Distributed memory
- Hybrid
11Shared memory architectures
- All processors access all memory as global
address space - Changes made by one processor are visible by
other processors - Two types based on the differences in memory
access speed - Uniform memory access (UMA)
- Non-uniform memory access (NUMA)
12UMA Shared memory architecture (mostly bus-based
MPs)
- Micro on a chip makes it natural to connect many
to shared memory - dominates server and enterprise market, moving
down to desktop - Faster processors began to saturate bus, then
bus technology advanced - today, range of sizes for bus-based systems,
desktop to large servers (Symmetric
Multiprocessor (SMP) machines).
13Bus bandwidth in Intel systems
14NUMA Shared memory architecture
- Identical processors, processors have different
time for accessing different part of the memory. - Often made by physically linking SMP machines
(Origin 2000, up to 512 processors). - The next generation SMP interconnects (Intel
Common System interface (CSI) and AMD
hypertransport) have this flavor, but the
processors are close to each other.
15Shared memory architecture advantages and
disadvantages
- Advantages
- Globally shared memory provides user-friendly
programming perspective to programming. - Disadvantage
- Lack of scalability (adding processors changes
the traffic requirement of the Interconnect). - Not easy to build big ones.
- Writing correct shared memory parallel programs
is not straight forward.
16Distributed memory architectures
- Processors have their own local memory. Memory
addresses in one processor do not map to another
processor. - no concept of global address space.
- No concept of cache coherency.
- To access data in another processor, use explicit
communication.
17Distributed memory architectures
- The networks can be very different for
distributed memory architectures - Massively parallel processors (MPP) usually use
a specially designed network. - IBM Bluegene, IBM SP series
- Clusters of workstations usually use system/local
area networks - Lemieux at PSC uses Quadrics
- Longstar at TACC uses Infiniband
- UC-TG at Argonne uses Myrinet
- Sax at CSIT and my Cetus use Gigabit Ethernet
- Grid computers use the Internet as the networks.
18Advantages and disadvantages
- Advantages
- Memory is scalable with number of processors.
Increase the number of processors and the size of
memory increases proportionately. - Each processor can rapidly access its own memory
without interference and without the overhead
incurred with trying to maintain cache coherency.
- Cost effectiveness can use commodity,
off-the-shelf processors and networking - Disadvantages
- The programmer is responsible for the details
associated with data communication. - It may be difficult to map existing data
structures, based on global memory, to this
memory organization.
19Hybrid distributed memory systems
- Current trends indicate that this type of
architectures will prevail and increase at the
high end of computing for the foreseeable future.
- Advantages/disadvantages common to both