Commodity processor with commodity interprocessor connection - PowerPoint PPT Presentation

About This Presentation
Title:

Commodity processor with commodity interprocessor connection

Description:

Commodity processor with commodity inter-processor connection. Clusters ... Gig Ethernet. Myrinet. Infiniband. QsNet. SCI. More detail... Tree, Fat-tree ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 15
Provided by: engi57
Category:

less

Transcript and Presenter's Notes

Title: Commodity processor with commodity interprocessor connection


1
Commercial Parallel Computer Architecture
Loosely Coupled Tightly Coupled
  • ? Commodity processor with commodity
    inter-processor connection
  • Clusters
  • Pentium, Itanium, Opteron, Alpha
  • GigE, Infiniband, Myrinet, Quadrics, SCI
  • NEC TX7
  • HP Alpha
  • ? Commodity processor with custom interconnect
  • SGI Altix
  • Intel Itanium 2
  • Cray Red Storm
  • AMD Opteron
  • ? Custom processor with custom interconnect
  • Cray X1
  • NEC SX-7
  • IBM Regatta
  • IBM Blue Gene/L

2
Super computers examples
  • SGI Altix
  • The Columbia Supercomputer at NASA's Advanced
    Supercomputing Facility at Ames Research Center.
  • It consists of a 10,240-processor SGI Altix
    system comprised of 20 nodes, each with 512 Intel
    Itanium 2 processors, and running a Linux
    operating system
  • Black Hole Simulations
  • Hitachi SR11000
  • NEC SX-7
  • Apple
  • Cray RedStorm
  • Cray BlackWidow
  • IBM Blue Gene/L

3
  • IBM Regatta p690
  • 41 SMP nodes with 32 processors each (total 1312)
  • Processortype Power4 1.7 GHz
  • Overall peak performance 8.9 Teraflops
  • Linpack 5.6 Teraflops
  • Main memory 41 x 128 Gbytes (aggregate 5.2 TB)
  • Operating system AIX 5.2
  • Fujitsu Primepower
  • 16 SPARC64 processors 1.35GHz/1.89GHz
  • 128GB memory
  • 16 disks
  • 2 x 8-way system boards
  • Solaris 8, 9, 10

4
Processors used in supercomputer and performance
Linpack a standard benchmark software that test
how fast your computer runs
? Intel Pentium Xeon 3.2 GHz, peak 6.4
Gflop/s Linpack 100 1.7 Gflop/s Linpack 1000
3.1 Gflop/s ? AMD Opteron 2.2 GHz, peak 4.4
Gflop/s Linpack 100 1.3 Gflop/s Linpack 1000
3.1 Gflop/s ? Intel Itanium 2 1.5 GHz, peak
6 Gflop/s Linpack 100 1.7 Gflop/s Linpack
1000 5.4 Gflop/s
Gflop/s One billion floating point operations
per second
? HP PA RISC ? Sun UltraSPARC IV ? HP Alpha
EV68 1.25 GHz, 2.5 Gflop/s ? MIPS R16000
5
  • Inter-processor connection technologies

? Gig Ethernet ? Myrinet ? Infiniband ? QsNet ?
SCI
More detail
6
Tree, Fat-tree
  • Tree network there is only one path between
    any pair of processors.
  • Fat tree network increase the number of
    communication links close to the root.

Root level has more physical connections
7
Torus topology
  • A.K.A----Wrapped-around-mesh topology

Mesh with wraparound
Three-dimensional Mesh
8
Clos network
  • is a kind of multistage switching network
  • Three stages, each consisting a number of
    crossbars.
  • Middle stage have redundant switching boxes to
    alleviate blocking probability

9
Myrinet
  • By Myricom company
  • First Myrinet in 1994
  • An alternative for Ethernet to connect the nodes
    in a cluster
  • entirely operated in user space, no Operating
    System delays

10G PCI Express NIC With fiber connectors
Miyinet switch 10-Gbps, 12,800 Clos networks
up to 128 host ports
10
QsNetII network
  • By Quadrics (formed in 1996)
  • uses a 'fat tree' topology
  • QsNetII scales up to 4096 nodes
  • Each node might have multiple CPUs
  • Designed for use within SMP systems
  • MPI latency on standard AMD Opteron starts at
    1.22 usec
  • Bandwidth on Intel Xeon EM64T is 912 Mbytes/s.

QsNetII E-Series 128-way switch
11
  • Each chip contains two nodes
  • Each node is a PPC440 processor
  • Each node has 512 local memory
  • Each node runs lightweight OS with MPI.
  • Each node runs one user process
  • No context switching at node

12
BlueGene/L Interconnection
  • Use five networks
  • GigE for I/O nodes, to external systems
  • A control network use FastEthernet
  • 3-D Torus for node-to-node message passing
  • Handle majority of application traffic (mpi
    messaging)
  • Longest path 64 hops
  • MPI software is highly customized
  • A collective network for broadcasting
  • A barrier network

13
BlueGene/L Interconnection Networks
Global Tree ?? Interconnects all compute and I/O
nodes (1024) ?? One-to-all broadcast
functionality ?? Reduction operations
functionality ?? 2.8 Gb/s of bandwidth per
link ?? Latency of one way tree traversal 2.5
µs ?? 23TB/s total binary tree bandwidth (64k
machine)
3 Dimensional Torus ?? Interconnects all compute
nodes (65,536) ?? Virtual cut-through hardware
routing ?? 1.4Gb/s on all 12 node links (2.1 GB/s
per node) ?? 1 µs latency between nearest
neighbors, 5 µs to the farthest ?? 4 µs latency
for one hop with MPI, 10 µs to the farthest
Ethernet ?? Incorporated into every node ASIC ??
Active in the I/O nodes (164) ?? All external
comm. (file I/O, control, user interaction,
etc.) Low Latency Global Barrier and Interrupt ??
Latency of round trip 1.3 µs
14
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com