Advancing Supercomputer Performance Through Interconnection Topology Synthesis - PowerPoint PPT Presentation

About This Presentation
Title:

Advancing Supercomputer Performance Through Interconnection Topology Synthesis

Description:

... (Cray X1) 3D torus (Cray T3E and XT3, IBM Blue Gene/L) Crossbar (NEC Earth Simulator) Folded Clos (Cray BlackWidow) Fat tree, flattened butterfly, Etc. – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 23
Provided by: yiz8
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Advancing Supercomputer Performance Through Interconnection Topology Synthesis


1
Advancing Supercomputer Performance Through
Interconnection Topology Synthesis
  • Yi Zhu, Michael Taylor, Scott B. Baden and
    Chung-Kuan Cheng
  • Department of Computer Science and Engineering
  • University of California, San Diego

2
Outline
  • Introduction
  • Design Flow, Formulation Algorithms
  • Example Blue Gene/L Packaging
  • Overview
  • Models Constraints
  • Experiments
  • Benchmark Instances
  • Generated Instances
  • Conclusion Future Work

3
Interconnection Networks
  • Interconnection networks become a more critical
    factor than computing or memory modules (W.
    Dally, HPCA 2007 Keynote Speech)
  • Popular network topologies
  • Hypercube (SGI Origin2000)
  • 2D torus (Cray X1)
  • 3D torus (Cray T3E and XT3, IBM Blue Gene/L)
  • Crossbar (NEC Earth Simulator)
  • Folded Clos (Cray BlackWidow)
  • Fat tree, flattened butterfly, Etc.

4
Our Work
  • We propose a design methodology to select the
    best topology to minimize the average latency
  • Design flow is fully automated
  • Physical constraints can be specified by users
  • Efficient multi-commodity flow algorithm to
    evaluate
  • Demonstrate the efficiency using Blue Gene/L
    packaging framework

5
Design Flow
6
Multi-Commodity Flow (MCF)
  • Graph G(V,E)
  • K commodities, each has a source and a sink, and
    demand amount d(k)
  • Each edge e has a capacity u(e)
  • Each edge e has a weight w(e)
  • Minimum Cost MCF each commodity k is routed
    units under the capacity constraints, minimize
    , where f(e) is the flow routed on
    edge e

7
Map Supercomputer Performance Evaluation to MCF
Problem
  • Nodes processors
  • Edges interconnection links
  • Commodities communications
  • Demands communication bandwidth (injection
    rate)
  • Flow amount wires assignments
  • Capacity constraints physical constraints
    (wires, pins, board dim)
  • Edge weight unit latency (unit power)

8
An Example on Maximum Concurrent Flow
  • Two commodities s1-gtt1, s2-gtt2, both have demand
    d(1)d(2)1
  • Optimal throughput 1.5

9
Approximation Algorithms
  • The duality theory in LP for a maximization,
    primal feasible , dual feasible D, optimal
    solution OPT
  • Increase and decrease D iteratively till the
    duality gap is small enough

10
Blue Gene/L An Example
11
Assumptions
  • We follow the same hierarchical structure
    midplane node card compute card
  • The properties of boards (dimensions, layers,
    dielectric) keep unchanged
  • We seek better topologies than the existing 3D
    torus to implement the networks in the midplane

12
Topology Generation
  • Generate 8-node 1D topologies and duplicate to
    each row and column
  • Topologies are isomorph-free and has maximum
    degree bound for each node

isomorph-free topologies
13
Node Card Graph Model
Horizontal Strongly Connected Vertical
Generated Topology
14
Midplane Graph Model
Coteus et al., Packaging the Blue Gene/L
SupercomputerIBM J of Res Dev, Vol. 43, pp.
213-248
15
Experiment 1 Benchmark Instances
  • NAS Parallel Benchmarks (121/128 processes)

Benchmark source code
Best topology
Compiled with Intel Trace Collector Analyzer
Our design flow
Executable
Task placement
Run on multi-processor machines
Simulated annealing placement
Traffic Patterns
Output
16
Benchmarks
Characteristics
Communication Pattern MG
17
Results
  • Optimal each instance has different topology
  • Aggregate one topology for all instances
  • 3D Torus 3D torus topology

18
Experiment 2 Generated Instances
  • Randomly generated communications
  • Scalar values which represent the demand for
    bandwidth between each pair of nodes
  • More general, time independent
  • Control Parameters
  • communication demands O(n) pairs
  • Communication amount uniform traffic but vary
    case by case (different congestion level)

19
Latency Throughput Tradeoffs
Distribution 40 / 50 / 10
20
Topologies with Different Injection Rates
With larger injection rate, more (red) links are
needed to go through the cut between 4 and 5, in
order to reduce the number of hops
21
Conclusion
  • An design flow for interconnection network
    synthesis
  • Fully automated
  • Explore large design space
  • Efficient evaluation algorithm
  • Future work
  • Power consumption
  • Accurate simulation

22
QA
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com