Optics group - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Optics group

Description:

... an efficient optical switching fabric for addressing: ... DOMAIN OPTICAL SWITCHING FABRICS. Ishikawa Laboratory. UNIVERSITY ... switching fabric: Full ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 43
Provided by: Ala5155
Category:
Tags: fabric | group | optics

less

Transcript and Presenter's Notes

Title: Optics group


1
Stage-Distributed Time-Division Permutation
Routing in a Multistage Optically Interconnected
Fabric
Alvaro Cassinelli(1), Makoto Naruse(2), Alain
Goulet(1), and Masatoshi Ishikawa(1) (1)
University of Tokyo, Dept. Information Physics
and Computing, 7-3-1 Hongo Bunkyo-ku, Tokyo
113-0033, Japan. (2) Communications Research
Laboratory, 4-2-1 Nukui-kita, Koganei, Tokyo
184-8795, Japan.
http//www.k2.t.u-tokyo.ac.jp/index-e.html
2
PLAN of the presentation
I. Introduction space-domain optical switching
fabrics
II. Column-Control in Multistage Interconnection
Networks (CCMINs)
III. Folded Optical Implementation of a
transparent CCMIN
IV. Packet switching in a buffered CCMIN (new)
V. Conclusion and Further Research
VI. Some References
3
I. Introduction the problem on study
How to design an efficient optical switching
fabric for addressing
  • Processor-memory bottleneck in Supercomputers
  • Router bottleneck in Next Generation Optical
    Internet

These problems have some similarities low
latency required, synchronization, high
bandwidth Traffic characteristics changes
synchronous/asynchronous, regular/arbitrary
request patterns, fixed/variable length of data
bursts (granularity)
In fact, the above problems are case studies
among a continuum of situations
4
I. Introduction optics inside routers
Where optics?
  • interconnect router subsystems
  • at the (unbuffered) switching fabric (OXC)
  • at the interfaces and controller (all-optical
    routing)

5
II. Column-Control in Multistage Interconnection
Networks
II.1 Multistage Interconnection Networks
II.2 Column-Control in MINs
II.3 Permutation Capacity of CCMIN
II.4 Unbuffered CCMIN for permutation routing
6
II.1 Multistage Interconnection Networks
  • Wide-sense non-blocking
  • Low latency

Basic switching fabric Full-Crossbar
(XC)
  • O(N2) complexity (using 2x2 switches)
  • Simultaneous switching noise
  • Central controller bottleneck
  • Poor modularity
  • Circuit Switching good for low-latency
    memory-processor communications.
  • Packet Switching Maximum throughput of 63
    without buffers (uniform traffic).

7
II.1 Multistage Interconnection Networks
It still has point-to-point full connectivity.
(and is self-routing)
8
II.2 Column-Control in MINs
  • Column-control simplifies hardware and control

9
II.3 Permutation Capacity of CCMIN
However
local-blocking
if blocking was a problem for a MIN
10
II.3 Permutation Capacity of CCMIN
64x64 network
  • Request serviced by circuit switching, (or by
    on-the-flight packet switching)
  • Input requests are indep. Bernoulli trials
    (parameter ?)
  • Uniform Traffic equal probability of requesting
    any output port

tends to 63 when N??, because HOL blocking.
crossbar
Standard MIN
Probability of request acceptance
both tend to 0 when N??
CCMIN
Input request probability per unit time (?)
CCMIN cannot be used to service arbitrary
requests in a circuit-switched manner!
11
II.4 Unbuffered CCMIN for permutation routing
5
C3
6
C4
1
13
14
2
9
10
C2
15
7
16
11
8
12
3
C1
4
4-D hypercube-connected multiprocessor
Synchronous, weak-connected parallel computer
(processors use same permutation / time slot)
12
III. Folded Optical Implementation of a
transparent CCMIN
III.1 Designing a CCMIN for circuit-switched
permutation routing
III.2 Folded Optical Implementation
III.3 Experimental Demonstration
III.4 Possible applications
13
III.1 Designing a CCMIN for circuit-switched
permutation routing
  • Number of permutations 2n (n3)

3 stage CC-Baseline Network
  • These are c3, idxc2, idxc1, id
  • These are just the required permutations to
    implement the (3D) hypercube!

c3, id
c1, id
c2, id
A multistage version of most parallel-computer
direct-network topologies (hypercube,
cube-connected-cycles, deBruijn, etc.) can be
implemented as a CCMIN with properly designed
inter-stage permutation modules.
14
III.2 Folded Optical Implementation
  • plane implementation
  • electronic
  • planar lightwave circuit (PLC)

Multistage Interconnection Network architecture
  • 3D implementation
  • free space
  • guided-wave

Dense Efficient 3D folded inter-stage optical
interconnects
Optical Multistage Architecture Paradigm (fixed
interconnections)
15
III.2 Folded Optical Implementation
slide not shown in main presentation
Guide-wave (fiber-based) Modules vs. Free-Space
  • fixed, no broadcast optical fiber ok.
  • better efficiency (and just like free-space
    optics, no cross-talk in 3D).
  • No space-invariance imposed.
  • Precise and robust alignment possible.
  • Theoretically more volume efficient than
    free-space counterpart.
  • hard to build? not fundamentally difficult
    (can be automated, permutation decomposition
    possible)
  • Alignment of output and input
  • Power dissipation fundamental limit very far
    compared with electronics.

integrated 2D folded perfect shuffle
permutation module
Prototype Fiber module (fibers and holders)
Wave-guide arrays for fixed, point-to-point and
space variant interconnections are an interesting
alternative to free-space optics
16
Prototype (non-integrated) 4x4 fiber module
slide not shown in main presentation
Input (VCSEL 8544nm)
Output (CCD)
Two holder prototypes Zirconium, SiO2 Pitch
2505 ?m Multimode graded index fibers
NA0,21 (core 50?m, cladding 126?m) Transmission
loss 3dB/km
17
III.2 Multiple-permutation module
Besides density, reduced crosstalk and optical
efficiency, there is another nice feature of the
guided-wave approach to plane-to-plane optical
interconnections
18
Cube Permutations for N2n
slide not shown in main presentation
Unfolded (example with n4)
Cube Permutation ck
c1
c3
c4
bn, bk1, bk, bk-1, b2, b1
ck
bn, bk1, bk, bk-1, b2,b1
Folded
If k ? n/2, exchange only rows If kgtn/2, ck
exchange only columns. The modules are just the
same, rotated.
19
III.2 Experimental Demonstration
plane mapping (folding)
Row-Column Folded bi-permutation module
Unfolded hypercube and identity permutations
Prototype implementation of using optical fibers
() not unique!
20
III.2 Experimental Demonstration
slide not shown in main presentation
topology is mapped on a plane
four-dimensional hypercube-connected
multiprocessor
(processors interconnected trough a 2D optical
socket or laying in a VLSI chip matrix)
Spanned 4D hypercube (use four bi-permutation
modules)
21
III.2 Experimental Demonstration
slide not shown in main presentation
Output (CCD camera)
Input (VCSEL array)
Exit first module
Commutation pitch 125 ?m
Alignment tolerance ?5 ?m (half peak power).
Input second module
Inter-module Coupling Efficiency 1.7dB (no
additional optics, matching oil or antireflection
coating).
?
Validation of simple cascaded architecture.
22
III.2 Experimental Demonstration
Visualization of 2D permutation switching using a
pair of modules
C2 or Id
C1 or Id
23
III.2 Demonstration electromechanical actuator
X-Y electro-magnetic actuated device
(can vibrate the module in both X and Y
directions in principle, permutation
interleaving is possible in both directions)
Resonant frequency 430 Hz (?62.5?m)
(Micro electro-mechanical actuators (MEMS) may
also be an interesting alternative when switching
latency in the millisecond range is tolerable)
24
III.2 Demonstration electromechanical actuator
slide not shown in main presentation
Resonant-frequency round-robin permutation
scheduling
Interconnect 1
Interconnect 2
Interconnect 3
Interconnect N
time
Time slot
25
III.2 Demonstration electromechanical actuator
slide not shown in main presentation
Input slow row/column scan of VCSEL array
No electromagnetic actuation
Electromagnetic actuation
Fixed Identity permutation
Identity Cube2 permutations alternate at 860 Hz.
26
III.2 Demonstration electromechanical actuator
Input 635nm laser modulated at 500MHz Output
High speed photodetector
Actuator position
200ms
Photodetector signal
  • Switching latency between interconnections
    0,96 ms ()
  • Time Slot (3dB) 200ms
  • If 10Gb/s optical link, burst size is 2 Mbits
    per channel, (every millisecond). Average
    bandwidth of 2 Gb/s per channel

() MEMS routers ms range.
27
III.4 Possible applications of an optical CCMIN
  • Possible computing applications
  • The present system is not usable for typical
    memory-processor communications, which requires
    low latencies (lt 100 ns), unless another
    switching hardware is used (Acousto-optic cells
    ?s range / electro-optical material ns range)
  • If processing time is large (slow switching
    latency) and burst of data large, the
    electromechanical system may be used (FFT, large
    database retrieval, ?)
  • Communication networks
  • burst switching at the WAN level (ms range
    reconfiguration times).
  • scientific-dedicated, transparent networks with
    long holding times and high-bandwidth
    (TransLight, GLIF). MEMS switches are currently
    used (reconfiguration times in the range of a
    second is ok). An optical GSMIN may be used to
    regularly provide interconnection configurations.
  • if switching time is reduced, it can be used to
    perform cyclic permutation scheduling in an
    virtual output queued (VOQ) switch, leading to
    100 throughput (Standford Tiny-Tera Switch)

28
slide not shown in main presentation
Burst interconnection within short time
slot (Ex. 10Gbps, 100nsec ? 1kbit)
time
Computation one-stage (ex. 1 ms)
Interconnection switching interval (Ex. 1ms)

Burst Interconnects
Slow switching may be okay
29
IV. Packet switching in a buffered CCMIN
IV.1 Buffering in blocking networks
IV.2 FIFO Buffered CCMIN architecture
IV.3 Performance evaluation
IV.4 Delay-line buffered architecture
30
IV.1 Buffering for packet switching
Blocking is a serious drawback for circuit
switching Less serious for packet switching
  • Unbuffered networks (even wide-sense
    non-blocking) suffer from HOL blocking
    buffering is unavoidable.
  • Input queues, Output Queues and Virtual Output
    Queues and internal buffering has been explored
    in crossbars as well as in MINs
  • However, an advantage of buffered MINs over
    buffered crossbars is that the stage-distributed
    switching marries well with the distribution of
    buffering (thus avoiding large buffers)

Buffering is a solution adopted in usual MINs
how much a CCMIN is improved by buffering?
31
IV.2 FIFO Buffered CCMIN architecture
Why this architecture may compare well with
standard buffered MINs?
  • For uniform traffic, at each stage half of the
    packets wait, and half pass individual
    switch/buffer control is, presumably, not really
    required

inter-stage FIFO buffers
  • Whats more
  • Arbitration for configuring the Global Switches
    may not be necessary at all !

32
IV.3 Performance global control vs. local control
Seven stage - 128x128 Input/Output fabrics
(rem inter-stage transfer with maximum speed-up
equal to the size of the buffer)
6
6
5
5
4
crossbar
4
3
3
standard MIN
2
Probability of packet acceptance
Performance of Global Switched MIN compares very
well with that of a standard MIN.
Buffer size
2
1
Global Switched MIN
0
1
0
Input request probability per unit time (?)
  • GSMIN performance evolve quicker with buffer
    size
  • For buffer size 5 packets, equivalent
    performances
  • For buffer size 3 packets, performances are
    better than Xbar

33
IV.3 Performance global control with blind
alternate
Blind Switch alternation of a GSMIN
6
5
As expected blind alternation of switch states
gives same performance than a fair
switch-selection (for uniform traffic)
crossbar
4
3
Probability of packet acceptance
Buffer size
2
blind alternate
fair switching
1
0
Input request probability per unit time (?)
This is very interesting, because it means that a
Standard MIN can be operated blindly if traffic
is uniform enough. Interconnection scheduling
bottleneck is eliminated (CLOS, etc.) by using a
Time-Division Permutation Routing strategy.
34
IV.4 Delay-line buffered architecture
Reliable optical memories are still too difficult
to implement...
delay-line buffer
What about just delaying packets?
(since there are only two states per stage, only
a single delay-line may give good performance)
output
input
35
slide not shown in main presentation
we didnt study a standard MIN with
delay-lines
delay-line buffer
Switch
input
output
36
IV.4 Performance of a delay-line buffered
architecture
6
Blind alternation of global witch states is
assumed
5
4
crossbar
3
delay-line
(we didnt study a standard MIN with
delay-lines)
Probability of packet acceptance
Buffer size
2
Global Switched MIN
1
0
Input request probability per unit time (?)
Using a single selectable delay per channel and
per stage, performance lies somewhere in between
one and two-packet sized FIFO buffered
architecture.
37
V. Conclusion
V.1 Results
V.2 Further Research
38
V.1 Conclusion
Summarizing
  • Column-Control simplifies MIN hardware and
    control
  • Column-Control MIN may have enough permutation
    capacity for specific applications (highly
    parallel algorithms)
  • Column-Controlled MIN can be efficiently
    implemented using dense plane-to-plane optical
    interconnections
  • Column-Controlled MIN can be used for packet
    switching if buffered, giving roughly the same
    performance than standard MINs
  • Path-selection mechanism may be blind (i.e.
    round-robin, time-division permutation routing)
    without appreciable degradation of performance.

39
V.2 Further Research
On transparent circuit switched CCMINs
  • An arbitrary permutation request may be serviced
    by multiplexing in time the available set of
    permutations. This needs input buffers and
    speed-up (i.e. short switching latency). This has
    been explored in standard MINs using 2x2
    switches
  • Design of active modules, and multi-function
    modules (containing more than two permutations,
    but also other optical functions - e.g. optical
    delay lines)

On buffered packet switched CCMINs
  • How heavily the the studied architectures rely
    on the URM assumption? Study more realistic
    traffic models / ways to balance the non-regular
    traffic.
  • Other models of buffers in particular,
    inter-stage virtual output queues (VOQ) may gives
    very good performance in CCMIN (because with a
    speed-up of only 2, each stage will have 100
    throughput). Two parallel delay-line buffers ?

40
slide not shown in main presentation
V.2 Fast switching permutation modules
stack of PLC layers coupled in the normal
direction
Reconfiguration time can be of the order of
nanoseconds!
  • Simulation of a crossbar by speed-up (TDM
    connections for local area networks)
  • Core of a permutation routing switches for
    inter-processor communications in a parallel
    computer

41
slide not shown in main presentation
V.2 advanced further research
Based on the observation that VOQ and speed-up,
plus optimal permutation decomposition are the
basic ingredients of the Birkhof-von Newmann
Switch (plus load-balancing to simplify the
decomposition gt Tiny-Tera switch) with 100
throughput, it will be interesting to study then
1) a constrained decomposition of a rate
matrix onto the set of available CCMIN
permutations 2) a multistage version of the BVN
switch, where the permutation decomposition is
done a) at each stage (using bi-permutation
modules, this will probably lead to simple
forced-alternate mode, and reduce the size of the
VOQ, to only 2, which may be accommodated by
simple delay-lines!), b) every some stages, so
that the available set of permutations will be
very reduced, but still larger than 2. This may
optimize the design of buffer functions (no need
to put in all stages).
Thank you for your attention
42
VI. Some References
slide not shown in main presentation
Traffic models J. Cao et al., Internet traffic
tends toward Poisson and Independent as load
Increases, Nonlinear Estimation and
Classification, eds. C. Holmes et al., Springer,
NY, 2002.
thermo-optic matrix Goh01 round-robin (TDM).
Thompson91. Crosstalk can be solved
decomposing a permutation into semi-permutations,
with an increase of the number of network stages
Qiao Volume-consumption comparisons of
free-space and guided-wave optical
interconnections, Y.Li and J. Popelek,
p.1815-1825, Appl.Opt. Vol 39, n.11, april
2000. Study of inter-stage VOQ in MINs Kolias,
Dual Banyan Switch, Kolias W.J. Dainty,
Virtual-Channel Flow Control, IEEE Trans.
Parallel and Distr. Systems, Vol. 3, No. 2, Mar.
1992, pp. 194-205. Dainy studies DAMQ
(dynamically allocated multi-queue buffers),
which looks quite similar to hop-mode buffers.
Write a Comment
User Comments (0)
About PowerShow.com