Title: Status of Joint Research Project
1TIPC as TML
draft-maloy-tipc-01.txt
Jon Maloy, Ericsson Steven Blake,
Modularnet Maarten Koning, WindRiver Jamal Hadi
Salim,Znyx Hormuzd Khosravi,Intel
IETF-61, Washington DC, Nov 2004
2TIPC
- A transport protocol for cluster environments
- Connectionless and Connection Oriented Reliable
or Unreliable. - Reliable or Unreliable Multicast
- Usage not limited to ForCES context
- A framework for detecting, supervising and
maintaining cluster topology - Available as portable open source code package
under BSD licence - 12000 lines of C code, 112 kbyte Linux kernel
module - Runs on 4 OSes so far, and more to come
- Proven concept, used and deployed in several
Ericsson products
3ForCES Protocol Framework
ForCES Protocol Messages
4TIPC as L2 TML
ForCES Protocol Messages
5Interface Adaptation
Interface Adaptation
Interface Adaptation
ForCES Protocol Messages
6Fulfilling Requirements(1)
- Reliability
- Reliable transport in all modes
- Can be made unreliable per socket/direction
- Security
- Only secure within closed networks.
- No explicit authentication/encryption support
yet, but planned - Not IP-based, no router will forward TIPC
messages!! - Congestion Control
- At three levels Connection/Transport, Signalling
Link and Carrier level - Will give feedback to PL layer if connection is
broken or message rejected - Multicast/Broadcast
- Supported
7Fulfilling Requirements(2)
- Timeliness
- Immediate delivery (No Nagle algorithm)
- Inter-node delivery time in the order of 100
microseconds - HA Considerations
- L2 link failure detection and failover handled
transparently for user - Connection abortion with error code if no
redundant carrier available - Peer node failure detection after 0.5-1.5 seconds
- Encapsulation
- 24 byte extra header
- 40 extra for connectionless
- Priorities
- Supports 4 message importance priorities,
determining congestion levels and abort/rejection
levels - Is 8 levels really needed ?
8Connection Directly on TIPC
CE
CE Object
FB X
FB Y
TIPC
FE
FE Object
LFB 1
LFB 2
9Connections via FE/CE Object
CE
CE Object
FB X
FB Y
TIPC
FE
FE Object
LFB 1
LFB 2
10Connection Usage
CE
CE Object
FB X
FB Y
Traffic Data Connection Low Priority Reliable
CE-gtFE Unreliable FE-gtCE
Control Connection High Priority Reliable in
both directions
TIPC
FE
FE Object
LFB 1
LFB 2
11Functional Addressing Unicast
- Function Address
- Persistent, reusable 64 bit port identifier
assigned by user - Consists of type number and instance number
- Function Address Sequence
- Sequence of function addresses with same type
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, instance 33)
Server Process, Partition A
bind(type foo, lower0,
upper99)
12Address Mapping -Unicast
CE
RSVP77
CE Object
FB X
tml_bind(RSVP,77)
TML API
bind(RSVP,77,77)
TIPC API
TIPC
FE
TIPC API
bind(meter,44,44)
Meter44
FE Object
TML API
LFB 1
tml_bind(meter,44)
13Connection Setup
CE 8
RSVP77
CE Object
FB X
TIPC API
TIPC
FE 17
connect(RSVP,77,node8)
Meter44
FE Object
LFB 1
tml_connect(RSVP,77, CEID8)
If instance numbers are coordinated over whole
cluster there is no need for LFBs to know CEID
14Functional Addressing Multicast
- Based on Function Address Sequences
- Any partition overlapping with the range used in
the destination address will receive a copy of
the message - Client defines multicast group per call
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, lower 33,
upper 133)
foo,33,133
Server Process, Partition A
foo,33,133
bind(type foo, lower0,
upper99)
15Address Mapping -Multicast
CE
RSVP77
CE Object
tml_mcast(meter_mc, groupX)
FB X
sendto(meter_mc,X,X)
TIPC
FE
Meter13
bind(meter_mc,X,X)
Meter44
bind(meter_mc,X,X)
FE Object
tml_join(meter_mc,X)
tml_join(meter_mc,X)
16Questions???
17Why TIPC in ForCES ?
- Congestion control at three levels
- Connection level, signalling link level and media
level - Based on 4 importance priorities
- Simple to configure
- Each node needs to know its own identity, that is
all - Automatic neighbour detection using
multicast/broadcast - Lightweigth, Reactive Connections
- Immediate connection abortion at node/process
failure or overload - Toplogy Subscription Service
- Functional and physical topology
18Functional View
Socket API Adapter
Port API Adapter
Other API Adapters
User Adapter API
Address Subscription
Address Resolution
Address Table Distribution
Connection Supervision Route/Link Selection
Reliable Multicast
Neighbour Detection Link Establish/Supervision/Fai
lover
Node Internal
Fragmentation/De-fragmentation
Packet Bundling Congestion Control
Sequence/Retransmission Control
Bearer Adapter API
Infiniband
Mirrored Memory
Ethernet
SCTP
UDP
19Network Topology
Zone lt1gt
Zone lt2gt
Cluster lt2.1gt
Slave Node lt2.1.3333gt
Node lt1.2.3gt
20Functional Addressing Unicast
- Function Address
- Persistent, reusable 64 bit port identifier
assigned by user - Consists of type number and instance number
- Function Address Sequence
- Sequence of function addresses with same type
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, instance 33)
Server Process, Partition A
bind(type foo, lower0,
upper99)
21Functional Addressing Multicast
- Based on Function Address Sequences
- Any partition overlapping with the range used in
the destination address will receive a copy of
the message - Client defines multicast group per call
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, lower 33,
upper 133)
foo,33,133
Server Process, Partition A
foo,33,133
bind(type foo, lower0,
upper99)
22Location Transparency
- Location of server not known by client
- Lookup of physical destination performed
on-the-fly - Efficient, no secondary messaging involved
Node lt1.1.1gt
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, lower 33,
upper 133)
Server Process, Partition A
foo,33,133
bind(type foo, lower0,
upper99)
23Location Transparency
- Location of server not known by client
- Lookup of physical destination performed
on-the-fly - Efficient, no secondary messaging involved
Node lt1.1.2gt
Node lt1.1.1gt
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, lower 33,
upper 133)
Server Process, Partition A
foo,33,133
bind(type foo, lower0,
upper99)
24Location Transparency
- Location of server not known by client
- Lookup of physical destination performed
on-the-fly - Efficient, no secondary messaging involved
Node lt1.1.1gt
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, lower 33,
upper 133)
Server Process, Partition A
foo,33,133
bind(type foo, lower0,
upper99)
25Address Binding
- Many sockets may bind to same partition
- Closest-First or Round-Robin algorithm chosen by
client
Server Process, Partition A
Client Process
bind(type foo, lower0,
upper99)
sendto(type foo, lower 33,
upper 133)
Server Process, Partition A
foo,33,133
bind(type foo, lower0,
upper99)
26Address Binding
- Many sockets may bind to same partition
- Closest-First or Round-Robin algorithm chosen by
client - Same socket may bind to many partitions
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, lower 33,
upper 133)
Server Process, Partition AB
foo,33,133
bind(type foo, lower0,
upper99) bind(typefoo, lower100,
upper199)
27Address Binding
- Many sockets may bind to same partition
- Closest-First or Round-Robin algorithm chosen by
client - Same socket may bind to many partitions
- Same socket may bind to different functions
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
sendto(type foo, lower 33,
upper 133)
Server Process, Partition A
foo,33,133
bind(type foo, lower0,
upper99) bind(typebar, lower0,
upper999)
28Functional Topology Subscription
- Function Address/Address Partition bind/unbind
events
Server Process, Partition B
Client Process
bind(type foo, lower100,
upper199)
subscribe(type foo, lower
0, upper 500)
Server Process, Partition A
bind(type foo, lower0,
upper99)
29Network Topology Subscription
- Node/Cluster/Zone availability events
- Same mechanism as for function events
Node lt1.1.3gt
Node lt1.1.1gt
Client Process
node,0x1001003
subscribe(type node, lower
0x1001000, upper 0x1001009)
Node lt1.1.2gt
node,0x1001002
30ForCES Applied on TIPC
Network Equipment
Control Element
OSPF, RIP
COPS, CLI, SNMP
Other Applications
ForCES Protocol/TIPC
Forwarding Element
31ForCES applied on TIPC
Network Equipment
Control Element
Control Element
Control Element
OSPF, RIP
COPS, CLI, SNMP
Other Applications
ForCES Protocol/TIPC
Forwarding Element
Forwarding Element
32CONNECTIONS
- Establishment based on functional addressing
- Selectable lookup algorithm, partitioning,
redundancy etc - No protocol messages exchanged during
setup/shutdown - Only payload carrying messages
- Traditional TCP-style connection setup/shutdown
as alternative - End-to-end flow control
- SOCK_SEQPACKET
- SOCK_STREAM
- SOCK_RDM for connectionless and multicast
- SOCK_DGRAM can easily be added if needed
- Same with Unreliable SOCK_SEQPACKET
33CONNECTIONS
- No protocol messages exchanged during
setup/shutdown - Only payload carrying messages
Server Process, Partition B
foo,117
sendto(type foo, instance 117)
34CONNECTIONS
- No protocol messages exchanged during
setup/shutdown - Only payload carrying messages
Server Process, Partition B
connect(client) send()
35CONNECTIONS
- No protocol messages exchanged during
setup/shutdown - Only payload carrying messages
Server Process, Partition B
connect(server)
36CONNECTIONS
- Immediate abortion event in case of peer
process crash
Server Process, Partition B
37CONNECTIONS
- Immediate abortion event in case of peer node
crash
Node lt1.1.5gt
Node lt1.1.3gt
Server Process, Partition B
abort
38CONNECTIONS
- Immediate abortion event in case of
communication failure
Node lt1.1.5gt
Node lt1.1.3gt
Server Process, Partition B
abort
39CONNECTIONS
- Immediate abortion event in case of node
overload
Node lt1.1.5gt
Node lt1.1.3gt
Server Process, Partition B
40Network Redundancy
- Retransmission protocol and congestion control at
signalling link level - Normally two links per node pair, for full load
sharing and redundancy
Node lt1.1.5gt
Node lt1.1.3gt
Server Process, Partition B
41Network Redundancy
- Retransmission protocol and congestion control at
signalling link level - Normally two links per node pair, for full load
sharing and redundancy - Smooth failover in case of single link failure,
with no consequences for user level connections
Node lt1.1.5gt
Node lt1.1.3gt
Server Process, Partition B
42Remaining Work
- Implementation
- Reliable Multicast not fully implemented yet
(exp. end of Q1) - Re-stabilization after most recent changes
- Re-implementation of multi-cluster neighbour
detection and link setup - Protocol
- Fully manual inter cluster link setup
- Guaranteeing Name Table consistency between
clusters - Slave node Name Table reduction
- ?????
43http//tipc.sourceforge.net
44QUESTIONS ??