Title: Substrate Control: Overview
1Substrate Control Overview
- Fred Kuhns
- fredk_at_arl.wustl.edu
- Applied Research Laboratory
- Washington University in St. Louis
2Overview
- Last time
- control software architecture
- component responsibilities
- basic abstractions meta-interface and tunnels
TCAM, slice-oriented view of lookup filters and
example IPv4 LPM entries - This time
- SW architecture update
- assignments and current efforts
- assigning bandwidth and queue weights
- allocating code options and NPE resources
3System Block Diagram
PLC
ReBoot how??
Substrate Control Daemon (SCD) Boot and
Configuration Control (BCC)
External Interfaces
RTM
RTM
SPP Node
10 x 1GbE
NPE
NPE
NPE
GPE
LC
GPE
ARP Table FIB
NAT Tunnel filters (in/out)
bootcd
Power Control Unit (has own IP Address)
cacert.pem boot_server plnote.txt
PCI
PCI
pl_netflow
user slivers
flow stats (netflow)
xscale
xscale
xscale
xscale
sppnode.txt
NPU-A
NPU-B
NPU-A
NPU-B
TCAM
TCAM
GE
GE
vnet
SPI
SPI
interfaces
Hub
Fabric Ethernet Switch (10Gbps, data path)
move pl_netflow to cp?
Base Ethernet Switch (1Gbps, control)
manage LC Tables
I2C (IPMI)
Control Processor (CP)
Standalone GPEs
tftp, dhcpd
sshd
httpd
routed
Resource DB
nodeconf.xml
Shelf manager
route DB
boot files
Slivers DB
user info
flow stats
All flow monitoring done at Line Card
4The SPP Node
- Slice instantiation
- Allocate VM instance on a GPE
- may request code option instance, NPEresources
and interface bandwidth - Share a common set of (global) IPaddresses
- UDP/TCP port space shared across GPE/NPEs
- Line card TCAM Filters direct traffic
- send unregistered traffic originating outsidethe
node to CP. - unregistered traffic originating within node
usesNAT (on line card) - application may register server ports. Causes
filter to be inserted in the line card directing
traffic to specific GPE - application must register ports (or tunnels)
associated with fast path instances - It is assumed that fast path instances will use
tunnels (overlays) to send traffic between
routing nodes. - Currently we only support UDP tunnels but will
extend to include GRE and possibly others.
NPE
SRAM
code option
vmx
GPE
NMP
NPE
FPx
GPE
TCAM
RMP
planetlab OS
fabric
local delivery and exceptions (UDP Tunnel)
Egress
Ingress lookup table
SNM
IP route and ARP
SCD (ARP, NAT)
SRM
LC
CP
Ingress
Internet
5Key Software Control Components
Primary Hub (Logical Slot 1, Channel 1)
Allocate VLANs, enable ports, stats
snmpd
Fabric SW
vlan Table
Base SW
Resource DB
CP
vlan Table
SFP
XFP
XFP
SRM
Assign slices to GPE. Boot management and PLC
proxy.
Filter management, BW allocations and stats
Slice requests to allocate or free resources.
Resource allocations and slice bindings
GPE
vmx
NMP
LC
control
SCD
NPE
RMP
SP
TCAM
MUX
SCD
Slice owned resource management
root context
SRAM
planetlab OS
vnet
FPk
FPk
FPx
TCAM
Exception and Local delivery traffic. Includes
shim header with RxMI.
6Software Control Components
- Utilities parts of BCC to generate config and
distribution files - Node configuration and management generate
config files, dhcp, tftp, ramdisk - Boot CD and distribution file management (images,
RPM and tar files) for GPEs and CP. - Control processor
- Boot and Configuration Control (BCC) GPEs
retrieve boot images and scripts from BCC - System Resource Manager (SRM) Runtime central
resource manager - System Node Manager (SNM) Interface with
PlanetLab Central (PLC) - http daemon providing a node specific interface
to netflow data (planetflow) - netflow runtime database and data management
- User authentication and ssh forwarding daemon
- Routing protocol daemon (BGP/OSPF/RIP) for
maintaining FIB in Line Card - General Purpose Element (GPE)
- Local Boot Manager (LBM) Modified BootManager
running on the GPEs - Resource Manager Proxy (RMP)
- Node Manager Proxy (NMP), the required changes to
existing Node Manager software. - Network Processor Element (NPE)
- Substrate Control Daemon (SCD)
- TCAM Library
- kernel module to read/write memory locations
(wumod)
Working, fredk
7Slice-Centric View
MI1
q0
SRAM block
DRAM block
MI1 (tunnel)
...
wrr
stats
BW1,min
qi
MI1 myIP, Port1
qparams
Fast path Slicex
Slicex
MI2
qj
...
qlen, threshold, weight
MI2 (tunnel)
MIn myIP, Portn
...
wrr
TCAM (Filters)
BW2,min
substrate slice state
qk
- Allocate and free fast path code option
instance, NPE resources and interface BW - Manage interfaces
- Get interface attributesifn, type, ipaddr,
linkBW, availBW, ... - If peering then get peers IP address
- Allocate aggregate interface bandwidth
- Allocate external port number(s)
- Define meta-interfaces
- Substrate adds line card filter(s)
- Slice may specify minimum BW
- Associate queues with meta-interfaces
- Substrate has to map meta-interface numbers used
in TCAM filters to the corresponding local
addresses - Manage queue parameters, get queue length
- threshold, bandwidth (weight)
...
max Buffers
qGPE
MIm
qj
VLAN
MIn (tunnel)
max weights
...
wrr
BWn,min
ql
GPE
- Manage TCAM filters
- add, remove, update, get, lookup
- Substrate remaps slice ids (qid, fid, mi, stats)
to global identifier - One-time or periodic statistics
- Periodic uses polled or callback model
- Read and write SRAM
- Substrate verifies address and length
- Extended to also support DRAM memory
8RMP Interface
qlen get_queue_len(qid) retcode write_fltr(fid,
keyN, maskN, resultM) retcode
update_result(fid, resultM) fltr get_fltr(fid),
fltr_t get_fltr(keyN) resultM
lookup_fltr(key) retcode rem_fltr(fid), recode
rem_fltr(keyN) uint32, tstamp
read_stats(index, location, what) handle
create_periodic(id,P,cnt,type,loc,what) retcode
delete_periodic(handle) retcode
set_callback(handle, xport) stats_t
get_periodic(handle) ret_t mem_write(offset,
len, data) data mem_read(offset, len)
retcode alloc_fast_path(copt, atype,
attrs) retcode free_fast_path() entry,...
get_interfaces() entry ifn, type, ipaddr,
linkBW, availBW entry get_ifattrs(ifn) ipaddr
get_ifpeer(ifn) retcode alloc_ifbw(ifn, bw) port
alloc_port(ipaddr, port, proto) port
alloc_port(ipaddr, port, proto) mi
add_endpoint(ep_type, params, BW) mi
add_udp_endpoint(ipaddr, port, BW) ep_type,
params get_endpoint(mi) retcode bind_queue(mi,
list_type, qid_list) retcode set_queue_params(qid,
thresh, weight) threshold, weight
get_queue_params(qid)
9Short Term Milestones
- 21/09/07
- SRM-SCD alloc_fp() Does not include tcamLib.
fred, mart. - RMP-SRM noop(). fred, mike.
- 28/09/07 (1 week delta)
- SRM-SCD rem_fp(xsid) Does not include tcamLib.
fred, mart. - tcamLib API tests config file with search
machine and multiple DBs reasonably complex DB
(say ising jdds configurations for SIGCOM).
jonathon. - rudiments of SNMP interface to SRM. fred
- 05/10/07 (1 week delta)
- SCD alloc_fp/free_fp using tcamLib retest
asynchronous copt_freed(). Includes configuration
file with target search machines and default
entries for both NPE and LC. mart, jonathon. - SCD simple client driven tests of tcam
operations (add, remove, update,lookup). mart,
jonathon - SCD-SRM fp_freed(xsid). Asynchronous free when
slice queues are non-empty. fred, mart. - RMP-SRM-SCD alloc_fp(...) and free_fp(). mike,
mart, fred. - 12/10/07 (1 week delta)
- RMP send commands from slice to RMP using UNIX
domain sockets. Map slice to its planetlab id
(PlabID). fred, mike - Configure HUB using snmp from srm
initialization, hardware discovery, add/remove
VLAN. fred - 19/10/07 (1 week delta)
- IDT kernel module, locking. jonathon
- SRM Interface and bandwidth management. verify
interface management with simple client
get_interfaces(), get_ifattrs(),
get_ifpeer(),alloc_ifbw(). fred - RMP-SCD tcam operations write_filter(),
update_result(), get_filter(), lookup_fltr(),
rem_fltr(). Must add code to map MI in filter to
internal representation and prepend the VLAN tag.
mike, jonathon, mart.
10Example Outlining Slice Interface and Abstractions
Slice Interface and Queue Allocations Port,
BW, QList Qlist qid, weight, threshold,...
NPE
wrr
Physical Port (Interface) Attributes ifn,
type, ipaddr, linkBW, availBW ifn Interface
number type Internet, Peering Operations get_
interfaces() get_ifattrs(ifn) get_ifpeer(ifn) a
lloc_ifbw(ifn,bw)
q10
q11
FP slice1
...
qid in 0...n-1
BW11
q1n
q20
LC
q21
FP slice2
...
qid in 0...m-1
q2m
wrr
FP1
BW1
FP2
ipAddr
BW11 BW21 BW1
GPE
linkBW
GPE
BW21
11- QM throughput estimates, up to 20 schedulers
- 5 schedulers per microengine. 4 uengines
- 2.5 Gbps per microengine
- 1 Gbps per scheduler
- Add scd commands
- initialize static code option/substrate
memory/tables - parse block
- header format
- queue manager
- ???
- load microengine code
- Use second VLAN tag to represent meta-interface
12Single Interface Example
LC
- LC Ingress
- One queue per slice with reserved bandwidth
(really one per scheduler) - One queue for best effort traffic to each GPE
- One scheduler for CP with queues for reserved
traffic plus BE - LC Egress
- At least one scheduler for each physical
interface - One queue for each active slice with MI defined
for the associated scheduler - One best effort queue for each board (GPE, CP,
NPE?) - NPE
- Slice binds queues to meta-interfaces, hence
physical interfaces - Slice either reserves BW on a physical interface
or it is assigned the minimum - Substrate assigns a per interface maximum weight
for each slice - Substrate sets scheduler rates according to
aggregate allocations - Manage scheduler rates to control aggregate
traffic to interfaces and boards.
Ingress
wrr
qxs1
SchedNPE1
qxs2
...
qxsn
...
qps1
dst addr protoport/icmp
qps2
SchedGPE1
...
qpsn
qBE
wrr
...
interface 1
SchedCP
Egress
NPE
BWNPE1,GPE1
local delivery and exception
VLAN
wrr
qs1
...
qw,GPE1
wrr
...
qs2
Total weight for all slices (i) and queues (j)
max weight for interface I1 (Wk)
SchedI1
...
...
qsn
slices minimum allocated BW
q11 w11
q12 w12
q1n w1n
...
qp1 wp1
qp2 wp2
qpm wpm
...
BWI1
...
qGPE
src addr proto port/icmp
...
wrr
BWNPE1,I1
SchedI1
qCP
scheduler rate
...
GPE
minimum weight 1 MTU sized packet
13Two Interface Example Setting Queue Weights
Slice i, slice qid j and scheduler k.
NPE
wrr
q10
q11
...
LC
BW11
q1n
wrr
to interface 1
FP1
q20
BW1
FP2
IP1
q21
...
linkBW
GPE
q2m
to interface 2
q10
wrr
q11
to interface 1
FP1
BW2
...
FP2
IP2
BW12
q1n
linkBW
GPE
q20
to interface 2
q21
...
q2m
wrr
14Allocating Code Option Instance
- Slice to RMP
- npeIP alloc_fast_path(copt, atype, attrs)
- uint16_t copt NPE code option IPv40,I31
- uint16_t atype Reservation type Shared 0,
Firm 1 - uint32_t attrib Array of resource
allocation parameters - attrib_t uint32_t bw, pps // bits/second,
packets/second - uint32_t fltrs, queues, buffers,
stats // totals - uint32_t sram, dram // memory
block size in Bytes -
- RMP to SRM
- xsid, npeIP alloc_fast_path(PlabID, copt,
atype, attrs) - uint32_t PlabID GPE/PlanetLab slice
identifier. The SRM allocates an internal slice
identifier unique within the SPP node. All
substrate operations use the xsid. - SRM to SCD
- set_fast_path(xsid,copt,VLAN,TParams,Mem)
- uint16_t xsid internal slice id.
- uint16_t VLAN
- uint32_t TParams Qs, Fltrs, Buffers,
Stats, - mem_t Mem SRAMOffset, Size,DRAMOffset
, Size
15Allocating NPE (Creating Meta-Router)
Cache assigned xsid. Open local socket for
exception and local delivery traffic return to
client vserver
Allocate code option copt, fltrs, Qs, stats,
buffs, SRAM, DRAM
NPE
FP - fast path
PE
GPE
NMP
Host (located within node)
FPk
SCD
RMP
root context
planetlab OS
Slice PlabID requests code option copt with
resources params
Returns status and assigned global Port number
Send message to SCD informing it of the new
allocation xsid, VLAN, params
4
3
2
1
VLANk
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
MI1
user login info
Resource DB
If sufficient resources available then assign
internal slice identifier (xsid) and associate
with allocation Slice, VLAN, NPEcopt, fltrs,
Qs, stats, buffs, SRAM, DRAM , EP , MI,
GPE IP, control Port
PLC
sliver tbl
Allocate and Enable VLAN to isolate internal
slice traffic, VLANk
16SRM Allocating NPE Resources
- Actions required to allocate code option instance
and resources - Select NPE
- Load balance across available NPEs of eligible
NPEs, select the one with the greatest head
room. - Eligible if sufficient resources (SRAM, TCAM
space, queues, etc) - Select NPE with greatest firm BW and PPS. If tie
then select greatest available soft resources,
else pick lowest numbered. - Either allocates requested resources or returns
error - Keeps memory map of SRAM (and DRAM) so can
perform allocation, though the absolute starting
address is not required. - If compaction is necessary then must communicate
with SCD directly. - Allocate VLAN and configure switch.
- Send command to selected NPEs SCD
- set_fast_path(xsid,copt,VLAN,Params)
- SCD updates tables and creates local mappings.
17Freeing Code Option Instance (Fast path)
- Slice to RMP and RMP to SRM
- void free_fast_path() requires asynchronous
processing by SRM and SCD. - SRM to SCD
- Success/Pending/Failed rem_fast_path(xsid)
- SRM first sends request to SCD directing it to
free all resources assigned to slice xsid. - the SCD first disables the fast path and GPE
(how?) so no new packets will be processed. - The it checks all queues assigned to xsid. If all
are empty then resources are freed and a Success
response is sent to the SRM - Else, if packets are in any of the queues then
the SCD must send a Pending response to the SRM
and periodically check all queues assigned to
xsid. When they are all empty the SCD sends a
asynchronous Successful-deallocation message
(which includes the Slices xsid)to the SRM
notifying it that all resources associated with
xsid are now free. - If the SCD returns Success the the SRM marks the
resources as available and removes the slice from
its internal xsid (fast path) tables). - If the SCD returns Pending then the SCD registers
a call back method which is called when the SCD
sends the resource freed message. - Regardless of whether the resources are freed
immediately or asynchronously the SRM returns
Success to the RMP.
18Comments
- Pace SCD message processing
- Drop threshold using packets or packets and
length - Limit BW over allocations
- Use fast path, not slice
- GPE traffic to NPE turned off when freeing fast
path - How long to wait for Qs to drain
- Turn off FP using vlan table