Substrate Control: Overview - PowerPoint PPT Presentation

About This Presentation
Title:

Substrate Control: Overview

Description:

Lookup table entries either NH IP or enet addr ... Must add code to map MI in filter to internal representation and prepend the VLAN tag. ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 19
Provided by: fredk5
Category:

less

Transcript and Presenter's Notes

Title: Substrate Control: Overview


1
Substrate Control Overview
  • Fred Kuhns
  • fredk_at_arl.wustl.edu
  • Applied Research Laboratory
  • Washington University in St. Louis

2
Overview
  • Last time
  • control software architecture
  • component responsibilities
  • basic abstractions meta-interface and tunnels
    TCAM, slice-oriented view of lookup filters and
    example IPv4 LPM entries
  • This time
  • SW architecture update
  • assignments and current efforts
  • assigning bandwidth and queue weights
  • allocating code options and NPE resources

3
System Block Diagram
PLC
ReBoot how??
Substrate Control Daemon (SCD) Boot and
Configuration Control (BCC)
External Interfaces
RTM
RTM
SPP Node
10 x 1GbE
NPE
NPE
NPE
GPE
LC
GPE
ARP Table FIB
NAT Tunnel filters (in/out)
bootcd
Power Control Unit (has own IP Address)
cacert.pem boot_server plnote.txt
PCI
PCI
pl_netflow
user slivers
flow stats (netflow)
xscale
xscale
xscale
xscale




sppnode.txt
NPU-A
NPU-B
NPU-A
NPU-B
TCAM
TCAM
GE
GE
vnet
SPI
SPI
interfaces
Hub
Fabric Ethernet Switch (10Gbps, data path)
move pl_netflow to cp?
Base Ethernet Switch (1Gbps, control)
manage LC Tables
I2C (IPMI)
Control Processor (CP)
Standalone GPEs
tftp, dhcpd
sshd
httpd
routed
Resource DB
nodeconf.xml
Shelf manager
route DB
boot files
Slivers DB
user info
flow stats
All flow monitoring done at Line Card
4
The SPP Node
  • Slice instantiation
  • Allocate VM instance on a GPE
  • may request code option instance, NPEresources
    and interface bandwidth
  • Share a common set of (global) IPaddresses
  • UDP/TCP port space shared across GPE/NPEs
  • Line card TCAM Filters direct traffic
  • send unregistered traffic originating outsidethe
    node to CP.
  • unregistered traffic originating within node
    usesNAT (on line card)
  • application may register server ports. Causes
    filter to be inserted in the line card directing
    traffic to specific GPE
  • application must register ports (or tunnels)
    associated with fast path instances
  • It is assumed that fast path instances will use
    tunnels (overlays) to send traffic between
    routing nodes.
  • Currently we only support UDP tunnels but will
    extend to include GRE and possibly others.

NPE
SRAM
code option
vmx
GPE
NMP
NPE
FPx
GPE
TCAM
RMP
planetlab OS
fabric
local delivery and exceptions (UDP Tunnel)
Egress
Ingress lookup table
SNM
IP route and ARP

SCD (ARP, NAT)
SRM
LC
CP
Ingress
Internet
5
Key Software Control Components
Primary Hub (Logical Slot 1, Channel 1)
Allocate VLANs, enable ports, stats
snmpd
Fabric SW
vlan Table
Base SW
Resource DB
CP
vlan Table
SFP
XFP
XFP
SRM
Assign slices to GPE. Boot management and PLC
proxy.
Filter management, BW allocations and stats
Slice requests to allocate or free resources.
Resource allocations and slice bindings
GPE
vmx
NMP
LC
control
SCD
NPE
RMP
SP
TCAM
MUX
SCD
Slice owned resource management
root context
SRAM
planetlab OS
vnet
FPk
FPk
FPx
TCAM
Exception and Local delivery traffic. Includes
shim header with RxMI.
6
Software Control Components
  • Utilities parts of BCC to generate config and
    distribution files
  • Node configuration and management generate
    config files, dhcp, tftp, ramdisk
  • Boot CD and distribution file management (images,
    RPM and tar files) for GPEs and CP.
  • Control processor
  • Boot and Configuration Control (BCC) GPEs
    retrieve boot images and scripts from BCC
  • System Resource Manager (SRM) Runtime central
    resource manager
  • System Node Manager (SNM) Interface with
    PlanetLab Central (PLC)
  • http daemon providing a node specific interface
    to netflow data (planetflow)
  • netflow runtime database and data management
  • User authentication and ssh forwarding daemon
  • Routing protocol daemon (BGP/OSPF/RIP) for
    maintaining FIB in Line Card
  • General Purpose Element (GPE)
  • Local Boot Manager (LBM) Modified BootManager
    running on the GPEs
  • Resource Manager Proxy (RMP)
  • Node Manager Proxy (NMP), the required changes to
    existing Node Manager software.
  • Network Processor Element (NPE)
  • Substrate Control Daemon (SCD)
  • TCAM Library
  • kernel module to read/write memory locations
    (wumod)

Working, fredk
7
Slice-Centric View
MI1
q0
SRAM block
DRAM block
MI1 (tunnel)
...
wrr
stats
BW1,min
qi
MI1 myIP, Port1
qparams
Fast path Slicex
Slicex
MI2
qj
...
qlen, threshold, weight
MI2 (tunnel)
MIn myIP, Portn
...
wrr
TCAM (Filters)
BW2,min
substrate slice state
qk
  • Allocate and free fast path code option
    instance, NPE resources and interface BW
  • Manage interfaces
  • Get interface attributesifn, type, ipaddr,
    linkBW, availBW, ...
  • If peering then get peers IP address
  • Allocate aggregate interface bandwidth
  • Allocate external port number(s)
  • Define meta-interfaces
  • Substrate adds line card filter(s)
  • Slice may specify minimum BW
  • Associate queues with meta-interfaces
  • Substrate has to map meta-interface numbers used
    in TCAM filters to the corresponding local
    addresses
  • Manage queue parameters, get queue length
  • threshold, bandwidth (weight)

...
max Buffers
qGPE
MIm
qj
VLAN
MIn (tunnel)
max weights
...
wrr
BWn,min
ql
GPE
  • Manage TCAM filters
  • add, remove, update, get, lookup
  • Substrate remaps slice ids (qid, fid, mi, stats)
    to global identifier
  • One-time or periodic statistics
  • Periodic uses polled or callback model
  • Read and write SRAM
  • Substrate verifies address and length
  • Extended to also support DRAM memory

8
RMP Interface
qlen get_queue_len(qid) retcode write_fltr(fid,
keyN, maskN, resultM) retcode
update_result(fid, resultM) fltr get_fltr(fid),
fltr_t get_fltr(keyN) resultM
lookup_fltr(key) retcode rem_fltr(fid), recode
rem_fltr(keyN) uint32, tstamp
read_stats(index, location, what) handle
create_periodic(id,P,cnt,type,loc,what) retcode
delete_periodic(handle) retcode
set_callback(handle, xport) stats_t
get_periodic(handle) ret_t mem_write(offset,
len, data) data mem_read(offset, len)
retcode alloc_fast_path(copt, atype,
attrs) retcode free_fast_path() entry,...
get_interfaces() entry ifn, type, ipaddr,
linkBW, availBW entry get_ifattrs(ifn) ipaddr
get_ifpeer(ifn) retcode alloc_ifbw(ifn, bw) port
alloc_port(ipaddr, port, proto) port
alloc_port(ipaddr, port, proto) mi
add_endpoint(ep_type, params, BW) mi
add_udp_endpoint(ipaddr, port, BW) ep_type,
params get_endpoint(mi) retcode bind_queue(mi,
list_type, qid_list) retcode set_queue_params(qid,
thresh, weight) threshold, weight
get_queue_params(qid)
9
Short Term Milestones
  • 21/09/07
  • SRM-SCD alloc_fp() Does not include tcamLib.
    fred, mart.
  • RMP-SRM noop(). fred, mike.
  • 28/09/07 (1 week delta)
  • SRM-SCD rem_fp(xsid) Does not include tcamLib.
    fred, mart.
  • tcamLib API tests config file with search
    machine and multiple DBs reasonably complex DB
    (say ising jdds configurations for SIGCOM).
    jonathon.
  • rudiments of SNMP interface to SRM. fred
  • 05/10/07 (1 week delta)
  • SCD alloc_fp/free_fp using tcamLib retest
    asynchronous copt_freed(). Includes configuration
    file with target search machines and default
    entries for both NPE and LC. mart, jonathon.
  • SCD simple client driven tests of tcam
    operations (add, remove, update,lookup). mart,
    jonathon
  • SCD-SRM fp_freed(xsid). Asynchronous free when
    slice queues are non-empty. fred, mart.
  • RMP-SRM-SCD alloc_fp(...) and free_fp(). mike,
    mart, fred.
  • 12/10/07 (1 week delta)
  • RMP send commands from slice to RMP using UNIX
    domain sockets. Map slice to its planetlab id
    (PlabID). fred, mike
  • Configure HUB using snmp from srm
    initialization, hardware discovery, add/remove
    VLAN. fred
  • 19/10/07 (1 week delta)
  • IDT kernel module, locking. jonathon
  • SRM Interface and bandwidth management. verify
    interface management with simple client
    get_interfaces(), get_ifattrs(),
    get_ifpeer(),alloc_ifbw(). fred
  • RMP-SCD tcam operations write_filter(),
    update_result(), get_filter(), lookup_fltr(),
    rem_fltr(). Must add code to map MI in filter to
    internal representation and prepend the VLAN tag.
    mike, jonathon, mart.

10
Example Outlining Slice Interface and Abstractions
Slice Interface and Queue Allocations Port,
BW, QList Qlist qid, weight, threshold,...
NPE
wrr
Physical Port (Interface) Attributes ifn,
type, ipaddr, linkBW, availBW ifn Interface
number type Internet, Peering Operations get_
interfaces() get_ifattrs(ifn) get_ifpeer(ifn) a
lloc_ifbw(ifn,bw)
q10
q11
FP slice1
...
qid in 0...n-1
BW11
q1n
q20
LC
q21
FP slice2
...
qid in 0...m-1
q2m
wrr
FP1
BW1
FP2
ipAddr
BW11 BW21 BW1
GPE
linkBW
GPE
BW21
11
  • QM throughput estimates, up to 20 schedulers
  • 5 schedulers per microengine. 4 uengines
  • 2.5 Gbps per microengine
  • 1 Gbps per scheduler
  • Add scd commands
  • initialize static code option/substrate
    memory/tables
  • parse block
  • header format
  • queue manager
  • ???
  • load microengine code
  • Use second VLAN tag to represent meta-interface

12
Single Interface Example
LC
  • LC Ingress
  • One queue per slice with reserved bandwidth
    (really one per scheduler)
  • One queue for best effort traffic to each GPE
  • One scheduler for CP with queues for reserved
    traffic plus BE
  • LC Egress
  • At least one scheduler for each physical
    interface
  • One queue for each active slice with MI defined
    for the associated scheduler
  • One best effort queue for each board (GPE, CP,
    NPE?)
  • NPE
  • Slice binds queues to meta-interfaces, hence
    physical interfaces
  • Slice either reserves BW on a physical interface
    or it is assigned the minimum
  • Substrate assigns a per interface maximum weight
    for each slice
  • Substrate sets scheduler rates according to
    aggregate allocations
  • Manage scheduler rates to control aggregate
    traffic to interfaces and boards.

Ingress
wrr
qxs1
SchedNPE1
qxs2
...
qxsn
...
qps1
dst addr protoport/icmp
qps2
SchedGPE1
...
qpsn
qBE
wrr
...
interface 1
SchedCP
Egress
NPE
BWNPE1,GPE1
local delivery and exception
VLAN
wrr
qs1
...
qw,GPE1
wrr
...
qs2
Total weight for all slices (i) and queues (j)
max weight for interface I1 (Wk)
SchedI1
...
...
qsn
slices minimum allocated BW
q11 w11
q12 w12
q1n w1n
...
qp1 wp1
qp2 wp2
qpm wpm
...
BWI1
...
qGPE
src addr proto port/icmp
...
wrr
BWNPE1,I1
SchedI1
qCP
scheduler rate
...
GPE
minimum weight 1 MTU sized packet
13
Two Interface Example Setting Queue Weights
Slice i, slice qid j and scheduler k.
NPE
wrr
q10
q11
...
LC
BW11
q1n
wrr
to interface 1
FP1
q20
BW1
FP2
IP1
q21
...
linkBW
GPE
q2m
to interface 2
q10
wrr
q11
to interface 1
FP1
BW2
...
FP2
IP2
BW12
q1n
linkBW
GPE
q20
to interface 2
q21
...
q2m
wrr
14
Allocating Code Option Instance
  • Slice to RMP
  • npeIP alloc_fast_path(copt, atype, attrs)
  • uint16_t copt NPE code option IPv40,I31
  • uint16_t atype Reservation type Shared 0,
    Firm 1
  • uint32_t attrib Array of resource
    allocation parameters
  • attrib_t uint32_t bw, pps // bits/second,
    packets/second
  • uint32_t fltrs, queues, buffers,
    stats // totals
  • uint32_t sram, dram // memory
    block size in Bytes
  • RMP to SRM
  • xsid, npeIP alloc_fast_path(PlabID, copt,
    atype, attrs)
  • uint32_t PlabID GPE/PlanetLab slice
    identifier. The SRM allocates an internal slice
    identifier unique within the SPP node. All
    substrate operations use the xsid.
  • SRM to SCD
  • set_fast_path(xsid,copt,VLAN,TParams,Mem)
  • uint16_t xsid internal slice id.
  • uint16_t VLAN
  • uint32_t TParams Qs, Fltrs, Buffers,
    Stats,
  • mem_t Mem SRAMOffset, Size,DRAMOffset
    , Size

15
Allocating NPE (Creating Meta-Router)
Cache assigned xsid. Open local socket for
exception and local delivery traffic return to
client vserver
Allocate code option copt, fltrs, Qs, stats,
buffs, SRAM, DRAM
NPE
FP - fast path
PE
GPE
NMP
Host (located within node)
FPk
SCD

RMP
root context
planetlab OS
Slice PlabID requests code option copt with
resources params
Returns status and assigned global Port number
Send message to SCD informing it of the new
allocation xsid, VLAN, params
4
3
2
1
VLANk
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
MI1
user login info
Resource DB
If sufficient resources available then assign
internal slice identifier (xsid) and associate
with allocation Slice, VLAN, NPEcopt, fltrs,
Qs, stats, buffs, SRAM, DRAM , EP , MI,
GPE IP, control Port
PLC
sliver tbl
Allocate and Enable VLAN to isolate internal
slice traffic, VLANk
16
SRM Allocating NPE Resources
  • Actions required to allocate code option instance
    and resources
  • Select NPE
  • Load balance across available NPEs of eligible
    NPEs, select the one with the greatest head
    room.
  • Eligible if sufficient resources (SRAM, TCAM
    space, queues, etc)
  • Select NPE with greatest firm BW and PPS. If tie
    then select greatest available soft resources,
    else pick lowest numbered.
  • Either allocates requested resources or returns
    error
  • Keeps memory map of SRAM (and DRAM) so can
    perform allocation, though the absolute starting
    address is not required.
  • If compaction is necessary then must communicate
    with SCD directly.
  • Allocate VLAN and configure switch.
  • Send command to selected NPEs SCD
  • set_fast_path(xsid,copt,VLAN,Params)
  • SCD updates tables and creates local mappings.

17
Freeing Code Option Instance (Fast path)
  • Slice to RMP and RMP to SRM
  • void free_fast_path() requires asynchronous
    processing by SRM and SCD.
  • SRM to SCD
  • Success/Pending/Failed rem_fast_path(xsid)
  • SRM first sends request to SCD directing it to
    free all resources assigned to slice xsid.
  • the SCD first disables the fast path and GPE
    (how?) so no new packets will be processed.
  • The it checks all queues assigned to xsid. If all
    are empty then resources are freed and a Success
    response is sent to the SRM
  • Else, if packets are in any of the queues then
    the SCD must send a Pending response to the SRM
    and periodically check all queues assigned to
    xsid. When they are all empty the SCD sends a
    asynchronous Successful-deallocation message
    (which includes the Slices xsid)to the SRM
    notifying it that all resources associated with
    xsid are now free.
  • If the SCD returns Success the the SRM marks the
    resources as available and removes the slice from
    its internal xsid (fast path) tables).
  • If the SCD returns Pending then the SCD registers
    a call back method which is called when the SCD
    sends the resource freed message.
  • Regardless of whether the resources are freed
    immediately or asynchronously the SRM returns
    Success to the RMP.

18
Comments
  • Pace SCD message processing
  • Drop threshold using packets or packets and
    length
  • Limit BW over allocations
  • Use fast path, not slice
  • GPE traffic to NPE turned off when freeing fast
    path
  • How long to wait for Qs to drain
  • Turn off FP using vlan table
Write a Comment
User Comments (0)
About PowerShow.com