Supercharged PlanetLab Platform, Control Overview - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Supercharged PlanetLab Platform, Control Overview

Description:

Second NP hosts multiple slice fast-paths. multiple static code ... Node Manager Proxys (per GPE) list of allocated slivers along with other node specific data ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 29
Provided by: fredk5
Category:

less

Transcript and Presenter's Notes

Title: Supercharged PlanetLab Platform, Control Overview


1
Supercharged PlanetLab Platform, Control Overview
  • Fred Kuhns
  • fredk_at_arl.wustl.edu
  • Applied Research Laboratory
  • Washington University in St. Louis

2
Prototype Organization
  • One NP blade (with RTM) implements Line Card
  • separate ingress/egress pipelines
  • Second NP hosts multiple slice fast-paths
  • multiple static code options for diverse slices
  • configurable filters and queues
  • GPEs run standard Planetlab OS with vServers

3
Connecting an SPP
East Coast
Local/Regional
West Coast
Host
plab/SPP
ARP endstations and intermediate routers
plab/SPP
plab/SPP
Ethernet SW
point-to-point
point-to-point
SPP
For now assume there is just a single connection
to the public Internet
sw
Host
4
System Block Diagram
PLC
ReBoot how??
Substrate Control Daemon (SCD) Boot and
Configuration Control (BCC)
External Interfaces
RTM
RTM
SPP Node
10 x 1GbE
NPE
NPE
NPE
GPE
LC
GPE
ARP Table FIB
NAT Tunnel filters (in/out)
bootcd
Power Control Unit (has own IP Address)
cacert.pem boot_server plnode.txt
PCI
PCI
pl_netflow
user slivers
flow stats (netflow)
xscale
xscale
xscale
xscale




sppnode.txt
NPU-A
NPU-B
NPU-A
NPU-B
TCAM
TCAM
GE
GE
vnet
SPI
SPI
interfaces
Hub
Fabric Ethernet Switch (10Gbps, data path)
move pl_netflow to cp?
Base Ethernet Switch (1Gbps, control)
manage LC Tables
I2C (IPMI)
Control Processor (CP)
Standalone GPEs
tftp, dhcpd
sshd
httpd
routed
Resource DB
nodeconf.xml
Shelf manager
route DB
boot files
Slivers DB
user info
flow stats
All flow monitoring done at Line Card
5
Software Components
  • Utilities parts of BCC to generate config and
    distribution files
  • Node configuration and management generate
    config files, dhcp, tftp, ramdisk
  • Boot CD and distribution file management (images,
    RPM and tar files) for GPEs and CP.
  • Control processor
  • Boot and Configuration Control (BCC)
  • System Resource Manager (SRM)
  • System Node Manager (SNM)
  • user authentication and ssh forwarding daemon
  • http daemon providing a node specific interface
    to netflow data (planetflow)
  • Routing protocol daemon (BGP/OSPF/RIP) for
    maintaining FIB in Line Card
  • General Purpose Element (GPE)
  • Local Boot Manager (LBM) Modified BootManager
    running on the GPEs
  • Resource Manager Proxy (RMP)
  • Node Manager Proxy (NMP), that is the required
    changes to existing Node Manager software.
  • Network Processor Element (NPE)
  • Substrate Control Daemon (SCD, formally known as
    wuserv)
  • kernel module to read/write memory locations
    (wumod)
  • Command interpreter for configuring NPU memory
    (wucmd)
  • Modified Radisys and Intel source ramdisk Linux
    kernel

6
Boot and Configuration Control
7
Boot and Configuration Control
  • Read config file and allocate IP subnets and
    addresses for substrate
  • Initialize Hub (delegate to SRM)
  • base and fabric switches
  • Initialize any switches not within the chassis
  • Create dhcp configuration file and start daemon
  • assigns control IP subnets and addresses
  • assigns internal substrate IP subnet on fabric
    Ethernet
  • Initialize Line Card to forward all traffic to CP
  • Use the control interface, base or front panel
    (Base only connected to NPUA).
  • All ingress traffic sent to CP
  • What about Egress traffic when we are
    multi-homed, either through different physical
    ports or one port with more than one next hop?
  • We could assume only one physical port and one
    next hop.
  • This is a general issue, the general solution is
    to run routing protocols on the CP and keep the
    line cards TCAM up to date.
  • Start remaining system level services (i.e.
    daemons)
  • wuarl daemons
  • system daemons sshd, httpd, routed
  • System Node Manager maintains user login
    information for ssh forwarding

8
Boot and Configuration Control
  • Assist GPE in booting
  • Download from PLC SPP specific version of the
    BootManager and NodeManager tar/rpm
    distributions.
  • Downloads/maintains Planetlab bootstrap
    distribution
  • Updated BootCD
  • The boot CD contains SPP config file with CP
    address, spp_config.
  • No modifications to initial boot scripts, they
    contact the BCC over the fabric interface (using
    the substrate IP subnet) and download the next
    stage.
  • GPEs obtain distribution files from the BCC on
    the CP
  • SPP changes are confined to the BootManager and
    NodeManager sources (that is the plan)
  • PLC Database updated to place all SPP nodes in
    the SPP Node Group, we use this to trigger
    additional special processing.
  • Modified BootManager scripts configure control
    interfaces (Base) and 2 Fabric interfaces (2 per
    Hub).
  • Creates/Updates spp_config file on GPE node
  • Installs BootStrap source then overwrites the
    NodeManager with our modified version.

9
Node Manager
10
System Node Manager
  • Logically the top-half of the PlanetLab Node
    Manager
  • PLC API method GetSlivers()
  • periodically call PLC for current list of slices
    assigned to this node
  • assign system slivers to each GPE, then split
    application slivers across available GPEs
  • keep persistent tables to handle daemon crashes
    or local device reboots
  • Local GetSlivers() (xmlrpc interface) to GPEs
  • Node Manager Proxys (per GPE) list of allocated
    slivers along with other node specific
    datatimestamp, list of configuration files,
    node id, node groups, network addresses, assigned
    slivers
  • Resource management across GPEs
  • Manage Pool and VM RSpec assignment for each GPE
  • opportunity to extend RSpecs to account for
    distributed resources.
  • Perform top-half processing of the per GPE NMP
    api (exported to sliver on this only). Calls on
    one GPE may impact resource assignments or sliver
    status on a different GPE
  • Ticket(), GetXIDs(), GetSSHKeys(), Create(),
    Destroy(), Start(), Stop(), GetEffectiveRSpec(),
    GetRSpec(), GetLoans(), validate_loans(),
    SetLoans()
  • Currently the node manager uses CA Certs and SSH
    keys when communicating with PLC, we will need to
    do the same. But we can relax security between
    SNM and the NMPs.
  • Tightly coupled with the System Resource Manager
  • Maintain a globally unique (to the node) Sliver
    ID which corresponds to what we call the
    meta-router ID and make available to SRM when
    enabling fast path processing (VLANs, UDP Port
    numbers etc).
  • must request/maintain list of available GPEs and
    resource availability on each. Used for
    allocating slivers to GPEs and handling RSpecs.
  • SRM may delegate GPE management to SNM.

11
SNM Questions
  • Robustness -- not contemplating for this version
  • If a GPE goes down do we migrate slivers to
    remaining GPEs?
  • If a GPE is added do we migrate some slivers to
    new GPE to load balance?
  • Intermediate solution
  • If GPE goes down then mark the corresponding
    slices as unmapped and reassign to remaining
    GPEs
  • No migration of slivers when GPEs are added, just
    assign new slivers to the new GPE
  • Do we need to intercept any of the API calls made
    against the PLC?
  • What about the boot manager api calls and the
    uploading of boot log files (alpina boot logs)?
  • implementation of the remote reboot command and
    console logging.

12
Node Manager Proxy
  • Bottom-Half of existing Node Manager
  • modify GetSliver() to call the System Node
    Manager.
  • use base interface and different security
    (currently they wrap xmlrpc calls with a curl
    command which includethe PLCs certified public
    key).
  • Forward GPE oriented sliver resource operations
    to SNM see API list in SNM description

13
System Resource Manager
14
System Resource Manager
LC
GPE
NMP
RMP
root context
Resource DB
planetlab OS
NPE
SCD
FPk
FPk
FPk
15
System Resource Manager
  • Maintains table describing system hardware
    components and their attributes
  • NPEs code-options, memory blocks, counters, TCAM
    entries
  • GPEs and HW attributes
  • Sliver attributes corresponding to internal
    representations and control mechanisms
  • unique Sliver ID (aka meta-router ID)
  • global port space across assigned IP addresses
  • fast path VLAN assignment and corresponding IP
    Subnets
  • HUB Management
  • Manage fabric Ethernet switches (including any
    used external to the Chassis or in a
    multi-chassis scenario)
  • Manage base SW
  • Manage line card table entries??

16
System Resource Management
  • Allocate Global port space
  • input Slice ID, Global IP address0, protoUDP,
    Port0
  • actions allocate port
  • output IP Address, Port, Proto or 0 cant
    allocate
  • Allocate Sliver ID
  • input Slice name
  • actions
  • Allocate unique Sliver ID and assign to slice
  • allocate VLAN ID (1-to-1 map of sliver ID to
    VLAN)
  • output Sliver ID, VLAN ID
  • Allocate NPE code option (internal)
  • input Sliver ID, code option id
  • action Assign NPE slot to slice
  • Allocate code option instance from an eligible
    NPE NPE, instance ID
  • Allocate memory block for instance (the instance
    ID is just an index into an array of preallocated
    memory blocks).
  • output NPE Instance NPE ID, Slot Number
  • Allocate Stats Index

17
System Resource manager
  • Add Tunnel (aka Meta-Interface) to NPE Instance
  • input Sliver ID, NPE Instance, IP Address, UDP
    Port
  • actions
  • Add mapping to NPE demux table VLANIP AddrUDP
    Port lt-gt Instance ID
  • Update instances attribute blocktunnel fields,
    exception/local delivery, QID, physical port,
    Ethernet addr for NPE/LC
  • Update next hop table (result index map to next
    hop tunnel)
  • Set default QM weights, number of queues,
    thresholds.
  • Update Line Card Ingress and Egress lookup
    tables tunnel, NPE Ethernet address, physical
    port, QIDs etc.??
  • Update LC ingress and egress queue attributes for
    tunnel??
  • Create NPE Sliver instance
  • Input Slice ID IP address, UDP Port
    Interface ID, Physical Port SRAM block
    filter table entries of queues queues of
    packet buffers code option amount of SRAM
    required total reserved bandwidth
  • Actions
  • Allocate NPE code option
  • Add tunnel to NPE Instance
  • enable Sliver VLAN on associated fabric interface
    ports
  • delegate to RMP configure GPE vnet module (via
    RMP) to accept Slivers VLAN traffic. Open UDP
    Port for data and control in root context and
    pass back to client.
  • output (NPE code option) Instance number

18
Resource Manager Proxy
  • Act as intermediary between client virtual
    machines and the node control infrastructure.
  • all exported interfaces are implemented by the
    RMP
  • managing the life cycle of an NPE code instance
  • accessing instance data and memory locations
  • read/write to code option instances memory block
  • get/set queue attributes threshold, weight
  • get/add/remove/update lookup table entries (i.e.
    TCAM filters)
  • get/clear pre/post queue counters, for a given
    stats index
  • one-time or periodic get
  • get packet/byte counter for tunnel at Line card
  • allocate/release local Port

19
Example Scenarios
20
Default Traffic Configurations
Control messages sent over an isolated base
Ethernet switch. For isolation and security
PE
NPE
GPE
NMP

Line card performs NAT like function for traffic
from vservers.
RMP
MP
root context
planetlab OS
4
3
2
1
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
user login info
Resource DB
Default traffic forwarded to CP over 10Gbps
Ethernet switch (aka fabric)
PLC
sliver tbl
21
Logging Into a Slice
PE
NPE
GPE
NMP
Host (located within node)

RMP
MP
root context
planetlab OS
Once authenticated, session forwarded to
appropriate GPE and vserver.
4
3
2
1
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
ssh fwder
user login info
Resource DB
ssh connection directed to CP for user
authentication
PLC
sliver tbl
22
Update Local Slice Definitions
PE
NPE
GPE
NMP
Host (located within node)

RMP
MP
root context
planetlab OS
4
3
2
1
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
update local database, allocate slice instances
(slivers) to GPE nodes
CP
LC
user login info
Resource DB
retrieve/update slice descriptions
PLC
sliver tbl
23
Creating Local Slice Instance
create new slice
retrieve/update slice descriptions
PE
NPE
GPE
NMP
Host (located within node)

RMP
MP
root context
planetlab OS
4
3
2
1
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
user login info
Resource DB
PLC
sliver tbl
24
Allocating NPE (Creating Meta-Router)
Open local socket for exception and local
delivery traffic return to client vserver
Allocate NPE sliver code option, SRAM,
Interfaces/Ports, etc
NPE
FP - fast path
PE
GPE
NMP
Host (located within node)
FPk

RMP
MP
root context
planetlab OS
Forward request to System resource manager
Returns status and assigned global Port number
4
3
2
1
VLANk
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
MI1
user login info
Resource DB
Allocate shared NPE resources, associate with new
slice fast path SRAM block filter table
entries of queues of packet buffers code
option amount of SRAM required total reserved
bandwidth
Allocate global UDP port for requested
interface(s) configure Line card.
PLC
sliver tbl
Allocate and Enable VLAN to isolate internal
slice traffic, VLANk
25
Managing the Data Path
  • Allocate or Delete NPE Slice instance
  • Add, remove or alter filters
  • each slice is allocated a portion of the NPEs
    TCAM
  • Read or write to per slice memory blocks in SRAM
  • each slice is allocated a block of SRAM
  • Read counters
  • one time or periodic
  • Set Queue rate or threshold.
  • Get queue lengths

NPE
GPE
NMP
DPl
DPl
FPk
RMP
SCD
root context
planetlab OS
2
1
x
x
10GbE (fabric, data)
6
1GbE (base, control)
x
CP
user login info
Resource DB
sliver tbl
FP - fast path
26
Misc Functions
27
Other LC Functions
  • Line Card Table maintenance
  • multi-homed SPP node must be able to send packets
    to the correct next hop router/endsystem
  • random traffic from/to the GPE must be handled
    correctly
  • tunnels represent point-to-point connections so
    it may be alright to explicitly indicate which of
    possibly several interfaces and next (Ethernet)
    hop devices the tunnel should be bound
  • alternatively if were are running the routing
    protocols we could provide the user with the
    output port as a utility program.
  • But there are problems with running routing
    protocols we could forward all route updates to
    the CP. But standard implementations assume the
    interfaces are physically connected to the
    endsystem.
  • We could play tricks as vini does.
  • or we assume that there is only one interface
    connected to one Ethernet device.
  • NAT Functions
  • traffic originating from within SPP
  • may also want to selective map global proto/port
    number to specific GPEs?
  • ARP and FIB on Line card
  • route daemon runs on CP and keeps FIB up to date
  • ARP runs on xscale and maps FIB next hop entries
    to their corresponding Ethernet destination
    addresses.
  • netflow
  • flow-based statistics collection
  • SRM collects periodically and posts via web

28
Other Functions
  • vnet
  • isolation based on VLAN IDs
  • support port reservations
  • ssh forwarding
  • maintain user login information on CP
  • modify ssh daemon (or have wrapper) to forward
    user logins to correct GPE
  • rebooting Node (spp), even when line card fails??
Write a Comment
User Comments (0)
About PowerShow.com