Supercharged PlanetLab Platform, Control Overview

About This Presentation

Title:

Supercharged PlanetLab Platform, Control Overview

Description:

Second NP hosts multiple slice fast-paths. multiple static code ... Node Manager Proxys (per GPE) list of allocated slivers along with other node specific data ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 29

Provided by: fredk5

Category:

more less

Transcript and Presenter's Notes

Title: Supercharged PlanetLab Platform, Control Overview

1
Supercharged PlanetLab Platform, Control Overview

Fred Kuhns
fredk_at_arl.wustl.edu
Applied Research Laboratory
Washington University in St. Louis

2
Prototype Organization

One NP blade (with RTM) implements Line Card
separate ingress/egress pipelines
Second NP hosts multiple slice fast-paths
multiple static code options for diverse slices
configurable filters and queues
GPEs run standard Planetlab OS with vServers

3
Connecting an SPP
East Coast
Local/Regional
West Coast
Host
plab/SPP
ARP endstations and intermediate routers
plab/SPP
plab/SPP
Ethernet SW
point-to-point
point-to-point
SPP
For now assume there is just a single connection
to the public Internet
sw
Host
4
System Block Diagram
PLC
ReBoot how??
Substrate Control Daemon (SCD) Boot and
Configuration Control (BCC)
External Interfaces
RTM
RTM
SPP Node
10 x 1GbE
NPE
NPE
NPE
GPE
LC
GPE
ARP Table FIB
NAT Tunnel filters (in/out)
bootcd
Power Control Unit (has own IP Address)
cacert.pem boot_server plnode.txt
PCI
PCI
pl_netflow
user slivers
flow stats (netflow)
xscale
xscale
xscale
xscale

sppnode.txt
NPU-A
NPU-B
NPU-A
NPU-B
TCAM
TCAM
GE
GE
vnet
SPI
SPI
interfaces
Hub
Fabric Ethernet Switch (10Gbps, data path)
move pl_netflow to cp?
Base Ethernet Switch (1Gbps, control)
manage LC Tables
I2C (IPMI)
Control Processor (CP)
Standalone GPEs
tftp, dhcpd
sshd
httpd
routed
Resource DB
nodeconf.xml
Shelf manager
route DB
boot files
Slivers DB
user info
flow stats
All flow monitoring done at Line Card
5
Software Components

Utilities parts of BCC to generate config and
distribution files
Node configuration and management generate
config files, dhcp, tftp, ramdisk
Boot CD and distribution file management (images,
RPM and tar files) for GPEs and CP.
Control processor
Boot and Configuration Control (BCC)
System Resource Manager (SRM)
System Node Manager (SNM)
user authentication and ssh forwarding daemon
http daemon providing a node specific interface
to netflow data (planetflow)
Routing protocol daemon (BGP/OSPF/RIP) for
maintaining FIB in Line Card
General Purpose Element (GPE)
Local Boot Manager (LBM) Modified BootManager
running on the GPEs
Resource Manager Proxy (RMP)
Node Manager Proxy (NMP), that is the required
changes to existing Node Manager software.
Network Processor Element (NPE)
Substrate Control Daemon (SCD, formally known as
wuserv)
kernel module to read/write memory locations
(wumod)
Command interpreter for configuring NPU memory
(wucmd)
Modified Radisys and Intel source ramdisk Linux
kernel

6
Boot and Configuration Control
7
Boot and Configuration Control

Read config file and allocate IP subnets and
addresses for substrate
Initialize Hub (delegate to SRM)
base and fabric switches
Initialize any switches not within the chassis
Create dhcp configuration file and start daemon
assigns control IP subnets and addresses
assigns internal substrate IP subnet on fabric
Ethernet
Initialize Line Card to forward all traffic to CP
Use the control interface, base or front panel
(Base only connected to NPUA).
All ingress traffic sent to CP
What about Egress traffic when we are
multi-homed, either through different physical
ports or one port with more than one next hop?
We could assume only one physical port and one
next hop.
This is a general issue, the general solution is
to run routing protocols on the CP and keep the
line cards TCAM up to date.
Start remaining system level services (i.e.
daemons)
wuarl daemons
system daemons sshd, httpd, routed
System Node Manager maintains user login
information for ssh forwarding

8
Boot and Configuration Control

Assist GPE in booting
Download from PLC SPP specific version of the
BootManager and NodeManager tar/rpm
distributions.
Downloads/maintains Planetlab bootstrap
distribution
Updated BootCD
The boot CD contains SPP config file with CP
address, spp_config.
No modifications to initial boot scripts, they
contact the BCC over the fabric interface (using
the substrate IP subnet) and download the next
stage.
GPEs obtain distribution files from the BCC on
the CP
SPP changes are confined to the BootManager and
NodeManager sources (that is the plan)
PLC Database updated to place all SPP nodes in
the SPP Node Group, we use this to trigger
additional special processing.
Modified BootManager scripts configure control
interfaces (Base) and 2 Fabric interfaces (2 per
Hub).
Creates/Updates spp_config file on GPE node
Installs BootStrap source then overwrites the
NodeManager with our modified version.

9
Node Manager
10
System Node Manager

Logically the top-half of the PlanetLab Node
Manager
PLC API method GetSlivers()
periodically call PLC for current list of slices
assigned to this node
assign system slivers to each GPE, then split
application slivers across available GPEs
keep persistent tables to handle daemon crashes
or local device reboots
Local GetSlivers() (xmlrpc interface) to GPEs
Node Manager Proxys (per GPE) list of allocated
slivers along with other node specific
datatimestamp, list of configuration files,
node id, node groups, network addresses, assigned
slivers
Resource management across GPEs
Manage Pool and VM RSpec assignment for each GPE
opportunity to extend RSpecs to account for
distributed resources.
Perform top-half processing of the per GPE NMP
api (exported to sliver on this only). Calls on
one GPE may impact resource assignments or sliver
status on a different GPE
Ticket(), GetXIDs(), GetSSHKeys(), Create(),
Destroy(), Start(), Stop(), GetEffectiveRSpec(),
GetRSpec(), GetLoans(), validate_loans(),
SetLoans()
Currently the node manager uses CA Certs and SSH
keys when communicating with PLC, we will need to
do the same. But we can relax security between
SNM and the NMPs.
Tightly coupled with the System Resource Manager
Maintain a globally unique (to the node) Sliver
ID which corresponds to what we call the
meta-router ID and make available to SRM when
enabling fast path processing (VLANs, UDP Port
numbers etc).
must request/maintain list of available GPEs and
resource availability on each. Used for
allocating slivers to GPEs and handling RSpecs.
SRM may delegate GPE management to SNM.

11
SNM Questions

Robustness -- not contemplating for this version
If a GPE goes down do we migrate slivers to
remaining GPEs?
If a GPE is added do we migrate some slivers to
new GPE to load balance?
Intermediate solution
If GPE goes down then mark the corresponding
slices as unmapped and reassign to remaining
GPEs
No migration of slivers when GPEs are added, just
assign new slivers to the new GPE
Do we need to intercept any of the API calls made
against the PLC?
What about the boot manager api calls and the
uploading of boot log files (alpina boot logs)?
implementation of the remote reboot command and
console logging.

12
Node Manager Proxy

Bottom-Half of existing Node Manager
modify GetSliver() to call the System Node
Manager.
use base interface and different security
(currently they wrap xmlrpc calls with a curl
command which includethe PLCs certified public
key).
Forward GPE oriented sliver resource operations
to SNM see API list in SNM description

13
System Resource Manager
14
System Resource Manager
LC
GPE
NMP
RMP
root context
Resource DB
planetlab OS
NPE
SCD
FPk
FPk
FPk
15
System Resource Manager

Maintains table describing system hardware
components and their attributes
NPEs code-options, memory blocks, counters, TCAM
entries
GPEs and HW attributes
Sliver attributes corresponding to internal
representations and control mechanisms
unique Sliver ID (aka meta-router ID)
global port space across assigned IP addresses
fast path VLAN assignment and corresponding IP
Subnets
HUB Management
Manage fabric Ethernet switches (including any
used external to the Chassis or in a
multi-chassis scenario)
Manage base SW
Manage line card table entries??

16
System Resource Management

Allocate Global port space
input Slice ID, Global IP address0, protoUDP,
Port0
actions allocate port
output IP Address, Port, Proto or 0 cant
allocate
Allocate Sliver ID
input Slice name
actions
Allocate unique Sliver ID and assign to slice
allocate VLAN ID (1-to-1 map of sliver ID to
VLAN)
output Sliver ID, VLAN ID
Allocate NPE code option (internal)
input Sliver ID, code option id
action Assign NPE slot to slice
Allocate code option instance from an eligible
NPE NPE, instance ID
Allocate memory block for instance (the instance
ID is just an index into an array of preallocated
memory blocks).
output NPE Instance NPE ID, Slot Number
Allocate Stats Index

17
System Resource manager

Add Tunnel (aka Meta-Interface) to NPE Instance
input Sliver ID, NPE Instance, IP Address, UDP
Port
actions
Add mapping to NPE demux table VLANIP AddrUDP
Port lt-gt Instance ID
Update instances attribute blocktunnel fields,
exception/local delivery, QID, physical port,
Ethernet addr for NPE/LC
Update next hop table (result index map to next
hop tunnel)
Set default QM weights, number of queues,
thresholds.
Update Line Card Ingress and Egress lookup
tables tunnel, NPE Ethernet address, physical
port, QIDs etc.??
Update LC ingress and egress queue attributes for
tunnel??
Create NPE Sliver instance
Input Slice ID IP address, UDP Port
Interface ID, Physical Port SRAM block
filter table entries of queues queues of
packet buffers code option amount of SRAM
required total reserved bandwidth
Actions
Allocate NPE code option
Add tunnel to NPE Instance
enable Sliver VLAN on associated fabric interface
ports
delegate to RMP configure GPE vnet module (via
RMP) to accept Slivers VLAN traffic. Open UDP
Port for data and control in root context and
pass back to client.
output (NPE code option) Instance number

18
Resource Manager Proxy

Act as intermediary between client virtual
machines and the node control infrastructure.
all exported interfaces are implemented by the
RMP
managing the life cycle of an NPE code instance
accessing instance data and memory locations
read/write to code option instances memory block
get/set queue attributes threshold, weight
get/add/remove/update lookup table entries (i.e.
TCAM filters)
get/clear pre/post queue counters, for a given
stats index
one-time or periodic get
get packet/byte counter for tunnel at Line card
allocate/release local Port

19
Example Scenarios
20
Default Traffic Configurations
Control messages sent over an isolated base
Ethernet switch. For isolation and security
PE
NPE
GPE
NMP

Line card performs NAT like function for traffic
from vservers.
RMP
MP
root context
planetlab OS
4
3
2
1
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
user login info
Resource DB
Default traffic forwarded to CP over 10Gbps
Ethernet switch (aka fabric)
PLC
sliver tbl
21
Logging Into a Slice
PE
NPE
GPE
NMP
Host (located within node)

RMP
MP
root context
planetlab OS
Once authenticated, session forwarded to
appropriate GPE and vserver.
4
3
2
1
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
ssh fwder
user login info
Resource DB
ssh connection directed to CP for user
authentication
PLC
sliver tbl
22
Update Local Slice Definitions
PE
NPE
GPE
NMP
Host (located within node)

RMP
MP
root context
planetlab OS
4
3
2
1
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
update local database, allocate slice instances
(slivers) to GPE nodes
CP
LC
user login info
Resource DB
retrieve/update slice descriptions
PLC
sliver tbl
23
Creating Local Slice Instance
create new slice
retrieve/update slice descriptions
PE
NPE
GPE
NMP
Host (located within node)

RMP
MP
root context
planetlab OS
4
3
2
1
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
user login info
Resource DB
PLC
sliver tbl
24
Allocating NPE (Creating Meta-Router)
Open local socket for exception and local
delivery traffic return to client vserver
Allocate NPE sliver code option, SRAM,
Interfaces/Ports, etc
NPE
FP - fast path
PE
GPE
NMP
Host (located within node)
FPk

RMP
MP
root context
planetlab OS
Forward request to System resource manager
Returns status and assigned global Port number
4
3
2
1
VLANk
x
x
x
x
10GbE (fabric, data)
5
6
1GbE (base, control)
x
x
Substrate
CP
LC
MI1
user login info
Resource DB
Allocate shared NPE resources, associate with new
slice fast path SRAM block filter table
entries of queues of packet buffers code
option amount of SRAM required total reserved
bandwidth
Allocate global UDP port for requested
interface(s) configure Line card.
PLC
sliver tbl
Allocate and Enable VLAN to isolate internal
slice traffic, VLANk
25
Managing the Data Path

Allocate or Delete NPE Slice instance
Add, remove or alter filters
each slice is allocated a portion of the NPEs
TCAM
Read or write to per slice memory blocks in SRAM
each slice is allocated a block of SRAM
Read counters
one time or periodic
Set Queue rate or threshold.
Get queue lengths

NPE
GPE
NMP
DPl
DPl
FPk
RMP
SCD
root context
planetlab OS
2
1
x
x
10GbE (fabric, data)
6
1GbE (base, control)
x
CP
user login info
Resource DB
sliver tbl
FP - fast path
26
Misc Functions
27
Other LC Functions

Line Card Table maintenance
multi-homed SPP node must be able to send packets
to the correct next hop router/endsystem
random traffic from/to the GPE must be handled
correctly
tunnels represent point-to-point connections so
it may be alright to explicitly indicate which of
possibly several interfaces and next (Ethernet)
hop devices the tunnel should be bound
alternatively if were are running the routing
protocols we could provide the user with the
output port as a utility program.
But there are problems with running routing
protocols we could forward all route updates to
the CP. But standard implementations assume the
interfaces are physically connected to the
endsystem.
We could play tricks as vini does.
or we assume that there is only one interface
connected to one Ethernet device.
NAT Functions
traffic originating from within SPP
may also want to selective map global proto/port
number to specific GPEs?
ARP and FIB on Line card
route daemon runs on CP and keeps FIB up to date
ARP runs on xscale and maps FIB next hop entries
to their corresponding Ethernet destination
addresses.
netflow
flow-based statistics collection
SRM collects periodically and posts via web

28
Other Functions

vnet
isolation based on VLAN IDs
support port reservations
ssh forwarding
maintain user login information on CP
modify ssh daemon (or have wrapper) to forward
user logins to correct GPE
rebooting Node (spp), even when line card fails??

Write a Comment

User Comments (0)

About PowerShow.com

Supercharged PlanetLab Platform, Control Overview - PowerPoint PPT Presentation

Supercharged PlanetLab Platform, Control Overview

Second NP hosts multiple slice fast-paths. multiple static code ... Node Manager Proxys (per GPE) list of allocated slivers along with other node specific data ... – PowerPoint PPT presentation