Title: Congestion Management and Service Differentiation
1Congestion Management and Service Differentiation
- Aditya Akella
- Carnegie Mellon University
- Joint work with Mukesh Agrawal, David Friedman,
David Nagle and Srinivasan Seshan
2Motivation
- End-systems implement many functions
- Reliability
- In-order delivery
- Demultiplexing
- Message boundaries
- Connection abstraction
- Congestion control
Of these, congestion control MUST be done for all
communications
3Congestion Control 101
- Congestion control - mechanism to track available
bandwidth - Two phases
- Seek available bandwidth
- Back-off upon over-estimation
- TCP uses AIMD
- AI ( ) Additive increase determines
aggressiveness in seeking - MD ( ) Multiplicative Decrease determines
extent of back-off
4Why is Congestion Control Important?
- Key to using the network efficiently
- Aim of tracking Minimize losses along the way
- Lost packets Waste resources upstream
5Outline
- The Congestion Manager (CM)
- CM Architecture
- Evaluation of CM
- Interactions with QoS and Multi-Path Routing
- False-Sharing
- Application-Customizable Traffic Management
6Problems with Current Solution
Yesterdays Applications
HTTP
Telephony
Streaming
Interactive Games
- When everything used TCP, it was sufficient
- Todays traffic has moved beyond this
- Not everything wants TCP
- Also, multiple TCP streams are less social
7The Big Picture
HTTP
Audio
Video 1
Video 2
Per-macroflow statistics (cwnd, rtt, etc.)
TCP
UDP
Congestion Manager
IP
- All congestion management tasks performed in CM
- Applications learn and adapt using API
- Transmissions are orchestrated by CM
- All flows to the same destination Macroflow
- Granularity of sharing
8The CM Architecture
Feedback
Sender App
Receiver App
Data
API
Callbacks
Kernel API
Congestion Manager
Congestion Controller
Scheduler
Flow integration
Per-flow Scheduling
- Congestion Controller responsible for deciding
when to send a packet - Scheduler responsible for deciding who should
send a packet - Unmodified receiver network stack
9A Simple API
- Apps use CM API to
- Inform CM about network state
- Request and schedule transmissions
- API overview
- open/close new connections
- request permission to send
- notify of transmission
- update with successes and losses
- Plus callbacks
- send a packet
- rate has changed
10Transmission API
- Traditional kernel buffered-send has problems
- Does not allow app to pull back data
- E.g., transcoding
cm_send( ,dst)
Lesson move buffering into the application
11Transmission API
- Request/callback-based send
Schedule requests, not packets
Enables apps to adapt at the last instant
12Transmission API (cont.)
- Request API asynchronous sources
- wait for (some_events)
- get_data( )
- send( )
-
- Synchronous sources
- do_every_t_ms
- get_data( )
- send( )
-
- Solution cmapp_update(rate, srtt) callback
13Outline
- The Congestion Manager (CM)
- CM Architecture
- Evaluation of CM
- Interactions with QoS and Multi-Path Routing
- False-Sharing
- Application-Customizable Traffic Management
14Benefits of Sharing
CM-Enabled
Utah
MIT
Stock Linux
60ms RTT
High Bandwidth
- Web-like
- Issue a request every 500ms regardless of
completion of earlier request - Measure completion times
15Integrating Congestion Control Helps
- Throughput can benefit from sharing
- Web-like workload
- Internet path MIT ? Utah
- 500ms request spacing
- 128k file
- Applications benefit locally from the CM
16Outline
- The Congestion Manager (CM)
- CM Architecture
- Evaluation of CM
- Interaction with QoS and Multi-Path Routing
- False-Sharing
- Application-Customizable Traffic Management
17What is False Sharing?
- False sharing occurs when
- Flows 1 and 2 treated differently by the network
e.g. DiffServ - Flows 1 and 2 take different paths e.g.,
dispersity routing, NATs - Evaluate the impact, detection and response
Flow 1
Src
Dst
Flow 2
- Is congestion control compromised does the
performance of individual flows suffer? - When and how can false sharing be detected?
- How should end systems be modified to deal with
false sharing?
18Impact of False Sharing
- Simulation set-up
- 20 of flows belong to Assured Forwarding class
and 80 are Best Effort - Bandwidth of CM flow is determined by slowest of
flows - No danger of overloading links
- Performance may suffer
19Detecting False Sharing Motivation
Unshared Loss Correlation Plot
- Uncorrelated delays and losses across flows is a
strong indicator of false sharing - Tests compare auto-correlation with
cross-correlation
20End System Response Motivation
Flows Share a Bottleneck
Flows Share no Bottlenecks
Cross Correlation
Cross Correlation
Auto Correlation
Auto Correlation
Correlation measure
Correlation measure
Time in seconds
Time in seconds
- When the 90 confidence intervals for the auto
and cross correlation metrics no longer overlap,
test outputs a decision of share or no share - It is harder to detect shared bottlenecks (90
secs) than to detect no shared bottlenecks (35
secs)
21End System Response Design Issues
- Default behavior start with congestion sharing
and detect problems - Wont overload the network
- Delay and loss correlation tests detect false
sharing more easily than they detect shared
bottlenecks - Scheduling detection tests work best when
packets are nicely interleaved - Possible only when flows belong to the same
macroflow - Upon detection segregate flows
- Congestion Manager associate flows with
different macroflows - Addition to API to support control of sharing
22Outline
- The Congestion Manager (CM)
- CM Architecture
- Evaluation of CM
- Interactions with QoS and Multi-Path Routing
- False-Sharing
- Application-Customizable Traffic Management
23Motivation
- Storage is increasingly moving into Internet
- Interaction with Internet protocols (Congestion
Control) - SCSI over IP (iSCSI)
- Utilization concerns
- Increase sender aggressiveness
- Bandwidth allocation for storage apps
- Prioritize by task (backup vs data mining)
- Prioritize by user
24Best-Effort vs Service Differentiation
- TCPs congestion control IPs best-effort
service model - Provides fair (equal) sharing of bandwidth
- What about bandwidth allocation that is not
necessarily equal? - Existing methods
- Integrated Services Differentiated Services
- Depend on network support
- Problems
- Scalability (IntServ)
- No guaranteed service with high aggregation
(DiffServ)
25Simple Solution
- Ask end-systems to do bandwidth allocation
- Scalable
- Fine-grained control on perceived performance
- But,
- How can the sending rate be controlled?
- What about security?
26Answers
- Sending rate
- Congestion control to limit sending rate
- Security
- Use iNICs to detect and prevent misbehavior
- In a controlled environment
- Enforce social behavior
27Congestion Control Revisited
28Controlling Sending Rate
- Can leverage off of congestion control
- Connection aggressiveness determines bandwidth
share - Aggressiveness can be tuned via congestion
control parameters
29Basic Operation
- End-users with higher QoS requirement get higher
weight - Higher weight
- Assuming some mechanism to assign weights
- But,
- End-users mostly act as sinks
- Need the source system to employ appropriate
per-flow parameters
30Solution Token Exchange
- End-systems buy tokens
- Tokens Money
- Send tokens over to the source Buy Service
- tokens sent quality of service provided
- How are tokens generated? Distributed?
31Mechanism Issues
- What if everyone is rich?
- High loss
- Nobody gains
- Overall performance suffers
- Options
- Limit token availability
- Control mapping of tokens to aggressiveness
32Challenges
- Coping with many aggressive users
- API Design
- Usable by apps
- Achievable by network
- Leveraged off of the CM
- Handle network dynamics
- Arrival patterns, flow lifetimes
- Adaptation speed
33Posters
- Application Customizable Traffic Management
- Mukesh Agrawal, David Friedman, David Nagle and
Srinivasan Seshan - The Impact of False Sharing on Shared Congestion
Management - Aditya Akella, Hari Balakrishnan and Srinivasan
Seshan - URL
- http//nms.lcs.mit.edu/projects/CM
34CM Macroflow
- For efficient aggregation
- Need to identify which flows should share info
- Typically flows to the same dest
- Default granularity of aggregation
- Macroflow
- A group of flows sharing the same congestion
state and state info - All flows to the same destination address
35Implementation Issues
- CM Client
- Sending app responsible for a flows
transmissions - TCP sender using CM for congestion control
- Implemented as an in-kernel client
- Same performance as standard TCP
- Only 0-3 overhead
- User-space clients
- Interact with using a library - libcm
- Kernel-user interface optimized to ensure minimum
overhead