Title: Scalable, Robust Widearea Control Architecture for Integrated Communications
1Scalable, Robust Wide-area Control Architecture
for Integrated Communications
- Helen J. Wang
- Qualifying Examination
- March 8, 2000
2Motivation
- Lack support for
- Integrated use of heterogeneous devices (old
new) - Rapid arbitrary communication service
customization
3Limitations of Existing Systems
- Telecommunications network
- engineered with one app and device in mind
- Existing Internet Telephony systems
- ease of service creation, but limited
- scalability, availability and fault tolerance not
fully addressed
4How good is a communication system?(Dissertation
Goals)
- Functionality communication services it can
support, and the ease of creating them - Viability scalability, robustness
- Focus on the control aspect
- control architecture system components
signaling protocol (session setup, tear-down, and
control)
5Problem Statement
- Given heterogeneity, how to design a scalable,
robust wide-area control architecture that
supports easy creation of a wide range of
communication services? And how should these
services be created?
6Outline
- Related Work and Research Contribution
- Control Architecture
- Signaling Protocol
- Service Creation Model
- Summary, Methodology, Research Agenda
7Related Work
8Overview of Research Contributions
- A scalable control architecture
- A robust signaling protocol
- A user-level, easy service creation model
- Publications
- A Signaling System Using Light Weight Sessions
accepted to Infocom 2000. - Helen J. Wang, et al. ICEBERG, An Internet-Core
Network Architecture for Integrated
Communications, accepted to IEEE Personal
Communications April/2000.
9Outline
- Related Work and Research Contribution
- Control Architecture
- Signaling Protocol
- Service Creation Model
- Summary, Methodology, Research Agenda
10Control Architecture Goals
- Any-to-any communication
- inter-working, composition of data transformation
- Personal mobility
- unique ID, name mapping
- Personalized communication services
- preference storage and management
- Enable user-activity driven services
- activity tracking
11Control ArchitectureComponents and Their
Operations
Alice_at_domain1
Bob_at_domain2
Pick up
Data Path
12Leverage Cluster Computing Platforms
- iPOP must be scalable and robust leverage
cluster computing platforms such as Ninja, AS1 - Our requirements
- highly available service invocation Ninja Base
- fault tolerant service session AS1
- session state maintained on client (IAP)
- iPOP on Ninja Base augmented with client
heartbeat support from AS1
13Control ArchitectureFacts
Access net
Call Agent
Call Agent
IAP
PR
PAC
PR
PAC
Local area communication
Wide-area communication
- One Call Agent per caller per device
- One type of IAP per access network
14Outline
- Related Work and Research Contribution
- Control Architecture
- Signaling Protocol
- Service Creation Model
- Summary, Research Methodology, Agenda
15Signaling Protocol
- Basic call service building blocks for
supplementary services - Conventional two party, homogeneous devices
- ICEBERG communication model
- multi-device communication
- invitation-based participation
- large number of dynamic small group communication
- Richer primitives add/remove an endpt during a
session - conference call, service handoff first class
service trivial to implement services that
require endpoint changes.
16Challenges in SignalingProblems with SIP
CA1
CA2
CA5
Alice
Bob
Carol
Dale
Carol
Dale
- no consideration of session dynamics membership,
component failure - bridged conference centralized component to
maintain states -- single point of failure
17Problems with H.323
- Centralized approach for conferencing
- Limited fault tolerance measure
- process-pair style
- cannot capture new state during fault recovery
- Complex
18Lessons Learned
- Correctness and robustness
- need to maintain up-to-date membership and
session state (call parties, device status, data
path info) in the face of transient component
failures, network partitions, and any exceptional
conditions. - distributed approach rather than centralized
19Our Approach
- Maintain membership and session state as soft
state in a distributed fashion. - Soft state expired unless refreshed, protocol
action upon new state or timeout, error recovery
same as normal operation - Question call setup latency requirement?
bandwidth scalability problems?
20Signaling Protocol Session Membership
- Session membership
- membership CAs
- IP multicasts group service an overkill for
small group communication - per group state in routers, IP addr scarcity,
deployment issues access control, accountability - Solution run an application-level group
membership protocol among participating IAPs
21Signaling Protocol Capture the Complete Session
State
Call Agent
Call Agent
Session state
Session state
Comm Session
iPOP
iPOP
IAP
IAP
Call Agent
Session state
IAP
iPOP
iPOP HB
22Signaling Protocol Fault Tolerance
Call Agent
Call Agent
Session state
Session state
Comm Session
APC
APC
iPOP
iPOP
IAP
IAP
IAP
APC
iPOP
iPOP HB
23Signaling Protocol Fault Tolerance
Call Agent
Call Agent
Session state
Session state
Comm Session
APC
APC
iPOP
iPOP
IAP
IAP
IAP
24Signaling Protocol Fault Tolerance
Call Agent
Call Agent
Session state
Session state
Comm Session
APC
APC
iPOP
iPOP
IAP
IAP
IAP
APC
25Invitation Protocol
- Invite a Call Agent to participate a session
- Also a soft state protocol for robustness
- IAP maintains the call state machine, sends
stateful, keep-alive heartbeat to the iPOP - Call Agents advance call state machines on IAPs
through periodic install-state message until
receiving new heartbeat with the new state - Soft state inter-iPOP communication
26Bandwidth Scalability
- Soft state period selection call setup latency,
fault recovery time vs Bandwidth overhead - An optimization problem minimize bandwidth
overhead, subject to the following contraints - expected call setup latency (1.5 second)
- standard deviation (0.5 second)
- fault recovery time (1, 4 seconds for local and
wide area) - parameters 2 wide-area loss rate, 0.2
local-area loss rate, 2ms local-area propagation
delay, 100 ms wide-area delay - local 1 sec, 800bps wide 3 sec, 233 bps for
64kbps data stream, local area control traffic 1
27Processing Scalability
- Compare our single cluster system against a class
4 switch which is a local (end) office 250
calls/second - Our current prototype yields 10 calls/second on a
PC due to inefficient RMI implementation (10s
ms), 25 PCs a class 4 switch
28Outline
- Related Work and Research Contribution
- Control Architecture
- Signaling Protocol
- Service Creation Model
- Research Agenda
29Service Creation Model
- Focus control, redirection services
- Goal end users can easily customize the control
services in any arbitrary way - Issues
- service creation/customization
- service invocation
- service portability
- system support
30Intelligent Network
- Separate service logic from basic call processing
Switch
Service Logic
Trigger
- Service portability standardize basic call state
machine ? too strict a standard ? failed - Limitation no user-level customization
31Proposed Approach
- Call processing implementation independent
customization use high-level events, e.g., call
request received, callee device busy, callee
device not answer - Service creation condition-action pairs
- condition conjunction of high level events, user
interested conditions, and boolean expressions - Action composition of system primitives
- Hypothesis condition-action pair sufficient
32Proposed ApproachService Invocation Portability
Preference Registry
Call Agent
PAC
Condition
Action
Condition
Action
- Service Portability standardize the events and
system primitives, much easier than call state
machine
33An Example Completion of calls to busy subscriber
- callee busy caller hang up
- ? register with callee PAC
- callee PAC reject
- ? exit
- callee PAC notify
- ? invite caller invite callee
- caller busy ? wait 5 minutes
- re-register with the callee PAC
- hangup time gt 1 hours
- ? de-register with callee PAC exit
34An Example, Cont.
- System support issues
- extended Call Agent life time
- queue management on the PAC
- track event sequence stack of timed events,
stack depth depending on user preferences
35How good is a communication system?
- Functionality services
- component identification
- powerful signaling protocol primitives
- easy, user-centric service creation model
- Viability scalability, robustness
- first application of soft state to signaling
protocol, bandwidth overhead not an issue, can
fulfill latency requirements - processing scalability, local area robustness by
leveraging cluster computing platforms
36Outline
- Related Work and Research Contribution
- Control Architecture
- Signaling Protocol
- Service Platform
- Methodology and Research Agenda
37Methodology1st Iteration (Completed)
- Control architecture
- Session maintenance protocol
- Control architecture
- Signaling protocol
- session maintenance protocol
Design
Prototype
Analysis Evaluation
- Measured the current prototype
- Simple soft state period analysis
38Methodology2nd Iteration Overview
- Service creation model
- Possibly revise the design of the control
architecture and the signaling protocol - Completed work
- invitation protocol
- membership protocol
- Wide-area testbed
- Group membership protocol
- Invitation protocol
- Service creation model
Design
Prototype
Analysis Evaluation
- Evaluation scalability, robustness, service
creation, hard/soft state comparison - Analysis group membership protocol, service
creation
39Research Agenda
- Phase 1 complete and fine-tune service creation
model design (1 month) - define events and system primitives
- preference conflict resolution
- identify service creation interaction with the
control architecture and signaling - Planned paper submission on service creation
model design to SmartNet 3/31
40Research Agenda
- Phase 2 2nd iteration Prototyping (3 - 6
months) - invitation protocol, membership protocol
- employ Ninja vSpace
- release ICEBERG to Ericsson, TU Berlin, NTT and
construct a wide-area test-bed - service creation model
Planned paper submission to ICNP (May) or INFOCOM
(July) on protocols and analysis
41Research Agenda, Cont.
- Phase 3 Evaluation (6 months)
- processing scalability measure call processing
time, of simultaneous sessions, compare against
class 4 switch - bandwidth scalability group membership protocol
analysis dynamic soft state period selection - robustness emulate failure conditions (losses,
long delays, component failures), run system over
time - hard/soft state comparison bandwidth usage,
latency, fault recovery time
42Research Agenda, Cont.
- Service creation evaluation
- comparable functionality implement
representative IN services such as call
completion upon busy - new services such as policy-based call waiting
- system extensibility of lines of code and
amount of time to develop new primitives for new
services - Planned paper submission on wide-area testbed
experience and evaluation to SIGMETRICS 3/2001
43Research Agenda, Cont.
- Phase 4 Write thesis (6 month)
- compile the publications
44Acronyms Lookup
- APC Automatic Path Creation
- CA Call Agent
- IAP ICEBERG Access Point
- iPOP ICEBERG Point of Presence
- NMS Name Mapping Service
- PAC Personal Activity Coordinator
- PR Preference Registry
45Soft and Hard State
- Soft State
- expire unless refreshed, protocol action upon new
state and timeout - loss of state will not stop the system -- robust
- eventual consistency
- error recovery built into normal operation
--simple, but longer latency, and no diagnosis
- Hard State
- explicit state setup once only (bandwidth and
processing efficiency) - explicit error detection and recovery
synchronously at involved components -- complex
but immediate - better consistency guarantees
46Signaling Protocol Group Membership Protocol
- Periodic membership exchange among members
- no bootstrapping needed every member knows at
least one other member (invitation-based) - receive superset or disjoint set immediate
synchronization with the rest of the session - run among the IAPs for Call Agent fault recovery
- time stamped ltIAP, CAgt list
- Convergence efficiency rather than bandwidth
efficiency
47Period Selection
- Soft State Period dominates fault recovery time,
affects bandwidth overhead - cannot trade latency for bandwidth scalability
- Problem what period values to select to fulfill
the call setup latency, fault recovery latency
requirements and minimize the bandwidth overhead?
-- an optimization problem
48Select PeriodProblem Formulation
- Call setup latency receiving 8 local-area and 4
wide-area msgs in sequence msg processing time - Receive a local-area msg f (local-area period,
local-area loss-rate, local-area propagation
delay) - The optimization problem
- find local-area and wide-area period that
minimize bandwidth overhead, subject to the
following constraints - E(call setup latency) lt1.5 second
- Standard deviation (call setup latency) lt 0.5
second - local-area fault recovery time lt1 s wide lt 4 s
- with parameters 2 wide-area loss rate, 0.2
local-area loss rate, 2ms local-area propagation
delay, 100 ms wide-area delay
49Results Period f (processing)
- fault recovery time constraints dominate the
effects on period - local-area period 1s
- 800 bps overhead
- wide-area period 3s
- 233 bps overhead
- for 64kbps data stream, 1 of members
50Proposed Approach Service Creation
Call Agent
GUI
User
- Condition conjunction of high level events, user
interested conditions, and boolean expressions - Action sequence of system primitives
- Advantage call processing impl. independent
- Hypothesis condition-action pair sufficient
51An Example