Scaling SIP Servers - PowerPoint PPT Presentation

About This Presentation
Title:

Scaling SIP Servers

Description:

Tandem queue system. Easy to fix. Non-blocking calls (event driven, later! ... How to set queue size investigate? Queue evolution for sipd ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 22
Provided by: henni50
Category:

less

Transcript and Presenter's Notes

Title: Scaling SIP Servers


1
Scaling SIP Servers
  • Sankaran Narayanan
  • Joint work with CINEMA team
  • IRT Group Meeting April 17, 2002

2
Agenda
  • Introduction
  • Issues in scaling
  • Facets of sipd architecture
  • Some results
  • Conclusion and Future Work

3
Introduction SIP servers
  • SIP Signaling Proxy, redirect
  • Proxies
  • Call routing by contact location
  • UDP/TCP/TLS
  • Stateful or stateless
  • Programmable scripts
  • User location Registrars

4
What is scale ?
  • Large call volumes, commodity hardware
    Schu0012Industrial
  • Response times (mean, deviation), Turn around
    time
  • Goals
  • Delay budget SIPstone
  • R2 lt 2 s
  • R1 lt 500 ms
  • Class-5 switches handle
  • gt 750K BHCA

REGISTER
R1
200 OK
INVITE
INVITE
R2
180
180
200
200
ACK
ACK
5
Limits to scaling
  • Not CPU bound
  • Network I/O blocking
  • Wait for responses
  • Latency Contact, DNS lookups
  • OS resource limits
  • Open files (lt 1024 on Unix)
  • LWPs (Solaris) vs. user-kernel threads (Linux,
    Windows)
  • Try not to
  • Customize and recompile OS
  • (parts) server into kernel (khttpd, AFPA, )

6
The problem
  • Scaling CPU-bound jobs (throughput1/delay)
  • Hardware CPU speed, RAM,
  • Software better OS, scheduler,
  • Algorithm optimize protocol processing
  • Blocking (Network, Disk I/O) is expensive
  • Hypothesis
  • I/O-bound CPU-bound reduce blocking
  • Optimized resource usage stability at high loads

7
Facets of sipd architecture
  • Blocking
  • Process models
  • Socket management
  • Protocol processing

8
Blocking
  • Mutex, event (socket, timeout), fread
  • Queue builds up
  • Potentially high variability
  • Tandem queue system
  • Easy to fix
  • Non-blocking calls (event driven, later!)
  • Move queue to different thread (lazy logger)

Logger lock write unlock
9
Blocking (2)
  • Call routing involves (? 1) contact lookups
  • 10 ms per query (approx)
  • Cache
  • Works well for sipd style servers
  • Fetch-on-demand with replacement (harder)
  • Loading entire database is easy
  • need for refresh long lived servers.
  • Potentially useful for DNS SRV lookups (?)

SQL database
Periodic Refresh
Cache
lt 1 ms
10
REGISTER performance
Single CPU Sun Ultra10
Response time is constant for Cache (FastSQL)
11
Process models (1)
  • One thread per request
  • Doesnt scale
  • Too many threads over a short timescale
  • Stateless proxy 2-4 threads per transaction
  • High load affects throughput

12
Process models (2)
Incoming Requests R1-4
  • Thread pool Queue
  • Thread overhead less more useful processing
  • Overload management
  • drop requests over responses, drop tail
  • Not enough if holding time is high
  • Each request holds (blocks) a thread

13
Stateless proxy (Solaris)
  • Turnaround time is almost constant for stateless
    proxy
  • The sudden increase in response time - client
    problem
  • UDP losses on Ultra10 _at_ (120 6 500 8) bps

14
Stateless proxy (Linux)
Request turnaround time breaks down Response
turnaround time is constant Effect of high
holding times and thread scheduling How to set
queue size investigate?
15
Queue evolution for sipd
Number of requests (y-axis) waiting in the queue
for a free thread on Solaris (left) and Linux
(right) over a period of up-time (x-axis).
16
Process models (3)
  • Blocking thread model needs too many threads
  • Stateful transaction stays for 30 s
  • Return thread to free pool instead of blocking
  • Event-driven architectures
  • State transition triggered by a global event
    scheduler
  • OnIncoming1xx(), OnInviteTimeout(),
  • SIP-CGI pre-forked multiple processes

17
Socket management
  • Problem open sockets limit (1024), liveness
    detection, retransmission
  • One socket per transaction does not scale
  • Global socket if downstream server is alive, soft
    state works for UDP
  • Hard for TCP/TLS connections
  • Worse for Java servers no select, poll

18
Optimizing protocol processing
  • Not too useful if CPU is not the bottleneck
  • Text protocol - parsing, formatting overheads
  • Order of headers matter (Via)
  • Other optimizations (parse-on-demand, date
    formatting)
  • . . .

19
Conclusion
  • Unlike web servers can be stateful, less disk
    I/O, lesser impact of TCP stack/behavior,
  • Pros UDP, Stateless routing, Load-balancing
    using DNS,
  • Challenges scaling state machine,
  • Towards 2.5M BHCA (3600 messages/s)
  • Event driven architecture (SEDA?)
  • Resource management (file limits, threads)
  • Tuning operating system (scheduler, )

20
Future work
  • Stateful proxy performance
  • Evaluate event driven architecture
  • Effect of request forking (gt 1 contacts) on
    server behavior
  • Programmable scripts
  • Queue management and overload control
  • Other types of servers (conference servers, media
    servers, etc.),

21
References
  • CINEMA web page. http//www.cs.columbia.edu/IRT/ci
    nema
  • H. Schulzrinne. Industrial strength internet
    telephony, Presentation at 6th SIP bakeoff, Dec.
    2000.
  • H. Schulzrinne et. al. SIPstone Benchmarking
    SIP server performance, CS Technical report,
    Columbia University.
Write a Comment
User Comments (0)
About PowerShow.com