(No Transcript)
VoIP TestingA How-to Session for Performance and
Functional Test Methodologies
Chris Bajorek, Director, CT Labs
Before We Start
  • Every year we consult with many companies,
    helping them to perform many different types of
    VoIP-oriented tests
  • This provides a unique industry perspective on
    the market readiness of a wide range of VoIP
  • Im pleased to have this opportunity to share our
    test experiences with you today

VoIP Products by Market Area
  • Residential (Voice over Broadband)
  • Analog terminal adapters, VoIP softphones,residen
    tial routers
  • Enterprise
  • IP PBXs, IP Contact Centers, VoIP phones
    softphones, firewalls/ALGs, intrusion
    prevention devices, media servers (conferencing,
    voice mail, IVR)
  • Next-Gen Network Carriers and Service Providers
  • Session border controllers, softswitches, media
    servers, proxys, media gateways, VQ enhancement

Building VoIP NetworksIMS is here and it needs
  • Key elements of IMS
  • Enables innovative new applications
  • High levels of network complexity
  • Modules from multiple vendorsmust peacefully
  • High rate of carrier adoption
  • Global deployments
  • Standards based
  • Exploits strengths of IPSIP

IMS Basics Functions by Layer
  • Services / Application Layer
  • Application servers, Media servers
  • Control / Switching Layer
  • HSS (Home Subscriber Server)
  • CSCF (Call Session Control Function)
  • BGCF (Breakout Gateway Control Function)
  • MGCF (Media Gateway Control Function)
  • MRFC (Media Resource Function Control)
  • etc.
  • Transport / Access Layer

Risks of Inadequate Testing
  • From the CT Labs VoIP project files
  • VoIP terminal adapters that act unreliable and
    emulate an occasional bad Internet connection
  • IP PBXs that drop calls when subjected to only
    certain types of call loads
  • VoIP soft clients that distort the caller audio
  • High-end enterprise firewalls that grind to a
    standstill under certain denial-of-service
  • Session border controllers that degrade voice
    quality at traffic levels below rated maximums

Test Automation-Reaping the Benefits of Shorter
Test Cycles
Test Automation-Why you should consider using it
sooner, not later
Test AutomationThe Benefits
  • Tightly controlled test environment
  • All aspects of the test setup can be controlled
    and coordinated by testing scripts
  • Repeatable results
  • Key to resolving issues that arise during testing
  • Includes ability to exactly reproduce product
    settings and test conditions
  • Faster test execution
  • Weeks of manual testing can literally be executed
    in Hours
  • Increased accuracy of results reporting
  • All of the above resulting in
  • Lower testing costs over products lifetime
  • Greater product and delivered-service reliability
  • Fewer field failures, fewer customer-reported

Challenges Using Live Callers in Tests
  • The exact timing and sequence of caller actions
    is not synchronized or repeatable
  • Ability to distinguish and describe nuances of
    results varies widely from person to person
  • i.e. reliability of reported results can be low
  • Ability to correlate assessment of voice quality
    and anomalies across multiple listeners is
    typically poor
  • Unless you just happen to know how to run ITU-T
    P.800 MOS tests
  • Call arrival profiles difficult to control when
    using large numbers of callers for load tests
  • In other words, dont expect more than coarse

Conference test, via automation
Conf 1
Conf 3
Conf 2
Automation-based VoIP Testing Goals
  • Verify call-handling performance
  • Verify voice quality
  • With a wide variety of caller and noise
  • Verify performance under real-world traffic and
    network impairment conditions
  • Verify performance under malicious attack
  • Verify service reliability
  • i.e. Availability of service under extended test
    run durations
  • Verify interoperability and feature interaction
  • Verify quality of access to enhanced services
  • Applications such as voice mail, conferencing,
    IVR, etc.

Real-world automation testing The 3-phase
  • Phase 1 Test with minimal stress in a sterile
  • i.e. no WAN impairments or network traffic, light
    call loads
  • This establishes an important performance
  • Phase 2 Test with realistic network traffic and
    call load conditions
  • Phase 3 Test to rated device call loads

Rules of Thumb that simplydo not work
  • I tested it with 50 calls and the CPU only went
    to 25, so we know the device can scale to 200
  • Not quite. Our experience shows that in fact
    most VoIP devices exhibit performance
    thresholding effects that are not linear and very
    hard to predict. In other words, after a certain
    load or capacity limit is reached the device can
    fail catastrophically.
  • If you dont test to full rated capacity, you are
    playing Russian Roulette with your customers.

Rules of Thumb that simply do not work
  • We dont need to test voice quality because we
    are OEMing the software that does that part.
  • Dangerous assumption. OEM software typically has
    many interface points and configuration options
    and is hardly in and of itself a guarantee of
    performance. The glue code around these objects
    can still cause voice quality issues.

Emulation of Network Impairments
  • Perfectly clean networks are not the real world
  • Real networks corrupt the flow of packets in the
    following time-varying ways
  • Packet loss (especially burst loss), packet
    duplication, and out-of-order packets
  • Latency and jitter
  • Restricted bandwidth
  • If you test while inducing these conditions, your
    product or service will be the cause of far fewer
    post-deployment issues
  • You can perform both static and dynamic emulation
    of impairment conditions
  • Both have value depending on nature of the VoIP
  • e.g. IP phone that renegotiates codec type or
    codec mode when network degrades in mid-call

Emulation of Network Impairments
Adding Internet Mix Network Traffic
  • The goal see the DUTs impact on VoIP calls when
    subjected to network traffic at rated capacity
  • Product examples
  • Firewalls, intrusion prevention devices, IP
    phones with integrated switch ports, session
    border controllers, etc
  • What we do Generate real session-based Internet
    Mix traffic and measure throughput performance
    of VoIP calls and IMIX traffic
  • e.g. http, ftp, P2P, SMTP, POP3, etc
  • Open source tool D-ITG http//www.grid.unina.i
  • Notable vendor Shenick (www.shenick.com)

Voice Quality Assessment- Automated Testing
Voice and Video Quality Assessment-
Automated Testing Techniques
Voice Quality Test Techniques
  • Automated VQ measurement techniques are designed
    to estimate the way humans perceive voice quality
  • MOS live listener tests done per ITU-T P.800
  • Active versus Passive VQ monitoring
  • Passive E-Model via packet inspection
  • Active end-to-end VQ measurement to the audio
  • Both techniques have their benefits

Active vs Passive VQ Testing
  • Active voice quality testing
  • Involves evaluation of received audio signals
    as compared to known references
  • i.e. you drive real 2-way calls through the VoIP
  • PESQ P.862 (2001)
  • High correlation with standard MOS-LQ subjective
  • Benefits More accurate, uses mature standards
    (PESQ) for automated quality assessment
  • Negatives Consumes VoIP network resources

Active vs Passive VQ Testing
  • Passive voice quality testing
  • Involves passive evaluation of call-based packet
  • ITU-T G.107 E-Model
  • Can return estimated MOS-LQ and MOS-CQ scores
    (Listening versus Conversational)
  • Benefits Can be embedded into products and test
    equipment with relatively low resource footprint
  • Negatives Ignores (or models) VoIP
    endpoint-specific behaviors to network
    conditions. Vendor implementations can vary.

How PESQ works
  • Computes a voice quality score by comparing
    degraded received audio with a reference speech
  • Reference prompts are actual speech clips played
    during an active test call
  • Quality scores relate only to the time during the
    test call when the reference prompts were played
    with far-end audio being captured
  • The calculation is not just comparing the
    reference and degraded waveforms, it is using a
    human perceptual model to ultimately compute a
    quality score (1bad to 4.5excellent)

What PESQ VQ Testing is designed for
  • PESQ is a way to quickly and cost-effectively
    estimate the effects of one-way speech distortion
    and noise on speech quality
  • PESQ is endpoint-agnostic can be used for
    VoIP-to-VoIP, VoIP-to-PSTN calls, etc.
  • Strengths
  • Provides excellent estimate of voice quality
  • Tests can be performed quickly
  • Tests are very repeatable

Passive versus Active VQ A Real Example
  • From actual CT Labs project
  • In this example, the phone had quality issues
    that the passive test did not see
  • Being aware of the difference in scoring
    techniques is critical when debugging reported VQ

Video Quality Test Techniques
  • Automated Video quality measurement techniques
    estimate the way humans perceive picture quality
  • Live viewer tests done per ITU-T BT.500
  • Three classes of objective video quality
  • Full reference, partial reference, and zero
  • Full reference techniques
  • PSNR (most used), VIM, SSIM. See ITU-T J.144.
  • Compute intensive, not useful for real time
  • Software suite available at http//www.compressio
  • Zero reference techniques
  • Best suited for in-service monitoring
  • Standards activity continues
  • Encompasses quality tests for picture, audio,
    multimedia, and networks ability to carry

Load and Stress Testing
Load and Stress Testing
Load and Stress Testing
  • What it is
  • Verifying the DUTs performance at rated call and
    traffic loads
  • Verify those theoretical specs on the data
  • How many simultaneous sessions? Its all
  • A full load stress test on a 2-line VoIP
    terminal adapter will require 2 simultaneous
  • A full load stress test on a carrier-grade
    session border controller may support upwards of
    150,000 simultaneous SIP calls with media (or
  • The key is this if you want to be assured of
    acceptable performance at your spec sheet limits,
    you cannot linearly scale a partial load tests

Load / Stress Testing Helpful Hints
  • Use call rates and call ramp profiles that
    emulate the actual call environment e.g. burst,
    ramp, etc.
  • Monitor and log DUT platform resources during
  • CPU, memory, disk I/O, network I/O can all
    provide clues as to why a test failed
  • Capture a periodic snapshot into logfile for
    post-test run analysis Windows Perfmon, Linux
    (various utilities)
  • Synchronize system clocks on DUT and test
    equipment devices before a test run
  • Allows failure events to be correlated from logs

Load / Stress Testing Pitfalls
  • Temptation is to do high volumes of simple
  • Problem with this it will not exercise internal
    resources in real world way
  • Example Conference bridge load test
  • The wrong way calls with simple 1-dimensional
    can you hear me? test
  • The right way multiple conferences of varying
    sizes with real talker-listener exchanges
  • Not running tests long enough
  • Not testing during DUT housekeeping periods
  • Leaving verbose DUT logging enabled can consume
    significant resources

Test Automation Setups
Functional Testing
Functional Testing
  • What it is
  • Verify that the DUT can execute all features and
    functions correctly (positive stimulus/response
  • Verify that DUT responds properly to negative
  • Very often ignored, to the detriment of product
    stability in field
  • How many simultaneous sessions to test?
  • Depends on device one or a few as required to
    verify all features
  • Quick examples of functional
  • Application servers Conferencing
  • Verify all host and listener TUI commands and DUT
  • VoIP endpoint devices Terminal Adapters (TAs)
  • Verify all call features against
    softswitch/feature server environments
  • Question Does verifying voice quality belong in
    a functional test?

Functional Testing A Few Hints
  • Test script synchronization with DUT is key
  • DTMF or MF handshaking
  • Typically involves tagging voice prompts with
    numeric sequences
  • Speech recognition
  • Delays
  • Automation-based functional tests allow
  • Much Faster test cycles
  • TA functional test plan comparison
  • 150 test cases verified against 4 different
    softswitch platforms
  • Good idea functional test suite can be turned
    into a performance test suite
  • If the tests are designed on a flexible call
    generator platform
  • Can mix call traffic from functional and load
    generator platforms

Test Automation Setups
Test Automation Setups
Session Border Controller/Firewall Automation
Setup Goals
  • Verify call-handling performance and advertised
    specifications at real-world high density VoIP
  • Verify Voice Quality under different Codec, frame
    packing, and other configuration settings
  • Verify call-handling performance when subjected
    to different call rate profiles e.g. Burst,
    ramp, etc.
  • Verify thru-SBC registration performance under
    burst registration conditions
  • Verify ability to survive and handle legitimate
    VoIP call loads while under various types of DoS
  • Verify long-term call handling reliability

SBC / Firewall AutomationTest Setup
voice quality opt.
voice quality opt.
SIP AttackGenerator
Protected Network
Unprotected Network
Terminal Adapter (TA)Real-World Network Model
Automated Feature Test Suite Goals
  • Automate as much of the Terminal Adapter
    interoperability feature regression test as
  • i.e. Verify call features of TA devices against
    core VoIP service architectures
  • Support input configuration files, event and
    error log files
  • Support multiple TA devices and PSTN access lines
    in setup

Automation Feature Test Solution
Automation Feature Test Framework Details
  • Supports 140 feature tests
  • Including 2-way calls, 3-way calls, features
    including hold/park/transfer, 911/411, voice
    mail, voice quality checking
  • Test run results captured in easily analyzed logs
  • Custom reports are generated
  • Individual test case scripts easily changed

Setting up for IMS Tests
IMS Product Tests IMS Function Tests
Application Server App Server, OSA-SCS, IM-SSF, SCIM, CCCF
Gateway Controller, Call Agents BGCF, MGCF, I-BCF
Border Elements, Gateways A-BGF, T-MGF, SGF, IWF, I-BGF
Subscriber Databases HSS, SLF
Media Servers MRFC, MRFP
Policy and Resource Function PDF, NASS, RACS
Session Controllers S-CSCF, I-CSCF, P-CSCF
  • Emulation of IMS devices in a QA lab setting will
    be critical unless you plan to purchase,
    support, and maintain a wide variety of third
    party IMS devices in your lab, a costly and
    time-consuming proposition.

Setting up for IMS Tests
VoIP Security Testing Issues
to consider
VoIP Vulnerabilities/Threats
  • The bad news VoIP systems are vulnerable
  • Platforms are vulnerable
  • VoIP-specific attacks are becoming more common
  • The good news The threat is still developing
  • VoIP handsets are still in minority out there
  • Vast majority of VoIP is company-internal
    Mark Collier, CTO SecureLogix
  • VoIP networks share the same vulnerabilities that
    plague data networks, PLUS some specific
    additional threats

VoIP Product Vulnerabilities
Voice Applications
Toll Fraud SPIT
VoIP Protocols
Services (Database, Web Server)
Protocol Attacks SIP Floods RTP Floods
Slammer worm SQL attacks
Network Stack(s) (IP, UDP, TCP, RTP,)
SYN Floods, etc. (many)
Telephony Devices
Network Devices
OS attacks (Windows worms, Viruses)
Physical infrastructure (power, wiring)
Physical Hacking
DoS Attack Testing
  • Generate SIP-specific attacks (send fuzzed and
    other types of SIP protocol packet floods) while
    also sending legitimate SIP calls
  • Measure call performance (dropped, blocked,
    delayed calls), voice quality with security
    measures in place
  • Test calls sent with media (real speech) to
    verify true voice quality via PESQ while under

SIP-Specific Attacks to Launch
  • i.e. in addition to lower-layer well known DoS
  • Blast packets from these scenarios at up to line
  • Malformed and Torture Test floods
  • Using SIP packets from open source Protos test
  • INVITE, REGISTER, and Response floods
  • Spoofed variations for above
  • i.e. Spoofing the IP address and port of
    legitimate devices, or spoofing the Via or AoR
    of legitimate users
  • RTP attacks
  • Rogue / Random RTP Fraud and Floods

SIP-Specific Attacks What to expect
  • Run each variation for 10-15 minutes
  • In the presence of varying levels of legitimate
    VoIP traffic
  • Monitoring DUT resources (CPU, memory), call
    completion rates, and voice quality of completed
  • Its typical to see threshold failure effects
  • i.e. above certain levels of legitimate SIP calls
    attack packets, service takes a major hit.
    Below that threshold normal calls may be handled
  • DUT often shows weakness within seconds of test
  • DUT may exhibit hard or soft crashes
  • Voice quality may show early warning of
    catastrophic failure

Good resources onVoIP Security
  • NIST National Institute of Standards and
  • Publication 800-58 Security Considerations for
    VoIP Systems (99 pgs, free)
  • http//csrc.nist.gov/publications/nistpubs
  • VoIPSA Voice over IP Security Alliance
  • Promoting education awareness, research,
    testing methodologies tools
  • Extensive membership vendors, VoIP providers,
    researchers, security vendors, test tool vendors
  • www.voipsa.org
  • PROTOS group - University of Oulu in Finland
  • Using protocol fuzzing to discover a wide variety
    of DoS and buffer overflow vulnerabilities
  • Have exposed HTTP, LDAP, SNMP, WAP, and VoIP
  • www.ee.oulu.fi/research/ouspg/protos/index.html
  • Mu Security
  • Manufacturers of a powerful protocol mutation
    tester (Mu-4000)
  • www.musecurity.com

Feel free to callif you have any questions
Chris Bajorekchris_at_ct-labs.com916-577-2110
(direct line)
