Title: The In Vivo Testing Approach
1The In Vivo Testing Approach
- Christian Murphy, Gail Kaiser, Ian Vo, Matt Chu
- Columbia University
2Problem Statement
- It is infeasible to fully test a large system
prior to deployment considering - different runtime environments
- different configuration options
- different patterns of usage
- This problem may be compounded by moving apps
from single-CPU machines to multi-core processors
3Our Solution
- Continually test applications executing in the
field (in vivo) as opposed to only testing in the
development environment (in vitro) - Conduct the tests in the context of the running
application - Do so without affecting the systems users
4int main ( ) ... ... ... foo(x)
int main ( ) ... ... ... foo(x)
test_foo(x)
... ...
5Contributions
- A new testing approach called in vivo testing
designed to execute tests in the deployment
environment - A new type of tests called in vivo tests
- An implementation framework called Invite
6Related Work
- Perpetual testing Clarke SAS00
- Skoll Memon ICSE04
- Gamma Orso ISSTA02
- CBI Liblit PLDI03
- Distributed In Vivo Testing Chu ICST08
7Example of Defect Cache
private int numItems 0, currSize 0 private
int maxCapacity 1024 // in bytes public int
getNumItems() return numItems public
boolean addItem(CacheItem i) throws ...
numItems add(i) currSize i.size
return true
Maximum capacity
if (currSize i.size lt maxCapacity)
else return false
8Insufficient Unit Test
public void testAddItem() Cache c new
Cache() assert(c.addItem(new CacheItem()))
assert(c.getNumItems() 1)
assert(c.addItem(new CacheItem()))
assert(c.getNumItems() 2)
1. Assumes an empty/new cache
2. Doesnt take into account various states
that the cache can be in
9Defects Targeted
- Unit tests that make incomplete assumptions about
the state of objects in the application - Possible field configurations that were not
tested in the lab - A legal user action that puts the system in an
unexpected state - A sequence of unanticipated user actions that
breaks the system - Defects that only appear intermittently
10Applications Targeted
- Applications that produce calculations or results
that may not be obviously wrong - Non-testable programs
- Simulations
- Applications in which exta-functional behavior
may be wrong even if output is correct - Caching systems
- Scheduling of tasks
11In Vivo Testing Process
- Create test code (using existing unit tests or
new In Vivo tests) - Instrument application using Invite testing
framework - Configure framework
- Deploy/execute application in the field
12Model of Execution
Function is about to be executed
Execute function
NO
Yes
Run test
Create sandbox
13Writing In Vivo Tests
/ Method to be tested / public boolean
addItem(CacheItem i) . . . / JUnit style
test / public void testAddItem() Cache
c new Cache() if (c.addItem(new
CacheItem())) assert (c.getNumItems()
1)
In Vivo
CacheItem i)
boolean
this
int oldNumItems getNumItems()
i))
return
oldNumItems1
else return true
14Instrumentation
/ Method to be tested / public boolean
__addItem(CacheItem i) . . . / In Vivo
style test / public boolean testAddItem(CacheItem
i) ... public boolean addItem(CacheItem i)
if (Invite.runTest(Cache.addItem))
Invite.createSandboxAndFork() if
(Invite.isTestProcess()) if
(testAddItem(i) false) Invite.fail()
else Invite.succeed()
Invite.destroySandboxAndExit()
return __addItem(i)
15Configuration
- Each instrumented method has a set probability ?
with which its test(s) will run - To avoid bottlenecks, can also configure
- Maximum allowed performance overhead
- Maximum number of simultaneous tests
- Also, what action to take when a test fails
16Case Studies
- Applied testing approach to two caching systems
- OSCache 2.1.1
- Apache JCS 1.3
- Both had known defects that were found by users
(no corresponding unit tests for these defects) - Goal demonstrate that traditional unit tests
would miss these but In Vivo testing would detect
them
17Experimental Setup
- An undergraduate student created unit tests for
the methods that contained the defects - These tests passed in development
- Student was then asked to convert the unit tests
to In Vivo tests - Driver created to simulate real usage in a
deployment environment
18Discussion
- In Vivo testing revealed all defects, even though
unit testing did not - Some defects only appeared in certain states,
e.g. when the cache was at full capacity - These are the very types of defects that In Vivo
testing is targeted at - However, the approach depends heavily on the
quality of the tests themselves
19Performance Evaluation
- We instrumented three C and two Java applications
with the framework and varied the value ?
(probability that a test is run) - Applications were run with real-world inputs on a
dual-core 3GHz server with 1GB RAM - No restraints were placed on maximum allowable
overhead or simultaneous tests
20Experimental Results
Time (seconds)
0 25 50 75
100 percent of function
calls resulting in tests
21Discussion
- Percent overhead is not a meaningful metric since
it depends on the number of tests run - More tests more overhead
- Short-running programs with lots of tests will
have significantly more overhead than
long-running programs - For C, the overhead was 1.5ms per test
- For Java, around 5.5ms per test
22Future Work
- Ensure that test does not affect the external
system state (database, network, etc.) - Adjust frequency of test execution based on
context or resource availability (CPU usage,
number of threads, etc.) - Apply approach to certain domains, e.g. security
testing
23Conclusion
- We have presented a new testing approach called
in vivo testing designed to execute tests in the
deployment environment - We have also presented an implementation
framework called Invite - In Vivo testing is an effective technique at
detecting defects not caught in the lab
24The In Vivo Testing Approach
- Christian Murphy, Gail Kaiser, Ian Vo, Matt Chu
- Columbia University
25Distributed In Vivo Testing Chu ICST08
- Testing load is distributed to members of an
application community - Each of the N members perform 1/Nth of the
testing so as to reduce overhead - We have also considered an autonomic approach
that balances testing load according to usage
profile