Title: 272:%20Software%20Engineering%20Fall%202008
1272 Software Engineering Fall 2008
- Instructor Tevfik Bultan
- Lecture 15 Interface Extraction
2Software Interfaces
- Here are some basic questions about software
interfaces - How to specify software interfaces?
- How to check conformance to software interfaces?
- How to extract software interfaces from existing
software? - How to compose software interfaces?
- Today we will talk about some research that
addresses these questions
3Software Interfaces
- In this lecture we will talk about interface
extraction for software components - Interface of a software component should answer
the following question - What is the correct way to interact with this
component? - Equivalently, what are the constraints imposed on
other components that wish to interact with this
component? - Interface descriptions in common programming
languages are not very informative - Typically, an interface of a component would be a
set of procedures with their names and with the
argument and return types
4Software Interfaces
- Lets think about an object oriented programming
language - You interact with an object by sending it a
message (which means calling a method of that
object) - What do you need to know to call a method?
- The name of the method and the types of its
arguments - What are the constraints on interacting with an
object - You need a reference to the object
- You have to have access (public, protected,
private) to the method that you are calling - One may want to express other kinds of
constraints on software interfaces - It is common to have constraints on the order a
components methods can be called - For example a call to the consume method is
allowed only after a call to the produce method - How can we specify software interfaces that can
express such constraints?
5Software Interfaces
- Note that object oriented programming languages
enforce one simple constraint about the order of
method executions - The constructor of the object must be executed
before any other method can be executed - This rule is very static it is true for every
object of every class in every execution - We want to express restrictions on the order of
the method executions - We want a flexible and general way of specifying
such constraints
6Software Interfaces
- First, I will talk about the following paper
- Automatic Extraction of Object-Oriented
Component Interfaces,'' J. Whaley, M. C. Martin
and M. S. Lam Proceedings of the International
Symposium on Software Testing and Analysis, July
2002. - The following slides are based on the above paper
and the slides from Whaleys webpage -
7Automatic Interface Extraction
- The basic idea is to extract the interface from
the software automatically - Interface is not written as a separate
specification - There is no possibility of inconsistency between
the interface specification and the code since
the interface specification is extracted from the
code - The extracted interface can be used for dynamic
or static analysis of the software - It can be helpful as a reverse engineering tool
8What are Software Interfaces
- In the scope of the work by Whaley et al.
interfaces are constraints on the orderings of
method calls - For example
- method m1 can be called only after a call to
method m2 - both methods m1 and m2 have to be called before
method m3 is called
9How to Specify the Orderings
- Use a Finite State Machine (FSM) to express
ordering constraints - States correspond to methods
- Transitions imply the ordering constraints
M2
M1
Method M2 can be called after method M1 is called
10Example File
- There are two special states Start and End
indicating the start and end of execution
read
open
START
close
END
write
11A Simple OO Component Model
- Each object follows an FSM model.
- One state per method, plus START END states.
- Method call causes a transition to a new state.
read
m1
m2
open
close
START
END
m1() m2() is legal,new state is m2
write
12The Interface Model
- Note that this is a very simple model
- It only remembers what the last called method is
- There is no differentiation between different
invocations of the same method - This simple model reduces the number of possible
states - Obviously all the orderings cannot be expressed
this way
13Adding more precision
- With above model we cannot express constraints
such as - Method m1 has to be called twice before method m2
can be called - We can add more precision by remembering the last
k method calls - If we have n methods this will create nk states
in the FSM - Whaley et al. suggest other ways of improving the
precision without getting this exponential blow up
14The Interface Model
- If the only state information is the name of the
last method called then what are the situations
that this information is not precise enough? - Problem 1 Assume that there are two independent
sequences of methods that can be interleaved
arbitrarily - once we call a method from one of the sequences
we will lose the information about the other
sequence - Problem 2 Assume that there is a method which
can be followed by all the other methods - then once we get to that method any following
behavior is possible independent of the previous
calls
15Problem 1
- Consider the following scenario
- An object has two fields, a and b
- There are four methods set_a(), get_a(), set_b(),
get_b() - Each field must be set before being read
- We would like to have an interface specification
that specifies the above constraints - Can we build an FSM that corresponds to these
constraints?
16Problem 1
- These kind of constraints create a problem
because once we call the set_a method it is
possible to go to any other method - FSM does not remember the history of the method
calls - FSM only keeps track of the last method call
- Solution Use one FSM for each field and take
their product
FSM below allows the following sequence start
set_a() get_b()
set_a
set_b
set_a
get_a
get_b
get_b
17Splitting by fields
- Separate the constraints about different fields
into different, independent constraints - Use multiple FSMs executing concurrently (or use
a product FSM)
set_a
set_b
get_a
get_b
Imprecise
Adds more precision
18Product FSM
The product FSM does not allow the
following sequence start set_a() get_b()
There is a transition from each state to the END
state
19Product FSM
- Product FSM has more number of states than the
FSM which just remembers the last call - Assume that there are n1 methods for field 1 and
n2 methods for field 2 - simple FSM n1 n2 states
- product FSM n1 ? n2 states
- Note that the number states in the product FSM
will be exponential in the number of fields
20Problem 2
- It is common to have methods which are used to
query the state of an object - These methods do not change the state of the
object - After such state-preserving methods all other
methods can be called - Calling a state preserving method does not change
the state of the object - If a method can be called before a call to a
state preserving method, then it can be called
after the call to the state preserving method - Since only information we keep in the FSM is the
last method call, if there exists an object state
where a method can be called, then that method
can also be called after a call to a
state-preserving method
21Problem 2
- getFileDescriptor is state-preserving
- Once getFileDescriptor is called then any
behavior becomes possible - The FSM for Socket allows the sequence
- start getFileDescriptor() connect()
- Solution
- distinguish betweenstate-modifying and
state-preserving methods - Calls to state-preserving methods do not change
the state of the FSM
FSM for Socket
start
START
START
connect
getFileDescriptor
getFileDescriptor
connect
close
END
22State-preserving methods
start
START
Calls to state-preserving methods do not change
the state of the FSM
getFileDescriptor
connect
m1
m2
m1 is state-modifying m2 is state-preserving m1()
m2() is legal,new state is m1
close
END
23Summary of Model
- Product of FSMs
- Per-thread, per-instance
- One submodel per field
- Use static analysis to find the methods that
either read the value of the field or modify the
value of the field. - Identifies the methods that belong to a submodel
- The methods that read and write to a field will
be in the FSM for that field - Separates state-modifying and state-preserving
methods. - One submodel per Java interface
- Implementation not required
24Extraction Techniques
Static Dynamic
For all possible program executions For one particular program execution
Conservative Exact (for that execution)
Analyze implementation Analyze component usage
Detect illegal transitions Detect legal transitions
Superset of ideal model(upper bound) Subset of ideal model(lower bound)
25Static Model Extraction
- Static model extraction relies on defensive
programming style - Programmers generally put checks in the code that
will throw exceptions in case the methods are not
used in the correct order - Such checks implicitly encode the software
interface - The static extraction algorithm infers the method
orderings from these checks that come from
defensive programming
26Static Model Extractor
- Defensive programming
- Implementation throws exceptions (user or system
defined) on illegal input.
public void connect() connection new
Socket() public void read() if (connection
null) throw new IOException()
connection
connection
27Extracting Interface Statically
- The static algorithm has two main steps
- For each method m identify those fields and
predicates that guard whether exceptions can be
thrown - Find the methods m that set those fields to
values that can cause the exception - This means that immediate transitions from m to
m are illegal - Complement of the illegal transitions forms the
model of transitions accepted by the static
analysis
28Detecting Illegal Transitions
- Only support simple predicates
- Comparisons with constants, null pointer checks
- The goal is to find method pairs ltsource, targetgt
such that - Source method executes
- field const
- Target method executes
- if (field const) throw exception
29Algorithm
- How to find the target method Control dependence
- Find the following predicates A predicate such
that throwing an exception is control dependent
on that predicate - This can be done by computing the control
dependence information for each method - For each exception check if the predicate
guarding its execution (i.e., the predicate that
it is control dependent on) is - a single comparison between a field of the
current object and a constant value - the field is not written in the current method
before it is tested - Such fields are marked as state variables
30Algorithm
- The second step looks for methods which assign
constant values to state variables - How to find the source method Constant
propagation - Does a method set a field to a constant value
always at the exit? - If we find such a method and see that
- that constant value satisfies the predicate that
guards an exception in an other method - then this means that we found an illegal
transition
31Sidenote Control Dependence
- A statement S in the program is control dependent
on a predicate P (an expression that evaluates to
true or false) if the evaluation of that
predicate at runtime may decide if S will be
executed or not - For example, in the following program segment
- if (x gt y) maxx else maxy
- the statements maxx and maxy are control
dependent on the predicate (x gt y) - A common compiler analysis technique is to
construct a control dependence graph - In a control dependence graph there is an edge
from a node n1 to another node n2 if n2 is
control dependent on n1
32Sidenote Constant Propagation
- Constant propagation is a well-known static
analysis technique - Constant propagation statically determines the
expressions in the program which always evaluate
to a constant value - Example
- y0 if (x gt y) then x5 else x5y z
xx - The assigned value to z is the constant 25 and we
can determine this statically (at compile time) - Constant propagation is used in compilers to
optimize the generated code. - Constant folding If an expression is known to
have a constant value, it can be replaced with
the constant value at compile time preventing the
computation of the expression at runtime.
33Static Extraction
- Static analysis of the java.util.AbstractList.List
Itr with lastRet field as the state variable - The analysis identifies the following
transitions illegal - start ?set
- start?remove
- remove?set, add?set
- remove?remove
- add?remove
- The interface FSM contains all the remaining
transitions
34Automatic documentation
- Interface generated for java.util.AbstractList.Lis
tItr
START
next,previous
set
add
remove
35Dynamic Interface Extractor
- Goal find the legal transitions that occur
during an execution of the program - Java bytecode instrumentation
- insert code to the method entry and exits to
track the last-call information - For each thread, each instance of a class
- Track last state-modifying method for each
submodel.
36Dynamic Interface Checker
- Dynamic Interface Checker uses the same mechanism
as the dynamic interface extractor - When there is a transition which is not in the
model - instead of adding it to the model
- it throws an exception
37Experiences
- Whaley et al. applied these techniques to several
applications
Program Description Lines of code
Java.net 1.3.1 Networking library 12,000
Java libraries 1.3.1 General purpose library 300,000
J2EE 1.2.1 Business platform 900,000
joeq Java virtual machine 65,000
38Automatic documentation
J2EE TransactionManager (dynamic)
An example FSM model that is dynamically generated
and provides a specification of the interface
start
suspend
rollback
commit
resume
END
39Test coverage
- Dynamically extracted interfaces can be used as
a test coverage criteria - The transitions that are not present in the
interface imply that those method call sequences
were not generated by the test cases - For example, the fact that there are no
self-edges in the FSM on the right implies that
only amax recursion depth of 1 was tested
J2EE IIOPOutputStream(dynamic)
START
increaseRecursionDepth
increaseRecursionDepth
simpleWriteObject
decreaseRecursionDepth
END
40Upper/lower bound of model
SocketImpl model(dynamic)
start
START
(static)
getFileDescriptor
availablegetInputStreamgetOutputStream
connect
close
- Statically generated transitions provide an
upper approximation of the possible method call
sequences - Dynamically generated transitions provide a
lower approximation of the possible method call
sequences
END
41Finding API bugs
- Automated interface extraction can be used to
detect bugs - The interface extracted from the joeq virtual
machine showed unexpected transitions
START
START
Expected APIfor jq_Method
Actual APIfor jq_Method
prepare
prepare
setOffset
compile
compile
42Summary Automatic Interface Extraction
- Product of FSM
- Model is simple, but useful
- Static and dynamic analysis techniques
- Generate upper and lower bounds for the
interfaces - Useful for
- Documentation generation
- Test coverage
- Finding API bugs
43Automated Interface Extraction, Continued
- There is a more recent work on interface
extraction for Java - Synthesis of Interface Specifications for Java
Classes, R. Alur, P. Cerny, P. Madhusan, W. Nam,
in Proceedings of Principles of Programming
Languages, (POPL 2005). - They built a tool called JIST (Java Interface
Synthesis Tool). - I will discuss this work in the rest of the
lecture.
44Java Interface Synthesis Tool (JIST)
- Here is the problem that JIST is trying to solve
- Given a class and a property such as the
exception E should not be raised - generate a behavioral interface specification for
the class that corresponds to the most general
way of invoking the methods in the class without
violating the safety property.
45Safe Interface
- Let E denote the unsafe states of the program
(for example an exception is raised) - E specifies the safety requirement, i.e., a state
satisfying E should not be reached - An interface specification for a class is a safe
interface with respect to a requirement E - if it is guaranteed that the program never
reaches the unsafe state E as long as the class
is used according to the interface specification
46Most Permissive Safe Interface
- The most permissive safe interface is a safe
interface that puts the least amount of
restrictions on the users of the class - Interface I is more permissive than interface I,
if any call sequence allowed by I is also
allowed by I - If I is the most permissive safe interface, then
for any safe interface I, I is more permissive
than I - JIST is guaranteed to find a safe interface but
it is not guaranteed to find the most permissive
safe interface
47Interface Synthesis Steps
- STEP 1 Abstract the class to a Boolean program
using predicate abstraction - The predicates are provided by the user
- STEP 2 Find a winning strategy in a two-player
partial information game - Player-0 is the user of the class. Player-0
chooses to invoke one of the methods of the
class. - Player-1, the abstract class, chooses a
corresponding possible execution through the
abstract state-transition graph which results in
an abstract return value. - A strategy for Player-0 is winning if the game
always stays away from the abstract states
satisfying the requirement E (E is provided by
the user) - The most permissive winning strategy can be
represented as a DFA - They use the L algorithm to compute this DFA
- L is an algorithm for learning a regular
language using membership and equivalence queries
48JIST Architecture
Java
Java compiler
Predicates
Java Byte Code
Soot
Predicate Abstractor
Jimple
Game Language Converter
Boolean Jimple
Symbolic Class
Interface Synthesizer
NuSMV Language
Interface Automaton
STEP1 Abstraction
Boolean Symbolic Class
Interface
STEP 2 Partial Information Game Solving
49STEP 1 Predicate Abstraction
- JIST uses a predicate abstraction technique
similar to the one used in SLAM model checker - Predicate abstraction is an automated abstraction
technique which can be used to reduce the state
space of a program - The basic idea in predicate abstraction is to
remove some variables from the program by just
keeping information about a set of predicates
about them - For example a predicate such as x y maybe the
only information necessary about variables x and
y to determine the behavior of the program - In that case we can just store a boolean variable
which corresponds to the predicate x y and
remove variables x and y from the program - Predicate abstraction is a technique for doing
such abstractions automatically
50Predicate Abstraction
- Given a program and a set of predicates,
predicate abstraction abstracts the program so
that only the information about the given
predicates are preserved - The abstracted program adds nondeterminism since
in some cases it may not be possible to figure
out what the next value of a predicate will be
based on the predicates in the given set - One needs an automated theorem prover to compute
the abstraction
51Predicate Abstraction, Simple Example
- Assume that we have two integer variables x,y
- Abstract the program y y1 using a single
predicate xy - We will represent the predicate xy as the
boolean variable B in the abstract program - Btrue will mean xy and Bfalse will mean
x?y
Step 2 Use Decision Procedures to determine if
the predicates used for abstraction imply any of
the preconditions
Concrete Statement y y 1
x y ? x y 1 ? No
Step 1 Calculate the preconditions
x ? y ? x y 1 ? No
x y 1
y y 1 x y
x y ? x ? y 1 ? Yes
x ? y ? x ? y 1 ? No
x ? y 1
y y 1 x ? y
Step 3 Generate Abstract Code
precondition for B being false after executing
the statement yy1
IF B THEN B false ELSE B true false
(Example taken from Matt Dwyers slides)
52STEP 1 Predicate Abstraction
- JISTs predicate abstraction implementation does
not handle the following - Floating point types, arrays, recursive method
calls (then inline the method calls by inlining),
exceptions (other than the one used for the
requirement E) - They do not use an automated theorem prover since
they only handle simple expressions - The result of the abstraction step is an Abstract
class which only contains boolean variables and
is nondeterministic - It provides an over-approximation of the
behaviors that can be generated by the concrete
class - I.e., if a call sequence does not reach E in the
abstract class then it is guaranteed that it will
not reach E in the concrete class
53STEP 2 Game Solving
- Player-0 user of the abstract class
- Player-1 the abstract class
- Game
- Player-0 chooses a method and calls it
- Player-1 picks a possible execution for the
method that is called (remember that there is
non-determinism) - Player-0 wins if E is not reached
- Question Find the most permissive winning
strategy for Player-0 - The most permissive winning strategy corresponds
to the interface for the class
54Game Solving via Learning
- Results from game theory show that the winning
strategy can be characterized as a DFA - JIST uses a learning algorithm called L to find
a winning strategy - L is an algorithm that can compute a DFA by
repeatedly asking membership and equivalence
queries - Membership query Is this string accepted by the
target DFA? - Equivalence query Given a DFA (a guess) is it
equal to the target DFA? - If the equivalence query returns false, it should
also give a counter-example string that is
accepted by one of the DFAs but not the other - If these two types of queries can be answered,
then L algorithm can compute the target DFA
55Implementing Equivalence Queries
- Let G be the DFA guessed by the learning
algorithm and let T be the target DFA - Equivalence query Are the language accepted by G
and the language accepted by T equal? - The equivalence query can be divided to two
separate queries - L(G) L(T) if and only if L(G) ? L(T) and
L(T) ? L(G) - They can handle subset queries precisely
- Membership queries can also be translated to
subset queries (generate a DFA that accepts only
the input string) - They cannot handle superset queries precisely,
and because of that they are not guaranteed to
compute the most permissive interface - However, they always compute a safe interface
56Implementing Subset Queries
- Checking a Subset query means the following
- The learning algorithm suggests an interface I
- They compute the composition of this interface I
with the abstract class A (A I) - Then they check if A I satisfies the property
AG(? E) using the model checker NuSMV - E is the requirement (and interface is a safe
interface if E never becomes true) - If A I satisfies AG(? E), then I is a safe
interface and hence it accepts a subset of the
language accepted by the most permissive
interface - The answer to the subset query is TRUE
- If A I violates AG(? E), then they generate a
counter-example execution which shows that I can
lead to violation of property E, i.e., it is not
a safe interface - The answer to the subset query is FALSE and the
counter-example is returned to the learning
algorithm
57Implementing Superset Queries
- Checking a Superset query means the following
- The learning algorithm suggests an interface I
- I is the superset of the most permissive safe
interface - if all the call sequences that are not allowed by
I lead to some execution of class A which reaches
E - There is no efficient way of checking this
- They check the following
- If in any call sequence, the first method call
that is not allowed by I always reaches E, then
the answer to the superset query is TRUE - Otherwise, we look at the counter-example call
sequence generated by the model checker and check
if that call sequence is safe - If it is, then the answer to the superset query
is FALSE and that call sequence is a
counter-example to the superset query - If it is not safe, then we do not know the answer
to the superset query, but we can still report
the interface as a safe interface since it passed
the subset query
58Experiments
- The automatically synthesized interfaces for some
Java classes - Signature, ServerTableEntry, ListItr,
PipedOutputStream - The computation time is 5 to 100 seconds
- In 4 our of 6 cases they found the most
permissive interface
Signature class interface
s0
initSign
initVerify
update sign initSign
update verify initVerify
initSign
s1
s2
initVerify