Title: Discovering Models of Software Processes form Eventbased Data
1Discovering Models of Software Processes form
Event-based Data
- Jonathan E. Cook and Alexander L. Wolf
- TOSEM 1998
- June, 11, 2002
- Yoon, Kyung-A
2Contents
- Introduction
- Approach
- Method for process discovery
- Rnet
- Ktail
- Markov
- DaGama discovery tool
- Case study
- Conclusion
3Introduction(1/3) - background and motivation
- Many technologies of software process assume the
existence of a formal model of a process for - Unambiguity
- Communication
- Automation
Process model
Process discovery
4Introduction(2/3) - process discovery
- Methods for automatically deriving a formal model
of a process from basic event data collected on
the process - Foundation for process discovery
- Grammar inference
- Sentences in language ? Data describing the
process behavior - Grammar of language ? Formal model of the process
- Data mining
- The task of discovering behavioral information in
data - Reverse engineering
5Introduction(3/3)- event-based framework and FSM
- Event-based framework
- Event
- is typed and can have attributes (ex. time)
- Uses to characterize the dynamic behavior of a
process in terms of identifiable, instantaneous
actions - Single event stream represents one execution of
one process - FSM (finite-state machine)
- Convenient and sufficiently powerful for
describing historical patterns of actual behavior - Reduce the complexity of discovery problems
- No inherent ability to model concurrency
6Approach
- Goal of work
- To use event data collected from a software
process execution to infer a formal model of the
behavior of the process
7Method for process discovery- overview
- Three grammar inference methods
- RNet
- Statistical (neural network) approach that looks
at the past behavior to characterize a state - Ktail
- Algorithmic approach that looks at the future
behavior to compute a possible current state - Markov
- Hybrid statistical and algorithmic approach that
looks at the neighboring past and future behavior
to define a state - Simple event stream example
- Edit, Review, Checkin
Edit-Review-Checkin (ERC) Edit-Checkin-Review
(ECR)
8Method for process discovery- RNet(1/2)
- Statistical approach
- Extended by Das and Mozer 1994
- Supports an arbitrary number of token types
- Standard feed-forward neural network is trained
- Propagating the difference between actual and
desired outputs backward through the network
9Method for process discovery- RNet(2/2)
- Result
- RNet successfully produces a deterministic FSM
- Edit-Review-Checkin and Edit-Checkin-Review
- RNet models behavior that is not present in the
stream - Edit-Review-Review
- Advantage
- Robust w.r.t. input stream noise
- Disadvantage
- Very slow for the training time
- Size of the net grows rapidly with the number of
token types
10Method for process discovery- Ktail(1/3)
- Algorithmic approach
- Based on work by Biermann and Feldman1972
- Takes a sample string as input, and gives FSM as
output - The basic concept of Ktail
- State is defined by what future behaviors can
occur from it - Current state is reached by given history, string
prefix - Future behavior is defined as the next k tokens
- This work examines a k-length future form all
points in an input string and reduce the number
of states in the FSM
11Method for process discovery- Ktail(2/3)
- Definition of Ktail
- Equivalence class E is a set of prefixes such
that - ?(p,p) ? E, ?t ? Tk , p t ? P ? p t ?
P - S the set of sample strings
- A the alphabet of tokens that make up the
strings in S - P the set of all prefixes in S
- p?P a valid prefix for some subset of the
strings in S - t token string, tail
- Tk the set of all strings composed from A of
length k or less - Transitions among state are the set D of E
- D ? epa, ?p ? Ei
- D destination state of the transitions
- Ei a given state (equivalence class)
- a token, a ? A
12Method for process discovery- Ktail(3/3)
- FSM inferred by the Ktail (k2)
- Merging state
- If S1 has transitions to states S2, .., Sn for a
token t, and if the sets of output transition
tokens for the states S2, .., Sn are equivalent
or strict subsets, then we merge states S2, ..,
Sn.
13Method for process discovery- Markov(1/6)
- Hybrid of statistical and algorithmic approach
- Uses the concept of Markov models to find the
most probable event sequence production - Algorithmically converts those probabilities into
states and state transitions - Assumptions of Markov model
- There are a finite number of states defined for
the process - At any point in time, the probability of the
process being in some state is only dependent on
the previous state that the process was in - The state transition probabilities do not change
over time - The initial state of the process is defined
probabilistically
14Method for process discovery- Markov(2/6)
- Four steps
- St1) Construction of the event-sequence
probability tables by traversing the event stream - St2) Construction of the event graph from the
probability tables - St3) Find the overconnected vertices and correct
by splitting this - St4) Conversion the event graph to proper form
15Method for process discovery- Markov(3/6)
- Construction of the event-sequence probability
tables by traversing the event stream
Fist- and second-order event-sequence
probability tables
16Method for process discovery- Markov(4/6)
- Construction of the event graph from the
probability tables
R
C
E
First- and second-order event-sequence
probability tables
17Method for process discovery- Markov(5/6)
- Find the overconnected vertices and correct by
splitting this
Fist- and second-order event-sequence
probability tables
18Method for process discovery- Markov(6/6)
- Conversion the event graph to proper form
G
G
19Method for process discovery- evaluation
Comparison of discovery methods
Event stream length vs. Time and space
requirements
Number of event type vs. Time and space
requirements
20Method for process discovery- DaGama discovery
tool
- DaGama is fit into the Balboa data analysis
framework - Usage
- Select event stream
- Choose a discovery method
- Specify the methods parameter
- Run the method
- The discovered model is displayed in a Balboa
process model viewer - The model can be edited by process engineer
21Case study- overview
- Conducted at ATT Bell Lab with DaGama.
- Change request process for a large
telecommunications software system - Prescribed process that was documented by
organization was not strictly enforced - 159 executions of the process
- 141 acceptance fix and 18 rejected fix
- 32 event types
22Case study- discovering a process model
- DaGama found the general patterns of behavior
entrenched within the data - as a sound starting point for a process engineer
to construct an accurate and useful model - Discovered model reflected a much greater amount
of the process behavior than the prescribed
process model documented by the organization
about 65
23Conclusion
- Ktail and Markov methods shows the most promise
and RNet is not sufficiently mature - Methods for process discovery support the process
engineer in constructing initial process models - May give the process engineer clues as to when
and in what direction the process model should
evolve, based on data from the currently
executing process