Title: GraphBased Methods for the Representation and Analysis of Business Workflows
1Graph-Based Methods for theRepresentation and
Analysis ofBusiness Workflows
- Amitava Bagchi
- Indian Institute of Management Calcutta
2References
- Mukherjee Arindam, Sen Anup K and Bagchi Amitava
(2004), Information analysis in workflows
represented as task-precedence metagraphs, Proc
WITS-2004, Workshop on Information Technology and
Systems, Seattle, WA, USA, pp 32-37 - Mukherjee Arindam, Sen Anup K and Bagchi Amitava
(2005), Representation, Analysis and Verification
of Business Processes A Metagraph-Based
Approach, Working Paper WPS-552, Indian Institute
of Management Calcutta (http//www.iimcal.ac.in)
3Outline
- Business Process Workflow
- Metagraphs Information Elements
- Task Precedence Metagraphs (TPMGs)
- Information Analysis Graphical Algorithm
- Functional Organizational Perspectives
- Workflow Verification
4Objectives
- To describe an AND/OR graph representation scheme
for business workflows - To present a graph traversal algorithm for the
analysis of information flow in such workflows - To extend the above method to task and resource
analyses - To outline how the structural correctness of
workflows can be verified
5Business Process Workflow
- A business process consists of a set of related
tasks in one or more functional areas (such as
finance or marketing), which, when performed in
any one of several permissible orders, enables an
organization to achieve a business goal. - Ex A loan appraisal system used in a bank
6Loan Appraisal Business Process Example
7Legend (see fig p 6)
- PD applicants property data
- CD data on comparable properties
- AC applicants account data
- APD loan application data
- AV appraised value of property
- CR applicants credit rating
- LA loan amount
- RLA revised loan amount
- LR risk level of loan
- AR, MR, BR the loan risk level is acceptable,
marginally bad, bad - BP current portfolio of banks loans
- RE banks current loan exposure
- YES the application is approved
- NO the application is rejected
8Business Process Workflow
- Workflow (or Workflow Instance) A specific
instance of flow of control in a business
process it is a sub-graph of the given process
graph - In practice, the terms business process and
workflow are often used interchangeably.
9Workflow Instance 1
10Workflow Instance 2
11Workflow Instance 3
12Business Process ModelingExisting Approaches
- Petri Nets Related Formalisms
- Petri Nets (van der Aalst van Hee 2002)
- Workflow Management Coalition (WfMC) Guidelines
(http//www.wfmc.org) - Metagraph-Based Formalisms
- Metagraphs (Basu Blanning 2000, 1999, 1994)
13Petri Nets Related Formalisms
- Main focus is on the precedence relationships
between tasks - Flow of information plays a subsidiary role
- Commercial products such as IBMs MQSeries have
adopted this convention - Widely used, and quite suitable for engineering
applications
14Metagraph-Based Formalisms
- A metagraph is a directed (hyper-)graph. It can
be viewed as a special type of AND/OR graph. - A metagraph is typically small in size (at most a
few hundred nodes) and is explicitly available,
i.e., the entire graph is supplied as input to a
search algorithm. So the expansion of a node just
means moving to its immediate successors. - In (implicit) game trees, new nodes actually get
added to the graph as they get created.
15Metagraph-Based Formalisms
- Each node in a metagraph contains one or more
information elements (items). - In Information Analysis, an input set of items is
supplied at start, specifying the business
information initially available. - Another set of items, called the output set,
contains the target set of items that are desired
as output.
16Metagraph-Based Formalisms
- Each arc represents a task that converts one set
of items to another set of items. - The objective is to start from the input set of
items, perform the tasks in the given order of
precedence and derive all the items in the
output set.
17Metagraph-Based Formalisms
- The metagraph convention puts more emphasis on
the flow of information, so has an advantage over
Petri Nets for business applications. - However, it is not widely used in practice
because it suffers from certain shortcomings.
18e
Metagraph for Loan Evaluation Process
1
Calculate
Account
Credit
Data (AC)
Rating
Credit
Rating
(CR)
Marg
. Bad
e
e
k
4
8
s
Risk (MR)
i
R
l
Calculate
a
Applicant
n
t
n
i
g
e
r
m
a
s
M
s
e
Loan
s
Data (APD)
s
A
Risk
Loan
Risk (LR)
A
e
c
A
B
c
s
a
9
Appr
.Value
e
s
A
d
p
p
r
o
v
p
e
e
R
e
t
s
t
i
h
e
(AV)
e
L
s
s
a
o
Loan
a
n
k
b
m
7
1
0
l
A
e
Approved
e
s
d
n
s
e
e
e
s
t
R
i
s
(YES)
a
5
s
y
i
r
t
s
m
p
r
C
n
e
w
L
o
a
n
e
a
l
c
u
l
a
t
e
a
k
e
e
2
p
n
p
A
t
o
o
u
n
t
A
m
r
e
t
P
Accept
a
l
f
o
u
c
e
l
Risk (AR)
Loan
u
a
l
Bad
C
a
V
Amt
. (LA)
Risk (BR)
R
e
e
e
j
A
6
e
1
L
p
e
c
t
a
1
c
u
l
l
t
a
o
p
C
Property
Risk
l
t
a
i
k
s
h
c
i
n
R
s
k
n
e
a
a
B
Data (PD)
Exposure
i
t
e
r
u
o
s
o
x
p
E
n
(RE)
Loan
Comparables
Banks
Rejection
Data (CD)
Portfolio
(NO)
(BP)
19Metagraphs
- The existing metagraph model for workflows has
three main shortcomings - Flow of control is not displayed with clarity and
the diagram appears cluttered - The analysis makes use of symbolic matrices which
are not easy to manipulate - A clear distinction is not always drawn between
OR joins AND joins (or even between OR splits
AND splits)
20Task Precedence Metagraphs (TPMGs)
- A TPMG is a modified form of metagraph.
- It is visually more appealing and is more like an
AND/OR graph in appearance. - It is less cluttered so the flow of control is
discerned more easily. - A TPMG is more general than a WfMC graph in that
AND OR splits and joins are not always required
to be matched in pairs.
21Terminology
- Tasks Propagation Edges
- Init Nodes Prop Nodes
- OR Nodes AND Nodes
- Split Nodes Join Nodes
22Task Precedence Metagraphs (TPMGs)
- Edges are of two types
- Tasks shown as bold arrows a task converts the
set of items at its start to another set of
items, which cannot be obtained from any other
task - Propagation Edges shown as lightly drawn arrows
a propagation edge conveys an item from the
outgoing end of a task to the incoming end of
another task. -
23Task Precedence Metagraphs (TPMGs)
- Nodes are also of two types
- Init Nodes
- An init node has a single outgoing edge
corresponding to a task - Is shown as a bold oval
- Prop Nodes
- A prop node can have multiple outgoing edges, all
of which are propagation edges - Is shown as a lightly drawn oval
24Task Precedence Metagraphs (TPMGs)
- Init and Prop Nodes
- On every directed path, init nodes alternate with
prop nodes, i.e., a TPMG is a directed bipartite
graph, just like a Petri net.
25Task Precedence Metagraphs (TPMGs)
- Nodes are of two types, OR and AND.
- An OR node (identified with a sign) shows
alternate paths for flow of control. - An AND node (identified with a sign) indicates
that flow of control takes place along all the
edges at the same time. - An OR (or AND) node is either a split node or a
join node.
26Task Precedence Metagraphs (TPMGs)
- Split Join Nodes
- A split node is a node at which multiple paths
begin. It is always a prop node. - A join node is a node at which multiple paths
end. It is always an init node.
27Task Precedence Metagraphs (TPMGs)
- However, a TPMG differs from a Petri Net in that
every node has an associated subset of labeled
items. This underscores the role of business
information in a business process.
28Information Analysis
- Given a workflow, we seek answers to questions of
the following type - Suppose a set A of items is supplied. Starting
from A, can we produce all the items in another
given set B? - Is item a essential for producing item b?
- These can be formulated as graph search problems.
29Information Analysis
- But a standard AND/OR graph search algorithm such
as AO (Nilsson 1980) is not appropriate for our
purpose because a TPMG differs from an AND/OR
graph in some ways - TPMG Multiple start nodes
- AND/OR Graph One start node
30Information Analysis
- TPMG Both AND joins and OR joins
- AND/OR Graph Only OR joins
- TPMG Can have directed cycles AND/OR
Graph AO assumes it is cycle-free - Note that a project scheduling network has only
AND splits/joins and no OR splits/joins
31Algorithm InfAnalysis
- Algorithm InfAnalysis is an iterative graph
search algorithm - Given
- An explicit TPMG
- An input set of items
- An output (target) set of items
- Determines whether all the items in the output
set can be derived.
32Algorithm InfAnalysis
- Algorithm InfAnalysis has some similarities with
A and AO and makes use of an edge-marking
method. - We think of the given TPMG as representing a
business process, and the marked solution
sub-graph produced by InfAnalysis as a workflow
instance.
33Algorithm InfAnalysis
- Makes use of four lists
- ITEMSET initially contains the input set of
items new items get added as nodes get expanded - TARGET contains the items desired as output
- FRONTIER only holds init nodes initially holds
those that have all their items in ITEMSET - STACK needed for processing OR nodes remembers
which OR alternative should be processed next -
34Algorithm InfAnalysis
- An active node is an init node in FRONTIER with
all its items in ITEMSET. - At each iteration, InfAnalysis looks for an
active node in FRONTIER, processes the
correspond-ing task, and updates ITEMSET
FRONTIER.
35Algorithm InfAnalysis
- If all items in TARGET belong to ITEMSET then a
solution has been found (success). - If there is no active node in FRONTIER then the
next OR alternative in STACK must be pursued. - If STACK is also empty then failure.
36Algorithm InfAnalysis
- Thus the algorithm traverses the given TPMG
exhaustively, looking for a workflow instance
that generates, for the given input set, a set of
items that contains the given output set. - When traversing a workflow instance, the edges in
the instance get marked (say by colouring red).
37Algorithm InfAnalysis
- When the next workflow instance is examined, the
marking at the corresponding OR split node is
changed. - The advantage of marking is that each instance
need not be traversed from scratch the work done
earlier can be remembered and partly reused. - The algorithm assumes that the TPMG is
structurally valid.
38Algorithm InfAnalysis
- Example For the loan appraisal process, we want
to know whether, given the set of items S LA,
PD, CD, AC, APD, BP as input, we can produce
the item YES as output. - A graph search algorithm is appropriate for such
problems. To keep the algorithm simple, we do not
indicate the edge markings.
39Algorithm InfAnalysis
- initialize ITEMSET, FRONTIER, STACK
- do while (TARGET is not a subset of ITEMSET)
- if (there is an active node n in FRONTIER)
then - remove n from FRONTIER
- expand n, entering its init successors in
FRONTIER, - OR split nodes in ORLIST, and new items in
ITEMSET - // else examine next workflow instance
- else if (STACK is not empty) then
- take next init successor p of OR node m
on top of STACK - enter p in FRONTIER adding its items to
ITEMSET - if (m has no other successors) then pop m
-
- else announce failure exit
- // no remaining workflow instances
- announce success exit
-
-
-
40Algorithm InfAnalysis Observations
- Works correctly on the example shown earlier
(TPMG for loan appraisal) - But for more complex TPMGs containing OR split
nodes that are not descendants of each other, the
STACK must be replaced by a more flexible data
structure
41Functional Perspective
- Queries that relate to the execution of tasks
rather than to the flow of information - Which other tasks must be completed before a
given task t can start? - If a task t cannot be executed, which other tasks
become inoperable?
42Functional Perspective
- Algorithm InfAnalysis can be modified in a simple
way to answer such queries. - For example, to find the tasks that must be
completed before task t can start, consider the
set S of items contained in the init node that
immediately precedes t. Run InfAnalysis with the
given inputs and with S as the target set the
required set of tasks are those in the marked
sub-graph.
43Organizational Perspective
- Queries that relate to resources (i.e., the
executors of tasks, whether human agents or
machines) - If a resource r is unavailable, some tasks will
not get performed. As a result, some other
resources might become idle. Which are the
resources that will become idle?
44Organizational Perspective
- Again, Algorithm InfAnalysis can be modified in a
simple way to answer such queries. - For example, to determine the other resources
that become idle when resource r is unavailable,
first find the set T of tasks that r executes. We
can determine which other tasks get held up
because the tasks in T cannot be executed. This
will tell us whether any resources have become
completely idle.
45Temporal Constraints
- The control structure of a workflow imposes
temporal constraints on tasks. If a task precedes
another task, it must be performed earlier. - If temporal information, such as the duration of
tasks, is supplied, then issues arise similar to
those in project scheduling. - However, the presence of directed cycles in
workflows causes additional complications.
46Structural Verification
- A valid workflow always serves a business goal.
- Given a business process W supplied in the form
of a TPMG, how do we tell whether W is valid? - To ensure the validity of W, some structural
(i.e., syntactic) constraints must be imposed on
W. - We now give examples of such structural
constraints.
47Structural Problem Deadlock
- Deadlock Caused when an OR split node is nested
with an AND join node. - In the figure, only one of the two outgoing edges
at the OR split node 2 can be marked at any time.
So execution cannot proceed beyond the AND join
node 7. - A valid workflow must not have any deadlocks.
48Structural Problem Lack of Synchronization
- Lack of Synchronization Caused when an AND split
node is nested with an OR join node. - In the figure, since both the outgoing edges at
the AND split node 2 will get marked, the task
(7,8) will be executed twice. - A valid workflow must not suffer from lack of
synchronization.
49Structural Problem Non-Terminating Cycle
- Non-Terminating Cycle Caused when control cannot
exit from a directed cycle. - This problem can be avoided when every directed
cycle is well-formed, i.e., it has an OR join
node lying on it through which control can enter,
and an OR split node lying on it through which
control can exit (see loan appraisal example).
50Other Structural Errors
- Examples of other structural errors that must be
eliminated - Dangling Nodes It should be ensured that if a
node in a TPMG contains items that are not target
items, then the node has a successor task.
51 Structural Verification
- The structural verification algorithm TPMG_SYN
traverses the workflow instances in the given
TPMG one by one looking for structural problems. - As soon as it locates a problem it terminates
with an appropriate error message. - If TPMG_SYN does not find a problem, the given
TPMG models a valid business process.
52 Structural Verification
- TPMG_SYN has many similarities with Algorithm
InfAnalysis. - TPMG_SYN assumes for convenience that there is
one start node and one goal node. If this does
not hold for the given TPMG, an AND split node
can be added at the top and an OR join node at
the bottom.
53Algorithm TPMG_SYN
- 1 initialize ITEMSET, FRONTIER, STACK finish
false - 2 do while (finish false)
- 3 if (there is a non-goal active node n in
FRONTIER) then - 4 remove n from FRONTIER
- 5 expand n, entering its init successors in
FRONTIER, - 6 OR split nodes in ORLIST, and new items
in ITEMSET - 7
- 8 else if (there is a goal node in FRONTIER)
then - 9 announce one workflow instance scanned
- 10 else if (STACK is not empty) then
- 11 take next init successor p of OR node m
on top of STACK - 12 enter p in FRONTIER adding its items to
ITEMSET - 13 if (m has no other successors) then pop
m - 14
- 15 else finish true
- 16
- 17 announce the given TPMG is valid exit
-
54TPMG_SYN Detection of Errors
- Illegal cycles and lack of synchronization can
both be detected at line 5 when node n is
expanded. We can just check whether as a result
of the expansion an OR join node has two marked
incoming edges. This would be illegal in general,
but in some situations it can indicate the
presence of a legal directed cycle.
55TPMG_SYN Detection of Errors
- Deadlock can be detected at line 8 when a goal
node is not found but inactive nodes are present
in FRONTIER. - Dangling nodes can also be detected at line 5
when node n is expanded.
56Workflow Verification
- Note that some semantic constraints are imposed
by the meanings of the items contained in the
TPMG nodes. - TPMG_SYN when appropriately modified can perform
certain types of semantic verification of TPMGs.
57Workflow Verification
- A similar verification procedure can be devised
for workflows drawn using Petri Nets or any other
WfMC convention.
58