Title: New Algorithms for Enumerating All Maximal Cliques
1New Algorithms for Enumerating All Maximal
Cliques
- Kazuhisa Makino Takeaki Uno
- Osaka University National
Institute of - JAPAN
Informatics, JAPAN - 9/Jul/2004 SWAT 2004
2Background
- Recently, Enumeration algorithms are interesting
- There are still many unsolved nice problems
- (unlike to ordinal
discrete algorithms) - Recent increase of computer power makes
- many enumeration problems practically
solvable - ? many applications have been appearing,
- such as, genome, data mining, clustering,
so on - Some (theoretical) algorithms use enumeration
as subroutines - (recognition of perfect graph)
3Background (cont.)
- My institute has 100 researchers of informatics
- At least 5 researchers (independently) use
implementations of enumeration algorithms - Suppose that there are 100,000 researchers of
informatics - in the world
5000 researchers use enumeration algorithms ?????
4Problems and Results
- Problem1 for a given graph G(V, E),
- enumerate all maximal
cliques in G - Problem2 for a given bipartite graph G(V1?V2,
E), - enumerate all maximal bipartite
cliques in G - ( Problem2 is a special case of
Problem1 ) -
- We propose algorithms for solving these
problems, - reduce the time complexity in dense cases and
sparse cases. - Computational experiments for random graphs and
real-world data
5Difficulty
- Consider branch-and-bound type enumeration
- divide maximal cliques into two groups
- maximal cliques including v / not including v
- If a group includes no maximal clique, ? cut
off the branch -
- ? Finding a maximal clique not including given
vertices of S - is NP-Complete
- ? Can not cut off subproblems(branches)
- including no maximal clique
v1?K
v1?K
v2?K
v2?K
6Existing Studies and Ours
- O(VE) Tsukiyama, Ide, Ariyoshi
Shirakawa, - O(VE), lexicographic order Johnson,
Yanakakis Papadimitriou - O(a(G)E) Chiba Nishizeki
- ( a(G) arboricity of G with m/(n-1) ? a(G)
?m1/2 ) -
- many heuristic algorithms in data mining, for
bipartite case
Ours O(V2.376) (dense case) O(?4)
(sparse case) O((?)4 ?3 ) (? vertices have
degree gt ? ) O(?3) (bipartite case) O(?2)
(bipartite case with using much memory)
7Enumeration of Maximal Cliques
- Improved version of algorithm of Tsukiyama et.
al. - Idea Construct a route on all maximal cliques to
be traversed - For a maximal clique K of G ( V, E )
- C (K) lexicographically maximum maximal clique
including K - K?i vertices of K with indices ? i
- i(K) minimum index s.t. C(K?i) C(K?i1)
- parent of a maximal clique K C(K?i(K)-1)
- parent is lexicographically larger than K
Lexicographically larger
9
4
1
11
7
1,2,3 gt 1,2,4
10
3
1,3,6 gt 1,4,5
2
K
8
6
i(K)
5
8Graph Representation of Relation
- Parent-child relation is acyclic
- ? graph representation forms a tree
(enumeration tree)
Visit all maximal cliques by depth-first search
need to find children of a maximal clique
9Child of Maximal Clique
- G(vi) vertices adjacent to vi
- Ki C ( K?i n G(vi) ? vi )
- H is a child of K only if H Ki for some
igti(K) - (H is a child of K if the parent of Ki is K )
- i(Ki) i
construct Ki in O(E) time construct parent
in O(E) time ( O(?2
) time) for ii(K)1,,V in O(VE) time ?
enumerate O(VE) time per maximal
clique
K,i(K)6
9
4
1
11
7
10
3
2
8
6
5
10Characterization of Child
- The parent of Ki K ?
- (1) no vj , jlti is adjacent to all vertices in
K?i nG(vi) ? vi - (2) no vj , jlti is adjacent to all vertices in
K?inG(vi) ? K?j - (1) is not satisfied ? Ki and parent of Ki
includes vj?K - (2) is not satisfied ? parent of Ki includes
vj?K
K 3,4,7,9 K10
3,7,10 K?5
3,4 K ?7nG(v10) 3,7
7
4
9
10
3
K ?10nG(v10)
? v10
11Use of Matrix Multiplication
- Check the conditions (1) and (2) by matrix
multiplication - (1) no vj , jlti is adjacent to all vertices in K
?i nG(vi) ? vi - ith row of left ? K?inG(vi)?vi
- jth column of right ? G(vj)
- ij cell of product ? K?inG(vi)?vi n
G(vj)
K?inG(vi)?vi ?
G(vj) n K ?i nG(vi) ? vi
K?inG(vi)?vi
G(vj)
Condition (2) can be checked in the same way
Checked in O( V2.368 ) time ? time complexity
is O( V2.368 ) for each
12Sparse Cases
- If vi is adjacent to no vertex in K
- ? Ki C ( K?i n G(vi) ? vi ) C (vi)
- ? parent of Ki C ( C (vi) ?i )
- If C (vi) ?i f, parent of Ki is K0
- If C (vi) ?i ?f, (1) is not satisfied
- ? If K ? K0, Ki is not a child of K
- Since K??1 , at most ?(?1) vertices are
adjacent to K - Each Ki takes O(?2) time to construct the
parent
? max. degree
O(?4 ) per maximal clique
13Bipartite Clique
- Enumerate maximal bipartite cliques in G (V1
?V2 ,E ) - ( maximal cliques in G (V1 ?V2 , E ?V1 V1
?V2V2 )) - ? enumerated in O( V2.368 ) time for each
- But a sparse bipartite graph will be dense
- ? need some improvements for sparse cases
V1
V2
14Fast Construction of Ki
- For any maximal bipartite clique K
- K nV2 nv?K nV1 G(v)
- K nV1 nv?K nV2 G(v)
- KinV1 for all i are computed in O(?2) time
- Ki for all i are computed in O(?3) time
Kv1
Kv6
v1 v2 v5 v6
G(1)
G(2)
G(3)
G(4)
V1
1
2
3
4
V2
15Checking the Parent
- Put small indices to V1 , large indices to V2
- ? Ki is a child of K ? Ki?i K?i
- ? checked in O(?) time
1
V1
2
3
V1-1
V1
V2
V11
V12
V1
V2
Enumerated in O(?3) time for each
O(?2) by using memory
16Computational Experiments
- for graphs randomly generated
- vertex vi is connected to vertices from i-r to
ir with probability 1/2
Faster than Tsukiyamas algorithm
Computation time is linear in maximum degree
17Benchmark Problems
- Problem of finding frequent closed item sets
from database - ? equivalent to maximal bipartite clique
enumeration - Used on KDDcup (data mining algorithm
competition ) - BMS-WebView1 (from Web-log data)
- V 60,000, ave. degree 2.5
- BMS-WebView2 (from Web-log data)
- V 80,000, ave. degree 5
- BMS-POS (from POS data)
- V 510,000, ave. degree 6
- IBM-Artificial (artificial data)
- V 100,000 , ave.degree 10
18Results
19Conclusion and Future Work
- Proposed fast algorithms for enumerating
- maximal cliques O(V2.376), O(?4 ), O((?)4
?3 ) - maximal bipartite cliques O(V2.376), O(?3 ),
O(?2) - Examined benchmark problems of data mining,
- and showed that our algorithm performs well.
- Future work
- Can we improve more? What is the difficulty ?
- Can we enumerate other maximal (minimal) graph
objects ? - Can we apply matrix multiplication to other
enumeration problems ? - What can be enumerated efficiently in practice
?
20Frequent Sets
- Input graph
- An item and a customer is connected
- iff the customer purchased the item
- In a maximal bipartite clique
- Customers have similar favorites
- Items frequently purchased together
- Agrawal et al. 96, Zaki et al. 02, Pei 00, Han
00,
customer1 customer2 customer3 customer4
beer nappy milk
21Few Large Degree Vertices
- Very few vertices (denoted by T) have large
degrees - Divide the maximal cliques into two groups
- (a) cliques not included in T
- (b) cliques included in T
- (a) can be enumerated in O(?4) time
- Maximal clique K in the induced graph by T is
- a maximal clique of G ? K is not included in any
of (a) - ? O(T3) time for each
small degree lt ?
large degree
O(?4 T3 ) per maximal clique
22Avoid Duplications by Using Memory
- We can avoid duplications by storing all maximal
bipartite cliques - From K nV1 G(K nV2) , we store all K nV1
- 1. Get a K from memory (which is un-operated)
- 2. generate all KinV1
- 3. Store each KinV1 if it is not in memory
- 4. Go to 1 if a maximal clique is un-operated
Enumerated in O(?2) time for each