Efficient Identification of Overlapping Communities - PowerPoint PPT Presentation

About This Presentation

Title:

Efficient Identification of Overlapping Communities

Description:

Compare run time of new vs. old. Compare cluster quality of new vs. old ... Preferential attachment quality. LA ordering run time. LA ordering quality ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 50

Provided by: jeffrey105

Learn more at: http://www.cs.rpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Identification of Overlapping Communities

1
Efficient Identification of Overlapping
Communities

Jeffrey Baumes
Mark Goldberg
Malik Magdon-Ismail

Rensselaer Polytechnic Institute, Troy, NY
2
Outline

Communities as clusters
What is a cluster?
Cluster seed procedure (LA)
Cluster refinement procedure (IS2)
Experimental results
Conclusions and future work

3
Communities as clusters

Malicious groups use large communication networks
for planning and coordination
Their goal remain undetected
Our goal sift through communications for
suspicious patterns, using structure only, not
content

4
Communities as clusters

Detecting all social groups (malicious or not)
will aide in searching for hidden groups
Social groups tend to communicate densely
Approach Find social groups by finding clusters
in the graph of the communication network

Add external edges
likely a social group
A communicates with B
likely not a social group
actor A
actor B
5
What is a cluster?

Many partitioning algorithms exist
Social groups often overlap
Instead define clusters as locally optimal with
respect to density

overlapping clustering
partitioning
6
Two-stage process
communication network
seed procedure
seed clusters
refinement procedure
final clusters
7
Original procedures
communication network
Rank Removal (RaRe)
seed clusters
Iterative Scan (IS)
Jeffrey Baumes, Mark Goldberg, Mukkai
Krishnamoorthy, Malik Magdon-Ismail, Nathan
Preston. "Finding Communities by Clustering a
Graph into Overlapping Subgraphs", International
Conference on Applied Computing (IADIS 2005), Feb
22-25, Algarve, Portugal.
final clusters
8
Proposed new procedures
communication network
Link Aggregate (LA)
seed clusters
Iterative Scan 2 (IS2)
final clusters
9
Link Aggregate (LA)

Order the nodes (two routines are used)
Pass through the nodes
For each node, add it to the clusters it
improves, or start a new cluster

10
LA procedure
11
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
12
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
13
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
14
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
15
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
16
LA procedure
8
27
35
12
23
3
6
24
25
5
7
17
16
28
1
15
21
2
9
29
4
11
33
26
32
10
20
14
19
22
31
13
30
34
18
17
Iterative Scan (IS)

Old refinement procedure
Traverses entire node list, adding / removing
nodes which increase the density
Repeats the process until no improvements are
possible
May be inefficient in sparse networks\
Guaranteed to be locally optimal

18
Iterative Scan 2 (IS2)

New refinement procedure
Traverses neighborhood of cluster only, adding /
removing nodes which increase the density
Repeats the process until no improvements are
possible
More efficient in sparse networks in spite of
overhead, less efficient in dense networks

19
IS2 procedure
20
IS2 procedure
21
IS2 procedure
22
IS2 procedure
23
IS2 procedure
24
Experimental results

Compare run time of new vs. old
Compare cluster quality of new vs. old
Compare on different network types
Random
Preferential attachment
Real-world
Compare possible actor orderings for LA

25
RaRe vs. LA run time
New RaRe
Original RaRe
LA
New RaRe
LA
26
IS vs. IS2 run time
Define IS IS for dense graphs, IS2 for sparse
graphs
27
Old vs. new quality
New RaRe ? IS
New RaRe ? IS
LA ? IS2
LA ? IS2
28
Preferential attachment
New RaRe ? IS
New RaRe ? IS
LA ? IS2
LA ? IS2
29
Real-World Networks

Ratio new/old
(LA?IS)/(RaRe?IS)

IS
IS2
IS2
IS2
IS
30
LA ordering
31
Conclusions and future work

Overlapping clustering may be used to discover
social groups in communication networks
The new algorithm is more efficient in many
cases, while keeping the same or better quality
A unified algorithm should choose strategies and
parameters based on network properties

32
Questions
33
Rank Removal

Existing seed procedure
Removes highly connected nodes until network is
broken into small clusters
Adds removed nodes back into clusters it is
well-connected to
Two main inefficiencies
Computed Page Rank at each iteration
Computed connected components at each iteration
Page Rank could be computed once, but
reprocessing connected components is crucial

34
LA procedure detail
35
IS2 procedure detail
36
RaRe vs. LA
37
RaRe vs. LA
38
RaRe vs. LA
39
IS vs. IS2
40
IS vs. IS2
41
IS vs. IS2
42
Run time RaRe vs. LA
43
Run time IS vs. IS2
44
Cluster quality
45
Cluster quality
46
Preferential attachment run time
47
Preferential attachment quality
48
LA ordering run time
49
LA ordering quality

Write a Comment

User Comments (0)