Clustering Social Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering Social Networks

Description:

Title: An Approximation Algorithm For The Minimum-Cost k-Vertex Connected Subgraph Author: rtiwari Last modified by: rtiwari Created Date: 8/16/2006 12:00:00 AM – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 20
Provided by: rti88
Learn more at: https://www.cise.ufl.edu
Category:

less

Transcript and Presenter's Notes

Title: Clustering Social Networks


1
Clustering Social Networks
  • Nina Mishra et al
  • Presented by Nam Nguyen

2
(a,ß)-Cluster
  • Definition
  • Given a graph G (V,E) where every vertex has a
    self-loop, C ? V is an (a,ß)-cluster if
  • 1. Internally dense ?v ? V, E(v,C) ßC
  • 2. Externally sparse ?u ? V\C, E(u,C) aC

u
ßC
aC
v
3
Example
  • a,b,c,d and d,e,f,g are (1/4, 1)-clusters
  • h and i are do not fall into any (a,ß)-cluster
    for 0 alt ½ lt ß 1
  • thus, they would not be clustered.
  • ? (a,ß)-cluster are able for detecting
    overlapping clusters.

4
Problem definition
  • Objective
  • Identify clusters that are internally dense,
    i.e., each vertex in the cluster is adjacent to
    at least a ß-fraction of the cluster, and
    externally sparse, i.e., any vertex outside of
    the cluster is adjacent to at most an afraction
    of the vertices in the cluster.
  • Given 0 alt ß 1, find all (a,ß)-clusters in the
    network

5
Contributions of the paper
  • Give a bound for the overlapping of two
    (a,ß)-clusters A and B.
  • They overlap in at most Cmin1-(ß- a),
    a/(2ß-1) vertices.
  • If the ratio of A and B is at most (1- a)/(1-
    ß) then one cluster can not be contained in the
    other.
  • Give a loose upper bound for the number of
    (a,1)-clusters of size s O( (n/s) a1 )
  • Introduction of the ?-champion of a cluster and
    if ßgt ½(1 ? a), there is a simple deterministic
    algorithm for finding all such clusters in time
  • O(m0.7n1.2 n2o(1))

6
Some minor remarks
  • ß ? 1, the cluster C ? a clique
  • a ? 0, C tends to a disconnected component
  • ßlt ½ then C might contain two disconnected
    components.
  • We want a lt ß and ßgt ½.
  • (0, ß)-clusters ? finding connected components
    output ß-connected ones.
  • (1-1/n, 1)-clusters ? finding the maximal cliques
    in a graph.
  • ((1-e) ß, ß)-clusters ? finding quasi-cliques.

7
Result 1
  • Question
  • How about the intersection of 3 (or more)
    (a,ß)-clusters of the same size? different size ?
  • How about the intersection of an (a,ß)-cluster
    and an (a,ß)-cluster of the same size?
    different size ?

8
Result 2 Bounding the number of (a,1)-clusters
  • Proof
  • Two clusters of the same size s can share at most
    as vertices.
  • Every subset of size (as1) must appear in at
    most one set in C.
  • There are subsets of s elements from n
    elements, each of these contains
    subsets of size (as1).
  • Therefore, we can have at most
    clusters in C
  • ? C

9
This bound is tight
  • when a 0
  • No overlapping ? of clusters of size s n/s.
  • when a ? 1 ( a (n-1)/n )
  • Consider the complement of the following graph
  • Let s n N/2, then the bound is 2n.
  • In fact, we do have 2n subsets of (a, 1)-clusters
    of size n by choosing from the set
  • B b1b2bn bi is either xi or yi

10
An algorithm for finding clusters with champions
  • Why?
  • In last example, each vertex has as many
    neighbors outside as within the cluster
  • There is no vertex that champions the cluster
    (having more friends inside than outside)
  • Why not find one who champions and start with it?

11
Algorithm (contd)
  • Assumption
  • A big gap between ß and a/2 ß gt ½ (a?)/2
  • Why?
  • Recall last example We have 2n possible clusters
    of size n ? Too many
  • Any algorithm that outputs more clusters than
    nodes are undesirable.
  • Thus, we need some restriction to reduce the of
    returned clusters.

12
Algorithm (contd)
  • How many clusters with ?-champion should we have
    ?
  • A big gap between ß and a/2 ß gt ½ (a?)/2
  • How to find them?

13
Algorithm (contd)
  • If v and c have sufficient many neighbors then v
    is a part of the cluster C that c champions.
  • ? thats what line 5 for
  • Running time of the algorithm

14
Experimental Results
  • For real networks
  • Do (a,ß)-clusters with ?-champion exist? ? use
    Tsukiayama algorithm
  • If they do exist, do most (a,ß)-clusters have
    ?-champion?
  • Results
  • Able to find 90 of the maximal cliques in
    graphs where a ½.
  • No strong ?-champions in missed clusters.
  • Running time Weight faster than Tsukiyamas
    algorithm
  • Datasets
  • High Energy Physics Theory Co-Author graph (HEP)
  • Theory Co-Author graph (TA)
  • A subset of Live Journal graph (LP)

15
Results
16
Results
17
Results
18
Results
19
References
  • 1 Clustering Social Networks, Ninna Mishra,
    Robert Schreiber, Isabelle Stanton and Robert E.
    Tarjan (2007)
Write a Comment
User Comments (0)
About PowerShow.com