Consistent Bipartite Graph CoPartitioning for HighOrder Heterogeneous CoClustering - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Consistent Bipartite Graph CoPartitioning for HighOrder Heterogeneous CoClustering

Description:

Tie-Yan Liu. WSM Group, Microsoft Research Asia. 2005.11.11 ... Talk at NTU, Tie-Yan Liu. Clustering ... Talk at NTU, Tie-Yan Liu. How to Solve the Optimization ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 28
Provided by: mingj
Category:

less

Transcript and Presenter's Notes

Title: Consistent Bipartite Graph CoPartitioning for HighOrder Heterogeneous CoClustering


1
Consistent Bipartite Graph Co-Partitioning for
High-Order Heterogeneous Co-Clustering
  • Tie-Yan Liu
  • WSM Group, Microsoft Research Asia
  • 2005.11.11
  • Joint work with Bin Gao, Peking University

2
Outline
  • Motivation
  • What is high-order heterogeneous co-clustering
  • Why previous methods can not work well on this
    problem
  • Consistent Bipartite Graph Go-partitioning (CGBC)
  • Experimental Evaluation
  • Conclusions and Future Work

3
Clustering
  • Clustering is to group the data objects into
    clusters, so that objects in the same cluster are
    similar to each other.
  • Spectral Clustering
  • Models the similarity of data objects by an
    affinity graph, and assume that the best
    clustering result corresponds to the minimal
    (ratio, normalized or min-max) graph cut.
  • It can be proven that the minimum of the
    normalized cut can be achieved by minimizing this
    objective function
  • and the corresponding solution q is the
    eigenvector associated with the second smallest
    eigenvalue of the generalized eigenvalue problem
    .

4
Co-Clustering
  • Co-clustering is to group two types of objects
    into their own clusters simultaneously.
  • Bipartite graph partitioning (Dhillon and Zha)
  • Use bipartite graph to model the
    inter-relationship between the two types of
    objects the edges are of the same type in the
    bipartite graph so the graph cut is still easy to
    define.
  • It can be proven that the solutions are the
    singular vectors associated with the second
    smallest singular value of the normalized
    inter-relationship matrix

5
High-order Heterogeneous Co-Clustering (HHCC)
  • HHCC is to group multiple (2) types of objects
    into clusters simultaneously.
  • Order is defined as the number of types of
    objects.
  • If we use graph to represent the
    inter-relationship between data objects, we will
    have that although the edges in each bipartite
    graph are of the same type, they are of
    different type for different bipartite graphs.
    This is what heterogeneous refers to, as
    compared to spectral clustering and bipartite
    graph co-clustering.

6
HHCC is not a Rare Problem
  • Typical examples
  • Surrounding Text Web Image Visual Features
  • User Query Click through
  • Many other examples

Category Document Term Reader Newspaper
Article Passenger Airplane Airways Webpage
Website Site-group Article Magazine
Category Hardware Computer Usage Software
People Community
7
Why HHCC is a new problem?
  • Although bipartite graph partitioning is just a
    trivial extension of the spectral clustering, the
    extension to HHCC is non-trivial
  • Since there are different types of edges in the
    HHCC problem, the cut of high-order data is
    difficult to define. It may not be very
    reasonable to assign some weights to
    heterogeneous edges so as to make their
    contributions to the graph cut comparable.
  • Simply applying spectral clustering may cause the
    high-order problem degraded to be a 2-order
    problem.

8
An Example of Weighting Heterogeneous Edges
a 0.01
a 1
no matter how we adjust the weights to balance
the different types of edges, we always can not
cluster X into two groups successfully
a 100
Embeddings produced by spectral clustering
9
An Example of Weighting Heterogeneous Edges
(Cont.)
  • Mathematical Proof.

Including X and Z
10
Order Degradation
2-Order Heterogeneous graph
11
Our Solution
  • We will try to tackle the aforementioned problems
    by proposing a new solution to HHCC Consistent
    Bipartite Graph Co-Partitioning (CGBC).
  • Where should we get started?
  • Star-structured HHCC
  • The concept of consistency
  • An SDP-based solution

12
Why Star-Structured?
  • Star-Structure means that in the heterogeneous
    graph, there is a central type of objects which
    connects all the other types of objects, and
    there is no direct connections between any other
    object types
  • Star-Structured is the simplest but very common
    case of HHCC.

13
Why Star-Structured?
  • Star-Structured is the simplest but very common
    case of HHCC.
  • Surrounding text
  • Web Images
  • Visual features
  • Author
  • Conference
  • Paper
  • Key Word
  • Customer
  • Shareholder
  • Shop
  • Supplier
  • Advertisement Media

14
The Concept of Consistency
  • Divide the star-structured HHCC problem into a
    set of bipartite sub-problems, where each
    sub-problem only has homogeneous edges.
  • Solve each sub problem separately, to avoid the
    order degradation.
  • Add a global constraint to the central type of
    objects, so as to get a feasible cut for the
    original problem.

15
The Concept of Consistency
partition these two graphs simultaneously and
consistently
divide this tripartite graph into two bipartite
graphs
16
Formulating the Optimization Problem
  • Minimize the cuts of the two bipartite graphs,
    with the constraints that their partitioning
    results on the central type of objects are the
    same.
  • Objective Function

The definition of q and p indicates the
consistency between these two graphs the y in
the two embeddings are the same, so we actually
force the partitioning on the central type of
objects to be the same.
17
How to Solve the Optimization Problem 1 Convert
it to a QCQP Problem
Simplify the original Problem to
single-objective programming
Assistant Notations
Considering that the normalized Rayleigh quotient
has been a scalar measure of the graph structure,
the combination of two Rayleigh quotients is more
reasonable and indicates which graph we should
trust more. Linear combination is only one of the
approaches of multi-objective programming. We can
surely use other methods which do not have this
argument.
Quadratically Constrained Quadratic Programming
(QCQP)
Sum-of-ratios Quadratic Fractional Programming
18
How to Solve the Optimization Problem 2 Convert
QCQP to SDP
Semi-definite Programming (SDP)
19
The Final Algorithm (CGBC)
  • Set the parameters ß, ?1 and ?2.
  • Given the inter-relation matrices A and B, form
    the corresponding diagonal matrices and Laplacian
    matrices D(1), D(2), L(1) and L(2).
  • Extend D(1), D(2), L(1) and L(2) to ?1, ?2, ?1
    and ?2, and form ?, such that the coefficient
    matrices in the SDP problem can be computed.
  • Solve the above SDP problem by a certain
    iterative algorithm such as SDPA.
  • Extract ? from W and regard it as the embedding
    vector of the heterogeneous objects.
  • Run the k-means algorithm on ? to obtain the
    desired partitioning of the heterogeneous objects.

20
CGBCs Extension to the k-star-structured HHCC
21
Experiment on Toy Problem
Relation Matrix A
Totally based on the first graph Y(812)
A more reasonable cut which is based on the
information from both the first and the second
graph
Embedding values of heterogeneous objects
ß 0 0.2 0.4
0.6 0.8 1.0
Relation Matrix B
Totally based on the second graph Y(128)
22
Experiment on Web Image Clustering
23
Embedding of the Clustering
Hill vs Owl
Flying vs Map
24
Average Performance
Performance Comparison
25
Conclusions
  • We propose a new problem named high-order
    heterogeneous co-clustering (HHCC).
  • We propose a consistent bipartite graph
    co-partitioning algorithm to solve the HHCC
    problem with star-structured inter-relationship.
  • Various experiments demonstrate the effectiveness
    of our proposed algorithm.

26
References
  • Bin Gao, Tie-Yan Liu, et al, Consistent Bipartite
    Graph Co-Partitioning for Star-Structured
    High-Order Heterogeneous Data Co-Clustering, in
    Proceedings of the Eleventh ACM SIGKDD
    International Conference on Knowledge Discovery
    and Data Mining (KDD 2005), pp4150.
  • Bin Gao, Tie-Yan Liu, Tao Qin, Qian-Sheng Cheng,
    Wei-Ying Ma, Web Image Clustering by Consistent
    Utilization of Low-level Features and Surrounding
    Texts, in Proceedings of ACM Multimedia 2005.

27
Thanks!
Contact tyliu_at_microsoft.com http//research.micro
soft.com/users/tyliu/
Write a Comment
User Comments (0)
About PowerShow.com