Clustering Graphs by Weighted Substructure Mining - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Clustering Graphs by Weighted Substructure Mining

Description:

Clustering Graphs by Weighted Substructure Mining ... Substructure Representation. 0/1 vector of ... Clustering algorithm based on substructure representation ... – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 30

Provided by: kojit

Category:

more less

Transcript and Presenter's Notes

Title: Clustering Graphs by Weighted Substructure Mining

1
Clustering Graphs by Weighted Substructure Mining

Max Planck Institute for Biological Cybernetics
Koji Tsuda

Joint work with Taku Kudo (Google Japan)
2
Unsupervised Clustering of Labeled Undirected
Graphs
3
Graph Structures in Biology

DNA Sequence
RNA
Texts in literature

Compounds

H
C
C
O
C
H
H
C
C
C
H
H
H
Amitriptyline
inhibits
adenosine
uptake
4
Substructure Representation

0/1 vector of pattern indicators
Huge dimensionality!
Need Graph Mining for selecting features
Better than paths (Marginalized graph kernels)

patterns
5
Overview

Clustering algorithm based on substructure
representation
Key Selecting informative substructures
EM-based Graph Clustering
Fitting a binomial mixture model
Combination of
L1 regularization
Weighted substructure mining

6
Quick Review of Graph Mining
7
Graph Mining

Analysis of Graph Databases
Find all patterns satisfying predetermined
conditions
Frequent Substructure Mining
Combinatorial, Exhaustive
Recently developed
AGM (Inokuchi et al., 2000), gspan (Yan et al.,
2002), Gaston (2004)

8
Graph Mining

Frequent Substructure Mining
Enumerate all patterns occurred in at least m
graphs
Indicator of pattern k in graph i

Support(k) of occurrence of pattern k
9
Gspan (Yan and Han, 2002)

Efficient Frequent Substructure Mining Method
DFS Code
Efficient detection of isomorphic patterns
Extend Gspan for our works

10
Enumeration on Tree-shaped Search Space

Each node has a pattern
Generate nodes from the root
Add an edge at each step

11
Tree Pruning
Support(g) of occurrence of pattern g

Anti-monotonicity
If support(g) lt m, stop exploring!

Not generated
12
Discriminative patternsWeighted Substructure
Mining

w_i gt 0 positive class
w_i lt 0 negative class
Weighted Substructure Mining
Patterns with large frequency difference
Not Anti-Monotonic Use a bound

13
Multiclass version

Multiple weight vectors
(graph belongs to class )
(otherwise)
Search patterns overrepresented in a class

14
EM-based clustering of graphs
15
EM-based graph clustering

Motivation
Learning a mixture model in the feature space of
patterns
Basis for more complex probabilistic inference
L1 regularization Graph Mining
E-step -gt Mining -gt M-step

16
Probabilistic Model

Binomial Mixture
Each Component

Mixing weight for cluster
Parameter vector for cluster
17
Function to minimize

L1-Regularized log likelihood
Baseline constant
ML parameter estimate using single binomial
distribution
In solution, most parameters exactly equal to
constants

18
E-step

Active pattern
E-step computed only with active patterns
(computable!)

19
M-step

Putative cluster assignment by E-step
Each parameter is solved separately
Use graph mining to find active patterns
Then, solve it only for active patterns

20
Solution

Occurrence probability in a cluster
Overall occurrence probability

21
Important Observation
For active pattern k, the occurrence probability
in a graph cluster is significantly different
from the average
22
Mining for Active Patterns F

F is rewritten in the following form
Active patterns can be found by graph mining!
(multiclass)

23
Experiments RNA graphs

Stem as a node
Secondary structure by RNAfold
0/1 Vertex label (self loop or not)

24
Clustering RNA graphs

Three Rfam families
Intron GP I (Int, 30 graphs)
SSU rRNA 5 (SSU, 50 graphs)
RNase bact a (RNase, 50 graphs)
Three bipartition problems
Results evaluated by ROC scores (Area under the
ROC curve)

25
Examples of RNA Graphs
26
ROC Scores
27
No of Patterns Time
28
Found Patterns
29
Conclusion

Probabilistic clustering based on substructure
representation
Inference helped by graph mining
Many possible extensions
Naïve Bayes
Graph PCA, LFD, CCA
Semi-supervised learning
Applications in Biology?

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Distributed Data Mining PowerPoint PPT Presentation

Distributed Data Mining - 1 Mflop/s 1 Megaflop/s 106 Flop/sec. 1 Gflop/s 1 Gigaflop/s 109 Flop/sec ... E.g., HPF, TreadMarks, sw for NoW (JavaParty, Manta, Jackal) 35 ... | PowerPoint PPT presentation | free to view

Contrast Data Mining: Methods and Applications PowerPoint PPT Presentation

Contrast Data Mining: Methods and Applications - Requires an ordering on the attribute values ... Attribute/Feature Conversion ... Detecting changes in attribute values is an important focus in data streams ... | PowerPoint PPT presentation | free to view

GENETIC ALGORITHMS AND GENETIC PROGRAMMING PowerPoint PPT Presentation

GENETIC ALGORITHMS AND GENETIC PROGRAMMING - Truss has 10 members (6 are length of 30 feet and 4 are length 302 = 41 feet) ... The weight is based on volume (i.e., cross-sectional area length) TRUSS GENOME ... | PowerPoint PPT presentation | free to view

Subdueing RHSEG: A Report on the Marriage of Graph Based Knowledge Discovery Subdue with Image Segme PowerPoint PPT Presentation

Subdueing RHSEG: A Report on the Marriage of Graph Based Knowledge Discovery Subdue with Image Segme - and Technology Office. NASA Goddard Space Flight Center. Greenbelt, MD 20771, USA ... have been awarded a patent by the United States Patent and Trademark Office. ... | PowerPoint PPT presentation | free to view

Graph Mining Applications in Machine Learning Problems PowerPoint PPT Presentation

Graph Mining Applications in Machine Learning Problems - 1. Graph Mining Applications. in Machine Learning Problems. Max Planck Institute for Biological Cybernetics. Koji Tsuda. 2. Existing methods assume ' tables' ... | PowerPoint PPT presentation | free to view

Outline - Synergies: Clever Algorithms and Domain Knowledge. Mining Molecular Fragments ... Based on Market Basket Analysis (Eclat Algorithm) ... | PowerPoint PPT presentation | free to view

Department of Computer Science, PowerPoint PPT Presentation

Department of Computer Science, - SCOP classification of 1B6C. Superfamily: Protein Kinase like (PK like) ... Pick families from SCOP, EC or other classifications ... | PowerPoint PPT presentation | free to view

Mining, Indexing PowerPoint PPT Presentation

Mining, Indexing - X. Yan and J. Han, gSpan: Graph-Based Substructure Pattern Mining, ICDM'02 ... If graph G contains query graph Q, G should contain any substructure of Q. Remarks ... | PowerPoint PPT presentation | free to view

Data Engineering PowerPoint PPT Presentation

Data Engineering - Substructure mining: Which substructures occur frequently in a set of compounds? ... the presence of any of these substructures is associated with the presence ... | PowerPoint PPT presentation | free to view

Neuroinformatics PowerPoint PPT Presentation

Neuroinformatics - meinauto(X) :- left(Y,X),holl ndisch(Y),rot(Y) ... Holographic reduced representation [Plate, 95] circular correlation/convolution ... | PowerPoint PPT presentation | free to view

Flowers - Efficient diverse substructure mining from a large class ... Activating / Deactivating. features. Euclidean embedding based on Co-Occurrences and Entropy[1] ... | PowerPoint PPT presentation | free to view

Data Mining for Social Network Analysis IEEE ICDM 2006, Hong Kong PowerPoint PPT Presentation

Data Mining for Social Network Analysis IEEE ICDM 2006, Hong Kong - You send me an email telling me the class number/ university in ... See (Wasserman and Faust, 1994) for a comprehensive introduction to social network analysis ... | PowerPoint PPT presentation | free to view

CIS303 Advanced Forensic Computing PowerPoint PPT Presentation

CIS303 Advanced Forensic Computing - Program control flow, traffic flow, and workflow analysis ... Can derive the embeddings of newly generated CAMs. University of Sunderland ... | PowerPoint PPT presentation | free to view

Chapter 9'1 Graph Mining PowerPoint PPT Presentation

Chapter 9'1 Graph Mining - Program control flow, traffic flow, and workflow analysis ... canonical adjacency matrix (CAM) ... Can derive the embeddings of newly generated CAMs. 8/21/09 ... | PowerPoint PPT presentation | free to view

Graph Data Mining PowerPoint PPT Presentation

Graph Data Mining - Harmony [Wang and Karypis] DDPMine [Cheng et al.] LEAP [Yan et al.] MbT [Fan et al. ... E.g., politicians bridge multiple groups ... | PowerPoint PPT presentation | free to view

Chapter 5: Link Analysis for Authority Scoring based on SC Chapter 6 PowerPoint PPT Presentation

Chapter 5: Link Analysis for Authority Scoring based on SC Chapter 6 - 5.1 Modeling the Web Graph (based on SC Chapter 5) ... power iteration (Jacobi method): initialization: p(0) (y) =1/n for all y ... | PowerPoint PPT presentation | free to view

Machine Learning for HighThroughput Biological Data PowerPoint PPT Presentation

Machine Learning for HighThroughput Biological Data - Predicting the operons in E. coli. Chromatin Remodelers and Nucleosome ... Finding Operons in E. coli (Craven, Page, Shavlik, Bockhorst and Glasner, 2000) ... | PowerPoint PPT presentation | free to view

Connectionist Knowledge Representation and Reasoning Part I PowerPoint PPT Presentation

Connectionist Knowledge Representation and Reasoning Part I - Recursive data structures. The general idea: recursive distributed representations ... Recursive data structures. The general idea: recursive distributed ... | PowerPoint PPT presentation | free to view

Contrast Data Mining: Methods and Applications - Contrast Data Mining: Methods and Applications James Bailey, NICTA Victoria Laboratory and The University of Melbourne Guozhu Dong, Wright State University | PowerPoint PPT presentation | free to view

Contrast Data Mining: Methods and Applications - Contrast Data Mining: Methods and Applications Kotagiri Ramamohanarao and James Bailey, NICTA Victoria Laboratory and The University of Melbourne | PowerPoint PPT presentation | free to view

Graph Mining Applications to Machine Learning Problems PowerPoint PPT Presentation

Graph Mining Applications to Machine Learning Problems - ... gspan (Yan et al., 2002), Gaston (2004) Graph Mining Frequent Substructure Mining Enumerate all patterns occurred in at least m graphs : Indicator of ... | PowerPoint PPT presentation | free to view

Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach PowerPoint PPT Presentation

Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach - Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach Hong Cheng Jiawei Han | PowerPoint PPT presentation | free to view

Mining, Indexing and Searching Graphs in Biological Databases PowerPoint PPT Presentation

Mining, Indexing and Searching Graphs in Biological Databases - Title: No Slide Title Author: Jiawei Han Last modified by: dlewis Created Date: 6/19/1998 4:38:52 AM Document presentation format: On-screen Show Company | PowerPoint PPT presentation | free to view

Connectionist Knowledge Representation and Reasoning (Part I) - Title: Intelligent Systems on the World Wide Web OWL Author: Marc Ehrig Last modified by: barbara Created Date: 4/30/2003 10:00:19 AM | PowerPoint PPT presentation | free to view

Graph Mining Applications in Machine Learning Problems - Graph Mining Applications in Machine Learning Problems Max Planck Institute for Biological Cybernetics Koji Tsuda | PowerPoint PPT presentation | free to view

Knowledge Discovery from Transportation Network Data PowerPoint PPT Presentation

Knowledge Discovery from Transportation Network Data - Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery from ... | PowerPoint PPT presentation | free to view

Ahmed K. Ezzat, PowerPoint PPT Presentation

Ahmed K. Ezzat, - Data Mining and Big Data Ahmed K. Ezzat, Data Mining Concepts and Techniques* | PowerPoint PPT presentation | free to view