Stochastic%20Block%20Models%20of%20Mixed%20Membership - PowerPoint PPT Presentation

About This Presentation
Title:

Stochastic%20Block%20Models%20of%20Mixed%20Membership

Description:

... this example we map latent groups to known functional categories ... Latent aspects patterns that correlate with, help predict, functional processes in the cell ... – PowerPoint PPT presentation

Number of Views:232
Avg rating:3.0/5.0
Slides: 24
Provided by: edoardo
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Stochastic%20Block%20Models%20of%20Mixed%20Membership


1
Stochastic Block Models of Mixed Membership
  • Edo Airoldi 1,2, Dave Blei 2, Steve Fienberg 1,
    Eric Xing 1
  • 1 Carnegie-Mellon University 2 Princeton
    University

SAMSI, High Dimensional Inference and Random
Matrices, September 17th, 2006
2
The Scientific Problem
  • Protein-protein interactions in Yeast
  • Different studies test protein interactions with
    different technologies (precision)

3
The Data Interaction Graphs
  • M proteins in a graph (nodes)
  • M2 observations on pairs of proteins
  • Edges are random quantities, Y n,m
  • Interactions are not independent
  • Interacting proteins form a protein complex
  • T graphs on the same set of proteins
  • Partial annotations for each protein, X n

M 871 nodes M2 750K entries
4
The Scientific Problems
  • What are stable protein complexes?
  • They perform many cellular processes
  • A protein may be a member of several ones
  • How many are there?
  • How do stable protein complexes interact?
  • Test hypotheses (inform new analyses)
  • Learn complex-to-complex interaction patterns

5
More Network Data
Disease Spread
Electronic Circuit
Food Web
Internet
Social Network
6
An Abstraction of the Data
  • A collection of unipartite graphs G1T (Y1T
    ,N )
  • Integer, real, multivariate edge weights Yt
    Yt nm n,m ? N
  • Node-specific (multivariate) attributes X1T
    Xt n n ? N
  • Partially observable Y1T and X1T

7
The Challenge
  • Given the data abstraction and the goals of the
    analysis
  • Can we posit a rich class of models that is
    instrumental for thinking about the scientific
    problems we face? Amenable to theoretical
    analyses?

8
Modeling Ideas
  • Hierarchical Bayes
  • Latent variables encode semantic elements
  • Assume structure on observable-latent elements
  • Combination of 2 class of models

1. Models of mixed membership
2. Network models (block models)
?

Stochastic block models of mixed membership
9
Graphical Model Representation
Stochastic Blocks
Mixed Membership
10
A Hierarchical Likelihood
11
More Modeling Issues
  • Technical Sparsity
  • Introduce parameter that modulates the relative
    importance of ones and zeros (binary edges) in
    the cost function that drives the clustering
  • Biological Ribosomes Distress
  • Some protein complexes act like hubs because they
    are involved, e.g., in protein production or cell
    recovery (Y2H technology is invasive)

12
Large Scale Computation
  • Masses of data
  • 750K observations in a small problem (M871)
  • 2.5M observations with (M1578)
  • 3M expressions for 6K genes/proteins in Yeast
  • Variational inference Jordan et al., 2001
  • Naïve implementation does not work
  • We develop a novel nested variational algorithm

13
Example A Scientific Question
  • Do PPI contain information about functions?

Model
Approximate Posterior on Membership Vectors
?
YLD014W
Raw data
Functional Annotations
14
Interactions in Yeast (MIPS)
  • Do PPI contain information about functions?

YLD014W
15
Results Identifiability
  • In this example we map latent groups to known
    functional categories

Known Annotations
Unknown Annotations
16
Results Functional Annotations
17
Results Mixed Membership
  • The estimated membership vectors support the
    mixed membership assumption

18
Results Stochastic Block Model
19
General Bayesian Formulation
  • Assumptions for unipartite graphs
  • Population existence of K sub-populations
  • Latent variable mixed memb. vectors ?n D?
  • Subject exchangeable edges given blocks memb.
    Ynm f ( . ?n ? ?m )
  • Sampling scheme the graphs are IID
  • Additional data, e.g., attributes, annotations
  • Integrated model formulation (descriptive/predicti
    ve)

T
20
Variational Algorithms
  • Naïve algorithm
  • init (?i ?i, ?ij ?ij)
  • while ( log-lik ?)update (?ij ?ij)update (?i
    ?i)
  • Nested algorithm
  • init (?i ?i)
  • while ( log-lik ?)loop ij
  • init ?ij
  • while ( log-lik ?)update ?ij
  • partially update (?i,?j)

We trade space for time but
21
Variational Algorithms for MMSB
Nested
Nested
Naïve
Naïve
  • On a single machine we empirically observed
    faster convergence (offsets extra computation),
    and more stable paths to convergence.

22
Take Home Points
  • Bayesian formulation is integral to the biology
  • A novel class of models that combines MM for
    soft-clustering network models for dependent
    data
  • Latent aspects ? patterns that correlate with,
    help predict, functional processes in the cell
  • Current implementation allows for fast inference
    on large matrices through variational
    approximation ? considerable opportunity to
    improve upon both computation and efficiency of
    the approximation

23
  • Data Problems Gavin et al. (2002) Nature Ho
    et al. (2002) Nature Mewes et al. (2004) Nucleic
    Acids Research Krogan et al. (2006) Nature.
  • Mixed Membership Models
  • Pritchard et al. (2000) Erosheva (2002)
    Rosenberg et al. (2002) Blei et al. (2003) Xing
    et al. (2003ab) Erosheva et al. (2004) Airoldi
    et al. (2005) Blei Lafferty (2006) Xing et
    al. (2006)
  • Stochastic network models
  • Wasserman et al. (1980, 1994, 1996) Fienberg et
    al. (1985) Frank Strauss (1986) Nowicki
    Snijders (2001) Hoff et al. (2002), Airoldi et
    al. (2006)
  • More material on the Web at http//www.cs.cmu.edu
    /eairoldi/
  • ICML Workshop on Statistical Network Analysis
    Models, Issues and New Directions on June 29 at
    Carnegie Mellon, Pittsburgh PA
    http//nlg.cs.cmu.edu/
Write a Comment
User Comments (0)
About PowerShow.com