Title: Nonparametric Latent Feature Models for Link Prediction
1Nonparametric Latent Feature Modelsfor Link
Prediction
- Kurt T. Miller, Thomas L. Griffiths, Michael I.
Jordan - NIPS 2009
- Presented by Minhua Chen, 06.04.2010.
2Problem Formulation
- Link prediction in Social Network (Binary Matrix
completion)
1 0 ? ?
0 1 ? 0
1 ? 1 ?
0 1 0 1
Y
Yij 1 person i is linked to person j. Yij
0 person i is not linked to person j. Yij ?
unobserved entry to be filled in.
- Linkage can stand for different relations,
e.g., friends or not, colleagues or not. - If the network is a directed graph, then Y can
be asymmetric. - Observed entries auxiliary information
(optional) ? unobserved entries
3Methods
- Class-based model
- Entities are clustered into classes.
- Linkage is determined by which classes they
belong to. - Models Infinite Relational Model (IRM)
- Mixed Membership Stochastic
Blockmodel (MMSB) - Disadvantage clustering description is too
coarse, not expressive. - Latent-feature model
- Interactions between latent-features
determine the linkage. - This paper extends it to a nonparametric
model using IBP. - Number of latent features can be inferred as
well as their interactions.
4Model
- Define Z to be a binary NK matrix with N people
and K latent features. - Define W to be a KK weighting matrix for the K
latent features. - The model is
- Or expressed in more details
5Results on Synthetic Data
(c) Ground truth of Z (d) Generated
Y (e) Inferred Z Although the
missing values are imputed correctly, the
inferred Z is different from ground truth. This
indicates that the model is unidentifiable.
6Results on Multi-Task Data
- The Countries data contains 54 relation matrices
among 14 countries, along with 90 given
covariates. - The Alyawarra data contains 26 kinship
relationship matrices of 104 people in the
Alyawarra tribe in Central Australia. - For each dataset, 80 of the data is used for
training and the rest 20 is used for testing.
- LFRM outperforms IRM and MMSB with proper
initialization.
7Results on Single-Task Data
AUC performance
LFRM w/IRM 0.9509
LFRM rand 0.9466
IRM 0.8906
MMSB 0.8705
- 234 authors who published with the most other
- people in NIPS 1-17 are used, and their
- coauthorship matrix is constructed.