Title: Xiaowei Ying, Xintao Wu
1On Link Privacy in Randomizing Social Networks
- Xiaowei Ying, Xintao Wu
- Univ. of North Carolina at Charlotte
- PAKDD-09 April 28, Bangkok, Thailand
2Motivation
- Privacy Preserving Social Network Publishing
- node-anonymization
- cannot guarantee identity/link privacy due to
subgraph queries. - Backstrom et al. WWW07, Hay et al. UMass TR07
- edge randomization
- Random Add/Del
- Random Switch
- K-anonymity
- Hay et al. VLDB08, LiuTerzi SIGMOD08, ZhouPei
ICDE08 - Utility preserving randomization
- Spectral feature preserving YingWu SDM08
- Real space feature preserving YingWu SDM09
3Problem Formalization
Add k then del k edges
Prior belief
vs. Posterior belief
YingWu SDM08
This paper
similarity measure value between node i and j
4Polbooks network
Network of US political books (105 nodes, 441
edges, r8) Books about US politics sold by
Amazon.com. Edges represent frequent
co-purchasing of books by the same buyers. Nodes
have been given colors of blue, white, or red to
indicate whether they are "liberal", "neutral",
or "conservative". http//www-personal.umich.edu
/mejn/netdata/
5Proportion of true edges vs. similarity
After randomly add/delete 200 edges (totally 441
edges)
6Similarity measures vs. Link prediction
- Similarity measures
- The number of common neighbors
- Adamic/Adar, the weighted number of common
neighbors - Katz, a weighted sum of the number of paths
connecting two nodes - Commute time, the expected steps of random walks
from node i to j and back to i. - Similarity measures have been exploited in the
classic link prediction problem.
Liben-NowellKleinberg CIKM03
7Proportion of true edges vs. similarity
After randomly add/delete 200 edges (totally 441
edges)
8Calculating Posterior belief
Applying Bayes theorem
The attacker does not know this value, what he
can do?
9MLE estimation
- Estimate based on randomized graph
Posterior belief can be calculated by attackers
10Comparison
11Comparison
12Empirical Evaluation
- Attackers Prediction Strategy
- Calculate posterior probability of all node pairs
- Choose top t node pairs (with highest post.
Prob.) as predicted candidate links
For each t, the precision of predictions (k0.5m)
13Empirical Evaluation
The posteriori beliefs with similarity measures
achieve higher precision than that without
exploiting similarity measures. One measure that
is best for one data is not necessarily best for
another data.
14Determining k to guarantee privacy
Data Owner
15Conclusion Future Work
- We have shown that node proximity measures can be
exploited by attackers to breach link privacy in
edge add/del randomized networks
- How about other topological properties?
- How about other randomization strategies?
- Privacy vs. utility tradeoff
16Thank You!
- Questions?
- Acknowledgments
- This work was supported in part by U.S. National
Science Foundation IIS-0546027 and CNS-0831204.
17Utility preserving randomization
- Graph space
- G with the given degree seq.
- Examining proportion of sample graphs with
existence a link between node i and j - YingWu,SDM09
Attackers confidence on link (i,j)