Anonymized social networks - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Anonymized social networks

Description:

Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Stenography * * * * * A social network occurs anywhere there is social ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 26
Provided by: Kan116
Category:

less

Transcript and Presenter's Notes

Title: Anonymized social networks


1
Anonymized social networks
  • Wherefore Art Thou R3579X? Anonymized Social
    Networks, Hidden Patterns, and Structural
    Stenography

2
What is a social network?
  • A social network occurs anywhere there is social
    interaction between people.
  • Examples include Email, instant messaging,
    Facebook, blogging trackbacks, coauthor networks

3
Coauthor Network
4
Uses of mining social networks
  • The structure of social networks can be
    interesting

How are friendships usually structured? Are there
hubs, such as Heather, who connect separate
networks? How many degrees of Kevin Bacon? We
can investigate these questions if we have the
data to mine.
5
Email
  • For our examples, we will use a network of emails
    sent between users.
  • How do we protect users privacy while still
    releasing the data for research?

John
Mary
Vertex
Vertex
Directed edge
6
Anonymization Techniques
  • Remove any identifiable information, such as name
    and other attributes.
  • Randomly rename the vertices

R3579X
R73313
7
Anonymization Techniques
  • Convert directed edges to undirected edges. This
    increases the complexity and makes it harder to
    attack.

R3579X
R73313
Undirected edge
8
Compromising privacy
  • Lets say you want to know if two vertices are
    connected onthe graph.
  • All the identifying info has beenremoved, so how
    do we do it?

9
Active Attacks!
  • An active attack involves the adversary creating
    vertices in the graph before the graph is
    released
  • The adversary will create edges between the
    vertices in a fashion that it can then recognize
    later on in when the graph is released

10
Walk-Based Attack
  • We create k new vertices around 2(log n) where n
    is the total number of vertices
  • We create new do d1 edges between these new
    vertices and the other ones in the graph
  • Then, we randomly create edges between these new
    nodes with independent probability of 1/2

11
Algorithm
  • Given the graph, how do we find the subgraph that
    we created?
  • Create a search tree, pruning the tree based on
    the properties of our subgraph, such as the
    number of degrees of our new vertices

12
Are Mary and John connected?
John
Mike
Mary
Zoe
Tom
13
Are Mary and John connected?
John
Mike
k1
k5
k2
Mary
k4
k3
Zoe
Tom
14
Are Mary and John connected?
John
Mike
k1
k5
k2
Mary
k4
k3
Zoe
Tom
15
Are Mary and John connected?
John
Mike
k1
k5
k2
Mary
k4
k3
Zoe
Tom
16
Graph is released
ZXCV
ASDF
WER
DFG
UYT
QWER
ASD
HGF
BNM
JKL
17
We identify our subgraph
ZXCV
ASDF
k1
k5
k2
QWER
k4
k3
BNM
JKL
18
Yes, theyre connected
John
ASDF
k1
k5
k2
Mary
k4
k3
BNM
JKL
19
Analysis
  • The paper proves that the search tree does not
    grow too large and that the algorithm displays
    good performance
  • Also, it proves that the subgraph is unique so
    that we dont identify the wrong subgraph

20
Experimental attack
  • They simulate an attack on LiveJournal friendship
    links. They create the accounts on the website,
    make the connections, and then crawl the site and
    anonymize the data
  • The network has 4.4 million nodes and 77 million
    edges

21
Results
22
Cut-Based Attack
  • Only needs sqrt(log(n)) new nodes to attack the
    graph
  • However, its much more computationally intensive
    and less practical in the real world, although it
    takes less nodes

23
Cut-based Results
24
Passive Attack
  • Its a lot like an active attack, except you
    dont create new nodes, instead you collaborate
    with your friends and find yourselves in the
    graph
  • However, because you did not specifically target
    certain people, you may not be able to identify
    other people when you find yourself

25
Conclusion
  • We cannot rely on anonymization to ensure privacy
    in social networks
  • Possible improvements add noise to the data by
    adding/removing random edges
Write a Comment
User Comments (0)
About PowerShow.com