Inferring Privacy Information from Social Networks - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

Inferring Privacy Information from Social Networks

Description:

Lecture Notes in Computer Science, page 154-165, Springer-Verlag ... Innocuous (???) information. Unrelated information. In Social network. Same Dance Club ... – PowerPoint PPT presentation

Number of Views:185

Avg rating:3.0/5.0

Slides: 52

Provided by: kin85

Category:

more less

Transcript and Presenter's Notes

Title: Inferring Privacy Information from Social Networks

1
Inferring Privacy Information from Social
Networks

Presenter Ieng-Fat Lam
Date 2007 / 12 / 25

2
Paper to be presented

Jianming He1, Wesley W. Chu1, and Zhenyu (Victor)
Liu2, Inferring Privacy Information from Social
Networks. Lecture Notes in Computer Science,
page 154-165, Springer-Verlag Berlin Heidelberg,
2006.
1 Computer Science Department UCLA, Los Angeles,
CA 90095, USA
2 Google Inc. USA

2
3
Motivation

Online social network services
Become popular
Privacy confidentiality (??) problem
Increasingly challenging
Urgent research issues
In building next-generation information systems
Existing techniques and policies such as
Cryptography and security protocols
Government policies
Aim to block direct disclosure of sensitive
personal information

3
4
Block Direct Disclosure
4
5
Motivation (cont.)

How about indirect disclosure?
Can be achieved by pieces of seemingly
Innocuous (???) information
Unrelated information
In Social network
Same Dance Club
Same Interest
Same office
Similar Professions
Privacy can be indirectly disclosed
By their social relations

5
6
Indirect Disclosure
6
7
Problem

Study the privacy disclosure in social networks
Indirect disclosure ?
To what extent (??) ?
Under what conditions ?

7
8
The Research

Perform privacy inference
Map Bayesian networks to social networks
Model the causal relations (????) among people
Discuss factors might affect the inference
Prior probability (????)
Influence strength
Society openness
Conduct extensive (???) experiment
On a real online social network structure

8
9
Probability Rules for consistent reasoning

Coxs Two axioms
First
If we specify how much we belief something is
true
We must have specified (implicitly) how much we
belief is false.
Sum rule P(XI) P(XI) 1
Second
If we first specify how much we believe Y is true
State how much we believe X is true given that Y
is true
We must have specified how much we believe both X
and Y are true
Product ruleP(X,YI) P(XY, I) P(YI)

9
10
Probability Rules for consistent reasoning
(cont.)

The given condition I
Denote relevant background information
There is no such thing in absolute probability
Although I is often omitted
We must never forget its existence

10
11
Bayes theorem

From Product Rule
P(X,YI) P(XY, I) P(YI)

P(X,YI) P(Y, XI) P(Y X, I) P(XI)
11
12
Bayes theorem (cont.)

Replace X and Y by hypothesis (??) and data
P (hypothesis I) Prior probability (????)
Our state of knowledge about the truth of the
hypothesis before we have analysis the current
data
P (data hypothesis, I) Conditional
Probability
Of seeing the data given that the hypothesis is
true
P (hypothesis data, I) Posterior probability
(????)
Our state of knowledge about the truth of the
hypothesis in the light of the data

12
13
Bayes theorem (cont.)

P ( data I ) Marginal Probability (????) of
data
Probability of witnessing data
Under all mutually exclusive hypothesis
Can be calculate as P ( data I )
P (data hypothesis, I ) / P ( data I )
Represent the impact that the data has on belief
in hypothesis.
Bayes theorem measure how much data should alter
a belief in a hypothesis
Use toss result to infer if a coin a fair

13
14
Bayesian Networks

A Bayesian network is
Graph presentation of
Joint probability distribution (physical or
Bayesian)
Over a set of variables
Include the consideration of network structure
It is consisted by
A network structure
Conditional probability tables (CPT)

14
15
Bayesian Networks (cont.)

Network structure
Presented as Directed Acyclic Graph
(DAG)(directed graph without cycle)
Each node corresponds to a random variable
Associated with a CPT
Each edge indicate dependent relationship
Between connected variables
Capture causal relationship
Conditional probability tables (CPT)
Enumerates the conditional probability of node
Quantify causal relationships

15
16
Bayesian Networks (cont.)

Detecting credit-card fraud (example only)
We want to solve

Node
Relation (cause to effect)
CPT
16
17
Bayesian Inference

Problem Statement (indirect inference)
It is possible to predict someones attributes
By looking their friends type
In the real world
People are acquainted via all types of relations
A personal attribute may only be sensitive to
certain types of relations
To infer peoples privacy from social relations
Must filter out other types of relations
Investigate in homogeneous societies (????)

17
18
Bayesian Inference (cont.)

Homogeneous societies
Reflect small closely related groups
Office, class or clubs
Individuals are connected by a single type of
social relations
In this case, Friendship
Impact of every person on friends is the same

18
19
Bayesian Inference (cont.)

To perform the inference
Model the causal relation among people
Infer attribute A for a person X
1. Construct Bayesian network from Xs social
network
2. Analyze the Bayesian network
For the Probability of X has attribute A
Inference performed
Single hop Inference
Only involved direct friends
Multiple Hops Inference
Consider friends from multiple hops

19
20
Single hop Inference (method)

The Case
We know all direct friends attribute of node X
Define Yij as jth friend of X at i hops away.
If a friend can be reached via more than one
route
Use shortest path, smaller i
Let Yi be the set of Yij (1 j ni)
Where ni is the number of Xs friends at I hops
away
For instance, Y1 Y11, Y12, ..., Y1n1
Direct friends which are one hop away

20
21
Single hop Inference (cont.)

An example
Y11, Y12 and Y13 are direct friends of X.
The attribute values of Y11, Y12 and Y13 are
known (shaded nodes).

21
22
Single hop Inference (cont.)

Bayesian Network Construction
Two assumptions
Localization Assumption
Consider only direct friends is sufficient
Naive Bayesian Assumption
Remove relationship between friends

22
23
Single hop Inference (cont.)

Localization Assumption
Given the attribute values of Xs direct friends
Y1
Friends at more than one hop away (i.e., Yi for i
gt 1)
Are conditionally independent of X.
Inference only involved direct friends
Removed Y21 and Y31
Decide DAG linking
No cycle -gt Obtain Bayesian Network Immediately
Otherwise -gt Remove cycles
Deletion of edges with the weakest relations
(approximation conversion)

23
24
Single hop Inference (cont.)

Reduction via Localization Assumption

24
25
Single hop Inference (cont.)

Naive Bayesian Assumption (For DAG)
Given the attribute value of the query node X
Attribute values of direct friends Y1
Are conditionally independent of each other.
Final DAG is obtained
Removing connection between Y11 and Y12

25
26
Single hop Inference (cont.)

Reduction via Naive Bayesian Assumption

26
27
Single hop Inference (cont.)

Bayesian Inference
Use the Bayes Decision Rule
To predict the attribute of X
For a general Bayesian network with maximum depth
i
For X, be the attribute value with the
maximum conditional probability (posterior
probability)
Given the observed attribute values of other
nodes in the network

27
28
Single hop Inference (cont.)

Bayesian Inference (cont.)
Use only direct friends (Y1) (localization)
P( X x Y1)
Y1 are independent of each other (naïve bayes)
P(Y11, Y12 X x) P(Y11 X x) P(Y12 X
x)
x and y1j are attribute of X and Y1j , (1 j
n1 , x,yij ? t, f )

28
29
Single hop Inference (cont.)

Bayesian Inference (cont.)
Assumed homogeneous network
CPT for each node is the same
P(Y1j y1j X x) represent P(Y y X x)
Posterior probability
Depends on N1t, number of friends with attribute
t
P(X x N1t n1t) represent P(X x Y1)
If N1t n1t, we obtain

29
30
Single hop Inference (cont.)

Bayesian Inference (cont.)
To compute (3) we need conditional probability
P(Y y X x)
We apply the parameter estimation
Substituting (4) and (3) into (1) yields x.

30
31
Multiple hop Inference

In the real world
People may hide their information
Localization Assumption is not applicable
Proposed generalized localization assumption
Generalized Localization Assumption
Given the attribute of jth friend of X as i hops,
Yij
Attribute of X is conditionally independent of
Descendants (??) of Yij

31
32
Multiple hop Inference (cont.)

Generalized Localization Assumption (cont.)
If the attribute of Xs direct friend Y1j is
unknown
The attribute of X is conditionally dependent on
Attribute for direct friends of Y1j
Continue until we reach a Y1js descendent with
known attribute.

32
33
Multiple hop Inference (cont.)

Generalized Localization Assumption (cont.)
Interpretation of this model
When we predict attribute of X
We treat him / her as egocentric person
Influences his / her friends but not vice versa
Attribute value of X can reflected by friends
We still apply the Bayes Decision Rule
Calculation of posterior probability is more
complicated
Use variable elimination
Adopt same techniques to derive the x in (1)

33
34
Experimental Study

The performance metric we consider
Inference accuracy
Percentage of nodes predicted correctly by
inference
Three Characters of social network
Prior Probability
Influence Strength
Society Openness
Might affect Bayesian influence

34
35
Experimental Study (cont.)

Prior Probability
P(X t)
The probability of people in social network have
attribute A.
Naïve inference
if P(X t) 0.5, we predict every query node
has value t
Otherwise has value f
Average naive inference accuracy
max (P(X t), 1 - P(X t)).
We use as reference to compare Bayesian Inference

35
36
Experimental Study (cont.)

Influence Strength
P(Y t X t)
Conditional Probability
Y has attribute A Given that direct friend X has
same attribute
Measures how X influence its friend Y
Higher influence strength, higher probability
that X and Y have attribute A
Society Openness
O(A)
Percentage of people in society who release
attribute A

36
37
Experimental Study (cont.)

Data Set
66,766 profiles from Livejournal (2.6 million
active members)
4, 031, 348 Friend relations
Attribute assignment
For each member, as assign a CPT
Determine the actual attribute value
Based on parents values and assigned CPT
Start from set of nodes whose in-degree is 0
Explore rest of network though friendship links
All members assigned with same CPT
Use different CPT to evaluate inference
performance.

37
38
Experimental Study (cont.)

After attribute assignment
We obtain a social network
To infer each individual
Built a corresponding Bayesian network
Conduct Bayesain Inference

38
39
Experimental Result

Comparison of Bayesian and Naive Inference
Prior probability 0.3
Influence strength 0.1 to 0.9

39
40
Experimental Study (cont.)

Effect of Influence Strength and Prior
Probability
Prior probability 0.05, 0.1, 0,3 and 0.5
Influence strength 0.1 to 0.9
Lowest accuracy occurs when
Influence strength prior probability
Knowing friend relations provide no more
information than knowing the prior probability
People are actually interdependent
Better Bayesian Inference
High difference between influence strength and
prior probability
Stronger influence of parent on children ( no
matter positive or negative)

40
41
Experimental Result
41
42
Experimental Study (cont.)

Society Openness
Assume the society openness is 100
All friends attribute value are known
Study inference at different levels of openness
Randomly hide the attribute of a certain
percentage of members (from 10 to 90)
Setting
Prior probability P(X t) 0.3
Society Openness O(A) 10, 50 and 90

42
43
Experimental Result
43
44
Experimental Result

Inference Accuracy
Decrease when more attributes are hidden
But is relatively small
Generally should drop drastically
Discuss on Society Openness

44
45
Discussions on Society Openness

Single Hop Inference
The Bayesian network is two-level tree
Derive the variation of posterior probability
Due to change of openness
N1t and N1t is number of friends have attribute
t
Before and after hiding h friends
Hiding is the same as remove in inference
?P(X t N1t n1t,N1t n1t)
Result
70 to 90 of the cases, the variation is less
then 0.1
Unlikely to be varied greatly due to hiding nodes
randomly

45
46
Discussions on Society Openness (cont.)
46
47
Discussions on Society Openness (cont.)

Multiple Hop Inference
Use complete k-ary trees
All internal nodes have k children
Hide a node with all of its ancestors (??)
Check the variation of posterior probability
Number of children k
Maximum depth of hidden nodes d
Prior probability 0.3
Influence strength 0.7
Result
When k 1, posterior probability varies
significant by more hidden nodes
When k gt 1, posterior probability does not varies
very much

47
48
Discussions on Society Openness (cont.)

Multiple Hop Inference (cont.)
k 2 and d 2 (Y11 and Y21 are hidden)

48
49
Discussions on Society Openness (cont.)
49
50
Conclusions

Privacy may be indirectly released via social
relations
Inference accuracy of privacy information
Closely related to the inference strength between
friends
Even in a society where people hide their
attributes, privacy still could be inferred from
Bayesian inference.
Protect Privacy
Hide friendship relation
Or ask friends to hide attributes

50
51
Thank you!

References
1 David Heckerman, A Tutorial on Learning With
Bayesian Networks. Microsoft Research, Advanced
Technology Division, 1996
2 D.S. Sivia, Data Analysis A Bayesian
Tutorial. Oxford University Press, 1996.
Questions?

Write a Comment

User Comments (0)