Title: Inferring Privacy Information from Social Networks
1Inferring Privacy Information from Social
Networks
- Presenter Ieng-Fat Lam
- Date 2007 / 12 / 25
2Paper to be presented
- Jianming He1, Wesley W. Chu1, and Zhenyu (Victor)
Liu2, Inferring Privacy Information from Social
Networks. Lecture Notes in Computer Science,
page 154-165, Springer-Verlag Berlin Heidelberg,
2006. - 1 Computer Science Department UCLA, Los Angeles,
CA 90095, USA - 2 Google Inc. USA
2
3Motivation
- Online social network services
- Become popular
- Privacy confidentiality (??) problem
- Increasingly challenging
- Urgent research issues
- In building next-generation information systems
- Existing techniques and policies such as
- Cryptography and security protocols
- Government policies
- Aim to block direct disclosure of sensitive
personal information
3
4Block Direct Disclosure
4
5Motivation (cont.)
- How about indirect disclosure?
- Can be achieved by pieces of seemingly
- Innocuous (???) information
- Unrelated information
- In Social network
- Same Dance Club
- Same Interest
- Same office
- Similar Professions
- Privacy can be indirectly disclosed
- By their social relations
5
6Indirect Disclosure
6
7Problem
- Study the privacy disclosure in social networks
- Indirect disclosure ?
- To what extent (??) ?
- Under what conditions ?
7
8The Research
- Perform privacy inference
- Map Bayesian networks to social networks
- Model the causal relations (????) among people
- Discuss factors might affect the inference
- Prior probability (????)
- Influence strength
- Society openness
- Conduct extensive (???) experiment
- On a real online social network structure
8
9Probability Rules for consistent reasoning
- Coxs Two axioms
- First
- If we specify how much we belief something is
true - We must have specified (implicitly) how much we
belief is false. - Sum rule P(XI) P(XI) 1
- Second
- If we first specify how much we believe Y is true
- State how much we believe X is true given that Y
is true - We must have specified how much we believe both X
and Y are true - Product ruleP(X,YI) P(XY, I) P(YI)
9
10Probability Rules for consistent reasoning
(cont.)
- The given condition I
- Denote relevant background information
- There is no such thing in absolute probability
- Although I is often omitted
- We must never forget its existence
10
11Bayes theorem
- From Product Rule
- P(X,YI) P(XY, I) P(YI)
P(X,YI) P(Y, XI) P(Y X, I) P(XI)
11
12Bayes theorem (cont.)
- Replace X and Y by hypothesis (??) and data
- P (hypothesis I) Prior probability (????)
- Our state of knowledge about the truth of the
hypothesis before we have analysis the current
data - P (data hypothesis, I) Conditional
Probability - Of seeing the data given that the hypothesis is
true - P (hypothesis data, I) Posterior probability
(????) - Our state of knowledge about the truth of the
hypothesis in the light of the data
12
13Bayes theorem (cont.)
- P ( data I ) Marginal Probability (????) of
data - Probability of witnessing data
- Under all mutually exclusive hypothesis
- Can be calculate as P ( data I )
- P (data hypothesis, I ) / P ( data I )
- Represent the impact that the data has on belief
in hypothesis. - Bayes theorem measure how much data should alter
a belief in a hypothesis - Use toss result to infer if a coin a fair
13
14Bayesian Networks
- A Bayesian network is
- Graph presentation of
- Joint probability distribution (physical or
Bayesian) - Over a set of variables
- Include the consideration of network structure
- It is consisted by
- A network structure
- Conditional probability tables (CPT)
14
15Bayesian Networks (cont.)
- Network structure
- Presented as Directed Acyclic Graph
(DAG)(directed graph without cycle) - Each node corresponds to a random variable
- Associated with a CPT
- Each edge indicate dependent relationship
- Between connected variables
- Capture causal relationship
- Conditional probability tables (CPT)
- Enumerates the conditional probability of node
- Quantify causal relationships
15
16Bayesian Networks (cont.)
- Detecting credit-card fraud (example only)
- We want to solve
Node
Relation (cause to effect)
CPT
16
17Bayesian Inference
- Problem Statement (indirect inference)
- It is possible to predict someones attributes
- By looking their friends type
- In the real world
- People are acquainted via all types of relations
- A personal attribute may only be sensitive to
certain types of relations - To infer peoples privacy from social relations
- Must filter out other types of relations
- Investigate in homogeneous societies (????)
17
18Bayesian Inference (cont.)
- Homogeneous societies
- Reflect small closely related groups
- Office, class or clubs
- Individuals are connected by a single type of
social relations - In this case, Friendship
- Impact of every person on friends is the same
18
19Bayesian Inference (cont.)
- To perform the inference
- Model the causal relation among people
- Infer attribute A for a person X
- 1. Construct Bayesian network from Xs social
network - 2. Analyze the Bayesian network
- For the Probability of X has attribute A
- Inference performed
- Single hop Inference
- Only involved direct friends
- Multiple Hops Inference
- Consider friends from multiple hops
19
20Single hop Inference (method)
- The Case
- We know all direct friends attribute of node X
- Define Yij as jth friend of X at i hops away.
- If a friend can be reached via more than one
route - Use shortest path, smaller i
- Let Yi be the set of Yij (1 j ni)
- Where ni is the number of Xs friends at I hops
away - For instance, Y1 Y11, Y12, ..., Y1n1
- Direct friends which are one hop away
20
21Single hop Inference (cont.)
- An example
- Y11, Y12 and Y13 are direct friends of X.
- The attribute values of Y11, Y12 and Y13 are
known (shaded nodes).
21
22Single hop Inference (cont.)
- Bayesian Network Construction
- Two assumptions
- Localization Assumption
- Consider only direct friends is sufficient
- Naive Bayesian Assumption
- Remove relationship between friends
22
23Single hop Inference (cont.)
- Localization Assumption
- Given the attribute values of Xs direct friends
Y1 - Friends at more than one hop away (i.e., Yi for i
gt 1) - Are conditionally independent of X.
- Inference only involved direct friends
- Removed Y21 and Y31
- Decide DAG linking
- No cycle -gt Obtain Bayesian Network Immediately
- Otherwise -gt Remove cycles
- Deletion of edges with the weakest relations
(approximation conversion)
23
24Single hop Inference (cont.)
- Reduction via Localization Assumption
24
25Single hop Inference (cont.)
- Naive Bayesian Assumption (For DAG)
- Given the attribute value of the query node X
- Attribute values of direct friends Y1
- Are conditionally independent of each other.
- Final DAG is obtained
- Removing connection between Y11 and Y12
25
26Single hop Inference (cont.)
- Reduction via Naive Bayesian Assumption
26
27Single hop Inference (cont.)
- Bayesian Inference
- Use the Bayes Decision Rule
- To predict the attribute of X
- For a general Bayesian network with maximum depth
i - For X, be the attribute value with the
maximum conditional probability (posterior
probability) - Given the observed attribute values of other
nodes in the network
27
28Single hop Inference (cont.)
- Bayesian Inference (cont.)
- Use only direct friends (Y1) (localization)
- P( X x Y1)
- Y1 are independent of each other (naïve bayes)
- P(Y11, Y12 X x) P(Y11 X x) P(Y12 X
x) - x and y1j are attribute of X and Y1j , (1 j
n1 , x,yij ? t, f )
28
29Single hop Inference (cont.)
- Bayesian Inference (cont.)
- Assumed homogeneous network
- CPT for each node is the same
- P(Y1j y1j X x) represent P(Y y X x)
- Posterior probability
- Depends on N1t, number of friends with attribute
t - P(X x N1t n1t) represent P(X x Y1)
- If N1t n1t, we obtain
29
30Single hop Inference (cont.)
- Bayesian Inference (cont.)
- To compute (3) we need conditional probability
- P(Y y X x)
- We apply the parameter estimation
- Substituting (4) and (3) into (1) yields x.
30
31Multiple hop Inference
- In the real world
- People may hide their information
- Localization Assumption is not applicable
- Proposed generalized localization assumption
- Generalized Localization Assumption
- Given the attribute of jth friend of X as i hops,
Yij - Attribute of X is conditionally independent of
- Descendants (??) of Yij
31
32Multiple hop Inference (cont.)
- Generalized Localization Assumption (cont.)
- If the attribute of Xs direct friend Y1j is
unknown - The attribute of X is conditionally dependent on
- Attribute for direct friends of Y1j
- Continue until we reach a Y1js descendent with
known attribute.
32
33Multiple hop Inference (cont.)
- Generalized Localization Assumption (cont.)
- Interpretation of this model
- When we predict attribute of X
- We treat him / her as egocentric person
- Influences his / her friends but not vice versa
- Attribute value of X can reflected by friends
- We still apply the Bayes Decision Rule
- Calculation of posterior probability is more
complicated - Use variable elimination
- Adopt same techniques to derive the x in (1)
33
34Experimental Study
- The performance metric we consider
- Inference accuracy
- Percentage of nodes predicted correctly by
inference - Three Characters of social network
- Prior Probability
- Influence Strength
- Society Openness
- Might affect Bayesian influence
34
35Experimental Study (cont.)
- Prior Probability
- P(X t)
- The probability of people in social network have
attribute A. - Naïve inference
- if P(X t) 0.5, we predict every query node
has value t - Otherwise has value f
- Average naive inference accuracy
- max (P(X t), 1 - P(X t)).
- We use as reference to compare Bayesian Inference
35
36Experimental Study (cont.)
- Influence Strength
- P(Y t X t)
- Conditional Probability
- Y has attribute A Given that direct friend X has
same attribute - Measures how X influence its friend Y
- Higher influence strength, higher probability
that X and Y have attribute A - Society Openness
- O(A)
- Percentage of people in society who release
attribute A
36
37Experimental Study (cont.)
- Data Set
- 66,766 profiles from Livejournal (2.6 million
active members) - 4, 031, 348 Friend relations
- Attribute assignment
- For each member, as assign a CPT
- Determine the actual attribute value
- Based on parents values and assigned CPT
- Start from set of nodes whose in-degree is 0
- Explore rest of network though friendship links
- All members assigned with same CPT
- Use different CPT to evaluate inference
performance.
37
38Experimental Study (cont.)
- After attribute assignment
- We obtain a social network
- To infer each individual
- Built a corresponding Bayesian network
- Conduct Bayesain Inference
38
39Experimental Result
- Comparison of Bayesian and Naive Inference
- Prior probability 0.3
- Influence strength 0.1 to 0.9
39
40Experimental Study (cont.)
- Effect of Influence Strength and Prior
Probability - Prior probability 0.05, 0.1, 0,3 and 0.5
- Influence strength 0.1 to 0.9
- Lowest accuracy occurs when
- Influence strength prior probability
- Knowing friend relations provide no more
information than knowing the prior probability - People are actually interdependent
- Better Bayesian Inference
- High difference between influence strength and
prior probability - Stronger influence of parent on children ( no
matter positive or negative)
40
41Experimental Result
41
42Experimental Study (cont.)
- Society Openness
- Assume the society openness is 100
- All friends attribute value are known
- Study inference at different levels of openness
- Randomly hide the attribute of a certain
percentage of members (from 10 to 90) - Setting
- Prior probability P(X t) 0.3
- Society Openness O(A) 10, 50 and 90
42
43Experimental Result
43
44Experimental Result
- Inference Accuracy
- Decrease when more attributes are hidden
- But is relatively small
- Generally should drop drastically
- Discuss on Society Openness
44
45Discussions on Society Openness
- Single Hop Inference
- The Bayesian network is two-level tree
- Derive the variation of posterior probability
- Due to change of openness
- N1t and N1t is number of friends have attribute
t - Before and after hiding h friends
- Hiding is the same as remove in inference
- ?P(X t N1t n1t,N1t n1t)
- Result
- 70 to 90 of the cases, the variation is less
then 0.1 - Unlikely to be varied greatly due to hiding nodes
randomly
45
46Discussions on Society Openness (cont.)
46
47Discussions on Society Openness (cont.)
- Multiple Hop Inference
- Use complete k-ary trees
- All internal nodes have k children
- Hide a node with all of its ancestors (??)
- Check the variation of posterior probability
- Number of children k
- Maximum depth of hidden nodes d
- Prior probability 0.3
- Influence strength 0.7
- Result
- When k 1, posterior probability varies
significant by more hidden nodes - When k gt 1, posterior probability does not varies
very much
47
48Discussions on Society Openness (cont.)
- Multiple Hop Inference (cont.)
- k 2 and d 2 (Y11 and Y21 are hidden)
48
49Discussions on Society Openness (cont.)
49
50Conclusions
- Privacy may be indirectly released via social
relations - Inference accuracy of privacy information
- Closely related to the inference strength between
friends - Even in a society where people hide their
attributes, privacy still could be inferred from
Bayesian inference. - Protect Privacy
- Hide friendship relation
- Or ask friends to hide attributes
50
51Thank you!
- References
- 1 David Heckerman, A Tutorial on Learning With
Bayesian Networks. Microsoft Research, Advanced
Technology Division, 1996 - 2 D.S. Sivia, Data Analysis A Bayesian
Tutorial. Oxford University Press, 1996. - Questions?
51