Inferring Privacy Information from Social Networks - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Inferring Privacy Information from Social Networks

Description:

Lecture Notes in Computer Science, page 154-165, Springer-Verlag ... Innocuous (???) information. Unrelated information. In Social network. Same Dance Club ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 52
Provided by: kin85
Category:

less

Transcript and Presenter's Notes

Title: Inferring Privacy Information from Social Networks


1
Inferring Privacy Information from Social
Networks
  • Presenter Ieng-Fat Lam
  • Date 2007 / 12 / 25

2
Paper to be presented
  • Jianming He1, Wesley W. Chu1, and Zhenyu (Victor)
    Liu2, Inferring Privacy Information from Social
    Networks. Lecture Notes in Computer Science,
    page 154-165, Springer-Verlag Berlin Heidelberg,
    2006.
  • 1 Computer Science Department UCLA, Los Angeles,
    CA 90095, USA
  • 2 Google Inc. USA

2
3
Motivation
  • Online social network services
  • Become popular
  • Privacy confidentiality (??) problem
  • Increasingly challenging
  • Urgent research issues
  • In building next-generation information systems
  • Existing techniques and policies such as
  • Cryptography and security protocols
  • Government policies
  • Aim to block direct disclosure of sensitive
    personal information

3
4
Block Direct Disclosure
4
5
Motivation (cont.)
  • How about indirect disclosure?
  • Can be achieved by pieces of seemingly
  • Innocuous (???) information
  • Unrelated information
  • In Social network
  • Same Dance Club
  • Same Interest
  • Same office
  • Similar Professions
  • Privacy can be indirectly disclosed
  • By their social relations

5
6
Indirect Disclosure
6
7
Problem
  • Study the privacy disclosure in social networks
  • Indirect disclosure ?
  • To what extent (??) ?
  • Under what conditions ?

7
8
The Research
  • Perform privacy inference
  • Map Bayesian networks to social networks
  • Model the causal relations (????) among people
  • Discuss factors might affect the inference
  • Prior probability (????)
  • Influence strength
  • Society openness
  • Conduct extensive (???) experiment
  • On a real online social network structure

8
9
Probability Rules for consistent reasoning
  • Coxs Two axioms
  • First
  • If we specify how much we belief something is
    true
  • We must have specified (implicitly) how much we
    belief is false.
  • Sum rule P(XI) P(XI) 1
  • Second
  • If we first specify how much we believe Y is true
  • State how much we believe X is true given that Y
    is true
  • We must have specified how much we believe both X
    and Y are true
  • Product ruleP(X,YI) P(XY, I) P(YI)

9
10
Probability Rules for consistent reasoning
(cont.)
  • The given condition I
  • Denote relevant background information
  • There is no such thing in absolute probability
  • Although I is often omitted
  • We must never forget its existence

10
11
Bayes theorem
  • From Product Rule
  • P(X,YI) P(XY, I) P(YI)

P(X,YI) P(Y, XI) P(Y X, I) P(XI)
11
12
Bayes theorem (cont.)
  • Replace X and Y by hypothesis (??) and data
  • P (hypothesis I) Prior probability (????)
  • Our state of knowledge about the truth of the
    hypothesis before we have analysis the current
    data
  • P (data hypothesis, I) Conditional
    Probability
  • Of seeing the data given that the hypothesis is
    true
  • P (hypothesis data, I) Posterior probability
    (????)
  • Our state of knowledge about the truth of the
    hypothesis in the light of the data

12
13
Bayes theorem (cont.)
  • P ( data I ) Marginal Probability (????) of
    data
  • Probability of witnessing data
  • Under all mutually exclusive hypothesis
  • Can be calculate as P ( data I )
  • P (data hypothesis, I ) / P ( data I )
  • Represent the impact that the data has on belief
    in hypothesis.
  • Bayes theorem measure how much data should alter
    a belief in a hypothesis
  • Use toss result to infer if a coin a fair

13
14
Bayesian Networks
  • A Bayesian network is
  • Graph presentation of
  • Joint probability distribution (physical or
    Bayesian)
  • Over a set of variables
  • Include the consideration of network structure
  • It is consisted by
  • A network structure
  • Conditional probability tables (CPT)

14
15
Bayesian Networks (cont.)
  • Network structure
  • Presented as Directed Acyclic Graph
    (DAG)(directed graph without cycle)
  • Each node corresponds to a random variable
  • Associated with a CPT
  • Each edge indicate dependent relationship
  • Between connected variables
  • Capture causal relationship
  • Conditional probability tables (CPT)
  • Enumerates the conditional probability of node
  • Quantify causal relationships

15
16
Bayesian Networks (cont.)
  • Detecting credit-card fraud (example only)
  • We want to solve

Node
Relation (cause to effect)
CPT
16
17
Bayesian Inference
  • Problem Statement (indirect inference)
  • It is possible to predict someones attributes
  • By looking their friends type
  • In the real world
  • People are acquainted via all types of relations
  • A personal attribute may only be sensitive to
    certain types of relations
  • To infer peoples privacy from social relations
  • Must filter out other types of relations
  • Investigate in homogeneous societies (????)

17
18
Bayesian Inference (cont.)
  • Homogeneous societies
  • Reflect small closely related groups
  • Office, class or clubs
  • Individuals are connected by a single type of
    social relations
  • In this case, Friendship
  • Impact of every person on friends is the same

18
19
Bayesian Inference (cont.)
  • To perform the inference
  • Model the causal relation among people
  • Infer attribute A for a person X
  • 1. Construct Bayesian network from Xs social
    network
  • 2. Analyze the Bayesian network
  • For the Probability of X has attribute A
  • Inference performed
  • Single hop Inference
  • Only involved direct friends
  • Multiple Hops Inference
  • Consider friends from multiple hops

19
20
Single hop Inference (method)
  • The Case
  • We know all direct friends attribute of node X
  • Define Yij as jth friend of X at i hops away.
  • If a friend can be reached via more than one
    route
  • Use shortest path, smaller i
  • Let Yi be the set of Yij (1 j ni)
  • Where ni is the number of Xs friends at I hops
    away
  • For instance, Y1 Y11, Y12, ..., Y1n1
  • Direct friends which are one hop away

20
21
Single hop Inference (cont.)
  • An example
  • Y11, Y12 and Y13 are direct friends of X.
  • The attribute values of Y11, Y12 and Y13 are
    known (shaded nodes).

21
22
Single hop Inference (cont.)
  • Bayesian Network Construction
  • Two assumptions
  • Localization Assumption
  • Consider only direct friends is sufficient
  • Naive Bayesian Assumption
  • Remove relationship between friends

22
23
Single hop Inference (cont.)
  • Localization Assumption
  • Given the attribute values of Xs direct friends
    Y1
  • Friends at more than one hop away (i.e., Yi for i
    gt 1)
  • Are conditionally independent of X.
  • Inference only involved direct friends
  • Removed Y21 and Y31
  • Decide DAG linking
  • No cycle -gt Obtain Bayesian Network Immediately
  • Otherwise -gt Remove cycles
  • Deletion of edges with the weakest relations
    (approximation conversion)

23
24
Single hop Inference (cont.)
  • Reduction via Localization Assumption

24
25
Single hop Inference (cont.)
  • Naive Bayesian Assumption (For DAG)
  • Given the attribute value of the query node X
  • Attribute values of direct friends Y1
  • Are conditionally independent of each other.
  • Final DAG is obtained
  • Removing connection between Y11 and Y12

25
26
Single hop Inference (cont.)
  • Reduction via Naive Bayesian Assumption

26
27
Single hop Inference (cont.)
  • Bayesian Inference
  • Use the Bayes Decision Rule
  • To predict the attribute of X
  • For a general Bayesian network with maximum depth
    i
  • For X, be the attribute value with the
    maximum conditional probability (posterior
    probability)
  • Given the observed attribute values of other
    nodes in the network

27
28
Single hop Inference (cont.)
  • Bayesian Inference (cont.)
  • Use only direct friends (Y1) (localization)
  • P( X x Y1)
  • Y1 are independent of each other (naïve bayes)
  • P(Y11, Y12 X x) P(Y11 X x) P(Y12 X
    x)
  • x and y1j are attribute of X and Y1j , (1 j
    n1 , x,yij ? t, f )

28
29
Single hop Inference (cont.)
  • Bayesian Inference (cont.)
  • Assumed homogeneous network
  • CPT for each node is the same
  • P(Y1j y1j X x) represent P(Y y X x)
  • Posterior probability
  • Depends on N1t, number of friends with attribute
    t
  • P(X x N1t n1t) represent P(X x Y1)
  • If N1t n1t, we obtain

29
30
Single hop Inference (cont.)
  • Bayesian Inference (cont.)
  • To compute (3) we need conditional probability
  • P(Y y X x)
  • We apply the parameter estimation
  • Substituting (4) and (3) into (1) yields x.

30
31
Multiple hop Inference
  • In the real world
  • People may hide their information
  • Localization Assumption is not applicable
  • Proposed generalized localization assumption
  • Generalized Localization Assumption
  • Given the attribute of jth friend of X as i hops,
    Yij
  • Attribute of X is conditionally independent of
  • Descendants (??) of Yij

31
32
Multiple hop Inference (cont.)
  • Generalized Localization Assumption (cont.)
  • If the attribute of Xs direct friend Y1j is
    unknown
  • The attribute of X is conditionally dependent on
  • Attribute for direct friends of Y1j
  • Continue until we reach a Y1js descendent with
    known attribute.

32
33
Multiple hop Inference (cont.)
  • Generalized Localization Assumption (cont.)
  • Interpretation of this model
  • When we predict attribute of X
  • We treat him / her as egocentric person
  • Influences his / her friends but not vice versa
  • Attribute value of X can reflected by friends
  • We still apply the Bayes Decision Rule
  • Calculation of posterior probability is more
    complicated
  • Use variable elimination
  • Adopt same techniques to derive the x in (1)

33
34
Experimental Study
  • The performance metric we consider
  • Inference accuracy
  • Percentage of nodes predicted correctly by
    inference
  • Three Characters of social network
  • Prior Probability
  • Influence Strength
  • Society Openness
  • Might affect Bayesian influence

34
35
Experimental Study (cont.)
  • Prior Probability
  • P(X t)
  • The probability of people in social network have
    attribute A.
  • Naïve inference
  • if P(X t) 0.5, we predict every query node
    has value t
  • Otherwise has value f
  • Average naive inference accuracy
  • max (P(X t), 1 - P(X t)).
  • We use as reference to compare Bayesian Inference

35
36
Experimental Study (cont.)
  • Influence Strength
  • P(Y t X t)
  • Conditional Probability
  • Y has attribute A Given that direct friend X has
    same attribute
  • Measures how X influence its friend Y
  • Higher influence strength, higher probability
    that X and Y have attribute A
  • Society Openness
  • O(A)
  • Percentage of people in society who release
    attribute A

36
37
Experimental Study (cont.)
  • Data Set
  • 66,766 profiles from Livejournal (2.6 million
    active members)
  • 4, 031, 348 Friend relations
  • Attribute assignment
  • For each member, as assign a CPT
  • Determine the actual attribute value
  • Based on parents values and assigned CPT
  • Start from set of nodes whose in-degree is 0
  • Explore rest of network though friendship links
  • All members assigned with same CPT
  • Use different CPT to evaluate inference
    performance.

37
38
Experimental Study (cont.)
  • After attribute assignment
  • We obtain a social network
  • To infer each individual
  • Built a corresponding Bayesian network
  • Conduct Bayesain Inference

38
39
Experimental Result
  • Comparison of Bayesian and Naive Inference
  • Prior probability 0.3
  • Influence strength 0.1 to 0.9

39
40
Experimental Study (cont.)
  • Effect of Influence Strength and Prior
    Probability
  • Prior probability 0.05, 0.1, 0,3 and 0.5
  • Influence strength 0.1 to 0.9
  • Lowest accuracy occurs when
  • Influence strength prior probability
  • Knowing friend relations provide no more
    information than knowing the prior probability
  • People are actually interdependent
  • Better Bayesian Inference
  • High difference between influence strength and
    prior probability
  • Stronger influence of parent on children ( no
    matter positive or negative)

40
41
Experimental Result
41
42
Experimental Study (cont.)
  • Society Openness
  • Assume the society openness is 100
  • All friends attribute value are known
  • Study inference at different levels of openness
  • Randomly hide the attribute of a certain
    percentage of members (from 10 to 90)
  • Setting
  • Prior probability P(X t) 0.3
  • Society Openness O(A) 10, 50 and 90

42
43
Experimental Result
43
44
Experimental Result
  • Inference Accuracy
  • Decrease when more attributes are hidden
  • But is relatively small
  • Generally should drop drastically
  • Discuss on Society Openness

44
45
Discussions on Society Openness
  • Single Hop Inference
  • The Bayesian network is two-level tree
  • Derive the variation of posterior probability
  • Due to change of openness
  • N1t and N1t is number of friends have attribute
    t
  • Before and after hiding h friends
  • Hiding is the same as remove in inference
  • ?P(X t N1t n1t,N1t n1t)
  • Result
  • 70 to 90 of the cases, the variation is less
    then 0.1
  • Unlikely to be varied greatly due to hiding nodes
    randomly

45
46
Discussions on Society Openness (cont.)
46
47
Discussions on Society Openness (cont.)
  • Multiple Hop Inference
  • Use complete k-ary trees
  • All internal nodes have k children
  • Hide a node with all of its ancestors (??)
  • Check the variation of posterior probability
  • Number of children k
  • Maximum depth of hidden nodes d
  • Prior probability 0.3
  • Influence strength 0.7
  • Result
  • When k 1, posterior probability varies
    significant by more hidden nodes
  • When k gt 1, posterior probability does not varies
    very much

47
48
Discussions on Society Openness (cont.)
  • Multiple Hop Inference (cont.)
  • k 2 and d 2 (Y11 and Y21 are hidden)

48
49
Discussions on Society Openness (cont.)
49
50
Conclusions
  • Privacy may be indirectly released via social
    relations
  • Inference accuracy of privacy information
  • Closely related to the inference strength between
    friends
  • Even in a society where people hide their
    attributes, privacy still could be inferred from
    Bayesian inference.
  • Protect Privacy
  • Hide friendship relation
  • Or ask friends to hide attributes

50
51
Thank you!
  • References
  • 1 David Heckerman, A Tutorial on Learning With
    Bayesian Networks. Microsoft Research, Advanced
    Technology Division, 1996
  • 2 D.S. Sivia, Data Analysis A Bayesian
    Tutorial. Oxford University Press, 1996.
  • Questions?

51
Write a Comment
User Comments (0)
About PowerShow.com