Privacy Preserving Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Privacy Preserving Data Mining

Description:

Secure Multiparty Computation Multi-round protocols Li Xiong CS573 Data Privacy and Security – PowerPoint PPT presentation

Number of Views:272
Avg rating:3.0/5.0
Slides: 25
Provided by: Yehu61
Category:

less

Transcript and Presenter's Notes

Title: Privacy Preserving Data Mining


1
Secure Multiparty Computation Multi-round
protocols
Li Xiong CS573 Data Privacy and Security
2
Secure multiparty computation
  • General circuit based secure multiparty
    computation methods
  • Specialized secure multiparty computation
    protocols
  • Decision tree mining across horizontally
    partitioned data
  • Secure sum, secure union
  • Association rule mining across horizontally
    partitioned data
  • Most of them rely on cryptographic primitives and
    are still expensive
  • Multi-round protocols as an alternative

3
Multi-round protocols
  • Max/min, top k
  • k-th element protocol using secure comparison
    (Aggarwal 04)
  • Multi-round probabilistic protocols (Xiong 07)
  • OR (Union)
  • Commutative encryption based
  • Multi-round probabilistic protocols (Bawa 03)

Secure computation of the k-ranked element,
Aggarwal, 2004 Preserving Privacy for outsourcing
aggregation services, Xiong, 2007 Privacy
Preserving Indexing of Documents on the Network,
Bawa, 2003
4
K-th element (Aggarwal 04)
  • Input
  • Di, i 1, 2, , s
  • k
  • Range of data values alpha, beta.
  • Size of the union of database n
  • Output
  • The k-th ranked elements in the union of Di

Secure computation of the k-ranked element,
Aggarwal, 2004
5
kth element protocol
  • Initialize
  • Each party ranks its elements in ascending order.
    Initialize current range a,b to alpha, beta,
    set n sum Di
  • Repeat until done
  • Set m (ab)/2
  • Each party computes li, number of elements less
    than m, and gi, number of elements greater than m
  • If sum(li) lt k-1 and sum(gi) lt n-k, done
  • If sum(li) gtk, set b m-1, output 0
  • If sum(gi) gt n-k1, set a m1, output 1

6
Cost
  • Number of rounds logM where M is the range size
  • Each round requires two secure sums and two
    secure comparisons

7
Multi-round protocols
  • Can we get away from cryptographic primitives?
  • Multi-round protocols idea
  • Use randomizations (random response)
  • Utilize inherent network anonymity of multiple
    nodes
  • Multi-round protocols
  • May not be completely secure
  • May not be completely accurate

8
Multi-round protocols
  • Multi-round probabilistic protocols for max/min
    and top k (Xiong 07)
  • Multi-round OR (union) protocol (Aggarwal 04)

9
Protocol Structure
  • Random response (Warner 1965)
  • Multi-round randomized protocol
  • Randomized local computation
  • Multi-node anonymity
  • Assumption semi-honest model

Preserving Privacy for outsourcing aggregation
services, Xiong, 2007
10
A Naïve Max/Min Protocol
gi-1gtvi gi-1ltvi
gi gi-1 vi
  • Add in randomization how, when, and how much?

11
Max Protocol Random response
  • Random response at node i

gi-1gtvi gi-1(r)ltvi
gi(r) gi-1(r) w/ prob Pr random number w/ prob 1-Pr vi
12
Max Protocol multi-round random response
  • Multiple rounds
  • Randomization Probability at round r
  • Pr(r)
  • Local algorithm at round r and node i

gi-1(r)gtvi gi-1(r)ltvi
gi(r) gi-1(r) w/ prob Pr rand gi-1(r), vi) w/ prob 1-Pr vi
13
Max Protocol - Illustration
Start
18
35
32
0
D2
D2
30
10
32
40
35
18
35
32
20
40
D3
D4
32
40
35
14
PrivateTopK Protocol
Gi(r)topk(Gi-1(r) U Vi) Vi Gi(r)
Gi-1(r) m Vi if m0 then Gi(r)
Gi-1(r) else with probability 1-Pr(r)
Gi(r) Gi-1(r) with probability
Pr Gi(r)1k-m Gi-1(r)1k-m
Gi(r)k-m1k a sorted list of m
random values generated from
min(Gi(r)k-delta,Gi-1(r-1)k-m1),
Gi(r)k) end
15
Min/Max Protocol - Correctness
  • Precision bound
  • Converges with r
  • Smaller p0 and d provides faster convergence

16
Min/Max Protocol - Cost
  • Communication cost
  • single round O(n)
  • Minimum of rounds given
  • precision guarantee (1-e)

17
Min/Max Protocol - Security
  • Probability/confidence based metric P(CIR,R)
  • Different types of exposures based on claim
  • Data value via
  • Data ownership Vi contains a
  • Loss of Privacy (LoP) P(CIR,R) P(CR)
  • Information entropy based metric
  • Loss of privacy as a measure of randomness of
    information H(DR) - H(DIR,R)

18
Min/Max Protocol Security (Analysis)
  • Upper bound for average expected LoP
  • max r 1/2r-1 (1-P0dr-1)
  • Larger p0 and d provides better privacy

19
Min/Max Protocol Security (Experiments)
  • Loss of privacy decreases with increasing number
    of nodes
  • Probabilistic protocol achieves better privacy
    (close to 0)
  • When n is large, anonymous protocol is actually
    okay!

20
Union
  • Commutative encryption based approach
  • Number of rounds 2 rounds
  • Each round encryption and decryption
  • Multi-round random-response approach?

21
Vector
p1
p2
pc
VG
1
0
1
0
1
1

OR
OR
OR




0
0
0
  • Each database has a boolean vector of the data
    items
  • Union vector is a logical OR of all vectors

Privacy Preserving Indexing of Documents on the
Network, Bawa, 2003
22
Group Vector Protocol

Processing of VG at ps of round r
Pex1/2r, Pin1-Pex for(i1 iltL i) if
(Vsi1 and VGi0) Set VGi1 with
prob. Pin if (Vsi0 and VGi1)
Set VGi0 with prob. Pex
p2
pc
p1
r1, Pex1/2, Pin1/2
r2, Pex1/4, Pin3/4
23
Open issues
  • Tradeoff between accuracy, efficiency, and
    security
  • How to quantify security
  • How to design adjustable protocols
  • Can we generalize the algorithms for a set of
    operators based on their properties
  • Operators sum, union, max, min
  • Properties commutative, associative, invertible,
    randomizable

24
Enjoy the spring break!
Write a Comment
User Comments (0)
About PowerShow.com