Privacypreserving data mining 2 - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Privacypreserving data mining 2

Description:

Using data structures like hash tree to speed up the counting process. Algorithm ... discussion on rule hiding. Need sufficient amount of computational cost at ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 24

Provided by: keke9

Category:

more less

Transcript and Presenter's Notes

Title: Privacypreserving data mining 2

1
Privacy-preserving data mining (2)
2
Outline

A brief introduction to association rule mining
Privacy preserving rule mining
Single party
Perturbation
Encryption
Distributed multiparty
Cryptographic protocols
Hiding sensitive rules

3
Association Rule Mining

Transactional datasets
A transaction t a,b,c,
a,b,c, are called items
The length of transaction of items
Transaction length can vary
Equivalent representation
The set of all items I
A transaction t can be transformed to a boolean
vector in length of I.

4
Association Rule Mining

Rule mining
Goal find the frequent itemset
Some itemset, e.g.a,b,c, appears frequently,
higher than certain support.
Rules can be derived from the itemset
a,b,c is frequent, then a,b?c, a?bc,
Metrics
Support of occurrences of itemset/ total of
transactions
Confidence of occurrences of itemset/ of
occurrences of left(rule)
I.e. the conditional prob Pr (rightleft)

Example
E.g. a,b appears 100 times together, while abc
appears 50 times together in total 5000
transactions
Support of abc 50/5000 0.01
Confidence of ab?c 50/100 0.5

6
Algorithms

Apriori
observation if itemset A is a part of B, then
support(A) gt support (B)
Steps in finding frequent itemsets
Starting from single-item set, pruning the
itemsets that have support lt threshold
When we have a set of k-itemsets, we expand it to
k1-itemsets, and check their supports.
Using data structures like hash tree to speed up
the counting process

Algorithm
Generating rules with confidence threshold
Confidence (A?B) P(BA)
support(AB)/support(A)

8
Single party PPRM

Two methods
(Categorical data) Perturbation
Encryption

9
Perturbation

Paper 111,112,113
Basic ideas
Consider a transaction as a boolean bit vector
Perturb each bit with certain rule
Paper 111 randomly select j items from t, then
for rest of all items, with prob p to be selected
Paper 112 each bit has the prob p to be
original, 1-p to be flipped
Paper 113 unify the methods with perturbation
matrix

10
The key is

After you perturb the data, you should still be
able to find the supported rules correctly.
The accuracy is traded off by the intensity of
perturbation (p)

11
Methods discovering the original support

Paper 111 using the correlation between partial
support to find the original support
Concept of partial support
Prob of the length change of matched parts

notewe actually want sup for lk
The size of t m, the size of itemset A k
12
Some results

Let si be supi(A) and si be supi (A)

1. 2.
The matrix P and D are defined with only pl?l
From 1, we can estimate the original support From
2, we can estimate the reliability (variance) of
the support Estimation (which is related to
perturbation rate p)
13
Privacy

Given an itemset A in perturbed transaction t
What is the probability of an item a, really in
the itemset A, i.e.,

14
Tradeoff between utility and privacy
Lowest discoverable support distinguishable from
zero (consider the variance of support
estimation)
15
Encryption method (paper118)

Substitution encryption
1-1 substitution a?1, b?2,
1-n substitution a?1,10, b?2,11,12,
Problem
1-1 substitution is weak
Arbitrary 1-n substitution does not work
Cannot recover original rules from the rules from
the substituted items.

16
The basic idea

Fake items
Original n items, additional m fake items
Define admissible 1-n mapping
Arbitrary 1-n mapping may result in irreversible
results
E.g., a?1,2, b?2, c?3
If we find frequent itemset 1,2,3 in the
substituted set, ac or abc, which one is the
right original itemset?
Admissible 1-n mapping
For each mapping, there should be at least one
unique substitute item in the mapped result,
which does not appear in other mapping
E.g., a?1,2, b?2, c?3 breaks the definition
while a?1,2, b?2,4, c?3 is admissible

17
Recovering rules

When we use admissible mappings
We are able to reverse the discovered rules on
substituted set.
E.g., if we find 1,2,4 is a frequent set
check all mappings
1,2 ? a, 2,4 ?b ? 1,2,4 ?ab

18
cost

Additional cost
Generating item mapping
Generating transaction transformation
significant
Cost of rule mining
Both the of items and the average length of
transaction is increased, thus the total cost
will be increased

19
Features of encryption method