Other Perturbation Techniques - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Other Perturbation Techniques

Description:

Other Perturbation Techniques – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 19
Provided by: Keke4
Category:

less

Transcript and Presenter's Notes

Title: Other Perturbation Techniques


1
Other Perturbation Techniques
2
Outline
  • Randomized Responses
  • Sketch
  • Project ideas

3
Randomized Responses
  • Problem description
  • A provides the answer to Bs question
  • A wants to preserve his/her privacy
  • Question/answer can be sensitive
  • The method
  • Assume the answer can be yes or no
  • A has a probability ? to be honest, and the
    probability 1- ? to give a random response
  • We can estimate the real probability of yes and
    no from the randomized responses

4
  • Notations
  • O(yes) observed probability of yes from the
    randomized responses
  • of yes/total of responses
  • P(yes) real probability of yes
  • Inference
  • O(yes) P(yes) ? P(no)(1-?)
  • P(yes) ? (1-P(yes))(1-?)
  • ? P(yes) (O(yes)?-1)/(2?-1)

5
  • Extend to multiple categories
  • The answer ci has a prob ?ij changed to cj
  • O((c1,c2,,cn)) observed prob of ci
  • P((c1,c2,,cn)) real prob of ci
  • The relationship between O and P

Note When ? is invertible, use matrix inversion
to solve P. Otherwise, use iterative
methods similar to that in Rakeshs paper
6
  • Different perturbation matrices can be used.
    Which one is the best?
  • Balance between privacy and utility?

Zero privacy is preserved, while full data
utility is preserved
Uniform randomization, privacy is fully
preserved, while no data utility is left
7
Optimizing both privacyutility
  • Read paper 33
  • Privacy similar to previous discussion
  • Based on accuracy of estimation
  • A Bayes method
  • C c1,c2,,cn)
  • Y is the perturbed value, X is the original
    value, and X is the estimated value

Accuracy of estimation
It can be calculated by checking the original
data, the perturbed data and the estimated data
8
  • Privacy
  • Average 1- (accuracy of estimation)
  • Worst case
  • Utility
  • P(ci) the original prob, O(ci) the prob on
    perturbed data, P(ci) is the estimated prob
  • Utility depends on the difference between the
    original prob and the estimated prob

9
Optimization algorithm
  • Find the perturbation that balance the two
    metrics
  • The evolutionary algorithm
  • Start with a set of initial RR matrices
  • Repeat the following steps in each iteration
  • Mating selecting two RR matrices in the pool
  • Crossover exchanging several columns between the
    two RR matrices
  • Mutation change some values in a RR matrix
  • Meet the privacy bound filtering the resultant
    matrices
  • Evaluate the fitness value for the new RR
    matrices.
  • Note the fitness values is defined in terms of
    privacy and utility metrics

10
(No Transcript)
11
summary
  • Randomized response is the basic technique for
    perturbing categorical data
  • Boolean
  • Multi-category

12
Sketch
  • Address the problem of high-dimensional sparse
    data
  • Multiplicative perturbation
  • Randomized responses
  • Market basket data
  • Bag of words

13
Definition of sketch
  • Similar to projection perturbation
  • Map d dimensional data ? r dimensional data, rltltd
  • Difference for each record the mapping matrix is
    different
  • Definition
  • X (x1,xd), S(s1,,sr)

is randomly drawn from -1, 1
14
property
  • Dot product of the original data X and Y can be
    approximated with their sketches
  • Dot product is important in calculating Euclidean
    distances!

15
  • Accuracy of the dot product estimation

Large r ? smaller variance ? better quality
however, ? lower privacy
16
Privacy
  • Original data value can be estimated
  • Sparse data
  • Most are canceled in sketch
  • Estimate of xk

17
privacy
  • ? - anonimity

Suppress the record if this condition is not
satisfied
Another concept K-variance paper 29 for more
details.
18
  • Applications
  • Dot product estimation
  • Determine the length of sparse transaction ( of
    non-zero items in boolean vector)
  • Determine Euclidean distance
  • Average of a set of records (centroid of a
    cluster)
Write a Comment
User Comments (0)
About PowerShow.com