Differential Privacy: Case Studies - PowerPoint PPT Presentation

About This Presentation
Title:

Differential Privacy: Case Studies

Description:

Windows Live / MSN Web Analytics data. Qualitative Case Study: Clinical Physicians Perspective ... Sampled Users Web Analytics Group ... – PowerPoint PPT presentation

Number of Views:308
Avg rating:3.0/5.0
Slides: 23
Provided by: dennygua
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Differential Privacy: Case Studies


1
Differential PrivacyCase Studies
  • Denny Lee (dennyl_at_microsoft.com),
  • Microsoft SQLCAT Team Best Practices

2
Case Studies
  • Quantitative Case Study
  • Windows Live / MSN Web Analytics data
  • Qualitative Case Study
  • Clinical Physicians Perspective
  • Future Study
  • OHSU/CORI data set to apply differential privacy
    to Healthcare setting

3
Sanitization Concept
  • Mask individuals within the data by creating a
    sanitization point between user interface and
    data.
  • The magnitude of the noise is given by the
    theorem. If many queries f1, f2, are to be
    made, noise proportional to Si?fi suffices. For
    many sequences, we can often use less noise than
    Si?fi . Note that ? Histogram 1, independent
    of number of cells

4
Generating the noise
  • To generate the noise, a pseudo-random number
    generator will create a stream of numbers, e.g.
  • The resulting translation of this stream is

0 0 1 1 1 1 0 0 0 0 1
- . 2 1 . . . . 6
5
Adding noise
  • The stream of numbers above is applied to the
    result set.
  • While masking the individuals, it allows accurate
    percentages and trending.
  • Presuming the magnitude is small (i.e. small
    error), the numbers are themselves accurate
    within an acceptable margin.

Category Value
A 36
B 22

N 102
noise
Category Value
A 34
B 23

N 108
6
Windows Live User Data
  • Our initial case study is based on Windows Live
    user data
  • 550 million Passport users
  • Passport has web site visitor self-reported data
    gender, birth date, occupation, country, zip
    code, etc.
  • Web data has IP address, pages viewed, page view
    duration, browser, operating system, etc.
  • Created two groups for this case study to study
    the acceptability / applicability of differential
    privacy within the WL reporting context
  • WL Sampled Users Web Analytics
  • Customer Churn Analytics

7
Windows Live Example Report
  • As per below, you can see the effect on the data

8
Sampled Users Web Analytics Group
  • New solution built on top of an existing Windows
    Live web analytics solution to provide a sample
    specific to Passport users.
  • Built on top of an OLAP database to provide
    analysts to view the data from multiple
    dimensions.
  • Built as well to showcase the privacy preserving
    histogram for various teams including Channels,
    Search, and Money.

9
Web Analytics Group Feedback
  • Feedback was negative because customers could not
    accept any amount of error.
  • This group had been using reporting systems for
    over two years that had perceived accuracy
    issues.
  • They were adamant that all of the totals matched
    the difference on the right was not acceptable
    even though this data was not used for financial
    reconciliation.

Country Visitors
United States 202
Canada 31
Country Gender Visitors
United States Female 128
United States Male 75
United States Total 203
Canada Female 15
Canada Male 15
Canada Total 30
10
Customer Churn Analysis Group
  • This reporting solution provided an OLAP cube,
    based on an existing targeted marketing system,
    to allow analysts to understand how services
    (Messenger, Mail, Search, Spaces, etc.) are being
    used.
  • A key difference between the groups is that this
    group did not have access to any reporting
    (though it was requested for many months).
  • Within a few weeks of their initial request, CCA
    customers received a working beta in which they
    were able to interact, validate, and provide
    feedback to the precision and accuracy of the
    data.

11
Discussion
  • The collaborative effort lead to the customer
    trusting the data, a key difference in comparison
    to the first group.
  • Because of this trust, the small amount of error
    introduced into the system to ensure customer
    privacy was well within a tolerable error margin.
  • The CCA group is in direct marketing hence had to
    deal more regularly with customer privacy.

12
An important component to the acceptance of
privacy algorithms is the users trust of the
data.
13
Clinical Researchers Perceptions
  • A pilot qualitative study on the perceptions of
    clinical researchers was recently completed.
  • It has noted three categories of six themes
  • Unaffected Statistics
  • Understanding the privacy algorithms
  • Can get back to the original data
  • Understanding the purpose of the privacy
    algorithms
  • Management ROI
  • Protecting Patient Privacy

14
Unaffected Statistics
  • The most important point no point applying
    privacy if we get faulty statistics.
  • Primary concern is healthcare studies involve
    smaller number of patients than other studies.
  • We are currently planning to provide in the near
    future a healthcare template for the use of these
    algorithms.

15
Understanding the privacy algorithms
  • As we have done in these slides, we have
    described the mathematics behind these algorithms
    only briefly.
  • But most clinical researchers are willing to
    accept the science behind them without
    necessarily understanding them.
  • While this is good, it does pose the problem that
    one will implement them w/o understanding them
    incorrectly guaranteeing the privacy of patients.

16
Can get back to the original data
  • It is very important to get back to the original
    data set if so required.
  • Many existing privacy algorithms perturb the data
    so while guaranteeing the privacy of an
    individual, it is impossible to get back to the
    individual.
  • Healthcare research always requires the ability
    to get back to the original data to potentially
    inform patients of new outcomes.
  • The privacy preserving data analysis approach
    here will allow this ability.

17
Understand the purpose of the privacy algorithms
  • Most educated healthcare professionals understand
    the issues and providing case studies such as the
    Gov Weld case make this more apparent.
  • But we will still want to provide well-worded
    text and/or confidence intervals below a chart or
    report that has privacy algorithms applied.

18
Management ROI
  • We should be limiting the number of users who
    need access to full data. So is there a good
    return-on-investment to provide this extra step
    if you can securely authorize the right people to
    access this data?
  • This is where standards from IRB, privacy
    security steering committees, and the government
    get involved.
  • Most importantly the ability to share data.

19
Protecting Patient Privacy
For us to be able to analyze and mine medical
data so we can help patients as well as lower the
costs of healthcare, we must first ensure patient
privacy.
20
Future Collaboration
  • As noted above, we are currently working with
    OHSU to build a template for the application of
    these privacy algorithms to healthcare.
  • For more information and/or interest in
    participating in future application research,
    please email Denny Lee at dennyl_at_microsoft.com.

21
Thanks
  • Thanks to Sally Allwardt for helping implement
    the privacy preserving histogram algorithm used
    in this case study.
  • Thanks to Kristina Behr, Lead Marketing Manager,
    for all of her help and feedback with this case
    study.

22
Practical Privacy The SuLQ Framework
  • Reference paper Practical Privacy The SuLQ
    Framework
  • Conceptually, this application of privacy can be
    applied to
  • Principal component analysis
  • k means clustering
  • ID3 algorithm
  • Perceptron algorithm
  • Apparently, all algorithms in the statistical
    queries learning model.
Write a Comment
User Comments (0)
About PowerShow.com