I - PowerPoint PPT Presentation

About This Presentation
Title:

I

Description:

I m in the Database, But Nobody Knows Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 31
Provided by: Cynthia190
Learn more at: https://zoo.cs.yale.edu
Category:
Tags: love

less

Transcript and Presenter's Notes

Title: I


1
Im in the Database, But Nobody Knows
  • Cynthia Dwork, Microsoft Research

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAA
2
Many Threats to Privacy of Electronic Data
  • Theft
  • Phishing
  • Viruses
  • Cryptanalysis
  • Changing Privacy Policies

3
This Talk Privacy-Preserving Data Analysis
  • First Tier Motivating Examples
  • Analysis of Census Data, Medical Outcomes Data,
    GWAS data, Epidemiology, Analysis of Vehicle
    Braking Records
  • Second Tier Examples
  • Training an advertising classifier,
    Recommendation System, Netflix Challenge

4
Pure Privacy Problem
  • Difficult Even if
  • Curator is Angel
  • Data are in Vault

5
Typical Suggestions
  • Large Set Queries
  • How many MSFT employees have Sickle Cell Trait
    (CST)?
  • How many MSFT employees who are not female
    Distinguished Scientists with very curly hair
    have the CST?
  • Add Random Noise to True Answer
  • Average of responses to repeated queries
    converges to true answer
  • Cant simply detect repetition (undecidable)
  • Detect When Answering is Unsafe
  • Refusal can be disclosive

6
A Litany
7
AOL Search History Release (2006)
Name Thelma Arnold Age 62 Widow Residence
Lilburn, GA
8
William Welds Medical Record Sweeney02
HMO data
voter registration data
name
ethnicity
address
ZIP
visit date
date reg.
diagnosis
birth date
procedure
party affiliation
sex
medication
total charge
last voted
9
(No Transcript)
10
GWAS Membership Homer et al. 08


C
T


T
T
  • SNP Single Nucleotide (A,C,G,T) polymorphism

Reference Population Major Allele (C)
94 Minor Allele (T) 6 Genome-Wide
Association Study Allele frequencies for
many thousands of SNPS
11
Anonymized Social Networks BackstromDK07
  • Magic Step
  • Isolate lightly linked-in subgraphs from rest of
    graph
  • Special structure of subgraph permits finding A, B

A
S
B
J
12
Definitional Failures
  • Failure to Cope with Auxiliary Information
  • Existing and future databases, newspaper reports,
    Flikr, literature, etc.
  • Definitions are Syntactic and Ad Hoc
  • Daleniuss Ad Omnia Guarantee (1977)
  • Anything that can be learned about a
    respondent from the statistical database can be
    learned without access to the database
  • Unachievable

13
Parable How Tall is Pamela Jones (Groklaw)?
  • An Admittedly Unreasonable Impossibility Proof
  • Database teaches average heights of population
    subgroups
  • PJ is 2 inches shorter than avg Swedish woman
  • PJs height learnable with the DB, not learnable
    without.
  • PJ loses privacy whether or not she is in the
    database
  • Suggests new notion of privacy risk incurred by
    joining DB
  • The outcome of any analysis is essentially
    equally likely, independent of whether any
    individual joins or refrains from joining the
    dataset. (The likelihood is over the choices
    made by the algorithm.)

14
Differential Privacy Dwork-McSherry-Nissim-Smith
2006
M gives ? - differential privacy if for all
adjacent D1 and D2, and all C µ range(M) Pr M
(D1) 2 C e? Pr M (D2) 2 C
Neutralizes all linkage attacks. Composes
unconditionally and automatically Si ? i
15
Differential Privacy
  • Resilience to All Auxiliary Information
  • Past, present, future data sources and algorithms
  • Low-error high-privacy DP techniques exist for
    many problems
  • datamining tasks (association rules, decision
    trees, clustering, ), contingency tables,
    histograms, synthetic data sets for query logs,
    machine learning (boosting, statistical queries
    learning model, SVMs, logistic regression),
    various statistical estimators, network trace
    analysis, recommendation systems,
  • Programming Platforms
  • http//research.microsoft.com/en-us/projects/PINQ/
  • http//userweb.cs.utexas.edu/shmat/shmat_nsdi10.p
    df

Download today!
16
(No Transcript)
17
(No Transcript)
18
Can we store and share your information with
health officials and researchers? This
information can be very helpful in monitoring
regional health conditions, plan flu response,
and conduct health research. By allowing the
responses to the survey questions to be used for
public health, education and research purposes,
you can help your community.
19
Snow 1854
Cholera cases
Suspected pump
20
https//h1n1.cloudapp.net/Privacy.aspx
Microsoft may also disclose information if
required to do so by law or in the good faith
belief that such action is necessary to (a)
conform to the edicts of the law or comply with
legal process served on Microsoft or the Site
(b) protect and defend the rights or property of
Microsoft and our family of Web sites or (c) act
in urgent circumstances to protect the personal
safety of users of Microsoft products or members
of the public.
21
Mission Creep
Think of the children!
Never store the data!
22
Pan-Private Streaming Algorithms DNPRY10
  • Private inside and out
  • Completely hide the pattern of appearances of any
    individual
  • Presence, absence, frequency, etc.
  • Protect against mission creep, subpoena, intrusion

23
DiffeP Limitations and Challenges
  • Cant study outliers
  • Privacy erosion over multiple analyses is
    cumulative
  • Privacy erosion over multiple databases is
    cumulative
  • Compare real world to one in which my info is
    everywhere deleted, looking at a lifetime of
    exposure against worst-case adversary/information/
    collection of databases
  • Formally capture reasonable worlds?
  • What are the right questions to ask about social
    networks?
  • Removing one person can affect data of many other
    people

24
Utility Implies Exposure to Harm
  • Database teaches that smoking causes cancer.
  • Smoker Ss insurance premiums rise.
  • But learning that smoking causes cancer is the
    whole point.
  • Smoker S enrolls in a smoking cessation program.
  • May be fine for First-Tier Uses, but what about
    Second Tier?
  • Who decides, and how?

25
Pause
  • Ad Omnia definition of privacy
  • Composes automatically and obliviously
    independent of aux info
  • Achievable, frequently with great accuracy
  • Usable
  • Can program using a privacy-preserving interface
  • Many questions remain
  • Is there a weaker ad omnia definition than
    differential privacy that also composes
    automatically?

26
Which Ad(s) Am I Charged For?
27
More Subtle Attack
  • A potentially embarrassing interest

    In a Long Leg Cast Anna models her cast on a
    stoop, wiggling/rubbing her casted toes.
    Length 90 minutes
  • User clicks on ad targeted to wealthy, whale
    loving LLC enthusiasts ) user is wealthy, whale
    loving LLC enthusiast
  • User understands that interest in LLC is
    communicated to seller, but does not understand
    that wealth and love of whales are also
    communicated
  • Should privacy of these attributes be protected?
    What about race?
  • Fairness should poor children see different ads
    than rich children?

28
Wall Street Journal 4/4/2010
  • User visits capitalone.com
  • Capital One uses tracking information via
    tracking network x1 to personalize offers
  • Danger Steering minorities into higher rates
  • In principle, law seems to allow credit outcomes
    based on browsing history may encode race
  • How can an ad network prevent steering?
  • What is fairness in classification, and how can
    we achieve it?

29
Work in Progress
D., Fiat, Hardt, Pitassi, Reingold, Zemel
30
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com