Privacy - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Privacy

Description:

Privacy is about a patient determining what patient/medical information the ... Data Mining as a Threat to Privacy ... Some Privacy Problems and Potential Solutions ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 28
Provided by: chrisc8
Category:
Tags: privacy

less

Transcript and Presenter's Notes

Title: Privacy


1
Privacy
  • Prof. Bhavani Thuraisingham
  • The University of Texas at Dallas
  • March 5, 2008
  • Lecture 18

2
What is Privacy
  • Medical Community
  • Privacy is about a patient determining what
    patient/medical information the doctor should be
    released about him/her
  • Financial community
  • A bank customer determine what financial
    information the bank should release about him/her
  • Government community
  • FBI would collect information about US citizens.
    However FBI determines what information about a
    US citizen it can release to say the CIA

3
Some Privacy concerns
  • Medical and Healthcare
  • Employers, marketers, or others knowing of
    private medical concerns
  • Security
  • Allowing access to individuals travel and
    spending data
  • Allowing access to web surfing behavior
  • Marketing, Sales, and Finance
  • Allowing access to individuals purchases

4
Data Mining as a Threat to Privacy
  • Data mining gives us facts that are not obvious
    to human analysts of the data
  • Can general trends across individuals be
    determined without revealing information about
    individuals?
  • Possible threats
  • Combine collections of data and infer information
    that is private
  • Disease information from prescription data
  • Military Action from Pizza delivery to pentagon
  • Need to protect the associations and correlations
    between the data that are sensitive or private

5
Some Privacy Problems and Potential Solutions
  • Problem Privacy violations that result due to
    data mining
  • Potential solution Privacy-preserving data
    mining
  • Problem Privacy violations that result due to
    the Inference problem
  • Inference is the process of deducing sensitive
    information from the legitimate responses
    received to user queries
  • Potential solution Privacy Constraint Processing
  • Problem Privacy violations due to un-encrypted
    data
  • Potential solution Encryption at different
    levels
  • Problem Privacy violation due to poor system
    design
  • Potential solution Develop methodology for
    designing privacy-enhanced systems

6
Privacy Constraint Processing
  • Privacy constraints processing
  • Based on prior research in security constraint
    processing
  • Simple Constraint an attribute of a document is
    private
  • Content-based constraint If document contains
    information about X, then it is private
  • Association-based Constraint Two or more
    documents taken together is private individually
    each document is public
  • Release constraint After X is released Y becomes
    private
  • Augment a database system with a privacy
    controller for constraint processing

7
Architecture for Privacy Constraint Processing
User Interface Manager
Privacy Constraints
Constraint Manager
Database Design Tool Constraints during database
design operation
Update Processor Constraints during update
operation
Query Processor Constraints during query and
release operations
DBMS
Database
8
Semantic Model for Privacy Control
Dark lines/boxes contain private information
Cancer
Influenza
Has disease
Johns address
Patient John
England
address
Travels frequently
9
Privacy Preserving Data Mining
  • Prevent useful results from mining
  • Introduce cover stories to give false results
  • Only make a sample of data available so that an
    adversary is unable to come up with useful rules
    and predictive functions
  • Randomization
  • Introduce random values into the data and/or
    results
  • Challenge is to introduce random values without
    significantly affecting the data mining results
  • Give range of values for results instead of exact
    values
  • Secure Multi-party Computation
  • Each party knows its own inputs encryption
    techniques used to compute final results
  • Rules, predictive functions
  • Approach Only make a sample of data available
  • Limits ability to learn good classifier

10
Cryptographic Approaches for Privacy Preserving
Data Mining
  • Secure Multi-part Computation (SMC) for PPDM
  • Mainly used for distributed data mining.
  • Provably secure under some assumptions.
  • Learned models are accurate
  • Efficient/specific cryptographic solutions for
    many distributed data mining problems are
    developed.
  • Mainly semi-honest assumption (i.e. parties
    follow the protocols)
  • Malicious model is also explored recently. (e.g.
    Kantarcioglu and Kardes paper in this workshop)
  • Many SMC based PPDM algorithms share common
    sub-protocols (e.g. dot product, summation, etc.
    )

11
Cryptographic Approaches for Privacy Preserving
Data Mining
  • Drawbacks
  • Still not efficient enough for very large
    datasets. (e.g. petabyte sized datasets ??)
  • Semi-honest model may not be realistic
  • Malicious model is even slower
  • Possible new directions
  • New models that can trade-off better between
    efficiency and security
  • Game theoretic / incentive issues in PPDM
  • Combining anonymization and cryptographic
    techniques for PPDM

12
Perturbation Based Approaches for Privacy
Preserving Data Mining
  • Goal Distort data while still preserve some
    properties for data mining propose.
  • Additive Based
  • Multiplicative Based
  • Condensation based
  • Decomposition
  • Data Swapping

13
Perturbation Based Approaches for Privacy
Preserving Data Mining
  • Goal Achieve a high data mining accuracy with
    maximum privacy protection.

14
Perturbation Based Approaches for Privacy
Preserving Data Mining
  • Privacy is a personal choice, so should enable
    individual adaptable (Liu, Kantarcioglu and
    Thuraisingham ICDM06)

15
Perturbation Based Approaches for Privacy
Preserving Data Mining
  • The trend is to make PPDM approaches fit in the
    reality
  • We investigated perturbation based approaches
    with real-world data sets
  • We give a applicability study to the current
    approaches
  • Liu, Kantarcioglu and Thuraisingham, DKE 07
  • We found out,
  • The reconstruction the original distribution may
    not work well with real-world data set
  • Distribution is a hard problem, should not use as
    a media step
  • Try to modify perturbation techniques, and adapt
    some data mining tools, e.g. Liu, Kantarcioglu
    and Thuraisingham, Novel decision tree UTD
    technical report 06

16
CPT Confidentiality, Privacy and Trust
  • Before I as a user of Organization A send data
    about me to organization B, I read the privacy
    policies enforced by organization B
  • If I agree to the privacy policies of
    organization B, then I will send data about me to
    organization B
  • If I do not agree with the policies of
    organization B, then I can negotiate with
    organization B
  • Even if the web site states that it will not
    share private information with others, do I trust
    the web site
  • Note while confidentiality is enforced by the
    organization, privacy is determined by the user.
    Therefore for confidentiality, the organization
    will determine whether a user can have the data.
    If so, then the organization van further
    determine whether the user can be trusted

17
Platform for Privacy Preferences (P3P) What is
it?
  • P3P is an emerging industry standard that enables
    web sites to express their privacy practices in a
    standard format
  • The format of the policies can be automatically
    retrieved and understood by user agents
  • It is a product of W3C World wide web consortium
  • www.w3c.org
  • When a user enters a web site, the privacy
    policies of the web site is conveyed to the user
    If the privacy policies are different from user
    preferences, the user is notified User can then
    decide how to proceed
  • Several major corporations are working on P3P
    standards including

18
Platform for Privacy Preferences (P3P)
Organizations
  • Several major corporations are working on P3P
    standards including
  • Microsoft
  • IBM
  • HP
  • NEC
  • Nokia
  • NCR
  • Web sites have also implemented P3P
  • Semantic web group has adopted P3P

19
Platform for Privacy Preferences (P3P)
Specifications
  • Initial version of P3P used RDF to specify
    policies Recent version has migrated to XML
  • P3P Policies use XML with namespaces for
    encoding policies
  • P3P has its own statements and data types
    expressed in XML P3P schemas utilize XML schemas
  • P3P specification released in January 20005 uses
    catalog shopping example to explain concepts P3P
    is an International standard and is an ongoing
    project
  • Example Catalog shopping
  • Your name will not be given to a third party but
    your purchases will be given to a third party
  • ltPOLICIES xmlns http//www.w3.org/2002/01/P3Pv1gt
  • ltPOLICY name - - - -
  • lt/POLICYgt
  • lt/POLICIESgt

20
P3P and Legal Issues
  • P3P does not replace laws
  • P3P work together with the law
  • What happens if the web sites do no honor their
    P3P policies
  • Then appropriate legal actions will have to be
    taken
  • XML is the technology to specify P3P policies
  • Policy experts will have to specify the policies
  • Technologies will have to develop the
    specifications
  • Legal experts will have to take actions if the
    policies are violated

21
Privacy for Assured Information Sharing
Data/Policy for Federation
Export
Export
Data/Policy
Data/Policy
Export
Data/Policy
Component
Component
Data/Policy for
Data/Policy for
Agency A
Agency C
Component
Data/Policy for
Agency B
22
Privacy Preserving Surveillance
Raw video surveillance data
Face Detection and Face Derecognizing system
Suspicious people found
Faces of trusted people derecognized to preserve
privacy
Suspicious events found
Comprehensive security report listing suspicious
events and people detected
Suspicious Event Detection System
Manual Inspection of video data
Report of security personnel
23
Directions Foundations of Privacy Preserving
Data Mining
  • We proved in 1990 that the inference problem in
    general was unsolvable, therefore the suggestion
    was to explore the solvability aspects of the
    problem.
  • Can we do something similar for privacy?
  • Is the general privacy problem solvable?
  • What are the complicity classes?
  • What is the storage and time complicity
  • We need to explore the foundation of PPDM and
    related privacy solutions

24
Directions Testbed Development and Application
Scenarios
  • There are numerous PPDM related algorithms. How
    do they compare with each other? We need a
    testbed with realistic parameters to test the
    algorithms
  • It is time to develop real world scenarios where
    these algorithms can be utilized
  • Is it feasible to develop realistic commercial
    products or should each organization adapt
    product to suit their needs?

25
Key Points
  • 1. There is no universal definition for privacy,
    each organization must definite what it means by
    privacy and develop appropriate privacy policies
  • 2. Technology alone is not sufficient for privacy
    We need technologists, Policy expert, Legal
    experts and Social scientists to work on Privacy
  • 3. Some well known people have said Forget about
    privacy Therefore, should we pursue research on
    Privacy?
  • Interesting research problems, there need to
    continue with research
  • Something is better than nothing
  • Try to prevent privacy violations and if
    violations occur then prosecute
  • 4. We need to tackle privacy from all directions

26
Application Specific Privacy?
  • Examining privacy may make sense for healthcare
    and financial applications
  • Does privacy work for Defense and Intelligence
    applications?
  • 3Is it eve meaningful to have privacy for
    surveillance and geospatial applications
  • Once the image of my house is on Google Earth,
    then how much privacy can I have?
  • I may want my location to be private, but does it
    make sense if a camera can capture a picture of
    me?
  • If there are sensors all over the place, is it
    meaningful to have privacy preserving
    surveillance?
  • This suggestion that we need application specific
    privacy
  • It is not meaningful to examine PPDM for every
    data mining algorithm and for every application

27
Data Mining and Privacy Friends or Foes?
  • They are neither friends nor foes
  • Need advances in both data mining and privacy
  • Need to design flexible systems
  • For some applications one may have to focus
    entirely on pure data mining while for some
    others there may be a need for privacy-preserving
    data mining
  • Need flexible data mining techniques that can
    adapt to the changing environments
  • Technologists, legal specialists, social
    scientists, policy makers and privacy advocates
    MUST work together
Write a Comment
User Comments (0)
About PowerShow.com