Title: Privacy
1Privacy
- Prof. Bhavani Thuraisingham
- The University of Texas at Dallas
- March 5, 2008
- Lecture 18
2What is Privacy
- Medical Community
- Privacy is about a patient determining what
patient/medical information the doctor should be
released about him/her - Financial community
- A bank customer determine what financial
information the bank should release about him/her - Government community
- FBI would collect information about US citizens.
However FBI determines what information about a
US citizen it can release to say the CIA
3Some Privacy concerns
- Medical and Healthcare
- Employers, marketers, or others knowing of
private medical concerns - Security
- Allowing access to individuals travel and
spending data - Allowing access to web surfing behavior
- Marketing, Sales, and Finance
- Allowing access to individuals purchases
4Data Mining as a Threat to Privacy
- Data mining gives us facts that are not obvious
to human analysts of the data - Can general trends across individuals be
determined without revealing information about
individuals? - Possible threats
- Combine collections of data and infer information
that is private - Disease information from prescription data
- Military Action from Pizza delivery to pentagon
- Need to protect the associations and correlations
between the data that are sensitive or private
5Some Privacy Problems and Potential Solutions
- Problem Privacy violations that result due to
data mining - Potential solution Privacy-preserving data
mining - Problem Privacy violations that result due to
the Inference problem - Inference is the process of deducing sensitive
information from the legitimate responses
received to user queries - Potential solution Privacy Constraint Processing
- Problem Privacy violations due to un-encrypted
data - Potential solution Encryption at different
levels - Problem Privacy violation due to poor system
design - Potential solution Develop methodology for
designing privacy-enhanced systems
6Privacy Constraint Processing
- Privacy constraints processing
- Based on prior research in security constraint
processing - Simple Constraint an attribute of a document is
private - Content-based constraint If document contains
information about X, then it is private - Association-based Constraint Two or more
documents taken together is private individually
each document is public - Release constraint After X is released Y becomes
private - Augment a database system with a privacy
controller for constraint processing
7Architecture for Privacy Constraint Processing
User Interface Manager
Privacy Constraints
Constraint Manager
Database Design Tool Constraints during database
design operation
Update Processor Constraints during update
operation
Query Processor Constraints during query and
release operations
DBMS
Database
8Semantic Model for Privacy Control
Dark lines/boxes contain private information
Cancer
Influenza
Has disease
Johns address
Patient John
England
address
Travels frequently
9Privacy Preserving Data Mining
- Prevent useful results from mining
- Introduce cover stories to give false results
- Only make a sample of data available so that an
adversary is unable to come up with useful rules
and predictive functions - Randomization
- Introduce random values into the data and/or
results - Challenge is to introduce random values without
significantly affecting the data mining results - Give range of values for results instead of exact
values - Secure Multi-party Computation
- Each party knows its own inputs encryption
techniques used to compute final results -
- Rules, predictive functions
- Approach Only make a sample of data available
- Limits ability to learn good classifier
10Cryptographic Approaches for Privacy Preserving
Data Mining
- Secure Multi-part Computation (SMC) for PPDM
- Mainly used for distributed data mining.
- Provably secure under some assumptions.
- Learned models are accurate
- Efficient/specific cryptographic solutions for
many distributed data mining problems are
developed. - Mainly semi-honest assumption (i.e. parties
follow the protocols) - Malicious model is also explored recently. (e.g.
Kantarcioglu and Kardes paper in this workshop) - Many SMC based PPDM algorithms share common
sub-protocols (e.g. dot product, summation, etc.
) -
11Cryptographic Approaches for Privacy Preserving
Data Mining
- Drawbacks
- Still not efficient enough for very large
datasets. (e.g. petabyte sized datasets ??) - Semi-honest model may not be realistic
- Malicious model is even slower
- Possible new directions
- New models that can trade-off better between
efficiency and security - Game theoretic / incentive issues in PPDM
- Combining anonymization and cryptographic
techniques for PPDM
12Perturbation Based Approaches for Privacy
Preserving Data Mining
- Goal Distort data while still preserve some
properties for data mining propose.
- Additive Based
- Multiplicative Based
- Condensation based
- Decomposition
- Data Swapping
13Perturbation Based Approaches for Privacy
Preserving Data Mining
- Goal Achieve a high data mining accuracy with
maximum privacy protection.
14Perturbation Based Approaches for Privacy
Preserving Data Mining
- Privacy is a personal choice, so should enable
individual adaptable (Liu, Kantarcioglu and
Thuraisingham ICDM06)
15Perturbation Based Approaches for Privacy
Preserving Data Mining
- The trend is to make PPDM approaches fit in the
reality - We investigated perturbation based approaches
with real-world data sets - We give a applicability study to the current
approaches - Liu, Kantarcioglu and Thuraisingham, DKE 07
- We found out,
- The reconstruction the original distribution may
not work well with real-world data set - Distribution is a hard problem, should not use as
a media step - Try to modify perturbation techniques, and adapt
some data mining tools, e.g. Liu, Kantarcioglu
and Thuraisingham, Novel decision tree UTD
technical report 06
16CPT Confidentiality, Privacy and Trust
- Before I as a user of Organization A send data
about me to organization B, I read the privacy
policies enforced by organization B - If I agree to the privacy policies of
organization B, then I will send data about me to
organization B - If I do not agree with the policies of
organization B, then I can negotiate with
organization B - Even if the web site states that it will not
share private information with others, do I trust
the web site - Note while confidentiality is enforced by the
organization, privacy is determined by the user.
Therefore for confidentiality, the organization
will determine whether a user can have the data.
If so, then the organization van further
determine whether the user can be trusted
17Platform for Privacy Preferences (P3P) What is
it?
- P3P is an emerging industry standard that enables
web sites to express their privacy practices in a
standard format - The format of the policies can be automatically
retrieved and understood by user agents - It is a product of W3C World wide web consortium
- www.w3c.org
- When a user enters a web site, the privacy
policies of the web site is conveyed to the user
If the privacy policies are different from user
preferences, the user is notified User can then
decide how to proceed - Several major corporations are working on P3P
standards including
18Platform for Privacy Preferences (P3P)
Organizations
- Several major corporations are working on P3P
standards including - Microsoft
- IBM
- HP
- NEC
- Nokia
- NCR
- Web sites have also implemented P3P
- Semantic web group has adopted P3P
19Platform for Privacy Preferences (P3P)
Specifications
- Initial version of P3P used RDF to specify
policies Recent version has migrated to XML - P3P Policies use XML with namespaces for
encoding policies - P3P has its own statements and data types
expressed in XML P3P schemas utilize XML schemas - P3P specification released in January 20005 uses
catalog shopping example to explain concepts P3P
is an International standard and is an ongoing
project - Example Catalog shopping
- Your name will not be given to a third party but
your purchases will be given to a third party - ltPOLICIES xmlns http//www.w3.org/2002/01/P3Pv1gt
- ltPOLICY name - - - -
- lt/POLICYgt
- lt/POLICIESgt
20P3P and Legal Issues
- P3P does not replace laws
- P3P work together with the law
- What happens if the web sites do no honor their
P3P policies - Then appropriate legal actions will have to be
taken - XML is the technology to specify P3P policies
- Policy experts will have to specify the policies
- Technologies will have to develop the
specifications - Legal experts will have to take actions if the
policies are violated
21Privacy for Assured Information Sharing
Data/Policy for Federation
Export
Export
Data/Policy
Data/Policy
Export
Data/Policy
Component
Component
Data/Policy for
Data/Policy for
Agency A
Agency C
Component
Data/Policy for
Agency B
22Privacy Preserving Surveillance
Raw video surveillance data
Face Detection and Face Derecognizing system
Suspicious people found
Faces of trusted people derecognized to preserve
privacy
Suspicious events found
Comprehensive security report listing suspicious
events and people detected
Suspicious Event Detection System
Manual Inspection of video data
Report of security personnel
23Directions Foundations of Privacy Preserving
Data Mining
- We proved in 1990 that the inference problem in
general was unsolvable, therefore the suggestion
was to explore the solvability aspects of the
problem. - Can we do something similar for privacy?
- Is the general privacy problem solvable?
- What are the complicity classes?
- What is the storage and time complicity
- We need to explore the foundation of PPDM and
related privacy solutions
24Directions Testbed Development and Application
Scenarios
- There are numerous PPDM related algorithms. How
do they compare with each other? We need a
testbed with realistic parameters to test the
algorithms - It is time to develop real world scenarios where
these algorithms can be utilized - Is it feasible to develop realistic commercial
products or should each organization adapt
product to suit their needs?
25Key Points
- 1. There is no universal definition for privacy,
each organization must definite what it means by
privacy and develop appropriate privacy policies - 2. Technology alone is not sufficient for privacy
We need technologists, Policy expert, Legal
experts and Social scientists to work on Privacy - 3. Some well known people have said Forget about
privacy Therefore, should we pursue research on
Privacy? - Interesting research problems, there need to
continue with research - Something is better than nothing
- Try to prevent privacy violations and if
violations occur then prosecute - 4. We need to tackle privacy from all directions
26Application Specific Privacy?
- Examining privacy may make sense for healthcare
and financial applications - Does privacy work for Defense and Intelligence
applications? - 3Is it eve meaningful to have privacy for
surveillance and geospatial applications - Once the image of my house is on Google Earth,
then how much privacy can I have? - I may want my location to be private, but does it
make sense if a camera can capture a picture of
me? - If there are sensors all over the place, is it
meaningful to have privacy preserving
surveillance? - This suggestion that we need application specific
privacy - It is not meaningful to examine PPDM for every
data mining algorithm and for every application
27Data Mining and Privacy Friends or Foes?
- They are neither friends nor foes
- Need advances in both data mining and privacy
- Need to design flexible systems
- For some applications one may have to focus
entirely on pure data mining while for some
others there may be a need for privacy-preserving
data mining - Need flexible data mining techniques that can
adapt to the changing environments - Technologists, legal specialists, social
scientists, policy makers and privacy advocates
MUST work together