Title: Poster Presentations by Students and Postdocs
1Poster Presentations by Students and Postdocs
- PORTIA Project Site Visit
- Stanford CA, May 12-13, 2005
- http//crypto.stanford.edu/portia/
2Paper Secure Computation of the k-th ranked
element (Eurocrypt 2004) PORTIA PI Rajeev
Motwani Institution Stanford
UniversityAuthors Gagan Aggarwal, Nina Mishra,
Benny Pinkas
Research Objectives Significant Results
Approach Significant Results
Approach Graphic
Broader Impact Graphic
Different organizations with related datasets
want to compute the median (or the k-th largest
element) of the union of these datasets. We ask
how they can do so without revealing anything
about the datasets except what the median itself
reveals.
k rank of the element to be computed D size
of the domain Protocol overhead Two-party case
O( log k log D ) t-party case O( t (log D)2
) Earlier work O( min k log D, D )
We adopt the security framework of secure
multi-party computation. Any function (including
the median) can be computed with polynomial
overhead, using protocols developed by Yao and
others. However, these generic protocols are not
efficient for large datasets. We develop
protocols for computing the median that have
polylogarithmic overhead.
What is the median of our datasets?
1. 2. 3. 4. 5.
1. 2. 3. 4. 5.
- Make system developers aware of multi-party
protocols to compute functions securely. - Build an efficient toolkit of privacy-preserving
protocols for distributed applications.
The median is 4254.34.
3Paper Enterprise privacy promises and
enforcement (WITS 2005) PORTIA PI John C.
Mitchell Institution Stanford
UniversityAuthors Adam Barth, John C. Mitchell
Research Objectives Significant Results APPEL and XPref can express unsafe preferences, such as Block providers who do not telemarket, which are not respected by the system as a whole. A P3P policy enforces its compact representation, allowing precise interpretation of compact policies. We give an algorithm for summarizing policies (e.g., translating an EPAL policy into a P3P policy).
Approach Significant Results APPEL and XPref can express unsafe preferences, such as Block providers who do not telemarket, which are not respected by the system as a whole. A P3P policy enforces its compact representation, allowing precise interpretation of compact policies. We give an algorithm for summarizing policies (e.g., translating an EPAL policy into a P3P policy).
Approach Graphic
Broader Impact Graphic
Allow an enterprise to determine whether its
detailed internal privacy policy meets its
published privacy promises. Provide an algorithm
for summarizing complex privacy policies as
simple privacy promises.
We propose a data-centric, unified model for the
semantics of several formal languages for privacy
policies (including P3P and EPAL) and equip the
model with a modal logic for reasoning about
permission inheritance across data hierarchies.
Policies are represented by Kripke structures,
and queries are represented by modal formulae.
The modalities reflect the differing perspectives
of consumers and service providers. Policies are
related by comparing the modal theory of their
models.
- Promulgate an end-to-end approach to the study
of privacy guarantees of complex systems. - Improve understanding of the semantics of
privacy policy languages, including the impact of
differing perspectives on privacy.
4Paper Vision Paper Enabling Privacy for the
Paranoids (VLDB 2004) PORTIA PI H.
Garcia-Molina, R. Motwani Institution Stanford
UniversityAuthors Gagan Aggarwal, Mayank Bawa,
Prasanna Ganesan, et al.
Research Objectives
Significant Results
P3P and Hippocratic databases assume that
organizations will implement privacy safeguards
for individuals information. Recent, well
publicized breaches have led people to mistrust
organizations use of these mechanisms. We seek
solutions that restore control to individuals.
- Present information types as points in a 3D
space (ownership, type, level of control). - Present the TRIM (Traceability, Revocability,
Isolation, Minimality) interaction model to
enable individual control in information
exchanges.
Approach
Graphic
We propose that individuals release information
to organizations in such a way that the released
information is unusable for illegitimate tasks.
We provide a small set of information types and
a set of mechanisms to retain control. Our
overall framework of types and mechanisms is
called P4P Paranoid Platform for Privacy
Preferences.
P4Pc complements traditional security
methodsa,b
Broader Impact
- Demonstrate that an individual can retain
control over his or her information even after
its release to an organization. - Initiate a radical re-think of modeling,
release, and management of personal information.
P4P Example A Example B
5Paper Privacy-Preserving Indexing of Documents
on the Network PORTIA PI Hector Garcia-Molina
Institution Stanford UniversityAuthors
Mayank Bawa, Roberto J. Bayardo Jr., Rakesh
Agrawal
(VLDB 2003)
Significant Results
Research Objectives
Individuals want to share content selectively but
lack mechanisms to do so. We address the problem
of privacy-preserving search over distributed
access-controlled content.
- Define a centralized inverted-index data
structure (the PPI) that guarantees Probable
Innocence (in the Reiter-Rubin sense). - Provide an efficient randomized algorithm that
builds the PPI and guarantees Possible Innocence
during construction.
Approach
We propose to use a centralized
privacy-preserving index (PPI) in conjunction
with a distributed access-control-enforcing
protocol. The new index provides strong and
quantifiable privacy guarantees that hold even if
the entire index is made public. The degree of
privacy provided by the index is tunable,and
search overhead is proportional to degree of
privacy.
Graphic
P4
Group Z
Group F
Broader Impact
- Raise public awareness of and research interest
in search tools for access-controlled content. - Build privacy tools that give each provider
complete control over how much information is
shared, when it is shared, and with whom it is
shared.
Group A
6Paper A network security game and sum of
squares partition (SODA 2005) PORTIA PI Ravi
Kannan Institution Yale UniversityAuthors
Jim Aspnes, Kevin Chang, Aleksandr Yampolskiy
Research Objectives
Approach Broader Impact
- Significant Results
- A Nash Equilibrium can be much costlier than the
optimum strategy in the worst case, Nash costs a
factor T(n) more than optimum. - It is NP-Hard to compute an optimum strategy, as
well as the pure Nash Equilibria with highest and
lowest costs. - There is an O(log2 n) approximation algorithm
for finding optimum strategies. It solves a new
graph-partitioning problem -- the sum of squares
partition --by recursively removing sparse cuts.
The overall security of a network is very much
dependent on the choices made by individual
users. We ask whether the system is adversely
affected if users act selfishly.
We model network security as a graph in which the
nodes (users) choose whether or not to install
antivirus software (their strategies). The
ability of a virus to spread through the network
is determined by user strategies and the network
topology. We define notions of cost to each user
and cost to the entire network. We study the
game-theoretic properties of the system if user
strategies are chosen to be in a Nash
Equilibrium and give algorithms to find
near-optimum strategies in terms of cost to the
network.
Graphic
- Raise awareness in the security community of
game-theoretic methods and their limitations. - Provide algorithmic techniques to combat virus
propagation.
7Paper Database Engines for Bioscience PORTIA
PI Avi Silberschatz Institution Yale
UniversityAuthors J. Corwin, A. Silberschatz,
S. Yadlapalli
Research Objectives Significant Results
Approach Significant Results
Approach Graphic
Broader Impact Graphic
Biomedical and Bioinformatics applications have
new requirements for persistent data storage. Our
work focuses on developing new algorithms and
tools to better support these applications.
- We have created an extension to PostgreSQL, a
modern, open-source database engine. Our system
supports - New storage mechanisms for heterogeneous data
and frequently modified schema - New syntax to access these features through SQL
In collaboration with Yale University's
neuroinformatics research group, we have observed
that bioscience data have characteristics that
complicate their storage in and retrieval from a
conventional database system, e.g.,
heterogeneous, sparse data and frequently
modified database schema. We introduce dynamic
tables, sparse attributes, and row-level
security, and we are working to implement these
features in a conventional relational-database
engine.
- Develop tools capable of working with data not
traditionally suited for relational databases. - Improve the productivity of researchers working
with bioinformatics data.
8Paper Compositional Analysis of Contract
Signing Protocols (CSFW 2005) PORTIA PI John
C. Mitchell Institution Stanford
UniversityAuthors M. Backes, A.Datta, A. Derek,
J. C. Mitchell, M. Turuani
Research Objectives Significant Results General contract-signing protocol template
Approach Significant Results General contract-signing protocol template
Approach Graphic
Broader Impact Graphic
Contract-signing protocols allow two or more
users to exchange signatures fairly, so that no
party receives a contract unless all do. In this
work, we seek a systematic method for proving
correctness of such protocols.
- Properties addressed are fairness,
abuse-freeness, and trusted third-party
accountability. - Instantiates to protocols of Asokan-Shoup-Waidne
r, Garay-Jacobson-MacKenzie, Markowitch-Kremer,
Markowitch-Sacednia, Zhou-Deng-Bao.
We develop a method for reasoning about
contract-signing protocols using a specialized
protocol logic. Our proof method is
compositional Security properties (like
fairness) are proved for each protocol by
combining independent proofs of its three
sub-protocols. Proofs are carried out in a
template form, yielding reusable proofs that
may be instantiated for a number of different
protocols. Game-theoretic properties are proved
by demonstrating that the specific strategies
achieve their desired outcomes.
- Develop compositional techniques for reasoning
about security and privacy. - Develop provably secure protocols for exchanging
privacy-related contracts over the Internet.
9Paper On-line Negative Databases (ICARIS
2004) PORTIA PI Stephanie Forrest
Institution University of New MexicoAuthors F.
Esponda, E. S. Ackley, S. Forrest, P. Helman
Research Objectives
Significant Results
- Create a representation of data that naturally
protects privacy. - Dynamically construct and maintain collections
of private data.
- If an insider acquires a proper subset of the
negative database, she acquires little useful
information. - If an insider obtains the entire negative
database, it is an NP-hard problem for her to
recover the original positive set of strings DB - A tractable query Is string x in DB?
- An intractable query What strings matching x
in fields F1...Fk are in DB?
Approach
- Store the negative image of a set of data
records rather than the records themselves. - Logically divide the space of possible records
(fixed length binary strings) into two disjoint
sets DB and U-DB. - Construct a compressed version of U-DB that can
be efficiently created and updated. - Negative databases (NDB) are defined over the
0,1, alphabet, where is the wild card
symbol and stands for both a 1 and a 0 at the
position where it appears.
Graphic Ex. of negative-db construction
DB U-DB NDB
000 001 01
101 010 01
111 011 10
100
110
Broader Impact
- Privacy is an inherent property of the
representation. - Negative representations of information have
many potential applications, e.g., set
intersection, surveys, and secret sharing.
Best paper award.
10Paper Two Can Keep a Secret A Distributed
Architecture for Secure PORTIA PI H.
Garcia-Molina, R. Motwani Institution Stanford
UniversityAuthors G. Aggarwal, M. Bawa, P.
Ganesan, et al.
Database Services (CIDR 2005)
Research Objectives Significant Results New privacy model based on compliance needs Proof-of-concept distributed architecture for secure database services Query optimization and database design heuristics for partitioned operation
Approach Significant Results New privacy model based on compliance needs Proof-of-concept distributed architecture for secure database services Query optimization and database design heuristics for partitioned operation
Approach Graphic
Broader Impact Graphic
Organizations want to outsource database services
but are concerned about data privacy. In this
work, we ask how one can protect outsourced data
by exploiting multiple service providers.
We propose that data be partitioned across two
service providers in such a way that neither
provider has useful information that can breach
privacy. Queries are answered by formulating
appropriate sub-queries at each provider and
combining the results intelligently. By modeling
privacy constraints realistically, it is possible
to avoid encrypting data and to execute queries
efficiently while still preserving privacy.
- Introduce alternatives to encryption for
enabling secure database services. - Formulate open problems, in both theory and
systems, in the area of distributed architecture
for database outsourcing.
11Paper Online Balancing of Range-Partitioned
Data with Applications to PORTIA PI Hector
Garcia-Molina Institution Stanford
UniversityAuthors Prasanna Ganesan, Mayank
Bawa, Hector Garcia-Molina
Peer-to-Peer Systems (VLDB 2004)
Research Objectives Significant Results (Largest load / smallest load) ? 4.24. Amortized O(1) tuples are migrated per insertion or deletion. Fully distributed asymptotically optimal solution
Approach Significant Results (Largest load / smallest load) ? 4.24. Amortized O(1) tuples are migrated per insertion or deletion. Fully distributed asymptotically optimal solution
Approach Graphic
Broader Impact Graphic
We seek to support range queries in a large P2P
system while ensuring load balance. More
generally, we seek to enable SQL-style queries in
P2P systems and to automate physical design of
parallel databases.
In both P2P and parallel database systems, we
partition data into ranges and have each node
store data in one contiguous range. Online load
balancing is used to modify the partitions while
migrating as little data as possible.
- Enlarge the class of powerful multi-dimensional
queries that can be executed in P2P systems. - Transfer successful P2P techniques to the realm
of parallel databases.
Two universal operations used for load
balancing (a) NbrAdjust Data handed off to
adjacent node (b) Reorder Empty node
reordered to split load
12Paper Evaluating 2-DNF Formulas on Ciphertext
(TCC 2005) PORTIA PI Dan Boneh
Institution Stanford UniversityAuthors Dan
Boneh, Eu-Jin Goh, Kobbi Nissim
Research Objectives Significant Results
Approach Significant Results
Approach Graphic Evaluating 2-DNFs
Broader Impact Graphic Evaluating 2-DNFs
Encryption protects data from unauthorized access
but makes it hard to use. In this work,we seek
new encryption schemes that enable arbitrary
computation on ciphertexts.
We provide an encryption scheme that isboth ?
homomorphic up to a single ? and
homomorphic. Applications include
- One-round, secure two-party evaluation of
2-DNFs and dot products - Improved PIR and e-voting
An encryption scheme E is homomorphic to function
f (or f homomorphic) if, given EA and EB,
anyone can compute Ef(A,B). All known
efficient homomorphic encryption schemes are x
homomorphic or homomorphic but not both.We
seek a fully homomorphic encryption scheme, i.e.,
one that is homomorphic to a logically complete
operation (e.g., NAND) or setof operations. A
scheme that is both x and homomorphic is
logically complete.
- Alice
- ?(x1,,xn) ?ki1(yi,1?yi,2)
- ? arith. of ?
- replace ? by , ? by ?, ?xi by (1- xi).
Bob A (a1,,an)
Invoke Keygen(?). Encrypt A
PK, Ea1,,Ean
- Enable one-round, secure two-party computation.
- Improve applications such as grid computing on
sensitive data that use homomorphic encryption.
Eval. Er ? ?(A) for random r
Er ? ?(A)
If decrypt 0, emit 0. Else, 1.
13Paper Stronger Pwd. Auth. with Browser
Extensions (USENIX Security 05) PORTIA PIs D.
Boneh, J. Mitchell Institution Stanford
UniversityAuthors B. Ross, C. Jackson, N.
Miyake, D. Boneh, J. Mitchell
Research Objectives Significant Results Graphic
Approach Broader Impact Significant Results Graphic
Approach Broader Impact
Phishing sites and weak passwords have led to
Internet identity theft. We want to provide
increased security against these attacks with
minimal change for both the user and the server.
- Theft of hashed password will not yield a
password that can be used to log in to another
site. - Theft of users computer will not yield any
passwords. - Unobtrusive user interface provides security
against password-field spoofing and other
Javascript attacks.
We have developed a web-browser extension, called
PwdHash, that applies a cryptographic hash
function to the users password (using data
associated with the web site and an optional
global password as salt). The original password
is discarded, and the hashed password is sent to
the website instead. A web-based interface
provides an alternate mechanism for generating
passwords if the browser extension is not
available.
Secure Authentication User Interface
- Enable further innovation in authentication
protocols by providing secure password entry into
a web browser. - Enable design of web browsers that prevent
user-interface spoofing in Javascript and Flash.
Remote Authentication User Interface
14Paper Stably Computable Properties of Network
Graphs (DCOSS 2005) PORTIA PI Joan Feigenbaum
Institution Yale UniversityAuthors D.
Angluin, J. Aspnes, M. Chan, M. J. Fischer, H.
Jiang, R. Peralta
Research Objectives Significant Results
Approach Significant Results
Approach
Broader Impact
Anonymous, finite-state sensing devices can be
deployed in an ad-hoc communication network of
arbitrary size and unknown topology. We seek to
characterize the computations that can be done by
these sensors.
- Presburger graph predicates are stably
computable with stabilizing inputs in the family
of complete interaction graphs. - One can stably determine whether the interaction
graph contains a fixed subgraph, a directed
cycle, or a directed cycle of odd size. - Nondeterministic transition functions do not
increase the class of stably computable
predicates.
We ask what properties of the network the sensors
can detect and how they can use the properties to
organize computation. We define the notion of
stabilizing inputs to such devices and consider
the class of predicates that are stably
computable in weakly connected networks. We also
study the effects of nondeterministic transition
functions.
Graphic
- Explore new models of computation that arise
from new technologies such as sensor nets. - Improve understanding of the effects of
anonymity and nondeterministic interaction on
computational systems and their users.
15Paper Simulatable Auditing (PODS 2005) PORTIA
PI Rajeev Motwani Institution Stanford
UniversityAuthors K. Kenthapadi, N. Mishra, K.
Nissim
Research Objectives Significant Results Graphic
Approach Significant Results Graphic
Approach
In online query auditing, one is given a sequence
of (query, answer) pairs, together with a new
query one should refuse to answer the new query
if privacy may be breached and give the true
answer otherwise. We seek auditing methods in
which query denials provably do not leak
information.
- Expose weaknesses of earlier query auditors.
- Provide new definitions and models.
- Devise simulatable auditing algorithms for
- fundamental classes of queries.
In naïve query-auditing systems, denials may leak
information. We propose a model in which the
decision to deny can be made by either the
auditor or the user. We introduce a new
definition of privacy and construct simulatable
auditors for sum and max queries.
Broader Impact
- Broaden and deepen researchers understanding
- of the notion of privacy in this context.
- Raise awareness of the facts that denials can
leak - information and, hence, that query auditing
must - be done carefully.
16Paper Anonymizing Tables (ICDT 2005)PORTIA
PI Rajeev Motwani Institution Stanford
UniversityAuthors Aggarwal, Feder, Kenthapadi,
Motwani, Panigrahy, Thomas, Zhu
Research Objectives Significant Results
Approach Significant Results
Approach Graphic
Broader Impact Graphic
A k-anonymous relational database containing
personal information provides individual privacy
and data integrity. In this work, we investigate
the computational complexity of the k-anonymity
problem.
- The k-anonymity problem is NP-hard.
- There is a polynomial-time O(k)-approximation
algorithm for the general k-anonymity problem. - There are polynomial-time 1.5-approximation and
2-approximation algorithms for 2-anonymity and
3-anonymity, respectively.
We seek to minimize the number of cells
suppressed while ensuring that, for each tuple in
the modified table, there are at least k-1 other
tuples in the modified table that are identical
to it in the columns that will be made public.
Although the general problem is NP-hard, we use a
graph representation to obtain good approximation
algorithms. We also show a matching lower bound
for any algorithm that uses only the graph
representation.
Original table
Improve understanding of the algorithmic
properties of a popular approach to
de-identification of personal data.
2-anonymized version of the table
17Paper Fast Monte Carlo Algorithms for Massive
Matrices PORTIA PI Ravi Kannan
Institution Yale UniversityAuthors P. Drineas,
R. Kannan, M. W. Mahoney
Research Objectives Significant Results
Approach Significant Results
Approach Graphic
Broader Impact Graphic
As the capacity of external storage devices has
increased enormously, random access to their
contents has become prohibitively slow. We seek
efficient algorithms for computations on massive
matrices that are stored externally.
- We give pass-efficient algorithms for
- Matrix mult., SVD, and CUR Decomposition
- Feasibility testing for Linear Programs
- Gram-Matrix approximation for Statistical
Learning
We formulate the pass-efficient model of
computation. In this model, algorithm-design
goals are faster running time and fewer passes
over the matrices. To achieve these, we devise
approximation algorithms that use sampling
techniques. The algorithms compute not on whole
matrices but rather on a small sample of rows and
columns of the matrices.
- Stimulate research on massive-data-set
algorithms in the field of computational linear
algebra. - Improve applications such as Latent Semantic
Indexing that are based on matrix computation.
Diagram of the SVD Algorithm
18Paper Secure Computation of Surveys (SMP 2004)
PORTIA PI Joan Feigenbaum Institution
Yale UniversityAuthors J. Feigenbaum, B.
Pinkas, R. Ryger, F. Saint-Jean
Research Objectives Significant Results
Approach Significant Results
Approach Graphic
Broader Impact Graphic
- User-friendly implementation of
privacy-preserving - data-mining protocols is an achievable goal.
- FairPlay is useful.
- There are significant barriers to adoption,
including
Many people and organizations decline to
participate in surveys because of potential
misuse of sensitive data. We investigate
privacy-preserving surveys, using the Taulbee
faculty-salary survey as a concrete example.
- the need for privacy-preserving data cleaning
- uncertainty about compliance with laws and
- policies
Every function can be computed in a
privacy-preserving manner, using a
general-purpose secure, multiparty
function-evaluation (SMFE) protocol however,
this is not efficient enough for practical salary
surveys. We use the run-time system of FairPlay,
a general-purpose SMFE system, and a
special-purpose compiler that generates an
odd-even sorting-network circuit for the
specified number and lengths of inputs.
- Raise awareness of barriers to adoption of
privacy- - preserving data mining.
- Release an open-source, privacy-preserving
- salary-survey package that can be used by
other - researchers.
M input providers (outside) and N computation
servers (inside)
19Papers Short Group Signatures (Crypto 2004,
CCS 2004)PORTIA PI Dan Boneh Institution
Stanford UniversityAuthors D. Boneh, X. Boyen,
H. Shacham
Research Objectives Significant Results Short and efficient group signatures Suitable for Private attestation in Trusted Computing Vehicle Safety Ad-hoc Networks Verifier-Local Revocation (VLR) enables simple TPM revocation in Trusted Computing.
Approach Broader Impact Significant Results Short and efficient group signatures Suitable for Private attestation in Trusted Computing Vehicle Safety Ad-hoc Networks Verifier-Local Revocation (VLR) enables simple TPM revocation in Trusted Computing.
Approach Broader Impact Graphic
Graphic
Group signatures provide privacy for signers.
Signer privacy is important, but existing
group-signature schemes are impractical. We seek
to construct new, practical group signatures.
We construct short group signatures from a
zero-knowledge proof of knowledge for possession
of a Strong Diffie-Hellman tuple in bilinear
groups. Signing and verification is faster in
our scheme than in previous schemes, and
signatures are 20 times shorter. We also provide
a new revocation mechanism for group signatures,
Verifier-Local Revocation (VLR), that is better
suited for trusted-computing applications.
Signature Scheme Signature size
Ordinary RSA signatures 1,024 bits
ACJT00 group signatures 32,000 bits
BBS04 group signatures 1,536 bits
Private Attestation
- Group sigs are a powerful privacy mechanism.
- They can be applied to Trusted Computing and
- Vehicle Safety.
- We plan to release an open-source library
implementing our group signatures.
Desktop Computerwith TPM Support
Online Merchant
20Paper Personal Information the U.S. Census A
Model of Contextual Integrity PORTIA PI Helen
Nissenbaum Institution New York
UniversityAuthor Timothy Weber PORTIA
Collaborator Sam Hawala, US Census Bureau
- Research Objectives
- Better understand the development of
information-handling norms within a firmly
established context. - Use the theory of contextual integrity as a model
with which to analyze current policies that
regulate the flow of census information. - Develop policy heuristics aimed at upholding a
model of privacy as contextual integrity.
- Significant Results
- Provide an opportunity for the user community
(Statistical Research Division, U.S. Census
Bureau) to reflect on its policies and purposes. - Articulate historical transformations in census
policies to provide better perspective on
possible responses to new challenges, e.g., data
mining that may increase the probability of
reidentifying personal information.
- Approach
- Examine the U.S. Census as a context in which
significant attention has been focused on
developing good information-handling policies. - Trace the historical evolution of practices and
policies regulating the flow of census data to
understand the rationale underlying current
policies.
Graphic
Norms of Appropriateness are Brought to bear on
Categories of Information, such as the
questions Asked on census schedules.
Race
Income
Disclosure-avoidance techniques are applied To
aggregated micro-data to prevent
identification Of individuals in small sets.
Age
- Broader Impact
- Highlight the importance of census policies
regulating categories of information and access
to identifiable data. - Render explicit the reasons that certain
data-handling practices matter.
Personal Info.
Norms of Transmission are Brought to bear on
determining Who has access to what information
And under what conditions.
21Paper Privacy-Preserving Classification of
Customer Data (SDM 2005) PORTIA PI Rebecca N.
Wright Institution Stevens Institute of
TechnologyAuthors Z. Yang, S. Zhong, R. N.
Wright
Research Objectives Significant Results (for Naïve Bayes)
Approach Significant Results (for Naïve Bayes)
Approach Graphic Protocol in Fully Distributed Setting
Broader Impact Graphic Protocol in Fully Distributed Setting
Many data-mining algorithms compute frequencies
of combinations of attributes. We ask how data
miners can learn frequencies in customer data
without compromising customer privacy.
We propose a privacy-preserving frequency-mining
protocol with which an online data miner can
compute the frequency of customers data without
collecting the customers actual data. Complex
data-mining models can be learned using this
protocol, e.g., naïve Bayes classifiers, ID3
trees, and association rules. The experimental
results show that the proposed solutions are very
efficient when learning data-mining models in the
fully distributed setting.
- Explore a new direction for privacy-preserving
data mining in the fully distributed setting. - Show that both privacy and accuracy in data
mining can be achieved using cryptographic
techniques.
22Paper Graph Distances in the Streaming Model
the Value of Space (SODA 2005) PORTIA PI Joan
Feigenbaum Institution Yale
UniversityAuthors J. Feigenbaum, S. Kannan, A.
McGregor, S. Suri, J. Zhang
Research Objectives Significant Results
Approach Significant Results
Approach Graphic
Broader Impact Graphic
Massive graphs arise naturally when large data
sets model interactions. In this setting,
traditional graph algorithms are not efficient.
We investigate efficient graph algorithms in the
streaming model.
- We present a streaming algorithm that, for a
graph G(V,E), constructs a spanner in one pass
and uses Vpolylog(V) space. - We show a lower bound on the number of passes
needed to find the vertices at distance d from a
specific vertex.
There is a tradeoff between the accuracy of the
computation and the time-space efficiency of the
algorithm. We consider approximation algorithms
that give a result with bounded error but use
near-linear time and much less space than
traditional algorithms. In particular, we devise
streaming algorithms that construct graph
spanners and use the spanners to approximate
distances in the graph.
- Initiate the study of massive-graph problems in
the streaming model. - Broaden and deepen understanding of distance
computation and spanners in massive graphs.
23Paper Towards a Theory of Data Entanglement
(ESORICS 2004) PORTIA PI Joan Feigenbaum
Institution Yale UniversityAuthors J. Aspnes,
J. Feigenbaum, A. Yampolskiy, S. Zhong
Research Objectives Significant Results
Approach Significant Results
Approach Graphic
Broader Impact Graphic
Organizations can offer high-capacity, low-cost
storage services, but users typically do not
trust them. In this work, we ask how one can
protect data stored on an untrusted server.
Recovery Alg.
Standard Public Private
Arbitrary AONI AONI AONI
Entropy-Reducing AONI AONI AONI
Adversary
We propose that users cryptographically entangle
their data in such a way that, if one users data
were corrupted, other users data would also be
corrupted. The strongest form of entanglement is
called all-or-nothing integrity (AONI) ? either
all users data are fine, or nobodys are fine.
We formalize the notion of entanglement in
various models. All-or-nothing integrity, if
feasible, would achieve our goals, because a well
known, visible storage service cannot risk
offending all its users.
- Raise public awareness of and research interest
in cryptographically protected storage services. - Improve understanding of entanglement and of
systems (e.g., Dagster and Tangler) that use it.
24Paper Privacy-Enhancing k-Anonymization of
Customer Data (PODS 2005) PORTIA PI Rebecca N.
Wright Institution Stevens Institute of
TechnologyAuthors S. Zhong, Z. Yang, R. N.
Wright
Research Objectives Significant Results
Approach Significant Results
Approach Graphic Solution Overview
Broader Impact Graphic Solution Overview
The technique of k-anonymization is a powerful
tool for data de-identification. In this work, we
ask how one can k-anonymize customer data while
maintaining end-to-end privacy.
Privacy Guarantee Computational Overhead
Extraction of k-anonymous part Ideal Privacy O(customers)
Distributed MW algorithm Revealing row distances only O((customers)2k)
Metric
Solution
We consider two versions of the problem. For our
first version, we develop a protocol in which
theminer extracts the k-anonymous part of the
customer data. For our second version, we
present a distributed version of Meyerson and
Williams's algorithm MW for k-anonymization.
For both solutions, we rigorously define and
prove the security guarantee achieved.
- Promote research interest in cryptographically
secure protocols for databases. - Extend the study of data de-identification to
distributed settings.
25Paper Personal Information the Design of
Vehicle Safety Communication Technologies An
Application of Privacy as Contextual Integrity
PORTIA PI Helen Nissenbaum Institution New
York UniversityAuthor Michael Zimmer
PORTIA Collaborator Dan Boneh, Stanford
- Research Objectives
- Determine how the design of VSC technologies
might alter data flows of personal information. - Raise awareness among the researchers and
engineers designing and writing the standards for
VSC applications of the privacy implications of
their design decisions. - Produce policy analyses and design heuristics to
ensure that contextual integrity is preserved in
the design of VSC technologies.
- Significant Results
- VSC technologies have the potential to disrupt
the contextual integrity of personal information
flows in the context of highway travel. - But, our acceptance into the VSC design community
has been limited, threatening our ability to
ensure that the value of privacy is a
constitutive part of the technological design
process. - Successfully entering the design community
remains an open research problem.
Before VSC
Cameras and police can visually record
information under favorable conditions, only if
vehicle happens to pass in front of camera/police.
- Approach
- Conceptualize the privacy of personal information
flows in the context of highway travel in terms
of the theory of privacy as contextual integrity. - Enter the VSC design community to ensure that the
value of privacy is a constitutive part of the
design process, not merely retrofitted after
completion.
No information shared
Visual information available License plate,
vehicle description, general location
With VSC
Widely dispersed receivers or police can
digitally record data from all vehicles within
1000 meters.
- Broader Impact
- Improve understanding of the theory of privacy as
contextual integrity. - Reveal how close attention to values can inform
and guide the design of information technologies.