Title: Lecturer: Moni Naor
1Foundations of PrivacyInformal Lecture
Anti-Persistence or History Independent Data
Structures
2Why hide your history?
- Core dumps
- Losing your laptop
- The entire memory representation
- of data structures is exposed
- Emailing files
- The editing history may
- be exposed (e.g. Word)
- Maintaining lists of people
- Sports teams, party invitees
3Election Day
Carol
Alice
Alice
Bob
- Elections for class president
- Each student whispers in Mr. Drews ear
- Mr. Drew writes down the votes
Carol
- ProblemMr. Drews notebook leaks sensitive
information - First student voted for Carol
- Second student voted for Alice
Alice
Alice
Bob
3
4 Learning from history only whats necessary
- A data structure has
- A legitimate interface the set of operations
allowed to be performed on it - A memory representation
- The memory representation should reveal no
information that cannot be obtained from the
legitimate interface
5History of history independence
- Issue dealt with in Cryptographic and Data
Structures communities - Micciancio (1997) history independent trees
- Motivation incremental crypto
- Based on the shape of the data structure, not
including memory representation - Stronger performance model!
- Uniquely represented data structures
- Treaps (Seidel Aragon), uniquely represented
dictionaries - Ordered hash tables (Amble Knuth 1974)
6More History
- Persistent Data Structures possible to
reconstruct all previous states of the data
structure (Sarnak and Tarjan) - We want the opposite anti-persistence
- Oblivious RAM (Goldreich and Ostrovsky)
7Overview
- Definitions
- History independent open addressing hashing
- History independent dynamic perfect hashing
- Memory Management
- (Union Find)
- Open problems
8Precise Definitions
- A data structure is
- history independent if any two sequences of
operations S1 and S2 that yield the same content
induce the same probability distribution on the
memory representation. - strongly history independent if given any two
sets of breakpoints along S1 and S2 s.t.
corresponding points have identical contents, S1
and S2 induce the same probability distributions
on memory representation at those points.
Alternative Definition transition probability
9Relaxations
- Statistical closeness
- Computational indistinguishability
- Example where helpful erasing
- Allow some information to be leaked
- Total number of operations
- n-history independent identical distributions if
the last n operations where identical as well - Under-defined data structures same query can
yield several legitimate answers, - e.g. approximate priority queue
- Define identical content no suffix T such that
set of permitted results returned by S1?T is
different from the one returned by S2?T
10History independence is easy (sort of)
- If it is possible to decide the
(lexicographically) first sequence of
operations that produce a certain content, just
store the result of that - This gives a history independent version of a
huge class of data structures - Efficiency is the problem
11Dictionaries
- Operations are insert(x), lookup(x) and possibly
delete(x) - The content of a dictionary is the set of
elements currently inserted (those that have been
inserted but not deleted) - Elements x ? U some universe
- Size of table/memory N
12Goal
- Find a history independent implementation of
dictionaries with good provable performance. - Develop general techniques for history
independence
13Approaches
- Unique representation
- e.g. array in sorted order
- Yields strong history independence
- Secret randomness
- e.g. array in random order
- only history independence (not strongly)
14Open addressing traditional version
- Each element x has a probe sequence
- h1(x), h2(x), h3(x), ...
- Linear probing h2(x) h1(x)1, h3(x) h1(x)2,
... - Double hashing
- Uniform hashing
- Element is inserted into the first free space in
its probe sequence - Search ends unsuccessfully at a free space
- Efficient space utilization
- Almost all the table can be full
15Open addressing traditional version
Not history independent later-inserted elements
move further along in their probe sequence
x
y
x arrived before y, so move y
y
y
No clash, so insert y
16History independent version
- At each cell i, decide elements priorities
independently of insertion order - Call the priority function pi(x,y).
- If there is a clash, move the element of lower
priority - At each cell, priorities must form a total order
17Insertion
x
y
x
p2(x,y)? No, so move x
x
y
x
18Search
- Same as in the traditional algorithm
- In unsuccessful search, can quit as soon as you
find a lower-priority element
No deletions
- Problematic in open addressing
- Possible way out - clusters
19Strong history independence
- Claim
- For all hash functions and priority functions,
the final configuration of the table is
independent of the order of insertion. - Conclusion
- Strongly history independent
20Proof of history independence
A static insertion algorithm (clearly history
independent)
Gather up the rejects and restart
x2
x1
p1(x2,x1) so insert x2
x2
x2
x3
x1
x3
x1
p3(x4,x5) and p3(x4,x6). Insert x4 and remove x5
x5
x5
x5
x6
x4
insert x5
x4
x4
x5
x4
x4
x5
x6
x5
x2
x2
x1
x1
x4
x3
x6
p6(x6,x4) and p6(x3,x6), so insert x3
x3
x3
21Proof of history independence
- Nothing moves further in the static algorithm
than in the dynamic one - By induction on rounds of the static alg.
- Vice versa
- By induction on the steps in the dynamic alg.
- Strongly history independent
Alternative view Blelloch-Golovin Stable
Matching
22Some priority functions
- Global
- A single priority function independent of cell
- Random
- Choose a random order at each cell
- Youth-rules
- Call an element younger if it has moved less
far along its probe sequence younger elements
get higher priority
23Youth-rules
y
p2(x,y) because x has taken fewer steps than y
y
x
- Use a tie-breaker if of steps the same
- This is a priority function
y
x
24Specifying a scheme
- Priority rule
- Choice of priority functions
- In Youth-rules determined by probe sequence
- Probe functions
- How are they chosen
- Maintained
- Computed
25Implementing Youth-rules
- Let each hi be chosen from a pair-wise
independent collection - For any two x and y the r.v. hi(x) and hi(y) are
uniform and independent. - Let h1, h2, h3, be chosen independently
- Example hi(x) (aix mod U) bi mod N
- Space two elements per function
- Need only log N functions
Prime
26Performance Analysis
- Based on worst-case insertion sequence
- The important parameter ? - the fraction of the
table that is used ?N elements - Analysis of expected insertion time and search
time (number of probes to the table) - Have to distinguish successful and unsuccessful
search
27Analysis via the Static Algorithm
- For insertions, the total number of probes in
static and dynamic algorithm are identical - Easier to analyze the static algorithm
- Key point for Youth-rules in the phase i all
unsettled elements are in the ith probe of their
sequence - Assures fresh randomness of hi (x)
28Performance
- For Youth-rules, implemented as specified
- For any sequence of insertion the expected
probe-time for insertion is at most 1/(1-?) - For any sequence of insertions the expected
probe-time for successful or unsuccessful search
is at most 1/(1-?) - Analysis based on static algorithm
- ? is the fraction of the table that is used
29Comparison to double hashing
- Analysis of double hashing with truly random
functions Guibas Szemeredi, Lueker
Molodowitch - Can be replaced by log n wise independent
functions Schmidt Siegel - log n wise independent is relatively expensive
- either a lot of space or log n time
- Youth-rules is a simple and provably efficient
scheme with very little extra storage - Extra benefit of considering history independence
30Other Priority Functions
- Amble Knuth log(1/(1-?)) for global
- Truly random hash functions
- Experiments show about log(1/(1-?)) for most
priority functions tried - Performance is for amortized search
31Other types of data structures
- Memory management (dealing with pointers)
- Memory Allocation
- Other state-related issues
32Dynamic perfect hashingFKS scheme, dynamized
Low-level tables O(n) space total. Each gets
about si2
n elements to be inserted
Top-level table O(n) space
h0
x1
x3
s0
h1
h
s1
x5
x4
x6
x2
hk
sk
The hi are perfect on their respective sets.
Rechoose h or some hi to maintain perfection and
linear space.
33A subtle problemthe intersection bias problem
- Suppose we have
- a set of states ?1, ?2, ...
- a set of objects h1, h2, ...
- a way to decide whether hi is good for ?j.
- Keep a current h as states change
- Change h only if it is no longer good.
- Choose uniformly from the good ones for ?.
- Then this is not history independent
- h is biased towards the intersection of those
good for current ? and for previous states.
34Dynamized FKS is not history independent
- Does not erase upon deletion
- Uses history-dependent memory allocation
- Hash functions (h, h1, h2, ...) are changed
whenever they cease to be good - Hence they suffer from the intersection bias
problem, since they are biased towards functions
that were good for previous sets of elements - Hence they leak information about past sets of
elements
35Making it history independent
- Use history independent memory allocation
- Upon deletion, erase the element and rechoose the
appropriate hi. This solves the low-level
intersection bias problem. - Some other minor changes
- Solve the top-level intersection bias problem...
36Solving the top-level intersection bias problem
- Cant afford a top-level rehash on every deletion
- Generate two potential hs ?1 and ?2 at the
beginning - Always use the first good one
- If neither are good, rehash at every deletion
- If not using ?1, keep a top-level table for it
for easy goodness checking (likewise for ?2)
37Proof of history independence
- Tables state is defined by
- The current set of elements
- Top-level hash functions
- Always the first good ?i, or rechosen each step
- Low-level hash functions
- Uniformly chosen from perfect functions
- Arrangement of sub-tables in memory
- Use history-independent memory allocation
- Some other history independent things
38Performance
- Lookup takes two steps
- Insertion and deletion take expected amortized
O(1) time - There is a 1/poly chance that they will take more
39SHI and Unique Representation
- Theorem Hartline et al for a reversible data
structure to be SHI, a canonical (unique)
representation for each state must be determined
during the data structures initialization.
40SHI with Deletions
- Blelloch and Golovin a dictionary based on
linear probing - Goal search in O(1) time (guaranteed)
- Each cluster of size O(log n)
- Can be obtained using 5-wise independence Pagh
et al., STOC 2007 - Needs random oracle for high level intersection
bias
41Open Problems
- Better analysis for youth-rules as well as other
priority functions with no random oracles. - Efficient memory allocation
- ours is O(s log s)
- Separations
- Between strong and weak history independence
Buchbinder-Petrank - Between history independent and traditional
versions - e.g. for Union Find
- Can persistence and (computational) history
independence co-exist efficiently?
42References
- Moni Naor and Vanessa Teague, Anti-persistence
History Independent Data Structures, STOC, 2001. - Hartline, Hong, Mohr, Pentney and Rocke,
Characterizing History Independent Data
Structures, Algorithmica 2005 - Buchbinder and Petrank, Lower and upper bounds on
obtaining history independence, Information and
Computation 2006. - Guy Blelloch and Daniel Golovin, Strongly
History-Independent Hashing with Applications,
FOCS 2007 - Tal Moran, Moni Naor and Gil Segev Deterministic
History-Independent strategies for Storing
Information in Write-Once Memories, ICALP 2007