Title: Cryptographic hashes and data structures
1Cryptographic hashes and data structures
2Overview
- Cryptographic hash function
- Hashes are like pointers
- Hashes can name objects
- Tamper evident logging
3Part 1 Cryptographic hashes
- Security properties
- Talk about two widely used functions
- Applications
- Public key signatures
- Authentication
- S/Key
4Part 1 Cryptographic hashes
Hello
0x 4aef db91
World
0x 6bd 7916
- Security
- Collision resistant
- Second pre-image resistant
- Idealizes as a random oracle
5Collision resistant
x
v
y
6Second pre-image resistant
Hello
0x 4aef db91
y
7Idealizes as a random oracle
- It deterministically maps every input into a
random response uniformily distributed in its
output domain.
Hello
0x4aefdb9
World
0x6d7291
8Typical hash functions
- MD5 128 bit output. Obsolete, broken
- SHA-1 160 bit output. Showing problems
- Fast 10mbyte/sec 200mbyte/sec.
- New contest for developing a replacement
9Cryptographic hashes
- What about a 32 bit hash function?
- Idealized as a random oracle
- Is it secure?
- Collision resistant
- Second pre-image resistant
x
v
y
Hello
0x42afdb91
y
10Applications
- Short hash acts as a proxy for a large object.
RSA
Contract
RSA
0x55adb35
11Applications Keyed MAC
- Shared secret S
- Bob proves they have a secret S to Alice
- Without disclosing it in the clear
Alice
Bob
Nonce
H(Nonce,S)
12Applications S/Key
- One-time-use passwords for login
- Dont store the password on the server
- User holds a small secret S
0x42af
0xb823
0x2ef7
0x38cc
S
13Part 2 Hashes are like pointers
- Self-certifying
- Can verify the contents
- Contents can be
- Atoms
- Hashes (pointers)
- We can build data structures around hashes
- Applications
- Authenticated dictionaries
- Merkle trees
- Hash chain log
0xdb91
Hello
0x423c
0xe802
14Authenticated dictionaries
- Dictionary lookups
- Proofs of correct answer
- Security model
- Trusted author. Untrusted publisher.
- Only trust the author
- Dont trust the publisher doing lookups and
returning results - Result must be verified by authors signature
- Applications
- Mapping from public keys to people/email
addresses - Someone breaks in to the webserver, cant fake
results
15Authenticated dictionaries
Always
0xc4c7
0x9690
Comp
0xdb91
Hard
0x84dd
Hello
0xaf82
0x03b6
0x423c
0xc3e1
0xe802
World
0x69be
Science
0x4a81
0x1193
NULL
16Authenticated dictionaries
Always
0xc4c7
0x9690
Comp
0xdb91
Hard
0x84dd
Hello
0xaf82
0x03b6
0x423c
0xc3e1
0xe802
World
0x69be
Science
0x4a81
0x1193
NULL
17Proofs in an authenticate dictionary
Always
0xc4c7
0x9690
Comp
0xdb91
Hard
0x84dd
Hello
0xaf82
0x03b6
0x423c
0xc3e1
0xe802
World
0x69be
Science
0x4a81
0x1193
NULL
18Simpler notation
Comp
Always
Hello
Hard
R
World
Science
19Merkle trees
- Public key cryptography is slow
- Can we batch?
- Do many signatures for each public key operation?
20Merkle trees
- Sign the root signs the full contents
R
X1
X2
X3
X4
X5
21Merkle trees
- Sign the root signs the full contents
R
X1
X2
X3
X4
X5
22Hash chain log
- Simple tamper-evident log
- Trusted clients inserting in a log with an
untrusted logger - On each update, logger commits to the past
- Tampering leaves evidence
- Notation
- Xn Events in the log
- Cn Commitment
23Hash chain log
Cn-3
Xn-5
Xn-4
Xn-3
24Hash chain log
Cn-2
Xn-5
Xn-4
Xn-3
Xn-2
25Hash chain log
Cn-1
Xn-5
Xn-4
Xn-3
Xn-2
Xn-1
26Timeline Entanglement
- Maniatis Baker 2003
- Agents in a distributed system each run a hash
chain log and exchange commitments with each
other - Builds a history palimpsest, preventing anyone
from tampering with the history without leaving
evidence - Can order events in remote timelines
27Timeline entanglement
Host A
Cn-4
Xn-5
Xn-4
Host B
Cn-4
Xn-5
Xn-4
28Timeline entanglement
Host A
Cn-3
Xn-5
Xn-4
Host B
Cn-4
Xn-5
Xn-4
29Timeline entanglement
Host A
Cn-2
Xn-2
Xn-1
Xn
Xn-5
Xn-4
Host B
Cn-2
Xn-3
Xn-2
Xn-1
Xn
Xn-5
Xn-4
30Timeline entanglement
Host A
Cn-2
Xn-2
Xn-1
Xn
Xn-5
Xn-4
Host B
Cn-1
Xn-3
Xn-2
Xn-1
Xn
Xn-5
Xn-4
31Timeline entanglement
Host A
Cn
Xn-2
Xn-1
Xn
Xn1
Xn2
Xn-5
Xn-4
Host B
Cn1
Xn-3
Xn-2
Xn-1
Xn
Xn-5
Xn-4
32Hash-based data structures
- Generic
- Can be arbitrary DAG data structures
- Cant handle loops
33Part 3 Hashes as names
- Hashes are
- Constant size
- Self-authenticating
- Can verify data is correct, from an untrusted
host - Detect data corruption
- Applications
- Bittorrent
- Problems?
- Hash Collisions
34BitTorrent
- A bunch of people all want the same file
- Breaks a file into 128kb pieces
- Peers interested in a file
- Form a random graph
- Trade pieces they have for pieces they want
35BitTorrents .torrent file
Piece 1 hash
Piece 2 hash
Piece 3 hash
IH
Piece k hash
filename1,size1
filename2,size2
filename3,size3
Tracker1 IPPort
Tracker2 IPPort
Tracker3 IPPort
36Hash collisions
- A 128 bit hash will have a collision, by chance
after 264 (1019) objects - Birthday paradox
- Have a 1/10-6 chance of collision with only 1016
objects - Sounds big
- But only 3M objects each for 10B people
- How to reduce chances of collision?
37Part 4 Secure logging
- What is tamper evidence?
- Threat model
- History tree
- Merkle aggregation
- Scaling the log
38Tamper evidence
- Log events to identify if they are later altered
- Requires looking at the log
- Evidence exists ! evidence efficiently findable
39Threat model
- Detect tampering, not prevent it
- Logger
- Stores events
- Never trusted
- Clients
- Little storage
- Create events to be logged
- Trusted only at time of event creation
- Auditors
- Little storage
- Trusted, at least one is honest
40Protocol for adding to log
Client
Logger
Add event X
Event at index i, signed commitment Ci
41Classic approach
- Use a hash-chain log
- Sign the hash commitments
Cn-1
Xn-5
Xn-4
Xn-3
Xn-2
Xn-1
42What if tampering occurred?
- Logger miscomputed hashes
- Can be found by asking for the raw event being
hashed
Cn-1
Xn-5
Xn-4
Xn-3
Xn-2
Xn-1
43What if tampering occurred?
- Logger forks the history
- Rolls back the log and adds on different events
- Attack requires two commitments on different
forks disagree on the contents of one event.
Cn-1
Xn-5
Xn-4
Xn-3
Xn-2
Xn-1
Cn-3
Xn-6
Xn-4
Xn-3
Xn-5
44Logger
- Generating a stream of log heads
- Each signs some log
- How to know if they are consistent with each
other? - If inconsistent, we have evidence of tampering
- But how to find?
- Have to look and audit
- Dont want to send the full log
- New paradigm
- Tamper evidence comes from auditing
Cn
Cn-4
Cn-5
Cn-2
Cn-1
45Two kinds of audits
- Membership auditing
- Prove the contents of a particular event in a
particular log - Lookup an event
- Incremental auditing
- Prove that two logs, represented by their
commitments commit the same events they have in
common - Detect forking
46Membership auditing
Client Auditor
Logger
What is event at index i in log Cj
Event is Xi, and heres a proof from Cj
47Incremental auditing
Auditor
Logger
Prove that Ci and Cj are consistent
Proof
48Historically integrity
- Definition
- If there is a verified incremental proof between
commitments Cj and Ck (j
all verifiable membership proofs that event i in
log Cj is Xi and event i in log Ck is Xi, we
must have XiXi.
Cn-1
Xn-5
Xn-4
Xn-3
Xn-2
Xn-1
Cn-3
Xn-5
Xn-4
Xn-3
49Historical integrity
- Any audit of commitments on different forks must
fail an incremental audit - Caveats
- Clients must share their received commitments
with auditors - Auditors must trade commitments with each other
- May probabilistically detect tampering by
auditing random subset of events - Any tampering with unaudited event will not be
discovered
50Example Auditing a hash chain
?
Cn
Cn-5
Cn
Xn-4
Xn-3
Xn-2
Xn-1
Xn
51Challenge
- How to make audits cheap
- CPU
- Communications complexity
- Storage
52History Tree
- Binary tree
- Events stored on leaves
- Logrithmic path length
- Random access
- Permits reconstruction of past version and past
commitments
53History Tree
R
X1
X2
54History Tree
R
X1
X2
X3
55History Tree
R
X1
X2
X3
X4
56History Tree
R
X1
X2
X3
X4
X5
57History Tree
R
X1
X2
X3
X4
X5
X6
58History Tree
R
X1
X2
X3
X4
X5
X7
X6
59History Tree
R
X1
X2
X3
X4
X5
X7
X6
60Incremental proof from C3 to C7
C7
C3
- Auditor has commitments C3 and C7
- Demands proof P
- Pruned tree
- Proves that both commitments commit the same
events - Three trees
- Trees implicitly commited by C3 and C7
- Tree given in P
61Incremental proof from C3 to C7
C7
C3
P
X1
X2
X3
X4
X5
X7
X6
- Prove
- P has same events as C7
- P has same events as C3
- Therefore, C3 and C7 commit same events.
62Incremental proof from C3 to C7
C7
C3
P
X1
X2
X3
X4
X5
X7
X6
- Prove
- P has same events as C7
- P has same events as C3
- Therefore, C3 and C7 commit same events.
63Incremental proof from C3 to C7
C7
C3
P
X1
X2
X3
X4
X5
X7
X6
- Prove
- P has same events as C7
- P has same events as C3
- Therefore, C3 and C7 commit same events.
64Incremental proof from C3 to C7
C7
C3
P
X1
X2
X3
X4
X5
X7
X6
- Prove
- P has same events as C7
- P has same events as C3
- Therefore, C3 and C7 commit same events.
65Membership proof of event X3 in C7
C7
P
X1
X2
X3
X4
X5
X7
X6
66Pruned subtrees
C7
C3
P
X1
X2
X3
X4
X5
X7
X6
- Not sent to auditor
- Fixed by hashes above them
- Tampering is discovered on audit
67Pruned subtrees
C7
C3
P
X1
X2
X3
X4
X5
X7
X6
- Not sent to auditor
- Fixed by hashes above them
- Tampering is discovered on audit
68Merkle aggregation
- Annotate events with attributes
- Aggregated up the tree
- Used to perform queries
- Used to permit safe deletion
69Events are flagged
R
X1
X2
X3
X4
X5
X7
X6
- Flags are incorporated into hashes
- Tamper-evident
- Tested for correct propagation during auditing
70Queries
R
X1
X2
X3
X4
X5
X7
X6
- Auditor wants all events matching flag
71Safe deletion
R
X1
X2
X3
X4
X5
X7
X6
- Unimportant events may be deleted
- When auditor requests deleted event
- Logger supplies proof that ancestor was not
important
72Other attributes
- Can use more than boolean flag
- Bank transactions
- Dollar value
- Aggregated Max function
- Tagged events
- Tag with legal, finance, security, junk
- Aggregated with OR
- Marked with their expiration time
- Aggregated with max
73Generic aggregation
- (?,?,?)
- ? Type of attributes on each node in history
- ? Aggregation function
- ? Maps an event to its attributes
- For any predicate P, as long as
- P(x) OR P(y) IMPLIES P(x?y)
- Then
- Can query for events matching P
- Can safe-delete events not matching P
74Scaling the log
- Storing events on secondary storage
- Increasing throughput
75Node write order
R
7
3
6
10
1
2
4
5
8
11
9
X1
X2
X3
X4
X5
X7
X6
- Nodes are frozen (no longer ever change)
- In post-order traversal
- Static order
- Map into an array
76Increasing throughput
- Public key signatures are slow
- Parallelize
77Parallelizing adds
- Processing pipeline
- Event arrives
- Assigned an index
- Serialization point
- Newly frozen interior nodes are computed
- O(log n) worst. O(1) typical. Dependencies
- New commitment Cn is generated
- O(log n). No dependencies
- Commitment Cn is signed and sent
- O(expensive). No dependencies except for Cn
78Parallelizing proof generation
- Processing pipeline
- Audits
- Request arrives
- Pruned tree built
- Requires read-only access to history. No
dependencies. - History is write-once, append-only
- May use mirrors
- Pruned tree is sent
79Tamper evident logs
- New paradigm
- Concise proofs
- Permit deletion
- Permits searching and safe deletion
- Scalable
80Summary
- Cryptographic hashes
- Data structures
- Naming
- New tamper-evident log design