Title: RANDOM GRAPHS IN CRYPTOGRAPHY
1RANDOM GRAPHS IN CRYPTOGRAPHY
- Adi Shamir
- The Weizmann Institute
- Israel
May 15, 2007 7th Haifa Workshop on
Interdisciplinary Applications of Graph Theory,
Combinatorics and Algorithms
2Random Graphs in Cryptography
In this talk I will concentrate on some
particular algorithmic issues related to random
graphs which are motivated by cryptanalytic
applications
Many of the results I will describe are either
unpublished or little known in our community
Note that in cryptanalysis, constants are
important!
3Cryptography and Randomness
- Cryptography deals with many types of randomness
- - random strings
- random variables
- random functions
- random permutations
- random walks
-
4Cryptography and Randomness
- The notion of random functions (oracles)
- - truly random when applied to fresh inputs
- consistent when applied to previously used
inputs - f(0)37
- f(1)92
- f(2)78
- f(3)51
5Cryptography and Randomness
This tabular description gives us a local view,
which is not very informative
To see the big picture, we define the random
graph G associated with the random function f
x
f(x)
6Cryptography and Randomness
When the function f is a permutation, its
associated graph G is very simple
7Cryptography and Randomness
However, when the function f is a random function
rather than a random permutation, we get a very
rich and interesting structure
8Random Graph 1
9Random Graph 2
10Cryptography and Randomness
There is a huge literature on the structure and
combinatorial properties of such random graphs
The distribution of component sizes, tree sizes,
cycle sizes, vertex in-degrees, number of
predecessors, etc.
11Cryptography and Randomness
In many applications we are interesting in the
behavior of the random function f under
iteration.
- Examples
- pseudo random generators
- stream ciphers
- iterated block ciphers and hash functions
- time/memory tradeoff attacks
- randomized iterates
-
In this case, we are interested in a single path
starting at a random vertex within the random
graph.
12A random path in a random graph
13A random path in a random graph
14A random path in a random graph
15Cryptography and Randomness
Such a path always starts with a tail, and ends
with a cycle.
The expected length of both the tail and the
cycle is about the square root of the number of
vertices.
16Assuming that we can only move forwards along
edges
Interesting algorithmic problems on paths
17Assuming that we can only move forwards along
edges- Find some point on the cycle
Interesting algorithmic problems on paths
18Assuming that we can only move forwards along
edges- Find some point on the cycle- Find the
same point a second time
Interesting algorithmic problems on paths
19Assuming that we can only move forwards along
edges- Find some point on the cycle- Find the
same point a second time- Find the length of the
cycle
Interesting algorithmic problems on paths
l
20Assuming that we can only move forwards along
edges- Find some point on the cycle- Find the
same point a second time- Find the length of the
cycle- Find the cycle entry point
Interesting algorithmic problems on paths
21Interesting algorithmic problems on paths
- Why are we interested in these algorithms?
- Pollards rho algorithm The cycle length l can
be used to find small factors of large numbers,
requires only negligible memory. -
- Finding collisions in hash functions The cycle
entry point can represent a hash function
collision.
22How to find a collision in a given hash function
H?
- Exhaustive search Requires 2n time, no space
- Birthday paradox Construct a large table of
2n/2 random hash values, sort it, and look for
consecutive equal values. Requires both time and
space of 2n/2 - Random path algorithm Iterate the hash function
until you find the entry point into a cycle.
Requires 2n/2 time and very little space
23Cycle detection is a very well studied problem
- Floyd
- Pollard
- Brent
- Yao
- Quisquater
-
- And yet there are new surprising ideas!
24The best known techniqueFloyds two finger
algorithm- Keep two pointers- Run one of them
at normal speed, and the other at double speed,
until they collide
25Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
26Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
27Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
28Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
29Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
30Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
31Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
32Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
33Floyds two finger algorithm- Keep two
pointers- Run one of them at normal speed, and
the other at double speed, until they collide
34Can we use Floyds algorithm to find the entry
point into the cycle?
35Can we use Floyds algorithm to find the entry
point into the cycle?-First find the meeting
point
36Can we use Floyds algorithm to find the entry
point into the cycle?- first find the meeting
point- move one of the fingers back to the
beginning
37Can we use Floyds algorithm to find the entry
point into the cycle?- first find the meeting
point- move one of the fingers back to the
beginning- move the two fingers at equal speed
38Can we use Floyds algorithm to find the entry
point into the cycle?- first find the meeting
point- move one of the fingers back to the
beginning- move the two fingers at equal speed
39Can we use Floyds algorithm to find the entry
point into the cycle?- first find the meeting
point- move one of the fingers back to the
beginning- move the two fingers at equal speed
40Why does it work?
41Why does it work?- denote by d the distance from
the beginningto the meeting point
d
42Why does it work?- denote by d the distance from
the beginningto the meeting point- the fast
finger ran another d, reaching the same point, so
d is some (unknown) multiple of the cycle length
d
43Why does it work?- running the two marked
fingers another d steps reaches the same point
again
d
44Why does it work?- running the two marked
fingers another d steps reaches the same point
again- so the two fingers meet for the first
time at the entrance to the cycle, and then
travel together
d
45Why does it work?- running the two marked
fingers another d steps reaches the same point
again- so the two fingers meet for the first
time at the entrance to the cycle, and then
travel together
d
46Is this the most efficient cycle detection
algorithm?
47Is this the most efficient cycle detection
algorithm? - When the path has n vertices and
the tail is short, Floyds algorithm requires
about 3n steps, and its extension requires up to
5n steps
48Is this the most efficient cycle detection
algorithm? - When the cycle is short, the fast
finger can traverse it many times without noticing
49A better idea- Place checkpoints at fixed
intervals- Update the checkpoints periodically
50Problem - Too few checkpoints can miss small
cycles
51Problem- Too many checkpoints are wasteful
52Problem- You do not usually know in which case
you are!
53Examples of unusually short cycles - cellular
automata (e.g., when simulating the game of
life)- stream ciphers (e.g., when one of the
LFSRs is stuck at 0)
54A very elegant solutionPublished by Nivasch in
2004
55Properties of the Nivasch algorithm- Uses a
single finger- Uses negligible amount of
memory- Stops almost immediately after recycling
- Efficient for all possible lengths of cycle
and tail- Ideal for fast hardware implementations
56The basic idea of the algorithm- Maintain a
stack of values, which is initially empty -
Insert each new value into the top of the stack-
Force the values in the stack to be monotonically
increasing
4
3
6
7
9
5
0
8
2
1
57The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
58The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
59The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
60The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
61The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
62The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
63The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
64The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
65The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
66The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
67The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
68The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
69The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
70The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
71The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
72The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
73The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
74The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
75The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
76The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
77The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
78The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
79The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
80The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
81The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
82The Stack Algorithm
4
3
6
7
9
5
0
8
2
1
83Stop when two identical values appear at the top
of the stack
4
3
6
7
9
5
0
8
2
1
84Claim The maximal size of the stack is expected
to be only logarithmic in the path length,
requiring negligible memory
4
3
6
7
9
5
0
8
2
1
85Claim The stack algorithm always stops during
the second cycle, regardless of the length of the
cycle or its tail
4
3
6
7
9
5
0
8
2
1
86Proof The smallest value on the cycle cannot be
eliminated by any later value. Its second
occurrence will eliminate all the higher values
separating them on the stack.
4
3
6
7
9
5
0
8
2
1
87The smallest value in the cycle is located at a
random position, so we expect to go through the
cycle at least once and at most twice (1.5 times
on average)
4
3
6
7
9
5
0
8
2
1
88Improvement Partition the values into k types,
and use a different stack for each type. Stop the
algorithm when repetition is found in some stack.
4
3
6
7
9
5
0
8
2
1
89The new expected running time (11/k)n. Note
that n is the minimum possible running time of
any cycle detecting algorithm, and for k100 we
exceed it by only 1
4
3
6
7
9
5
0
8
2
1
90Unlike Floyds algorithm, the Nivasch algorithm
provides excellent approximations for the length
of the tail and cycle as soon as we find a
repeated value, with no extra work
4
3
6
7
9
5
0
8
2
1
91Note that when we stop, the bottom value in each
stack contains the smallest value of that type,
and that these k values are uniformly distributed
along the tail and cycle
4
3
6
7
9
5
0
8
2
1
92Adding two special points to the k stack bottoms,
at least one must be in the tail and at least one
must be in the cycle, regardless of their sizes
4
3
6
7
9
5
0
8
2
1
93We can now find the two closest points (e.g., 0
and 2) which are just behind the collision point.
We can thus find the collision after a short
synchronized walk
4
3
6
7
9
5
0
8
2
1
94The Fundamental Problem of Cryptanalysis
- Given a ciphertext, find the corresponding key
- Given a hash value, find a first or second
preimage
Invert the easily computed random function f
where f(x)Ex(0) or f(x)H(x)
95The Random Graph Defined by f
Goal Go backwards Means Going forwards
96Possible solutions
Method 1 Exhaustive search Time complexity T
N. Memory complexity M 1.
Method 2 Exhaustive tableTime complexity T
1. Memory complexity M N.
Time/Memory tradeoffs find a compromise
betweenthe two extremes, i.e., M ltlt N and T
ltlt N, by using a free preprocessing stage.
97Hellmans T/M Tradeoff (1979)
Preprocessing phaseChoose m random starting
points, evaluate chains of length t.Store only
pairs of (startpoint,endpoint) sorted by
endpoints.
Online phase from the given yf(x) complete the
chain.Find x by re-calculating the chain from
its startpoint.
98How can we cover this graph by chains?
The main problem Long chains converge
99How can we cover this graph by chains?
Problem Hard to cover more than N/t images Why?
A new path of length t is likely to collide with
the N/t images already covered by the table, due
to the birthday paradox Hellmans solution Use t
independent tables from t related
functions fi(x)f(x i mod N) note that
inversion of fi ? inversion of f.
100Are these graphs really independent?
- Local properties are preserved, while global
properties are modified - A point which had k predecessors in f will also
have k predecessors in fi, but their identities
will change. - In particular, all the graphs will have exactly
the same set of leaves, and values which are not
in the range of f will not be covered by any path
in any table - On the other hand, the number of components, the
size of the cycles, and the structure of the
trees hanging around the cycle can be very
different
101Are these graphs really independent?
Hellmans trick is theoretically unfounded, but
works very well in practice. To invert a given
image, try separately each one of the t
functions, so both time and space increase by t
Tt2, Mmt By the birthday paradox, the
maximum possible values of t and m in a single
table satisfy ttmN ? T/M tradeoff TM2N2.
102A typical choice of parameters
Let cN1/3 Use c tables, each table with mc
paths, each path of length about tc (stopping
each path at the first distinguished point
rather than at a fixed length) Together they
cover most of the c3N vertices. Total time
Tc2N2/3 , total space Mc2N2/3
103Are such tradeoffs practically interesting?
Can be the best approach for cryptosystems with
about 80 bit keys Can be the best approach for
cryptosystems whose keys are derived from
passwords with up to 16 characters
104A new optimization of Hellmans scheme
If each value has b bits, straightforward
implementations require 2b bits per path I will
show now that only b/3 bits are needed. This is
a 6 fold saving in memory, which is equivalent to
36 fold saving in time due to the T/M tradeoff
curve TM2N2
105The new optimization
According to an old Chinese philosopher, Paths
in random graphs are like people They are born
at uniformly distributed startpoints, but die at
very unequally distributed endpoints!
106The unequal distribution of endpoints
distinguished points are much more likely to be
near the leaves than deep in this graph, so very
few of them will be chosen as endpoints
107The new optimization Forget the endpoints!
Note that the startpoints are arbitrary, so for
each one of the c tables we can choose a
different interval of c consecutive values as
start points Since cN1/3 , only b/3 bits are
needed per startpoint. Since we do not store
endpoints, this is all we need!
108Divide all the c2 possible distinguished points
into about c large regions
9
8
c
7
6
During preprocessing, make sure that each region
has at most one path ending at one of its
distinguished points
5
4
3
c
For each region, memorize the startpoint of this
path (if it exists), but not the value of the
corresponding endpoint
2
1
0
109Problem This can lead to too many false alarms
9
8
7
6
A false alarm happens when a stored endpoint is
found, but its corresponding startpoint does not
lead back to the initial value. In Hellmans
scheme, false alarms happen in about half the
tables we try. This wastes time, but is hard to
avoid.
5
4
3
2
1
0
110There are two types of false alarms here
9
8
7
An old false alarm happens when the path from the
initial value joins one of the precomputed paths
6
5
4
3
2
1
0
111There are two types of false alarms here
9
8
7
An old false alarm happens when the path from the
initial value joins one of the precomputed paths
6
5
4
3
A new false alarm happens when the path from the
initial value enters a new endpoint
2
1
0
112A surprisingly small number of new false alarms
are created by forgetting the endpoints
9
8
7
Endpoints which are likely to be chosen by the
online phase were also likely to be chosen by the
preprocessing phase Since the Hellman
parameters were chosen maximally, with high
probability each new path is likely to end in one
of the marked endpoints (otherwise we could add
more paths to increase our cover!)
6
5
4
3
2
1
0
113The bottom line
9
8
7
Simulations show that the total running time is
increased only by a few percent due to new false
alarms, and thus it is a complete waste to
memorize the endpoints!
6
5
4
3
2
1
0
114Oechslins Rainbow Tables (2003)
115There are many other possible tradeoff schemes
Use a different sequence of functions along each
path, such as111222333 or 123123123 or
pseudorandom e.g. 1221211
Make the choice of the next function dependent on
previous values
116What kind of random graph are we working with in
such schemes?
There was already a slight problem with the
multiple graphs of Hellmans scheme
Oechslins scheme is even wierder
Its time to define a new notion of a random graph!
117Barkan, Biham, and Shamir (Crypto 2006)
We introduced a new type of graph called Stateful
Random Graph
We proved rigorous bounds on the achievable
time/memory tradeoffs of any scheme which is
based on such graphs, including Hellman,
Oechslin, and all their many variants and
possible extensions
118The Random Stateful Graph Model
y1
x1
y2
x2
y2
x2
y2
x2
y0
U
U
U
U
f
f
f
f
s0
s1
s2
s2
s2
- The nodes in the graph are pairs (yi , si), with
N possible images yi and S possible states si. - The scheme designer can choose any U, then random
f is given. - The increased number of nodes (NS) can reduce the
probability of collisions and a good U can create
more structured graphs. - Examples of states Table in Hellman, column in
Oechslin. We call it a hidden state, since its
value is unknown to the attacker when he tries to
invert an image y .
119The Stateful-Random-Graph Model cont
y1
x1
y2
x2
y2
x2
y2
x2
y0
U
U
U
U
f
f
f
f
s0
s1
s2
s2
s2
U in Hellman xiyi-1 si-1 mod Nsisi-1
120The Stateful-Random-Graph Model cont
y1
x1
y2
x2
y2
x2
y2
x2
y0
U
U
U
U
f
f
f
f
s0
s1
s2
s2
s2
U in Rainbow xiyi-1 si-1 mod Nsisi-1 1
mod S.
121The Stateful-Random-Graph Model cont
y1
x1
y2
x2
y2
x2
y2
x2
y0
U
U
U
U
f
f
f
f
s0
s1
s2
s2
s2
U in exhaustive searchxisi-1sisi-1 1 mod
N, which goes over all the preimagesof f in
a single cycle
122Coverage and Collision of Paths
y1
x1
y2
x2
y2
x2
y2
x2
y0
U
U
U
U
f
f
f
f
s0
s1
s2
s2
s2
net coverage the set of images yi covered by
the M paths.gross coverage set of nodes (yi ,
si ) covered by the M paths.
Definition Two paths collide if both yi yj
and sisj
(yi-1 , si-1 )
(yj-1 , sj-1 )
123The rigorously proven Coverage Theorem
Let where MN a, for any 0ltalt1. For any U with
S hidden states, with overwhelming probability
over random fs, the net coverage of any
collection of M paths of any length in the
stateful random graph is bounded from above by
2A.
124Reduction of Best Case to Average Case
For a given U, consider a huge table W of all the
possible functions and all the subsets of M
startpoints
Wi,j1 if the net coverage of fi and Mj is larger
than 2A (0 otherwise).
We want to prove that almost all the rows contain
only zeroes by proving that there are fewer 1s
than rows in W.
125Upper Bounding Prob(Wi,j1)
- Method
- Construct an algorithm that counts the net
coverage for fi and Mj. - Analyze prob. that the counted coverage gt 2A,
i.e., Prob(Wi,j1), over a random and uniform
choice of start points Mj and function fi - The Combinatorial heart of the proof
- Define the notion of a coin toss with fixed
success prob. q.Define the notion of miracle
many coin tosses with few successes.Prove that
the probability of a miracle is negligible.
126Bounding Prob(Wi,j1) Basic Idea
Algorithm traverses chains, stop chain on
collision, counts net coverage.
We want to consider each output of f as a truly
random number. However, this view is justified
only the first time f is applied to an input
(fresh value). Otherwise, the output of f is
already known. Recall Collision only if (yi ,
si ) (yj , sj ).
127Bounding Prob(Wi,j1) Basic Idea
Real execution of chain creation
4
U
3
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
128Bounding Prob(Wi,j1) Basic Idea
2
4
U
3
1
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
129Bounding Prob(Wi,j1) Basic Idea
2
7
4
U
f
3
1
7
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
130Bounding Prob(Wi,j1) Basic Idea
2
7
6
4
U
U
f
3
1
2
7
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
131Bounding Prob(Wi,j1) Basic Idea
2
7
6
9
4
U
U
f
f
3
1
2
7
9
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
132Bounding Prob(Wi,j1) Basic Idea
2
7
6
9
2
4
U
U
U
f
f
3
1
2
4
7
9
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
Not fresh! (although in another bucket)We
already know f(2)7.7 already covered by bucket1
no need to add to fresh bucket4.
133Bounding Prob(Wi,j1) Basic Idea
2
7
6
9
2
7
3
4
U
U
U
U
f
f
f
3
1
2
4
2
7
9
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
134Bounding Prob(Wi,j1) Basic Idea
2
7
6
9
2
7
3
9
4
U
U
U
U
f
f
f
f
3
1
2
4
2
7
9
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
Collision of a freshly created value f(3) with
a value already in the fresh bucket2. Chain must
end.
Clearly NetCoverage ? S FreshBucket
135Bounding Prob(Wi,j1) Analysis
2
7
6
9
2
7
3
9
4
U
U
U
U
f
f
f
f
3
1
2
4
2
7
9
FreshBucket1
FreshBucket2
FreshBucketS
FreshBucket3
FreshBucket4
Analysis what is the probability of a collision
between a fresh image yif(xi) and the values in
the fresh bucket? Exactly FreshBucket/N as
yi f(xi) is truly random and independent of
previous events. Problem FreshBucket depends
on previous probabilistic events difficult to
analyze.
136Bounding Prob(Wi,j1) Coin Toss
y1
x1
y2
x2
y2
x2
y2
x2
y0
U
U
U
U
f
f
f
f
s0
s1
s2
s2
s2
Set a threshold A/S in each bucketdivides bucket
to lower and upper buckets.
A/S
FreshBucket1
FreshBucket2
FreshBucketS
Coin Toss is when xi is fresh and lower bucket
is full. ? UpperBuckets ? coin
tossNetCoverage ? SFreshBuckets ?
AUpperBuckets ? ACoin Tosses
Successful Coin Toss Coin toss and yi collides
with lower bucket.
Prob. that a coin toss is successful exactly
qA/(SN) (independent of previous events!)A
successful coin toss ? Collision ? at most M
successful coin tosses
137Miracles happen with very low probability
Miracle NetCoverage gt 2A Miracle ? After A coin
tosses there are fewer than M successes i.e.,
Prob(Miracle) ? Prob(B(A,q)ltM), where B(A,q) is
a binomial random variable, with qA/(SN).
Concluding the proofProb(Wi,j1) is so small
that 1 in table ltlt rows so for any tradeoff
scheme U with S hidden states, almost all
functions f cannot be covered well even by the
best subset of M startpoints. QED.
138Rigorous Lower Bound on the Number of Hidden
States S
Net coverage of at least N/2 ? and therefore,
the number of hidden states must satisfy
139Corollaries
To cover most of the vertices of any stateful
random graph, you have to use a sufficiently
large number of hidden states, which determines
the minimal possible running time of the online
phase of the attack. This rigorously proven
lower bound is applicable to Hellmans scheme,
the Rainbow scheme, and to any other scheme which
can be described by stateful random graphs.
140Conclusion
Random graphs are wonderful objects to
study Understanding their structure can lead to
many cryptographic and cryptanalytic
optimizations In this talk I gave only a small
sample of the published and folklore results at
the interface between cryptography and random
graph theory
141(No Transcript)