Title: Fast Dictionary Attack
1Fast Dictionary Attack
- 22C196 Security in Distributed System
- Dat Tien Nguyen
- Mar 29 2007
2Quick notes
- First, Im very sorry for my English
- If you have any questions or comments, feel free
to ask. - Hopefully I wont make you fall asleep
3Dictionary Attack
4Definition
- (1) A method used to break systems by testing all
possible passwords beginning with words that have
a higher possibility of being used - (from webopedia)
5Definition (cont.)
- (2) An e-mail spamming technique in which the
spammer sends out millions of e-mails with
randomly generated addresses using combinations
of letters added to known domain names - (from webopedia)
6Dic attack vs. BF attack
7Example
- Break Udays password
- What dictionary should we try?
- English dictionary
- India dictionary
- His girl friends list
- Do you want to try Vietnamese dictionary?
8Some naïve improvement
- Use a large dictionary or more dictionaries
- Use string manipulation in the dictionary
- password
- Its backward drowssap
- Number-letter replacement p4ssw0rd
- Capitalization PasSwoRd
9Advanced improvement
- Rainbow Table
- Markovian/Deterministic Finite Automata Dictionary
10My presentation
- 1 Philippe Oechslin, Making a faster
Cryptanalytic Time-Memory Trade-Off - NS05 or 2 A. Narayanan and V. Shmatikov, Fast
Dictionary Attacks on Passwords Using Time-Space
Tradeoff - Fun facts about RaibowCrack
11Rainbow Table
12The problem
- Given a fixed plaintext P0 and the corresponding
ciphertext C0. The encrypt function is S. - Find the key k ? N such that
- C0 Sk(P0)
13The original method
- Try to generate all possible ciphertexts by
encrypting P0 with all N possible keys. - The ciphertexts are organized in a chains so that
only the first and the last elements of a chain
will be stored
14The original method (cont.)
- The chains are created by using a reduction
function R which creates a key from a cipher
text. - R(Sk(P0)) is written as f(k)
15The original method (cont.)
- m chains of length t are created. Their first and
last elements are stored. - Given a cipher text C. Find the key
- Generate a chain of keys starting with R(C) and
up to length t - If there is a key in the chain that matches a
last key from the table, C should be in this key
chain - Using the first key of the chain to generate the
whole chain and the expected key is the one comes
just before R(C)
16Disadvantages
- Chains starting at different keys collide and
merge ? reduce the number of distinct keys
covered by a table - Finding a matching endpoint does not imply that
the key in the table (false alarm) - The key maybe in a chain that have the same
endpoint but is not in the table - The chain containing key is in the table, but
merge with other chains ? search through all
chains that have the same endpoint until the key
is found.
17Some improvement
- Using more tables to obtain a higher probability
of success. Each table has a different reduction
function. - Given the probability of success of one table is
Ptable - Using n table will increase the probability of
success to - 1 (1-Ptable)n
18Some improvement (cont.)
- Using distinguished points (DP) as endpoints 3
- Distinguished points are points for which a
simple criteria holds true - The first ten bits are zero
- Chains are generated until distinguished point
appears - Given a cipher text, generate a chain until we
find a distinguished point and only look it up in
the memory
19Some improvement (cont.)
- Advantages of using distinguished endpoints
- Loop detection
- Merge detection
- Disadvantages
- The length of chains are not fixed
- False alarm cases
- What happens if the chain starting with R(C)
contains loop itself? - What happens if the chains starting with R(C)
contains the chain in the table?
20Rainbow Table
- Use successive reduction function for each point
in the chain.
21Rainbow Table (cont.)
- Classical table vs. Rainbow table
22Rainbow Table (cont.)
- To look up a key
- Apply Rn-1 to the ciphertext
- Look up the result in the endpoints of table
- If we dont find, apply Rn-2 and fn-1 to see if
the key was in the second last column - So on and so forth
23Rainbow Table (cont.)
- Advantages
- Reduce the number of table look-up
- Classical table t2
- Rainbow table t(t-1)/2
- Merge detection
- No loops
- Dont have to spend time to detect and reject
loop chains - Fixed length chains
- False alarm cases
24Rainbow Table (cont.)
- Improvement
- To have higher success rate
- Increase the size of the table
- Using more tables
- People choose using more tables because
- Easy to calculate the success rate
- One table 80
- Five tables 1 (1-0.8)5 0.9996
- Easy to calculate the of tables needed
25Hybrid Markovian/DFA dictionary
26Why use Markov models?
- The fact that the passwords are generated by
people, which are unlikely to be uniformly
distributed in the space of alphabet sequences
27Why use Markov models?
- Markov models are commonly used in Natural
Language Processing - It defines a probability distribution over
sequences of symbols - Used to generate random, yet pronounceable pwds
in LANL in the late 1980s - Very effective at guessing passwords generated by
users
28Markovian filtering
- Zero-order Markov model
- Each character is generated according to the
underlying probability distribution and
independently of the previous generated
characters - P(a) ?x?av(x)
29Markovian filtering (cont.)
- First-order model
- Each diagram (ordered pair) of characters is
assigned a probability and each character is
generated by looking at the previous character. - P(x1x2xn) v(x1)?iv(xi1xi) (i 1 .. n-1)
30Markovian dictionary
- Zero-order dictionary
- First-order dictionary
31Markovian dictionary (cont.)
- The models can drastically reduce the size of the
plausible pwd space - Ex consider 8-character sequence
- If ? 85 (means 85 percents of sequences are
ignored) - The size if only 1/7 of the key space
- The zero-order dictionary contains 90 of the
dictionary - The first-order dictionary can even do better
(more than 95)
32Markovian dictionary (cont.)
- The distribution of letter frequencies used in
Markovian filtering is language-specific - Two ways to deal with unknown languages
- Combine the keyspaces for two or more languages
- Come up with a distribution that works reasonably
well for multiple languages
33Filtering using Finite Automaton
- A search space that contains only alphabetic
sequences is unlikely to have a good coverage of
the plausible pwd space - Human often mix upper-lower characters and
numbers - System requires users to put special characters
to their pwds - Even that the distribution of resulting pwds is
not random - Numbers are likely in the end
- The first character is more likely to be upper
- Pwd contains mostly lower chars
34Filtering using Finite Automaton
- Deterministic finite automata are ideal for
expressing such properties - Specify a set of common regular expression
- All lower case
- One upper case followed by all lower case
-
- Dictionary the set of sequences matching the
Markovian filtering and accepted by at least one
DFA corresponding to the regular expression
35Filtering using Finite Automaton
- The complete alphabet
- Lower case chars 26
- Upper case chars 26
- Numerals 10
- Special chars space, hyphen, underscore, period
and comma (5) - Total 67 chars. The associated keyspace of
8-char sequences is 678 1015 ? impossible for
BF attack - Divide this set into 4 categories a lower
case, A upper case, n - numerals, s special. - The input alphabet of the automata consists of
just these 4 symbols.
36Indexing Algorithms (IA)
- With Markovian filtering and/or DFA filtering we
have a compressed dictionary - We need to index the compressed dictionary
- An efficient enumeration algorithm which takes
index i as input and outputs the ith element of
the dictionary.
37Indexing Algorithms (cont.)
- Some changes
- Modified dictionary
- Discretize the probability distribution
- µ(x) log v(x) and ? log?
- Discretize the value of µ to the nearest multiple
of µ0 - If µ0 is large, the memory is low, but the
accuracy is low as well
38IA for Zero-order Markovian dic
- For any string prefix a
- Partial Dic all sequences ß aß satisfies
the Markovian property - Note that Dv,?,l,?,l is well-defined because
- ?x?aßv(x) ?x?av(x)?x?ßv(x) ??x?ßv(x)
39IA for Zero-order Markovian dic
- The recursive algorithm to compute the size of a
partial dictionary
partial_size1(current_length, level) if level
gt threshold return 0 if total_length
current_length return 1 sum 0 for each char
in alphabet sum sum partial_size1(current_le
ngth1, levelmu(char)) return sum
40IA for Zero-order Markovian dic
- Function takes an index as input and returns the
corresponding key
get_key1(current_length, index, level) if
total_length current_length return "" sum
0 for each char in alphabet new_level level
mu(char) // looked up from precomputed
array size partial_size1current_length1new
_level if sum size gt index // refers
to string concatenation return char
get_key1(current_length1,index-sum,
new_level) sum sum size // control
cannot reach here print "index larger than
keyspace size" exit
41IAs for other dictionaries
- First-order Markovian dictionary
- Partial_size2(cur_len,prev_char,level)
- Get_key2(cur_len,index,prev_char,level)
- DFA dictionary
- Paritial_size3(cur_len,state)
- Get_key1(cur_len,index,state)
- Any keyspace
- Compute_bins(t)
- Get_key(index)
- Hybrid Markovian/DFA dictionary
- Get_key5(index)
- Multiple keyspaces
- Get_key6(K1,K2,,Kn,index)
42Possible optimizations
- The hybrid algorithm should use 50-100 table
lookups and the table must fit into the cache - Get_key6() can be accelerated by pre-computation
- Get_key1() can be speed up by reordering the
characters so that the more frequent ones come
first. This strategy works for Get_key2() as well.
43Experiment
- Rainbow vs. Markovian/DFA (paper 2)
- 142 real pwds from Passware
- Search space 6-character alphanumeric sequences
44Conclusion
45Fun facts about Rainbow Crack
- All the configurations are done for a 666MHz CPU,
256MB RAM - Nowadays with a 3.0GHz CPU, 4GB RAM, the time
should be 5 times less than their results - 5 Rainbow tables
- Crack Windows hash function (LM), but it can be
configured for other hash algorithms such as MD5
or SHA1
46Fun facts about Rainbow Crack
47Fun facts about Rainbow Crack
48Fun facts about Rainbow Crack
49- Tables 5.2 TB
- 1TB is about the same amount of information as
all of the books in a large library
(http//kb.iu.edu/data/ackw.html) - Supported Algorithms 13 (CiscoPIX, LanManager,
MD4, MD5, NTLM, MySQL123, MySQLSHA1, SHA1,
HalfLMChallenge, LMChallenge, NTLMChallenge,
MSCache, Oracle) - It takes them 3 years to generate all tables (97
years for one computer) - Successful rate 100
- http//www.rainbowcrack-online.com/
50References
- 1 Philippe Oechslin, Making a faster
Cryptanalytic Time-Memory Trade-Off - 2 A. Narayanan and V. Shmatikov, Fast
Dictionary Attacks on Passwords Using Time-Space
Tradeoff - 3 D.E. Denning. Cryptography and Data Security,
page 100. Addison-Wesley, 1982.
51Thank you for your attention!