Automated Cryptanalysis of XOR Plaintext Strings - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Automated Cryptanalysis of XOR Plaintext Strings

Description:

Information Security Research Center in Australia. Presented by Chris Brown. Introduction ... Unbreakable if keystring is never reused. Vulnerable to ... – PowerPoint PPT presentation

Number of Views:490
Avg rating:3.0/5.0
Slides: 27
Provided by: chris1179
Category:

less

Transcript and Presenter's Notes

Title: Automated Cryptanalysis of XOR Plaintext Strings


1
Automated Cryptanalysis of XOR Plaintext Strings
  • Research by E. Dawson and L. Nielsen of the
  • Information Security Research Center in Australia
  • Presented by Chris Brown

2
Introduction
  • One-time pads
  • Unbreakable if keystring is never reused
  • Vulnerable to decryption after reuse
  • Example Russian VENONA cipher
  • Used in WW2 ran out of cipher material
  • Cracked by manually comparing messages
  • Many, but not all, messages deciphered

3
Research Goal
  • Manually decrypting one-time pads is slow
  • Computers can do most of the work
  • Goal Decrypt reused pads automatically
  • Message content presumed unknown
  • Only use letter-frequency attacks

4
The Message Source
  • 600,000 character section of The Bible
  • All letters converted to uppercase
  • All non-letters removed except spaces
  • All consecutive spaces removed
  • Frequency analysis of the plaintext
  • Distribution of top 100 2..11 character words
  • Listings of impossible 2-tuples and 3-tuples
  • Sequences of letters that never occurred

5
Removing the Key
  • Two messages XORed with same key
  • (P0 ? K) ? (P1 ? K) P0 ? P1
  • Thus, if we know two messages had the same key,
    we can XOR them together and ignore the key
    completely.

6
The Ciphertext
  • Plaintexts P0 and P1
  • Two randomly chosen 12,000 character strings from
    our 600,000 section of The Bible
  • Our goal is to extract P0 and P1 from C
  • C is constructed by
  • C(i) P0(i) ? P1(i), i 0..11999

7
Decryption Routines
  • Multiple decryption routines
  • Word Positioning
  • Identifying Two Words
  • Securing Three Words
  • Determining Four Words
  • Isolating Short Words
  • Fitting Words (Optional)
  • Run successively in this order

8
First, about Word Placement
  • Decrypting gives us two characters
  • No way of knowing which file they belong in
  • Consecutive tuples overlap
  • Matched to output files P0 or P1
  • Uses first/last characters of decoded tuple
  • As more is decoded, these groupings grow
  • This grouping information is kept in a sequence
    file

9
D1 Word Positioning
  • Decrypts all C formed by XORing a _ with a letter
  • Approximately 40 of all characters in C
  • Places all spaces into P0 and letters to P1
  • Consecutive spaces put into P1
  • This identifies the placement of most words
  • 32.7 of the 24,000 characters decoded
  • 0 errors

10
D1 Word Positioning Results
  • One decoded tuple pair is
  • _ B ? ? ? _ M
  • L _ ? ? ? U _
  • Consecutive spaces might go in either file
  • Two possibilities here
  • 4-letter word starting with B
  • 5-letter word starting with B and ending with U,
    and a 3-letter word.
  • The next 3 routines will try to fill in gaps.

11
D2 Identifying Two Words
  • Chooses tuple pairs _c d _
  • _ W ? ? _ W
  • D _ ? ? E _
  • Searches the most frequent word file
  • Matches words meeting the known characters
  • 3 letter word starting with W
  • 3 letter word ending with E
  • XORing the unknowns produces the letter in C

12
D2 Identifying Two Words Results
  • _ W ? ? _ W - _ W A S _ W
  • D _ ? ? E _ - D _ T H E _
  • 5.7 more characters decoded correctly
  • 38.4 total so far
  • 2 of the 24,000 characters incorrect
  • Sometimes multiple combinations of the letters
    could make the value in C
  • If one combination is more likely, it is chosen
  • This is not always correct, though

13
D3 Securing Three Words
  • Same as D2, with one more constraint
  • Must be a _ paired with a letter in the tuples
  • _ H ? ? ? R ? ? _ S
  • F _ ? ? ? _ ? ? R _
  • _ H U N D R E D _ S
  • F _ T H E _ A I R _

14
D3 Securing Three Words Results
  • Again, only words in the most frequent word list
    are used
  • 40.7 decoded correctly so far
  • No additional errors from this phase

15
D4 Determining Four Words
  • Same as D2, with one more constraint
  • One of the characters must be the same in both
    plaintexts
  • This character is decoded to a _
  • _ A ? ? ? ? _ E
  • D _ ? ? ? ? D _
  • _ A N D _ O F _ E
  • D _ H E _ D I D _

16
D4 Det. Four Words Results
  • 41.4 of the 24,000 characters decoded correctly
  • Errors increased by one, to three.
  • The text is now somewhat readable in places

17
D5 Isolating Short Words
  • So far, most of the spaces and many common words
    are decoded
  • Short words are very common
  • D5 decrypts some identical characters into spaces
    to make short words
  • ?????? - ???_???
  • ????? - ???_??
  • ????? - ??_???

18
D5 Isolating Short Words Results
  • Decrypts 2.2 additional characters to spaces
  • Current total 43.6
  • Up to 0.1 errors
  • Many more word positions identified

19
D6 Fitting Words
  • The previous approaches are running out
  • Diminishing returns
  • New goal attempt to fit words into place
  • _THE_ could be fitted into _T??_, _?H?_, or
    _??E_

20
D6 Fitting Words (cont.)
  • Choosing the top 8 words worked well
  • 4.2 decoded 47.8 total, 0.8 errors
  • For two plaintexts, this produced the
    best-quality text
  • Lowest error ratio

21
Maximum Decryption
  • Very loose setting for D6
  • Turn ??? into _???_
  • Then guess 3 letter words
  • _???_ could be _THE_
  • Repeat for words up to length 11, using the 8
    most common words
  • 62.7 correct, 17.8 incorrect
  • Text was more readable in most places

22
Multiple Reuse of Keystream
  • The more the keystream is reused, the more we can
    decode
  • With three files, we get three files to decode
  • P0 ? P1
  • P0 ? P2
  • P1 ? P2
  • Testing remained on Maximum Decryption

23
Using the Extra Files
  • Compare and sort the three files
  • P0 ? P1 is A,B
  • P0 ? P2 is A,C
  • P1 ? P2 is B,C
  • P0 must be A, P1 must be B, P2 must be C
  • Accuracy remained around 62
  • Errors decreased to 5

24
Using the Extra Files (cont.)
  • Selects tuples beginning and ending with _
  • Word fits between these spaces, like in D6
  • Average of 76 accuracy, 12 errors

25
Summary
  • Decrypts 62 from just two ciphertexts
  • Decrypts 76 from three ciphertexts
  • Message is readable at 76
  • This program takes one hour on a 486-66
  • It is practical and easy to break a Vernam cipher
    with a reused one-time pad
  • Attack base solely on plaintext statistics

26
Future Work
  • Much room for improvement in current algorithms
  • Many errors visible that humans would easily
    correct
  • More analysis will yield better algorithms
  • Add support for known plaintext attacks
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com