Title: STATISTICAL AND PERFORMANCE ANALYSIS OF SHA-3 HASH CANDIDATES
1STATISTICAL AND PERFORMANCE ANALYSIS OF SHA-3
HASH CANDIDATES
- Ashok V Karunakaran
- Department of Computer Science
- Rochester Institute of Technology
- Committee Chair Prof. Stanislaw Radziszowski.
- Reader Prof. Peter Bajorski.
- Observer Prof. Christopher Homan.
2Project Abstract
- Randomness - A good hash function should behave
as close to a random function as possible.
Statistical tests help in determining the
randomness of a hash function and NIST has
provided a series of tests in a statistical test
suite for this purpose. This tool has been used
to analyze the randomness of the ?nal ?ve hash
functions. - Performance - It is the second most important
factor in determining a good hash function.
Performance of the all the fourteen Round 2
candidates was measured using Java as the
programming language on Sun platform machines for
small sized messages. - Security - Security is the most important
criteria when it comes to hash functions. Grøstl
is one of the ?nal ?ve candidates and its
architecture, design and security features have
been studied in detail. Some of the successful
attacks on reduced versions have also been
explained. Also, the lesser known candidates,
Fugue and ECHO, from Round 2 have been studied.
3Hash function
- Input String of arbitrary size.
- Output Predetermined fixed size string.
4Hash function requirements
- Pre-image, second pre-image and collision
resistant. - Collisions When we find x and y such that h(x)
h(y). - Birthday paradox Gives lower bound on collision
attack - q 1.17vm for e ½ (m 365, q 23).
- Birthday bound for a m-bit message is 2m/2.
5The need for a new hash function
- Most commonly used hash functions are broken
- Collisions in MD5 and SHA-0.
- Security flaws in SHA-1.
- Increasing hardware power and parallelization
capabilities.
6SHA-3 Competition
- Organized by NIST.
- Started on Nov. 2, 2007.
- Received 64 entries.
- 51 met minimum requirements.
- Round 1
- First candidate conference at KU Leuven, Belgium
on Feb 25-28, 2009. - 14 candidates on July 24, 2009.
7Round 2 and 3
- Round 2
- Second candidate conference at Santa Barbara, CA
on August, 23-24, 2010. - 5 candidates on Dec. 9, 2010.
- Round 3/ Final Round
- Final conference in Spring 2012.
- Select a winner later in 2012.
8Round 2 and 3 Candidates
- BLAKE
- BMW
- CubeHash
- ECHO
- Fugue
- Grøstl
- Hamsi
- JH
- Keccak
- Luffa
- Shabal
- SHAvite-3
- SIMD
- Skein
9Randomness and Statistics
- Hash function should behave indistinguishably
from a random function. - Avoid finding patterns, which lead to collisions.
- Statistical randomness tests to determine hash
function randomness. - Pseudo-randomness is sufficient.
10Statistical Tests
- Motivation Decide whether a particular statement
or claim is correct. - Null hypothesis The output of a hash function is
random, irrespective of the input. - Alternative hypothesis The output is not random.
- Test statistic Computed from sample data. Helps
in deciding whether to reject/accept the null
hypothesis.
11NIST Test Suite
- Statistical test suite for random and
pseudo-random number generators for cryptographic
applications. - Helpful in detecting deviations of a binary
sequence from randomness. - Total of 15 tests.
- Ex., Frequency Test, Longest runs of ones in a
block.
12P-value and Significance level
- P-value is calculated from the test statistic.
- The probability that a perfect random number
generator would have produced a sequence less
random than the sequence that was tested. - P-value 1implies perfect randomness.
- P-value 0 implies complete non-randomness.
13P-value and Significance level (cont.)
- Significance level (a) denotes the probability of
Type 1 error. - False positive, occurs when a statistical test
rejects a true null hypothesis. - If P-value a then the null hypothesis is
accepted. - Meaning, the sequence appears to be random.
- If P-value lt a then the null hypothesis is
rejected.
14P-value and Significance level (cont.)
- For the project,
- a 0.01
- One would expect 1 sequence in 100 sequences to
be rejected. - P-value 0.01 indicates that the sequence would
be considered random with a confidence of 99. - P-value lt 0.01 indicates that the sequence is
considered non-random with a confidence of 99.
15Frequency Test
- Tests the proportion of zeros and ones in the
sequence. - For a random sequence, the proportion should be
the same. - Test Description
- Convert bits to -1 or 1 and then add.
- Sn X1 X2 Xn.
- For ex., if e 1011010101,
- then n 10 and Sn 2.
16Frequency Test (cont.)
- Compute the test statistic,
- Sobs Mod( Sn) / vn.
- Sobs 2 / v10 .63245
- Compute P-value erfc(Sobs / v2).
- P-value erfc(.63245 / v2) 0.527089.
- Decision P-value gt 0.01, so accept sequence as
random. -
-
-
17Longest Runs of one in a block
- Tests the longest run of ones within M-bit
blocks. - It should be similar to what is expected of a
random sequence. - Test Description
- Input 1100110000010101011011000100110011100000000
00010010011010101000100010011110101101000000011010
11111001100111001101101100010110010. - Input length n 128 bits.
- Divide the input into M-bit blocks.
- M 8.
18Longest Runs of one in a block (cont.)
- Longest run of ones in each subblock is noted
- Calculate the frequencies of the longest run
- ?0 4 ?1 9 ?2 3 ?4 0.
- Compute X2(obs), it is a measure of how well the
observed longest run length matches the expected
longest length within M-bit blocks. -
-
Subblock Max-Run Subblock Max-Run
11001100 2 00010101 1
01101100 2 01001100 2
19Longest Runs of one in a block (cont.)
20Inputs for the experiment
- Numbers Hash of numbers 0-3999.
- Tests require length of at least 106 bits.
- For 256 bit output,
- 256 x 4000 1,024,000 bits.
- KAT Inputs 2048 hexadecimal inputs from the
official candidate documentation.
21Inputs for the experiment (cont.)
- From file The NIST document on the statistical
test suite. - Every 10Kb Each input block has 10Kb. The first
input is the first 10Kb, second input skips first
m1Kb and takes next n10Kb. - Every 100Kb Each input block has 100Kb. In this
case, every 100 bytes are skipped before the next
input block. - Ensures there is some over-lapping and
non-overlapping in the data blocks.
22Output for BLAKE-256
Tests Numbers KAT 10Kb 100Kb
App. Entropy 0.531403 0.132928 0.365077 0.476437
Block Freq. 0.550332 0.999349 0.105159 0.634999
Cumulative Sums 0.324573, 0.201009 0.988702, 0.943249 0.000432, 0.001383 0.129711, 0.221312
FFT 0.204233 0.655976 0.255107 0.617123
Frequency 0.187412 0.765466 0.000966 0.127740
Linear Complex 0.867403 0.312439 0.551978 0.693519
Longest Run 0.095483 0.382246 0.697027 0.936944
Overlapping Template 0.099496 0.718846 0.180799 0.214866
Rank 0.077948 0.162680 0.946797 0.843130
23Output for BLAKE-256 (contd.)
Tests Numbers KAT 10Kb 100Kb
Runs 0.753526 0.978062 0.863215 0.048920
Serial 0.876547, 0.838931 0.252703, 0.520978 0.625307, 0.854685 0.988346, 0.986553
Universal 0.861028 0.057151 0.382927 0.833105
Non-overlapping Template 0.272553, 0.156433 0.748985, 0.001491 0.013372, 0.593525 0.376109, 0.329376
Random Excursions 0.560459, 0.148643 0.997930, 0.945050 0.000000, 0.000000 0.381784, 0.935452
Random Excursions Variant 0.612882, 0.582494 0.163078, 0.205123 0.000000, 0.000000 0.219435, 0.393705
Total Bits 1024000 524288 1677056 16936192
No. of 0s 511333 262036 840665 8464962
No. of 1s 512667 262252 836391 8471230
24Results and Conclusions
- 0.0 P-values dont indicate failed tests but
inapplicable tests for input. - All hash functions are random.
- Failed results are outliers rather than the norm.
- Arent enough to classify as non-random.
- Areas of failed tests can be explored further.
25Performance
- Second most important criteria.
- Most of the work has been done with C as the
programming language. - The following combination has not been studied
comprehensively before - Language Java
- Platform Sun
- Messages size Small
26Specification
- Machine Sun Microsystems Ultra 20.
- Config AMD 2.2GHz processor.
- OS OS5.10 or Solaris 10.
- Small messages size lt 8192 bytes.
- Java code Sphlib, hash function implementations
in C and Java.
27Candidates 256 256 512 512
I/p1024bytes Mbytes/s Cycles/byte Mbytes/s Cycles/byte
SHA-2 57.90 38 19.69 111.73
BLAKE 45.5 48.35 27.48 80.06
Grøstl 11.56 190.31 6.87 320.23
JH 8.33 264.11 8.33 264.11
Keccak 12.63 174.19 6.89 319.3
Skein 38.24 57.53 30.11 73.07
Hamsi 18.50 118.92 7.12 308.99
BMW 42.89 51.29 36.84 59.72
CubeHash 23.75 92.63 23.87 92.17
ECHO 11.24 195.73 5.75 382.61
Fugue 22.69 96.96 11.62 189.33
Luffa 33.26 66.15 18.97 115.97
Shabal 104.37 21.08 103.36 21.28
SHAvite 24.11 91.25 13.97 157.48
SIMD 12.10 181.82 0.75 2933.33
28256 output bits
29512 output bits
30Performance and Message length
- Most of them claim performance is better than
SHA-2. - Interesting to see how it is affected by message
length. - For final five candidates, 16-byte and 4096-byte
inputs were hashed.
31Performance and Message length (cont.)
Candidates 16-256 4096-256 16-512 4096-512
SHA-2 11.89 61.43 2.39 21.93
BLAKE 10.93 47.68 3.47 29.99
Grøstl 2.8 12.38 0.67 7.74
JH 1.8 8.75 1.7 8.64
Keccak 1.52 13.7 1.56 7.26
Skein 9.18 38.77 3.78 31.76
32Performance and Message length (cont.)
- Rate of hashing
- Keccak-256 gt SHA-256.
- Grøstl-512 gt SHA-512.
33Performance and Block size
- For JH, the performance remains the same for 256
and 512 version. - Only one large internal state of 1024 bits.
- For BLAKE and Keccak, the performance difference
is almost twice. - The 256 version has block size of 512 whereas the
512 version has block size of 1024.
34Candidates 256 256 512 512
I/p1024bytes Mbytes/s Cycles/byte Mbytes/s Cycles/byte
SHA-2 57.90 38 19.69 111.73
BLAKE 45.5 48.35 27.48 80.06
Grøstl 11.56 190.31 6.87 320.23
JH 8.33 264.11 8.33 264.11
Keccak 12.63 174.19 6.89 319.3
Skein 38.24 57.53 30.11 73.07
Hamsi 18.50 118.92 7.12 308.99
BMW 42.89 51.29 36.84 59.72
CubeHash 23.75 92.63 23.87 92.17
ECHO 11.24 195.73 5.75 382.61
Fugue 22.69 96.96 11.62 189.33
Luffa 33.26 66.15 18.97 115.97
Shabal 104.37 21.08 103.36 21.28
SHAvite 24.11 91.25 13.97 157.48
SIMD 12.10 181.82 0.75 2933.33
35Hardware vs Software implementation
- Visualizing area-time tradeoffs for SHA-3 has
hardware implementation of the candidates.
36Hardware vs Software implementation
Hardware Software
1) Keccak 1) Shabal
2) CubeHash 2) Skein
3) JH 3) BLAKE
4) Shabal 4) CubeHash
5) Skein 5) Luffa
6) Fugue 6) SHAvite-3
7) Luffa 7) Fugue
8) BLAKE 8) JH
9) Hamsi 9) Hamsi
10) SHAvite-3 10) Keccak
11) Grøstl 11) Grøstl
37Hardware vs Software implementation (cont.)
- Among the final five candidates
- Grøstl remains last in both implementations.
- Keccak has the biggest difference in terms of
position. - JH and BLAKE swap positions with BLAKE performing
better in software. - Skein is the only one to perform reasonably well
in both.
38Security of Grøstl
- One of the final five candidates.
- Developed at the University of Denmark.
- What makes Grøstl interesting?
- Does not use block cipher components like SHA
family. - Based on few individual permutations.
- Borrows components from AES like the S-box.
39Hash Function Construction
- Message M is padded and split into l bit message
blocks. - If H(x) lt 256, l 512 else l 1024.
- The compression function f is as follows
- hi? f (hi-1, mi) for i 1 to t.
- Initial value of h, h0 iv is predefined.
- The final value of h, ht is passed to the output
transformation function - H(M) ?(ht)
40Compression Function
- Based on two permutations P and Q.
- Defined as
- f(h, m) P(h ? m) ? Q(m) ? h
- Design of P and Q
- Inspired from Rijndael.
- Consists of r rounds, which consists of a number
of round transformations.
41Design of P and Q (cont.)
- The four round transformations
- AddRoundConstant
- SubBytes
- ShiftBytes
- MixBytes
- One round consists of the above transformations
in the following order - R MixBytes ? ShiftBytes ? SubBytes ?
AddRoundConstant.
42Byte Sequence to State Matrix
- Mapping is done in a similar way to Rijndael.
- The 64-byte sequence 00 01 02 3f is mapped to a
8x8 matrix -
43AddRoundConstant
- Adds a round dependent constant to the matrix.
- Transformation in round i is defined as
- A ? A ? Ci
44SubBytes
- Each byte in the matrix is substituted with a
corresponding value from the S-box. - S-box is same as the one used in Rijndael.
- The transformation is as follows
- ai,j ? S(ai,j), 0 i lt 8, 0 j lt v.
-
- ai,j is the element in row i and column j.
-
-
45ShiftBytes
- Shifts the bytes within a row to the left by a
number of positions. -
- In round i, all bytes in row i are shifted s
positions to the left. - s 0, 1, 2, 3, 4, 5, 6, 7
-
-
-
-
-
46MixBytes
- Each column in the matrix is multiplied by a
constant 8x8 matrix. -
- The transformation is defined as
- A ? B A.
-
-
-
-
47Output Transformation
- Defined as
- ?(x) truncn (P(x) ? x)
- truncn (x) discards all but the trailing n bits
of x. - n is the length of the message digest.
48Cryptanalysis
- Differential Cryptanalysis
- There are at least 92 active S-boxes in a 4 round
differential trail. - MixBytes ensures branch number is 9. Meaning, a
difference of k gt0 bytes of a column will result
in a difference of at least 9-k bytes after one
mix bytes operation. - ShiftBytes moves bytes in one column to 8
different columns. - Maximum distance propagation probability of S-box
2-6.
49Cryptanalysis (cont.)
- Linear Cryptanalysis
- Propagates similar to differential trail.
- Max distance propagation of S-box 2-3.
- Integrals
- Sets of plaintexts are chosen with one part held
constant and other part varies through all
possibilities. - For ex., an attack may chose 256 plaintexts that
have all but 8 of their bits the same, but all
differ in those 8 bits. - Has an XOR sum of 0.
- XOR sums of corresponding ciphertexts provide
information about the ciphers operation.
50Integrals (cont.)
- Similar to integrals on AES.
- Grøstl- 256
- 2120 texts for 6 and 7 rounds.
- The texts are balanced in every byte of input
and output. - Grøstl-512
- 2704 for 8 and 9 rounds.
- For 8 rounds, the texts are balanced in every
byte of input and output. - For 9 rounds, every byte of input and every bit
of output is balanced. - Conclusion Integrals cannot expose non-random
behavior in Grøstl.
51Cryptanalysis (cont.)
- Algebraic Cryptanalysis
- Attack on AES S-box, which is used by Grøstl.
- There are 200 S-box applications in AES for 1
encryption, it gives 8000 quadratic equations
with 1600 variables (the solution derives the
key). - The time complexity of solving this is unknown.
- Grøstl-256 and Grøstl-512 have 1280 and 3584
S-box applications, respectively.
52Rebound Attack
- Can be applied on block or permutation based
ciphers. - Consists of two phases
- Inbound phase Meet-in-the-middle (Ein) plus
exploiting the available degrees of freedom.
53Rebound Attack (cont.)
- Outbound phase Use the values obtained from the
inbound phase to move in the forward (Efw) and
backward (Ebw) directions to find collisions. - Collisions found on reduced Grøstl
- Grøstl-256 4 out of 10 rounds.
- Grøstl-512 5 out of 12 rounds.
54Internal Differential Attack
- Exploits the differential trails between parallel
computations that are not distinct enough. - The idea is to device a differential path that
represents the difference between the two paths
rather than the differences between the inputs. - Grøstl has two permutations, P and Q, which are
very similar to each other.
55Internal Differential Attack (cont.)
- Compute two internal states, A and B.
- A ? B ?in.
- P(A) ? Q(B) ?out.
- Collisions Found
- Grøstl-256 5 rounds, 279 computations and 264
memory. - Grøstl-512 6 rounds, 2177 computations and 264
memory. - P and Q were modified in the final round to make
them more different.
56Conclusion
- Frontrunners among the five
- Performance
- Good BLAKE and Skein.
- Bad Keccak.
- Ugly Grøstl and JH.
- Randomness tests Weakest is BLAKE.
- Novel algorithm Skein and Keccak.
- Potential Winners Skein or Keccak.
57Thank You.