Title: Remote Timing Attacks are Practical
1Remote Timing Attacks are Practical
- An Overview by
- - Rahul Deshpande
-
2What are Timing Attacks
-
- Extracting secrets by observing time to respond
to various queries - E.g.. Kocher designed a timing attack to expose
secret keys used for RSA.
3Timing Attacks
-
- Usually used to attack weak computing devices
such as Smart Cards - Also applicable to general software systems
- Practical against network servers
4Common Assumptions
- Attack only applicable to hardware security
devices - Attack cannot be used to against general purpose
servers since decryption times are masked by many
concurrent processes running on the system.
5Challenging the Assumptions
- Remote timing attack against OpenSSL developed.
- OpenSSL an SSL library commonly used in web
servers and other SSL applications. - Attack client measures the time an OpenSSL server
takes to respond to the decryption queries. - Client able to extract private key stored on the
server.
6Environments in which attack is applicable
-
- Network between two machines in different
buildings with multiple routers and switches
between them. - Interprocess Between two processes running on
the same machine. - Virtual Machines extracting RSA private key
from secure Virtual Machine (VM), invalidating
isolation provided by Virtual Machine Monitor
(VMM)
7OpenSSL Decryption
- RSA decryption done using modular exponentiation
- M cd mod N where N pq is the RSA modulus.
- OpenSSL uses Chinese Remainder Theorem to perform
exponentiation. - CRT computes exponentiation in two steps by
computing m1 and m2 and then combining the two to
get m. - Decryption with CRT gives up to a factor of four
speedup - Timing attack can expose the factors of N used in
CRT.
8The Chinese Remainder Theorem
- It is possible to reconstruct integers in a
certain range from their residues modulo a set of
pair wise relatively prime moduli. - E.g. The 10 integers in Z10(0,1.9) can be
reconstructed from their two residues modulo 2
and 5 (relatively prime factors of 10). - Provides a way to manipulate large numbers mod M
in terms of tuples of smaller numbers. - CRT can be formulated as
- k
- M ? mi
- i1
- Where, mi are pairwise relatively prime.
- Any integer in Zm can be represented by a k-tuple
whose elements are in Zmi using the following
correspondence - A lt-gt (a1, a2,.ak)
9The Chinese Remainder Theorem
- n n1n2nk with gcd (ni nj ) 1 when i ! j
- The system of congruencies
- xx1(mod n1)xk(mod nk)
- has a simultaneous solution x to all of the
congruencies, and there exists exactly one
solution x between 0 and n-1.
10Speedup RSA with CRT
- Any message MltN is uniquely represented by the
tuple MPMQ , where - MP M(mod P) and MQ M(mod Q).
- CP C(mod P) and CQ C(mod Q).
- DP D(mod (P-1)) and DQ D(mod (Q-1))
- RP QP-1(mod N) and RQ PQ-1(mod N)
- MP CPDP(mod P) and MQ CQDQ(mod Q)
- SP MPRP(mod N) and SQ MQRQ(mod N)
- M SP SQ. If MgtN then calc MM-N.
- Reference Johann Großschädl, The Chinese
Remainder Theorem and its Application in a
High-Speed RSA Crypto Chip
11Exponentiation
- Simplest algorithm to compute gd mod q is square
and multiply. - OpenSSL uses optimization of square and multiply
called sliding window exponentiation
12Sliding Window Exponentiation
- Block of bits (window) of d processed at each
iteration. - Requires precomputing a multiplication table,
taking time proportional to 2w-1 1 for a window
size of w. - For a 1024-bit modulus, OpenSSL uses a window
size of five. - Attack Querying on many inputs g, attacker
exposes information about bits of the factor q. - Attack on sliding windows harder than on square
and multiply because of fewer multiplications.
13Montgomery Reduction
- A reduction modulo q done via multiprecision
division and then returning the remainder is
expensive. - Montgomery proposed method for implementing
reduction modulo q using series of operations
efficient in hardware and software. - Montgomery reduction transforms a reduction
modulo q into a reduction modulo some power of
two denoted by R - Reduction modulo power of 2 faster since easily
implemented in hardware. - All variables must be put into Montgomery form.
14Montgomery Reduction
- At the end of reduction, checked if output cR is
greater than q. - If cRgtq, q subtracted from output to keep cR in
the range o,q). This extra step is called Extra
Reduction. - Extra Reduction causes timing difference for
different inputs. - Detecting timing differences from extra reduction
tells how close g is to a multiple of one of the
factors.
15Multiplication Routines
- RSA operations make use of a multi-precision
integer multiplication routine. - OpenSSL implements two multiplication routines
Karatsuba and Normal. - Karatsuba used when multiplying two numbers with
equal number of words. Takes time O(n1.58). - Normal Multiplication used when multiplying two
numbers with unequal word sizes n and m. Takes
time O(nm).
16Multiplication Routines
- Normal Multiplication takes quadratic time for
numbers of approximately same size. - Multiplication of two unequal size words takes
longer than multiplication of equal size words. - This fact used in timing attack on OpenSSL.
- Underlying word multiplication algorithm
dominates the total time for a decryption. - In OpenSSL, it takes 30-40 of total running
time.
17Comparison of Timing Differences
- Two algorithmic data dependencies in OpenSSL that
cause time variance in RSA decryption - 1. Number of extra reductions in Montgomery
Reduction. - 2. Choice of multiplication routine.
- Effects of these optimizations counteract each
other. - Karatsuba decryption of gltq faster than ggtq and
vice versa for Montgomery Reduction.
18A Timing Attack on OpenSSL
- Exposes the factorization of the RSA modulus.
- Approximations built which get progressively
closer as the attack proceeds. - Can be viewed as a binary search for q.
- After recovering half-most bits of q,
Coppersmiths algorithm used to retrieve complete
factorization. - Value of decryption not needed, only the time
required for decryption needed.
19Timing Attack on OpenSSL
- g is an integer that has the same top i-1 bits as
q and remaining bits of g are 0. - ghi is same as g, with ith bit set to 1. If bit
of q is 1 then glt ghiltq, otherwise gltqlt ghi. - Measure the time to decrypt both ug and ughi,
represented as t1 and t2. - Calculate the timing difference td t1-t2.
- If bit i of q is 0, then td is large
- If bit i of q is 1, then td is small
20Real World Scenarios
- Timing attack applies to SSL applications such
as stunnel, Apache web server with mod_SSL, and
trusted computing projects such as Microsofts
NGSCB. - RSA applications using a hardware crypto
accelerator not vulnerable. - Attacks apply to only software based RSA
implementations. -
21Example of an Attack on SSL server
- In a standard full SSL handshake, SSL server
performs RSA decryption using its private key. - CLIENT-KEY-EXCHANGE message composed by
encrypting PKCS 1 padded random bytes with
servers public key. - In the attack, client substitutes properly
formatted CLIENT-KEY-EXCHANGE with the guess g. - Server generates ALERT message.
- Client computes time difference and repeats for
various values. -
22Experiments
- Show that factorization of the RSA modulus N is
vulnerable. - Test effects of increasing decryption requests
- Compare effectiveness based upon different keys
- Compare effectiveness based upon machine
architecture and common compile-time
optimizations - Compare effectiveness based upon source-based
optimizations - Compare inter-process vs. local network attacks
- Compare effectiveness against two common SSL
applications Apache web server with mod_SSL and
stunnel -
23Experiment Setup
- Attack performed against OpenSSL 0.9.7 which does
not blind RSA operations by default. - Simple TCP server implemented that read an ASCII
string - Converted string to OpenSSLs internal
multi-precision representation - The RSA decryption performed
- Decryption time writing the ciphertext over the
socket to receiving the reply.
24Experiment 1- Number of Ciphertexts
- Parameters that determine the number of queries
needed to expose a single bit of RSA factor - Neighborhood size for every bit of q, measure
the decryption time for a neighborhood of values
g, g1, g2 gn, denoted by n. - Sample Size For each value gi, sample
decryption time multiple time and compute mean
decryption time. Number of times gi is queried
on denoted by s. - Total number of queries needed to compute Tg
sn. -
25Continued..
- Zero-one gap gap between when a bit of q is 0
and 1. - Larger the gap, stronger the indicator that bit
is 0, smaller the chance of error. - Increasing the neighborhood size increases
zero-one gap when bit is 0, but is steady when
bit is 1. - Total number of queries to recover a factor
(2ns)log2(N/4) where N RSA public modulus.
26Experiment 2- Different Keys
- Several 1024-bit keys attacked, to determine the
ease of breaking different moduli. - Zero-one gap positive for first 32 bits due to
Montgomery reductions. - Normally, resulting zero-one gap shifts occur
around the multiple of machine word size. - Attacker must be aware that zero-one gap may flip
signs when guessing bits that are around
multiples of machine word size. - If hard-to-guess bits encountered, neighborhood
size can be increased to increase the zero-one
gap.
27Experiment 3- Architecture and Compile-Time
effects
- Computer Architecture and compile-time
optimizations affect the zero-one gap. - Effect of Architecture Programs with similar
retirement counts may have different execution
profiles. - This is due to different run-time factors such as
branch predictions, pipeline throughput, and the
L1 and L2 cache behavior. - Compile-time optimizations change the number of
instructions and how efficiently instructions are
executed on the hardware.
28Continued
- Effects of compile-time optimizations tested by
compiling OpenSSL in three different ways - Optimized
- No Pentium flag
- Unoptimized
- Each different compile-time optimizations changes
the zero-one gap.
29Experiment 4 Source-Based Optimizations
- Patches can change the code profile of RSA
libraries resulting in timing vulnerability. - After a CRT decryption, OpenSSl re-encrypts the
result to verify if it is identical to original
ciphertext. - OpenSSL calculates both Montgomery parameters on
every decryption. - A patch allows OpenSSL to cache both the values
between decryptions with the same key. - This shifts the zero-one gap since resulting code
has different execution profile. - Patches may be used to increase the zero-one gap
making the code vulnerable to timing attacks.
30Experiment 5 Interprocess vs. Local Network
Attacks
- Noise from network eliminated by repeated
sampling, giving similar zero-one gap to
inter-process. - Networks with less than1ms of variance are
vulnerable. - Attacker can take advantage of higher CPU speeds
for increasing accuracy of timing measurements.
31Experiment 6 Attacking SSL Applications on the
Local Network
- Apachemod_SSL is a commonly used secure web
server. - Stunnel allows TCP/IP connections to be tunneled
through SSL. - Servers connected by a single switch are
vulnerable to the attack. Attacker has access to
a machine near the OpenSSL-based server. - Timing attacks also work in larger networks where
client and webserver are separated by multiple
routers and switches on the network backbone. - Run-time differences result in different zero-one
gaps. - Experiment highlights difficulty in determining
minimum number of queries for a successful
attack.
32Defenses
- Three Possible Defenses
- RSA Blinding
- Calculates
- x is then decrypted as normal, followed by
division by r. - Since r is random, x is random and timing the
decryption does not reveal information about the
key. - Performance penalty of 2-10.
33Continued
- 2. Try and make all RSA decryptions not
dependent upon the input ciphertext. - Harder to create and maintain the code when
decryption time is not dependant upon ciphertext. - 3. Require all RSA computations to be
quantized i.e. always take multiples of some
predefined time quantum. - Preferred method is Blinding.
- Drawbacks is that it requires a good source of
randomness to prevent attacks on blinding factor
leading to a small performance degradation -
34Conclusion
- Experiments show that timing attacks are
effective when carried out between machines
separated by multiple routers. - Timing attacks also effective on two processes on
the same computer. - Several Crypto libraries, including OpenSSL, now
implement blinding by default to prevent timing
attacks.