Cryptographic%20Algorithms%20Implemented%20on%20FPGAs - PowerPoint PPT Presentation

About This Presentation
Title:

Cryptographic%20Algorithms%20Implemented%20on%20FPGAs

Description:

Cryptographic Algorithms Implemented on FPGAs Why Secure Hardware? Embedded systems now common in the industry Hardware tokens, smartcards, crypto accelerators ... – PowerPoint PPT presentation

Number of Views:721
Avg rating:3.0/5.0
Slides: 81
Provided by: deltaCsCi
Category:

less

Transcript and Presenter's Notes

Title: Cryptographic%20Algorithms%20Implemented%20on%20FPGAs


1
Cryptographic Algorithms Implemented on FPGAs
2
Why Secure Hardware?
  • Embedded systems now common in the industry
  • Hardware tokens, smartcards, crypto accelerators,
    internet appliances
  • Detailed analysis reverse engineering
    techniques available to all
  • Increase difficulty of attack
  • The means exist

3
Attacker resources and methods vary greatly
Resource Teenager Academic Org. Crime Govt
Time Limited Moderate Large Large
Budget () lt1000 10K-100K 100K Unknown
Creativity Varies High Varies Varies
Detectability High High Low Low
Target Challenge Publicity Money Varies
Number Many Moderate Few Unknown
Organized No No Yes Yes
Spread info? Yes Yes Varies No
Source Cryptography Research, Inc. 1999, Crypto
Due Diligence
4
Minimal key lengths for symmetric ciphers
Source Blaze/Diffie/Rivest/Schneier/Shimoura/Thom
pson/Wiener www.bsa.org/policy/encryption
Type of attacker
Length needed for protection in late 1995
Budget
Tool
Time and cost per key recovered
40 bits
56 bits
Pedestrian Hacker SmallBusiness CorporateDepar
tment Big Company IntelligenceAgency
scavengedcomputer time FPGA FPGA FPGA ASIC FPGA
ASIC ASIC
infeasible 38 years(5,000)556
days(5,000)19 days(5,000)3 hours(38) 13
hours(5,000)6 min(38)12 sec(38)
tiny 400 10.000 300K 10M 300M
45 5055607075
1 week5 hours(0.08)12 min(0.08)24
sec(0.08)18 sec(0.001) 7 sec(0.08)0.005
sec(0.001)0.0002 sec(0.001)
5
Reconfigurable Hardware
  • Reconfigurable Hardware (RCHW) means in
    commercial applications mostly
  • Field Programmable Gate Arrays (FPGAs)
  • Erasable Programmable Logic Devices (EPLD).

6
Field Programmable Gate Arrays
  • Can realize a variety of circuits
  • can be reprogrammed in-system,
  • consist of boolean and storage elements,
  • can realize fairly large circuits gt 100 000
    gates.

7
Reconfigurable Computing - Characteristics
  • RC is the middle ground between ASICs and
    microprocessors. ASICs are the ultimate in speed
    but lack flexibility while processors have the
    ultimate in flexibility but lack speed.
  • Its key feature is the ability to perform
    computations in hardware to increase performance,
    while retaining much of the flexibility of a
    software solution.

8
Choosing a Platform
  • Choice of implementation is driven by
  • Algorithm performance
  • Cost Per-unit cost, Development cost
  • Power consumption (wireless devices!)
  • Flexibility
  • Parameter change
  • Key agility
  • Algorithm agility
  • Physical security

9
Platform Implementation for Cryptographic
Algorithms
Cryptographic Algorithms

Classic Hardware
Reconfigurable HW
Software
FPGAs
VLSI ASIC chips
General purpose ?Procs, Embedded ?Procs, etc.
10
Reconfigurable Computing - defined
ASIC
Processor
Reconfigurable Hardware
Performance
Flexibility
Unit Cost
Development Cost
11
Why Crypto-algorithms in Hardware
  • Two main reasons
  • Software implementations are too slow for some
    applications (symmetric alg encryption rates
    100 Mbit/sec public-key alg gt 10 msec)
  • Hardware implementations are intrinsically more
    physically secure Key access and algorithm
    modication is considerably harder.

12
But why reconfigurable hardware?
  • Potential advantages of crypto algorithms
    implemented on reconfigurable platforms
  • Algorithm Agility
  • Algorithm Upgrade
  • Architecture Efficiency
  • Resource Efficient
  • Algorithm Modification
  • (Throughput relative to software)
  • (Cost Efficiency relative to ASICs)

13
Crypto and FPGAs Algorithm Agility
  • Observation Modern security protocols are
    defined to
  • be algorithm independent
  • Encryption algorithm is negotiated on a
    per-session basis.
  • Wide variety of ciphers can be required. Ex
    IPsec-allowed algorithms DES, 3DES, Blow-Fish,
    CAST, IDEA, RC4 and RC6, future extensions!
  • Same holds for public-key algorithms, e.g.,
    Diffie-Hellman and ECDH.
  • Recall that ASIC solutions can provide
    algorithm agility
  • only at high costs.

14
Crypto and FPGAs Algorithm Upgrade
  • Applications may need upgrade to a new algorithm
    because
  • Current algorithms was broken (DES)
  • Standard expired (again DES)
  • New standard was created (AES)
  • Algorithm list of algorithm independent protocol
    was extended
  • Upgrade of ASIC-implemented algorithm is
    practically
  • infeasible if many devices are affected or in
    applications
  • such as satellite communications.

15
Crypto and FPGAs Architecture Efficiency
  • In certain cases a hardware architecture can be
    much more efficient if it is designed for a
    specific set of parameters. Parameters for
    cryptographic algorithms can be for example the
    key, the underlying finite field, the coefficient
    used (e.g., the specific curve of an ECC system),
    and so on. Generally speaking, the more specific
    an algorithm is implemented the more efficient it
    can become.

16
Crypto and FPGAs Resource Efficiency
  • Observation The majority of security protocols
    uses
  • private-key as well as public-key algorithms
    during one session, but not simultaneous.
  • Same FPGA device can be used for both through run
  • time reconguration.

17
Crypto and FPGAs Algorithm Modification
  • Some applications require Public algorithms (such
    as AES candidates) with proprietary modules,
    e.g., proprietary S-boxes or permutations.
  • Change of modes of operations (feedback modes,
  • counter mode, etc.)
  • Crypto-analytical implementation, such as
    key-search
  • machines, may use slightly altered version of the
  • algorithms.
  • With FPGAs, these changes can readily be
    implemented.

18
Motivation
19
  • Motivation(1) FPGAs
  • potential features

20
  • Motivation(1) FPGAs
  • CLB

Configurable Logic Block
4
Combinational Logic
1-bit reg
1-bit reg
4
Combinational Logic
Logic Mode
21
  • Motivation(1)
  • High density built-in modules

Virtex-II Pro
Feature/Product XC2VP2 XC2VP4 XC2VP7 XC2VP20 XC2VP30 XC2VP40 XC2VP50 XC2VP70 XC2VP100 XC2VP125
EasyPath cost reduction - - - - XCE2VP30 XCE2VP40 XCE2VP50 XCE2VP70 XCE2VP100 XCE2VP125
Logic Cells 3,168 6,768 11,088 20,880 30,816 43,632 53,136 74,448 99,216 125,136
Slices 1,408 3,008 4,928 9,280 13,696 19,392 23,616 33,088 44,096 55,616
BRAM (Kbits) 216 504 792 1,584 2,448 3,456 4,176 5,904 7,992 10,008
18x18 Multipliers 12 28 44 88 136 192 232 328 444 556
Digital Clock Management Blocks 4 4 4 8 8 8 8 8 12 12
Config (Mbits) 1.31 3.01 4.49 8.21 11.36 15.56 19.02 25.6 33.65 42.78
PowerPC Processors 0 1 1 2 2 2 2 2 2 4
Max Available Multi-Gigabit Transceivers 4 4 8 8 8 12 16 20 20 24
Max Available User I/O 204 348 396 564 644 804 852 996 1164 1200
1 Logic Cell (1) 4-input LUT (1) FF (1)
Carry Logic 1 CLB (4) Slices
http//www.xilinx.com/products/tables/fpga.htmv2p
22
Motivation(2) Cryptographic algorithms ?
Basic primitives
Survey by Stephen et al, LNCS 1482, Sep. 98
23
Motivation(1 2) Cryptographic algorithms
on FPGAs
  • Cryptographic algorithms
  • Simple logical operations - at a bit level
  • Replicated block
  • block length is high
  • FPGAs
  • FPGAs actually treat bit level operations
  • Blocks can be just copied
  • Parallelism is possible (high no. of IOs)
  • More physical security
  • Flexibility
  • High density

24
  • Motivation(3)
  • High Performance

25
  • Motivation(4)
  • Smart card applications

26
Case of Study Modular Exponentiation
27
But why are we interested in modular
exponentiation in the first place?
28
RSA cryptosystem by layers
Protocols and Applications SSL, TLS, WTLS, WAP,
etc.
PKCS User FunctionsPKCS1_OAEP_Encrypt,
PKCS1_OAEP_Decrypt, PKCS1_v15_Sign,
PKCS Primitives PKCS1_OAEP_Encode,
PKCS1_OAEP_Decode, etc
RSA primitive Operations Encryption C Me mod
n, Decryption M Cd mod n.
FP finite field operations Addition, Squaring,
multiplication, inversion and exponentiation
29
Public-Key Cryptography
30
Public-Key Cryptography
31
Modern Cryptosystems A Top-Down Model
Applications e-commerce, smart cards, digital
money, secure communications, etc.
Crypto-protocols Diffie-Hellman, authentication
protocols, etc.
Top level Crypto-primitives Key-pair generation,
Signing and Verification
Low-level crypto-primitives addition, doubling,
scalar multiplication
F2m finite field operations Addition, Squaring,
multiplication and inversion
32
AES (Rijndael) Algorithm Implementation
33
AES Advanced Encryption Standard (Rijndael)
Plain Text
128
AES
Selection of rounds
Key
128
  • AES Processes
  • Key Scheduling
  • Encryption
  • Decryption

128
Cipher Text
34
AES Advanced Encryption Standard
Input 128 bits 16 bytes
35
Key Scheduling
User-key
Generated- keys
.. ..
Round Key 0 Round Key 1 Round Key 3 .. Round Key 10
36
AES Encryption Algorithm Flow
USER KEY
SUB KEY
SUB KEY
IN
OUT
ARK
BS
ARK
BS
SR
ARK
(ROUND-1)
SR
MC
BS Byte Substitution SR Shift Rows MC Mix
Column ARK Add Round Key
37
1. Byte Substitution
SUB KEY
BS
ARK
SR
MC
S-BOX 16x16
a0,0 a0,1 a0,2 a0,3
a1,0 a1,1 a1,2 a1,3
a2,0 a2,1 a2,2 a2,3
a3,0 a3,1 a3,2 a3,3
b0,0 b0,1 b0,2 b0,3
b1,0 b1,1 b1,2 b1,3
b2,0 b2,1 b2,2 b2,3
b3,0 b3,1 b3,2 b3,3
State Matrix
38
2. ShiftRow(SR)
SUB KEY
BS
ARK
SR
MC
a b c d
f g h e
k l i j
p m n o
a b c d
e f g h
i j k l
m n o p
Offset 0
Offset 1
MC
Offset 2
Offset 3
a b c d
f g h e
k l i j
p m n o
a b c d
e f g h
i j k l
m n o p
Offset 0
Offset 1
IMC
Offset 2
Offset 3
39
3. MixColumn(MC) Inv MixColumn(IMC)
SUB KEY
BS
ARK
MC
SR
MC
i0,1,2,3
IMC
Every entry is represented in GF(28)
40
4. AddRoundKey(ARK)
SUB KEY
BS
ARK
SR
MC
key
41
AES Implementation Strategies
The commonly used architecures are
Iterative looping
repeated n times
Inner-round pipeling
one round
Loop unrolling
42
AES Implementation Strategies
Metrics to measure performance?
1
2
  • FPGAs Resources used
  • CLB slices
  • BRAMs
  • etc.

43
  • Design 1 Encryptor Core
  • Sequential vs. Pipelined Architecture

44
AES Algorithm ImplementationSequential Approach
USER-KEY
ROUND-KEY
ROUND-KEY
CLK
S
PLAIN TEXT
CIPHER TEXT
RND 0
RND 1-9
LATCH
RND 10
RCON
CLK
S
USER KEY
ROUND KEY
KGEN
LATCH
45
AES Algorithm ImplementationSequential Approach
Byte Substitution (BS) Look-up table method
B1
B1
S-Box (256 x 8)
B2
B2
S-Box (256 x 8)
4
16x1 RAM
1-bit reg
4
1-bit reg
16x1 RAM
Memory Mode
B15
B15
S-Box (256 x 8)
B16
B16
S-Box (256 x 8)
46
AES Algorithm ImplementationSequential Approach
SR
IN4 bytes
OUT4 bytes
b
a
b
c
c
d
d
a
Just change of wires, No space occupied
47
AES Algorithm ImplementationSequential Approach
AddRoundKey
Key
Here xtime(v) represents 02v.
48
Performance results
Target device VirtexE XCV812 Tools used Xilinx
Foundation Tool F4.1i CLB slices 2744 (22
) BRAMs No used I/Os 385 (95 ) Achieved
Frequency 20.192 MHz Throughput 258.5
Mbits/s Throughput/Area 0.09
49
AES Algorithm Implementation Pipelined Approach
IN REG
RND 0
RND 1
RND 2
RND 3
RND 4
RND 5
RND 6
RND 7
RND 8
RND 9
RND 10
OUT
IN
RK 10
RK 0
RK 1
RK 2
RK 3
RK 4
RK 5
RK 6
RK 7
RK 8
RK 9
IN REG
KGEN
KGEN
KGEN
KGEN
KGEN
KGEN
KGEN
KGEN
KGEN
KGEN
KGEN
USER- KEY
50
Performance results
Target device VirtexE XCV812 Tools used Xilinx
Foundation Tool F4.1i CLB slices 2136 (18
) BRAMs 100 I/Os 385 (95 ) Achieved
Frequency 22.41 MHz Throughput 2868
Mbits/s Throughput/Area 1.29
51
  • Design 2 Encryptor/Decryptor Core
  • MixColumn Inv. MixColumn Modified

52
MixColumn(MC) Inv MixColumn(IMC) Revisted
MC
IMC
Every entry is represented in GF(28)
53
MixColumn(MC) Inv MixColumn(IMC) Cont
MC
IMC
02(x)
Where
04(x)
02(x)
08(x)
  • The co-efficient for IMC have higher hamming
    weight ?
  • It is a costly operation?

54
MixColumn(MC) Inv MixColumn(IMC) Cont
We observe that,
(1) (2)
The biggest co-efficient for Eq.2 is, 05
Eq.1, we already have, Eq.2 calculation can be
made before Eq.1
55
Data Path for Encryption/Decryption
E/D
AF
MC
E/D
ENC
ARK
SR
OUT
IN
MI
ISR
IMC
DEC
IAF
IARK
E/D
E/D
AF
ENC
SR
MC
OUT
IN
MI
ARK
ISR
ModM
DEC
IAF
Encryption MI AF SR MC ARK Decryption
ISR IAF MI ModM MC ARK
56
Performance results
Target device VirtexE XCV2600 Tools used Xilinx
Foundation Tool F4.1i CLB slices 5677 (22.3
) BRAMs 80 (43) I/Os 386 (48 ) Achieved
Frequency 34.2 MHz Throughput 4121
Mbits/s Throughput/Area 0.73
57
  • Design 3 Encryptor/Decryptor Core
  • S-Box Inv. S-Box

58
Byte Substitution (Revisited)
S-BOX 256 x 8
b0,0 b0,1 b0,2 b0,3
b1,0 b1,1 b1,2 b1,3
b2,0 b2,1 b2,2 b2,3
b3,0 b3,1 b3,2 b3,3
a0,0 a0,1 a0,2 a0,3
a1,0 a1,1 a1,2 a1,3
a2,0 a2,1 a2,2 a2,3
a3,0 a3,1 a3,2 a3,3
State Matrix
59
BS and Inverse BS
S-BOX
MI
AF
IN
INV S-BOX
IAF
MI
in GF(28)
E/D
S-BOX
AF
MI
IN
IAF
INV S-BOX
60
MI 1st Approach
E/D
E/D
AF
MC
SR
ARK
MI
OUT
IN
ISR
IMC
IAF
IARK
  • MI with Look-up Table
  • Same S-Box (MI) for encryption/decryption
  • Memory requirements become half
  • BRAMs are used for storing MI values.
  • No initial time to prepare them

61
Performance results
Target device VirtexE XCV2600 Tools used Xilinx
Foundation Tool F4.1i CLB slices 6677 (26.3
) BRAMs 80 (43) I/Os 386 (48 ) Achieved
Frequency 30 MHz Throughput 3840
Mbits/s Throughput/Area 0.58
62
MI 2nd Approach
Ist Transformation
MI Manipulation
2nd Transformation
M-1
M
FIELD F TO GF(28)
GF(28) TO FIELD F
GF(24)
MI Three-Stage Strategy S. Morioka and A. Satoh,
CHES 2002
  • MI with Composite Fields GF(22)2 GF(24)2
  • Map the element A ? GF(28) to a composite field F
  • Compute the Multiplicative Inverse over the field
    F
  • Map back from field F to GF(28)

63
MI Implementation
AH
AH
GF(28) to GF(24)
GF(24) to GF(28)
4
Xl
X2
Mul 4x4
lAH
2
4
8
A17
8
X -1
AL
A
A-1
4
ALA16
4
Mul 4x4
Mul 4x4
AL
A16
Let A?F2 and A AH y AL , then it can be shown
that
(
)
16



A
A
y
A
A

L
H
H
(
)
16
2
16
16
16
17








l
l
A
A
A
A
A
A
A
y
A
A
A
0
L
L
H
L
L
H
H
64
Performance results
Target device VirtexE XCV2600 Tools used Xilinx
Foundation Tool F4.1i CLB slices 13416 (52
) BRAMs no I/Os 386 (48 ) Achieved
Frequency 24.5 MHz Throughput 3136
Mbits/s Throughput/Area 0.24
65
AES Algorithm Implementations
Results Comparison
66
Sequential Vs Pipeline design
Sequential Design
Pipeline Design
67
MixColumn vs Inv MixColumn
Device BRAMs CLB(S) Slices Throughput (Mbits/s)(T) T/S
McLoone et al XCV3200E 102 7576 3239 0.43
This design XCV2600E 80 5677 4121 0.73
  • Two approach for MC/IMC
  • Less BRAMs
  • Less Slices
  • Higher Throughput reported to-date


68
S-Box Vs Inv S-Box
Device BRAMs CLB(S) Slices Throughput (Mbits/s)(T) T/S
McLoone XCV3200E 102 7576 3239 0.43
E/D GF(28) XCV2600E 80 6676 3840 0.58
E/D GF(24) XCV2600E No BRAMs 13416 3136 0.24
  • Two approaches for MI
  • Key Scheduling included
  • No initial delay
  • First design uses look-up table for MI,
  • Fast but high memory requirements
  • Second design use composite field approach
  • for MI, Slower with less memory requirements.
  • Both are efficient as compared to reported design

69
Modular Exponentiation Binary Method Variations
70
Side Channel Attacks
Algorithm Binary exponentiation Input a in G,
exponent d (dk,dk-1,,d0) (dk is the most
significant bit) Output c ad in G 1. c
a 2. For i k-1 down to 0 3.
c c2 4. If di 1 then c ca
5. Return c
The time or the power to execute c2 and ca are
different (side channel information).
Algorithm Corons exponentiation Input a in G,
exponent d (dk,dk-1,,dl0) Output c ad in
G 1. c0 1 2. For i k-1 down to
0 3. c0 c02 4. c1
c0a 5. c0 cdi 6. Return
c0
71
Mod. Exponentiation LSB-First Binary
  • Let k be the number of bits of e, i.e.,
  • Input M, e, n.
  • Output C Me mod n
  • R 1 C M
  • For i 0 to n-1
  • If ei 1 then R R?C mod n
  • C C2 mod n
  • Return C

72
Modular Exponentiation LSB First Binary
  • Example e 250 (11111010), thus k 8

i ei Step 3 (R) Step 4 (C)
7 0 1 M2
6 1 1(M)2 M2 (M2)2 M4
5 0 M2 (M4)2 M8
4 1 M2 M8 M10 (M8)2 M16
3 1 M10 M16 M26 (M16)2 M32
2 1 M26 M32 M58 (M32)2 M64
1 1 M58 M64 M122 (M64)2 M128
0 1 M122 M128 M250 (M128)2 M256
73
Modular Exponentiation LSB First Binary
  • The LSB-First binary method requires
  • Squarings k-1
  • Multiplications The number of 1s in the binary
    expansion of e, excluding the MSB.
  • The total number of multiplications
  • Maximum (k-1) (k-1) 2(k-1)
  • Minimum (k-1) 0 k-1
  • Average (k-1) 1/2 (k-1) 1.5(k-1)
  • Same as before, but here we can compute the
    Multiplication operation in parallel with the
    squarings!!

74
Arquitectura del MultiplicadorMario García et
al ENC03
75
Desarrollo (Método q-ario)
76
Desarrollo (Método q-ario)
  • Precálculo de W.
  • Tamaño de q.
  • Cálculo de d 2p q

77
Desarrollo (Análisis)
  • Tamaño de memoria y tiempo de ejecución del
    precómputo W.
  • Número de multiplicaciones y elevaciones al
    cuadrado para método q-ario.

78
Tiempo de Ejecución Vs. Número de Procs.
79
Tamaño de Memoria
80
First Layer Field Multiplication
  • Preliminary results yield a time delay of 50-70
    ?Sec and ?9K Slices of hardware resources
    utilization.
Write a Comment
User Comments (0)
About PowerShow.com