Title: An Improved AES Design
1An Improved AES Design
Chia-Lung Horng Cheng-Wen Wu
- Laboratory of Reliable Computing
- Department of Electrical Engineering
- National Tsing Hua University
- Hsinchu, Taiwan
2Outline
- Introduction
- AES algorithm
- On-the-fly key scheduler
- Architecture of improved AES
- Experimental results
- Conclusions and future work
3Outline
- Introduction
- A brief overview of AES development
- Previous works
- Proposed efficient AES design
- AES algorithm
4Introduction (1)
- Due to the growth of wireless communication and
Internet, network security has become more and
more important. - A conception of network security
- Ex
- symmetric-key and asymmetric-key cryptography.
5Introduction (2)
- Symmetric-key cryptography is called private-key
cryptography. - Using substitution and permutation to cause
diffusion and confusion. - So, a set of operations is executed iteratively
in the symmetric-key crypto-system. - Be simple to implement and have a high
throughput. - DES, 3DES, and AES are kinds of symmetric-key
cryptography.
6History of AES Development
- 1977 Data Encryption Standard (DES) was
published by NBS - 1997 NIST announced the initiation of the AES
development - 1999 There are five algorithms (MARS, RC6,
Rijndael, Serpent, and Twofish) selected
by NIST - 2000 NIST announced that it has selected
Rijndael to propose for the AES - 2001 A final AES standard (FIPS PUB 197) was
published by NIST
7Previous Works
- Because high throughput is needed, software
implementation is not enough. - Many papers about AES algorithm have been
proposed. - They focused mainly on the implementation of
SubByte() transformation (non-linear operation). - They assumed the Round Key were stored before
encryption and decryption.
8Proposed Efficient AES Design
- We present a more efficient hardware design for
AES algorithm. - By loop unrolling and scheduling, the
architecture is modified to reduce the critical
path. - Using on-the-fly key scheduler instead of
storage-based key scheduler to reduce power
dissipation.
9Outline
- Introduction
- AES algorithm
- Algorithm specification
- Cipher
- Key schedule
- Inverse cipher
- On-the-fly key scheduler
10AES Specification (1)
- The length of the input block and the output
block is 128 bit. - Having different key lengths, and different round
numbers
11AES Specification (2)
- Using a round function for both its Cipher and
Inverse Cipher. - A round function is composed of four different
byte-oriented transformations - Non-linear transformation
- byte substitution (SubByte())
- Linear transformation
- shifting rows of the State array (ShiftRow())
- mixing the data within each column (MixColumn())
- adding a Round Key to the State (AddRoundKey())
12Cipher
13SubByte() (1)
- Byte substitution using a substitution table
(S-Box). - S-Box, can be invertible, is composed of two
components - a multiplicative inverse in GF(28) with
irreducible polynomial m(x) x8 x4 x3 x
1 11B - a multiplication with a constant 1F over GF(2),
followed by a XOR with 63
14SubByte() (2)
Ex If input xy is 66, output will be 33
from SubByte()
15ShiftRow()
- Shifting each row of the State by different
offsets cyclically. - The offset is a function of row number r and Nb,
f(r, Nb).
cyclically shifts the last three rows in the State
16MixColumn() (1)
- A linear operation on each column (32-bit word)
of the State. - Viewed as polynomial over GF(28) and multiplied
modulo x41 with a fixed polynomial k(x) - If input is S3, S2, S1, S0T, output of
MixColumn() is S3, S2, S1, S0T - S(x) S3x3 S2x2 S1x S0
- k(x) 03x3 01x2 01x 02
- S(x) S3x3 S2x2 S1x S0
- S(x) S(x) ? k(x) mod x4 1 over GF(28)
17MixColumn() (2)
- Written as a matrix form
- k(x) is coprime to x4 1 and so is invertible.
- Ex S(x) S(x) ? k-1(x) mod x4 1
- k(x)-1 0Bx3 0Dx2 09x 0E
18AddRoundKey()
- Adding the Round Key to the State over GF(2).
- AddRoundKey() is its own inverse.
19Key Schedule
- Generating the round key from the cipher key.
- A total of Nb(Nr1) round key to be generated.
- Key schedule consists of two components.
- Key expansion
- generates the expanded key from the cipher key.
- Round key selection
- decides the round key from the expanded key.
20Pseudo Code of Key Expansion
21Inverse Cipher
22Outline
- AES algorithm
- On-the-fly key scheduler
- Key scheduling for different key length
- The efficient 3-in-1 construction
- Architecture of improved AES
23On-the-Fly Key Scheduler
- Different from pre-computation method, do not
need storage elements (ex RAM, registers). - Consume less power than storage-based key
scheduler. - Generating the round key when it is needed (when
that round function is executed). - Generating the 128-bit round key in each round
function. - Proposing the efficient 3-in-1 construction.
24Key Scheduling for 128-bit Key
- The key scheduling of 128-bit key for encryption.
25Key Scheduling for 192-bit Key
- The key scheduling of 192-bit key for encryption.
26Rearrangement for 192-bit Key
27Key Scheduling for 256-bit Key
- The key scheduling of 256-bit key for encryption.
28Rearrangement for 256-bit Key
29The Efficient 3-in-1 Construction
- Supporting an architecture for key scheduling of
different key length. - By properly shuffling the expanded key, the round
key is generated in a time. - Assuming that all operations in a round function
is executed in a clock cycle. - The complex operations, like SubWord(), are
processed separately to reduce the critical path.
30The Data Path of The Construction
31Outline
- On-the-fly key scheduler
- Architecture of improved AES
- Design consideration
- Hardware architecture
- Low power consideration
- Experimental results
32Design Specification
- Supporting encryption and decryption.
- Supporting for different key length.
- Supporting for different operating mode.
- Like ECB, CBC, and CTR mode
- Using AMBA interface to easily integrate in SOC.
- Higher performance, lower area, and lower power
are expected. (Of course, it is impossible to do
all the things.)
33Design Consideration
- According to proposed papers, the composite field
is a better way for multiplicative inverse. - Instead of storage elements, the On-the-Fly key
scheduler is used. - By loop unrolling and scheduling, a modified
round function is proposed. - For simplicity, a round function is separated by
two components - Non-linear part
- Linear part
34Pseudo Code of Modified Cipher
35Hardware Architecture
- Our AES design includes three main parts.
- Controller
- I/O control, main control, and key control
- Datapath
- En/De datapath, and key scheduler
- Storage elements
- input/output buffer, keys, and initial vectors
36The Architecture of Our AES Design
37FSM of Main Controller
- Doing key schedule to generate final key before
any encryption and decryption. - Without any idle between continuous encryption
and decryption.
38The AES Datapath
- Mapping before and after the round function.
- The round function includes non-linear and linear
parts.
39Low Power Issue
- Mainly, reduce unnecessary power dissipation.
- Minimizing the number of operations.
- Resource sharing
- Reduce the operation complexity
- Minimizing the transition activity.
- Using encode technique (ex gray code, hot code)
to reduce control signal transition - Using LFSR instead of counter
- Balance different fan-in path, and holding signal
when idle to avoid unnecessary transition
activity - Using clock gating to reduce clock power.
40Outline
- Architecture of improved AES
- Experimental results
- simulation and verification
- result analysis
- Conclusions and future work
41Simulation and Verification
- Simulation
- RTL simulation and gate-level simulation are done
- The simulation results are correct
- Verification
- FPGA verification
- Not complete
42Implement Results (1)
- Process technology UMC 0.18um
- Total cell area 665.7K (gate count 67.9K)
- Clock rate 125MHz under the worse case condition
- throughput
- 1.6Gb/s for 128-bit keys,
- 1.33Gb/s for 192-bit keys,
- 1.14Gb/s for 256-bit keys,
- Fault coverage
- 98.02 using 238 scan test patterns
43Implement Results (2)
- Power analysis under the worse case condition
(fast type). (frequency is 111MHz) - Using PrimePower tool.
44Implement Results (3)
45Outline
- Experimental results
- Conclusions and future work
- conclusions
- future work
46Conclusions
- We present an improved implementation of the AES
algorithm. - By loop unrolling and scheduling, critical path
is reduced. - Using on-the-fly key scheduler to reduce power
dissipation. - The max. throughput is about 1.6 Gbps and the
area overhead is about 67.9K gate.
47Future Work
- High throughput and low power is a trend in the
security application in the future. - The datapath can be pipelined to increase the
throughput. - Our design fits to be an IP core, and it can be
integrated into other chip.