Dynamic High-Performance Multi-Mode Architectures for AES Encryption - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Dynamic High-Performance Multi-Mode Architectures for AES Encryption

Description:

Dynamic High-Performance Multi-Mode Architectures for AES Encryption Eric Swankoski Naval Research Lab Vijay Narayanan Penn State University Background & Motivation ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 23
Provided by: klabsOrgma9
Learn more at: http://klabs.org
Category:

less

Transcript and Presenter's Notes

Title: Dynamic High-Performance Multi-Mode Architectures for AES Encryption


1
Dynamic High-Performance Multi-Mode Architectures
for AES Encryption
  • Eric Swankoski
  • Naval Research Lab
  • Vijay Narayanan
  • Penn State University

2
Background Motivation
  • Bandwidth and throughput capabilities of modern
    optical networks is skyrocketing
  • Protecting transmitted data becoming more and
    more critical
  • Current encryption architectures generally arent
    capable of keeping up with high-speed
    environments
  • SEU effects rarely, if ever, considered

3
Plan of Attack FPGA Encryption
  • Algorithm Advanced Encryption Standard (AES)
  • Supports multiple key lengths
  • Supports multiple encryption modes
  • Supports multiple levels of pipelining
  • Target Architecture Xilinx FPGAs
  • Can be adapted to ASIC devices
  • Virtex-II, Virtex-4
  • Target Performance 60 gigabits per second
  • Requires both inner-round and outer-round
    pipelining

4
The AES Algorithm
  • 10 Rounds of Encryption for 128-bit operands
  • Four basic operations
  • SubBytes
  • 8-bit substitution (16 parallel operations per
    round)
  • ShiftRows
  • Byte reordering and rotation (4 parallel
    operations per round)
  • MixColumns
  • Polynomial multiplication (4 parallel operations
    per round)
  • AddRoundKey
  • Simple 128-bit XOR

5
Optimizing for Performance
  • Exploit all possible parallelism
  • Alternative byte substitution methods
  • 1 cycle for a lookup-based substitution
  • 5 cycles for a mathematical transformation
  • Utilize pipelining
  • Outer-Round 1 cycle per round
  • Inner-Round
  • 4 cycles per round (lookup-based byte
    substitution)
  • 8 cycles per round (pipelined byte substitution)

6
Combinatorial Byte Substitution
  • Actual mathematical transformation
  • Conventional implementation cannot be pipelined
  • Simple (atomic) 8x8 lookup table
  • Smaller than lookup table
  • Faster than lookup table
  • Utilizes five-stage pipeline
  • All internal operands are four bits wide

7
Encryption Round Diagram
  • Atomic S-Box
  • 40 Pipeline Stages
  • Combinatorial S-Box
  • 76 Pipeline Stages
  • Needs a constant stream to be effective
  • Parallel Key Scheduling
  • No performance penalty
  • Offline Key Scheduling
  • Precomputed keys can be stored in registers

8
Counter (CTR) Mode
  • Effectively converts AES into a stream cipher
  • High security similar to CBC
  • Supports inner-round and outer-round pipelining
  • No error propagation errors are completely
    isolated

9
Cipher Block Chaining (CBC) Mode
  • Most secure no patterns are observed
  • Cannot be pipelined
  • 100 downstream corruption resulting from data
    loss or single-event upsets (SEUs) during
    encryption
  • Errors are isolated during decryption

10
Electronic Codebook (ECB) Mode
  • Supports full pipelining
  • No error propagation errors are completely
    isolated
  • Least secure identical input gives identical
    output
  • Patterns observable in video and image data

11
Staggered CBC Mode
  • Pipelined with Output Feedback
  • Each encrypted block n depends on itself and the
    block (n x) where x is the latency of the
    pipeline
  • Maintains security while mitigating some error
    propagation problems

12
More Challenges
  • Error-Tolerant Encryption
  • Maintaining High Security
  • Maintaining High Performance

13
Error-Tolerant Encryption
  • Are errors acceptable?
  • Possibly, but better to assume not
  • How do the multiple modes of encryption deal with
    upsets?
  • Is there a benefit to triple modular redundancy
    (TMR)?
  • Is it what we expect?

14
Error-Tolerant Encryption
  • CTR and ECB encryption isolate errors
  • Transmission integrity largely preserved even
    without SEU mitigation
  • TMR can ensure 100 transmission integrity
  • TMR REQUIRED for CBC encryption

15
Error-Tolerant Encryption
  • Image 1 Error-Free Plaintext Image
  • Before Encryption / After Decryption
  • CTR, ECB, or CBC with mitigation
  • Image 2 Decrypted Plaintext Image
  • One corrupted block
  • CTR or ECB without mitigation
  • Image 3 Decrypted Plaintext Image
  • One block corrupted during encryption
  • CBC without mitigation

16
Maintaining High Security
  • How do the multiple modes of encryption affect
    security?
  • Is physical protection of the key necessary?
  • Depends on the environment
  • How is throughput affected by increased security?
  • Hopefully, not at all

17
Maintaining High Security
  • ECB-encrypted image has observable patterns
  • CTR/CBC/SCBC encryption looks like random noise

18
Maintaining High Security
  • Physical Key Protection
  • Not required in aerospace applications
  • Power Analysis / Soft Attacks
  • Countermeasures not mode specific
  • Throughput Effects
  • ECB CTR far outperform CBC
  • Why is CBC an official mode?

19
System-Level Diagram
  • Supports ECB, CTR, CBC, and SCBC modes
  • Supports two types of TMR
  • System triplicates all control, key hardware,
    and mode logic
  • Encryption triplicates only encryption and key
    scheduling hardware

20
Performance Results Virtex-4
Byte Substitution Key Scheduling Area Frequency Throughput (CTR, ECB, SCBC) Throughput (CBC)
ROM Online 3588 339.5 MHz 43.5 Gbps 1.088 Gbps
ROM Offline 2827 446.8 MHz 57.2 Gbps 1.430 Gbps
Combinatorial Online 13651 519.2 MHz 66.5 Gbps 700.0 Mbps
Combinatorial Offline 10912 519.2 MHz 66.5 Gbps 700.0 Mbps
  • Key Scheduling
  • Offline uses precomputed and stored keys (compile
    or design time)
  • Online uses dynamically computed keys (run time)
  • Significant performance improvement for
    combinatorial byte substitution in pipelined mode
  • Virtex-II Pro performs better with ROM
    implementation (56.42 60.35 Gbps)
  • Better CBC performance achieved through other
    architectures

21
Lessons Learned
  • Dont try to over-optimize FPGA code
  • Returns diminish quickly
  • Sometimes less is more
  • Know your synthesis tool
  • Now why did it do THAT?
  • Check your systems memory
  • RAM does fail at inopportune times
  • ESPECIALLY if it has a lifetime warranty

22
Lessons Learned
  • Over-optimization
  • In a highly pipelined FPGA design, routing plays
    a MAJOR role in the clock frequency
  • 70-80 of the total delay
  • What would work in an ASIC (or in theory, or on
    paper) might actually make things worse
  • Manual floorplanning and PR might help, but
    usually provides minimal (if any) improvement
  • Moral? Try reducing the pipeline depth as well
    as increasing it, it just might help!
Write a Comment
User Comments (0)
About PowerShow.com