Title: Design of a Reconfigurable Hardware
1Design of a Reconfigurable Hardware
- For Efficient Implementation of Secret Key and
Public Key Cryptography
2Presentation Outline
- Introduction Motivation
- Related Work
- Design Methodology
- Design Description
- Algorithm Implementations
- Comparison with other Work
- Programming Paradigm
- Conclusion/Work in Progress
3Motivating Factors
- Need for high speed cryptography
- Need for algorithm independence
- Need for more secure implementations
- Need for implementing both Symmetric and
Asymmetric key encryption
4Need for High Speed Implementations
- Software implementations cannot provide real time
rates - Hardware implementations essential for
- IPSec end points
- SSL servers
- VPN at rates exceeding ATM
- Algorithm implementation must be able to sustain
the network bandwidth
5Need for Algorithm Independence
- IPSec
- Cipher Algorithm Specified in Security
Association (SA) - SSL Transactions
- Algorithm Negotiable for both Key Exchange
Encryption - Need for Both Secret Key and Public Key
Encryption - Session establishment - Large Number of
transactions - Dedicated hardware not cheap!
6Hardware Implementation Benefits
- More secure implementations
- Implementing both algorithms in hardware removes
bottleneck associated with slow computations in
key establishment - Single hardware implementation supporting both
algorithms reduce costs of separate hardware
7Advantages of Reconfigurable Hardware
Implementations
- Algorithm Agility
- Algorithm Upload/Modification
- Architecture Efficiency/Throughput
- Cost Efficiency
8Comparison of Different Approaches
9FPGAs?
- Post Fabrication Customization
- Low Cost Design Cycle
- Fast turnaround time
- Potential for Parallelism
- Instruction-level Multiple operations
- Data-level Multiple blocks of data
- Task-level Parallel tasks (e.g. secret key)
10FPGA The basics
- General purpose logic elements (LUTs)
- Very flexible interconnect
- Basically fine grained to support both data paths
and random logic
11FPGA Disadvantages
- Too much flexible inefficiencies
- Too fine grained again inefficiencies
- Block ciphers primarily data flow oriented
implemented using a large number of small
elements - Ciphers have a well defined data flow general
purpose interconnect end up being slow and
overkill in terms of area
12FPGA vs. Specialized Reconfigurable Logic
- Coarse grained vs. Fine grained
- Specialized interconnect vs. generic interconnect
- Reduced reconfiguration times
- End result
- Faster performance with reduced area while
maintaining enough flexibility to support the
application domain
13Issues in Reconfigurable Hardware Designs
- How much of what to support?
- How many functional units?
- What kinds of functional units?
- How much support for random logic?
- How much interconnect flexibility to allow?
- Programming/CAD tools
- What kind of programming model to target
- How to design efficient automated tools
14Custom Reconfigurable Hardware Design- Whats
involved?
- Looking for commonalities/overlaps as well as
disjoint elements - Identify crucial components
- Utilize potential overlap or partial reuse
- Generic enough but fast components
- Minimizing the differences in component types
- Balancing the resources
- Upper bounds/Lower bounds
- Logic units vs. memory blocks
- Determining exact number of each type of unit
- Make the common case fast- IMPORTANT ALWAYS!
15Related Work
- Cavium Networks SSL IPSEC Protocol Aware
Security Processor - USC Mark II s Advanced Cryptographic Engine for
IPsec - Worcester Polytechnic Institutes COBRA
Architecture
16SSL/IPsec Security Processor
- Support for both public key and secret key
encryption - Not Reconfigurable
- Dedicated hardware blocks for each operation
17Advanced Cryptographic Engine (ACE)
- Designed to implement flexible cipher needs of
IPsec - Only supports block ciphers
- Support for any algorithm through a library of
general purpose FPGA implementations
18COBRA Architecture
- Custom Reconfigurable Hardware for block ciphers
- Each RCE is a macro block supporting various
component operations - Configured using VLIW instructions
19Design Methodology
- Literature Survey
- Block cipher implementations
- Public key cipher implementations
- Identifying essential components of efficient
implementations - Iterative Development of Architecture
- Validation by mapping several representative
algorithms - Identification of Programming Methodology
20Categorizing Implementation Requirements
- Essential step to handle the design complexity
- Logic Requirements
- Interconnection Requirements
- Memory (RAM/ROM) Requirements
- Area and Performance directly affected by these
21Prioritizing Support
- Ordered by importance and then by relative
hardware complexity - AES (Rijndael)
- DES
- Modular Exponentiation (RSA)
- Serpent
- Twofish
- RC6, MARS, and others
22Block Ciphers Key Elements
- Bitwise XOR, AND, OR.
- Addition or subtraction modulo 2n
- Shift or rotation by a constant number of bits.
- Data-dependent rotation by a variable number of
bits. - Multiplication modulo the table entry value.
- Multiplication in the Galois field specified by
the table entry value. - Inversion modulo the table entry value.
- Look-up-table substitution
23Block Cipher Core Operations
24Modular Multiplication and Exponentiation
- Modular Exponentiation implemented with multiple
and square algorithm - Montgomery Multiplication algorithm the most
popular for modulo multiplication - Various Approaches for Implementation
- Systolic Array
- Word Based
25ME MM
- ME primarily requires fast adders
- CSA based implementation most common
- The highest throughput implementation used
redundant representation with carry save adders
for computation of partial results - The same implementation style thus selected for
ME
26Our Design Key Insight
- CSA made up of 2 half adders with 1 OR gate
- Each half adder itself 1 XOR 1 AND
- Add some configurability to the basic CSA
- Result A fast basic element with support for
most of primitive operations
27So What Else is needed?
- Shifts between rounds of addition (for modulo
exponentiation) - support for fixed length shifts, rotates
arbitrary permutes of 32-bit operands (for
symmetric key) - Solution A Permutation Unit!
28Structure of Proposed Design
- Final Design arrived upon by iterative refinement
- Hierarchical Design
- Cell
- Block/Cluster
- Groups
- Top of Hierarchy
29The Cell
30The Block/Cluster
31Group
32Interconnects In a Group
33Overall Structure
34Random Logic Support