Design of a Reconfigurable Hardware - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Design of a Reconfigurable Hardware

Description:

How to design efficient automated tools. Custom Reconfigurable Hardware Design- What's involved? ... Our Design: Key Insight. CSA made up of 2 half adders with ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 35
Provided by: MustafaI7
Category:

less

Transcript and Presenter's Notes

Title: Design of a Reconfigurable Hardware


1
Design of a Reconfigurable Hardware
  • For Efficient Implementation of Secret Key and
    Public Key Cryptography

2
Presentation Outline
  • Introduction Motivation
  • Related Work
  • Design Methodology
  • Design Description
  • Algorithm Implementations
  • Comparison with other Work
  • Programming Paradigm
  • Conclusion/Work in Progress

3
Motivating Factors
  • Need for high speed cryptography
  • Need for algorithm independence
  • Need for more secure implementations
  • Need for implementing both Symmetric and
    Asymmetric key encryption

4
Need for High Speed Implementations
  • Software implementations cannot provide real time
    rates
  • Hardware implementations essential for
  • IPSec end points
  • SSL servers
  • VPN at rates exceeding ATM
  • Algorithm implementation must be able to sustain
    the network bandwidth

5
Need for Algorithm Independence
  • IPSec
  • Cipher Algorithm Specified in Security
    Association (SA)
  • SSL Transactions
  • Algorithm Negotiable for both Key Exchange
    Encryption
  • Need for Both Secret Key and Public Key
    Encryption
  • Session establishment - Large Number of
    transactions
  • Dedicated hardware not cheap!

6
Hardware Implementation Benefits
  • More secure implementations
  • Implementing both algorithms in hardware removes
    bottleneck associated with slow computations in
    key establishment
  • Single hardware implementation supporting both
    algorithms reduce costs of separate hardware

7
Advantages of Reconfigurable Hardware
Implementations
  • Algorithm Agility
  • Algorithm Upload/Modification
  • Architecture Efficiency/Throughput
  • Cost Efficiency

8
Comparison of Different Approaches
9
FPGAs?
  • Post Fabrication Customization
  • Low Cost Design Cycle
  • Fast turnaround time
  • Potential for Parallelism
  • Instruction-level Multiple operations
  • Data-level Multiple blocks of data
  • Task-level Parallel tasks (e.g. secret key)

10
FPGA The basics
  • General purpose logic elements (LUTs)
  • Very flexible interconnect
  • Basically fine grained to support both data paths
    and random logic

11
FPGA Disadvantages
  • Too much flexible inefficiencies
  • Too fine grained again inefficiencies
  • Block ciphers primarily data flow oriented
    implemented using a large number of small
    elements
  • Ciphers have a well defined data flow general
    purpose interconnect end up being slow and
    overkill in terms of area

12
FPGA vs. Specialized Reconfigurable Logic
  • Coarse grained vs. Fine grained
  • Specialized interconnect vs. generic interconnect
  • Reduced reconfiguration times
  • End result
  • Faster performance with reduced area while
    maintaining enough flexibility to support the
    application domain

13
Issues in Reconfigurable Hardware Designs
  • How much of what to support?
  • How many functional units?
  • What kinds of functional units?
  • How much support for random logic?
  • How much interconnect flexibility to allow?
  • Programming/CAD tools
  • What kind of programming model to target
  • How to design efficient automated tools

14
Custom Reconfigurable Hardware Design- Whats
involved?
  • Looking for commonalities/overlaps as well as
    disjoint elements
  • Identify crucial components
  • Utilize potential overlap or partial reuse
  • Generic enough but fast components
  • Minimizing the differences in component types
  • Balancing the resources
  • Upper bounds/Lower bounds
  • Logic units vs. memory blocks
  • Determining exact number of each type of unit
  • Make the common case fast- IMPORTANT ALWAYS!

15
Related Work
  • Cavium Networks SSL IPSEC Protocol Aware
    Security Processor
  • USC Mark II s Advanced Cryptographic Engine for
    IPsec
  • Worcester Polytechnic Institutes COBRA
    Architecture

16
SSL/IPsec Security Processor
  • Support for both public key and secret key
    encryption
  • Not Reconfigurable
  • Dedicated hardware blocks for each operation

17
Advanced Cryptographic Engine (ACE)
  • Designed to implement flexible cipher needs of
    IPsec
  • Only supports block ciphers
  • Support for any algorithm through a library of
    general purpose FPGA implementations

18
COBRA Architecture
  • Custom Reconfigurable Hardware for block ciphers
  • Each RCE is a macro block supporting various
    component operations
  • Configured using VLIW instructions

19
Design Methodology
  • Literature Survey
  • Block cipher implementations
  • Public key cipher implementations
  • Identifying essential components of efficient
    implementations
  • Iterative Development of Architecture
  • Validation by mapping several representative
    algorithms
  • Identification of Programming Methodology

20
Categorizing Implementation Requirements
  • Essential step to handle the design complexity
  • Logic Requirements
  • Interconnection Requirements
  • Memory (RAM/ROM) Requirements
  • Area and Performance directly affected by these

21
Prioritizing Support
  • Ordered by importance and then by relative
    hardware complexity
  • AES (Rijndael)
  • DES
  • Modular Exponentiation (RSA)
  • Serpent
  • Twofish
  • RC6, MARS, and others

22
Block Ciphers Key Elements
  • Bitwise XOR, AND, OR.
  • Addition or subtraction modulo 2n
  • Shift or rotation by a constant number of bits.
  • Data-dependent rotation by a variable number of
    bits.
  • Multiplication modulo the table entry value.
  • Multiplication in the Galois field specified by
    the table entry value.
  • Inversion modulo the table entry value.
  • Look-up-table substitution

23
Block Cipher Core Operations
24
Modular Multiplication and Exponentiation
  • Modular Exponentiation implemented with multiple
    and square algorithm
  • Montgomery Multiplication algorithm the most
    popular for modulo multiplication
  • Various Approaches for Implementation
  • Systolic Array
  • Word Based

25
ME MM
  • ME primarily requires fast adders
  • CSA based implementation most common
  • The highest throughput implementation used
    redundant representation with carry save adders
    for computation of partial results
  • The same implementation style thus selected for
    ME

26
Our Design Key Insight
  • CSA made up of 2 half adders with 1 OR gate
  • Each half adder itself 1 XOR 1 AND
  • Add some configurability to the basic CSA
  • Result A fast basic element with support for
    most of primitive operations

27
So What Else is needed?
  • Shifts between rounds of addition (for modulo
    exponentiation)
  • support for fixed length shifts, rotates
    arbitrary permutes of 32-bit operands (for
    symmetric key)
  • Solution A Permutation Unit!

28
Structure of Proposed Design
  • Final Design arrived upon by iterative refinement
  • Hierarchical Design
  • Cell
  • Block/Cluster
  • Groups
  • Top of Hierarchy

29
The Cell
30
The Block/Cluster
31
Group
32
Interconnects In a Group
33
Overall Structure
34
Random Logic Support
Write a Comment
User Comments (0)
About PowerShow.com