Title: Design Flow for HWSW Acceleration Transparency in the ThumbPod Embedded System
1Design Flow for HW/SW Acceleration Transparency
in the ThumbPod Embedded System
- David Hwang, Patrick Schaumont,Yi Fan, Alireza
Hodjat,Bo-Cheng Lai, Kazuo Sakiyama,Shenglin
Yang, Ingrid Verbauwhede - 2003 DAC/ISSCC Student Design ContestUCLA
Electrical Engineering Department
2Outline Case Study
- Introduction to ThumbPod
- Design flow HW/SW Acceleration Transparency
- Interface Design
- Interface Overhead
- Implementation
- Software and Simulation Flow
- FPGA Prototype
- Conclusions and Future work
3Introduction to ThumbPod
- Intelligent secure keychain device that
recognizes owner biometrically - Components
- Microcontroller
- Fingerprint sensor
- Biometric signal processing
- Crypto-processor
- LCD display
- Memory
- Communication IR and USB
- Applications
- Secure credit cards, secure memory, access
control, etc.
4Complete secure system design!
- Design a complete system at all levels of
abstraction
Confidentiality
Protocol Wireless authentication protocol
design
Integrity
Integrity
Identification
Identification
SIM
SIM
Cipher Design,
Algorithm Embedded fingerprint matching
algorithms
Biometrics
Java
Java
JCA
JCA
Architecture Embedded software stack of Java,
JVM / KVM, C
JVM
KVM
CPU
CPU
Crypto
Micro-Architecture Crypto-processor and DFT
co-processor design
MEM
MEM
Vcc
Vcc
D
D
Circuit Circuit techniques to combat power
analysis attacks (not on prototype)
Q
Q
CLK
CLK
5Fingerprint Verification Protocol Design and
Challenges
- Verify identity of user of ThumbPod remotely
- Strong user-ThumbPod identity tie prevents fraud
- Complex computations
- Cryptography
- Fingerprint extraction signal processing
- on a constrained device
- Energy efficiency
- Performance considerations
- Form factor
- Example Extraction of 2800-b fingerprint
template and key generation 200 million cycles - Java used for portability and security
6Design Flow
designflow
FPGA PROTOTYPE FORCONCEPT DEMONSTRATION(DAC
UNIVERSITY BOOTH)
IDEA
7Conventional Design Flows
ASIC-based
Microprocessor-based
HW / SWWALL
C / Matlab
C
GOOD high performance BAD design time power
(FPGA)
GOOD design time BAD low performance
8Microprocessor-based Design
Hardware acceleration
Improved compiler
C
C
COMPILE-os2
COMPILE
mPROC
mPROC
- Bridge the HW/SW wall
- Add hardware modules to processor
- Modify compiler for new instructions
9Bridging the Wall
ASIC-based
Microprocessor-based
HW / SWWALL
C / Matlab
C
VHDL
GOOD high performance BAD design time
GOOD design time BAD low performance
10Our Flow Functional HW / SW Acceleration
Transparency
Java
COMPILE
Bytecode
KVM
mPROC
11From Another Angle
Java hash( )
JavaCYCLES
Java hash( )
JavaCYCLES
CCYCLES
Co-Proc.CYCLES
Java CYCLES
12Advantages and Disadvantages
- Advantages
- Performance
- Recode only necessary functions
- Design over multiple abstraction levels (Java, C,
VHDL) - Original Java code changes minimally
- Java function signature remains the same
- Smooth migration from workstation to embedded
platform - Incremental refinement
- Disadvantages
- Interface design
- Interface overhead (cycle count)
13Interface Design and OverheadAES Example
- Interface Design
- Java to C interface Java/K Native Interface (JNI
/ KNI) - C to HW interface GEZEL design environment
- Interface Overhead
- Cycles required for moving to lower abstraction
layers - Overhead should not negate performance gains
- AES Co-Processor Example
- Advanced Encryption Standard (Rijndael)
- 128-b data, 128-b key symmetric key cipher
- Various cryptographic functions over GF(28)
14AES ExampleInterface Design
public final class RijndaelAlgorithm static
native int rijndael(int din, int key)
public static void main(String args)
dout rijndael(key, din) . . .
JAVA
Java CInterface
void Java_RijndaelAlgorithm_rijndael (void) .
. . i1 popStackAsType(ARRAY) i2
popStackAsType(ARRAY) rijndael(i1-gtdata,
i2-gtdata, result-gtdata) pushStackAsType(ARRAY,
result) . . .
C
C Co-proc.Interface
- void rijndael(int din4, int key4, int
dout4) - asm(" mov 0, l0" "r" (key))
- asm(" ldd l0, c0 ! upper word
key - ldd l08, c2 ! lower word
key - cpop1 load_key c0, c2 ! load the
key) - asm(" cpop1 encrypt_ecb c0, c2 ! encrypt
AES-ECB - cpop1 read_output c4, c6 ! getoutput
data) - . . .
ASM
Co-proc.Instructions
15AES ExampleInterface Overhead
Javacycles
AES301,034
Interface367
Interface892
Ccycles
AES44,063
acceleration
AES11
Co-processorcycles
301, 034
44,430
903
Total Cycles
6.8X
333X
Improvement
- Interface overhead for co-processor consumes
cycles but still 333X improvement - Better improvement if separate data and control
flow - Currently, data flow and control flow are merged
- Co-processors with direct memory access would
reduce interface overhead
16System Improvements Cycle Counts
CYCLES 241,948,800
CYCLES 159,259,730 (34 FEWER)
- Other applications have different cycle profiles
(i.e. streaming) - Functional acceleration must be taken in context
of ENTIRE system HW/SW model - Profiling is important
CYCLES 127,846,828 (47 FEWER)
17Software and Simulation Flow
KVM
- GEZEL is an interface mechanism to combine C and
hardware - Three simulation platforms of the same system
- Each platform corresponds to the addition of
anabstraction level
18FPGA Hardware Prototype
- Xilinx Virtex-II FPGA
- Embedded LEON 32-b Sparc processor
- Memory-mapped co-processors on the AMBA APB bus
- Two UARTs
- Communication with server
- Authentec CMOS fingerprint sensor
32 MB SRAM
Xilinx Virtex-II FPGA
Mem. Controller
Boot PROM
AMBA AHB
Server
APB Bridge
LEON32- SparcProc.
UART
APB
DFTCo-Proc.
AESCo-Proc.
AuthentecAF-2
19Prototype Setup
- Working demo on an FPGA board (two ThumbPods
shown)and PC connected over RS-232 - Fingerprint algorithm has 0.5 false reject rate
and0.01 false acceptance rate
20Conclusions and Future Work
- ThumbPod is a secure embedded system witha user
identity tie - HW/SW acceleration transparency used asembedded
design flow - Interface-based design
- AES co-processor performance gain of 333X
- System cycle reduction of 47
- Seven graduate students in eight months
fromconcept to demo - Future work
- Build an actual ThumbPod
- Low-power issues
- Generalizing design flow