Title: Assistant Professor at GMU since Fall 1998
1Kris Gaj
Assistant Professor at GMU since Fall 1998
- Research and teaching interests
- cryptography
- computer arithmetic
- VLSI design and testing
- Contact
- Science Technology II, room 223
- kgaj_at_gmu.edu, (703) 993-1575
Office hours Monday, 600-700 PM
and after class
Tuesday, 730-830 PM
2Computer Arithmetic
Spring 2000
ECE 699001
Advanced Topics in Electrical and Computer
Engineering Computer Arithmetic
Since Fall 2000
ECE 645 Computer Arithmetic Implementations
in Hardware and Software
3ECE 645
Part of
MS in CpE
Digital Systems Design (required course) Network
and System Security
MS in EE
Certificate in VLSI Design/Manufacturing
PhD in IT
PhD in ECE
4Courses
Design level
Computer Arithmetic
Introduction to VHDL
VLSI Test Concepts
VLSI Design Automation
algorithmic
ECE 645
ECE 681
ECE 545
register-transfer
ECE 682
gate
ECE 586
transistor
Digital Integrated Circuits
ECE 680
layout
Physical VLSI Design
MOS Device Electronics
devices
ECE684
5Course web page
ECE web page ? Courses ? Course web pages ? ECE
645
http//teal.gmu.edu/courses/ECE645/index.htm
Contains basic information about the course
To be extended in the future
6Computer Arithmetic
Lecture
Project
Project 1 20 Project 2 30
Homework 15 Midterm exam 1 (in class)
20 Midterm exam 2 (take-home)
15
7Advanced digital circuit design course covering
Efficient
- addition and subtraction
- multiplication
- division and modular reduction
- exponentiation
- Elements
- of the Galois
- field GF(2n)
- polynomial base
Integers unsigned and signed
Real numbers
- fixed point
- single and double precision
- floating point
8Lecture topics (1)
INTRODUCTION
1. Applications of computer arithmetic algorithms
2. Number representation
- Unsigned Numbers
- Signed Numbers
9ADDITION AND SUBTRACTION
1. Basic addition, subtraction, and counting 2.
Carry-lookahead adders 3. Adders based on
Parallel Prefix Networks
10MULTIOPERAND ADDITION
1. Carry-save adders 2. Wallece and Dadda
Trees 3. Adding multiple signed numbers
11MULTIPLICATION
In hardware
1. Basic hardware multipliers 2. High-radix
multipliers 3. Tree multipliers 4. Array
multipliers 5. Multiplication of signed numbers
and squaring
In software
6. Survey of software multiplication algorithms
12DIVISION
In hardware
1. Basic hardware dividers 2. High-radix
dividers 3. Array dividers
In software
4. Survey of algorithms for division
modular reduction, and modular exponentiation
13FLOATING POINT ARITHMETIC
1. Floating-point number representations 2.
Floating-point operations
GALOIS FIELD ARITHMETIC
1. Representations of elements of the Galois
Field 2. Galois Field operations
14Similar courses at other universities
- University of California, Santa Barbara, Behrooz
Parhami, - ECE252B Computer Arithmetic.
- University of Massachusetts, Amherst, Israel
Koren, - ECE666 Digital Computer Arithmetic
- Lehigh University, Michael Schulte,
- ECE496 High-Speed Computer Arithmetic.
- Worcester Polytechnic Institute, Berk Sunar,
- EE-579 V Computer Arithmetic Circuits.
- Stanford University, Michael Flynn,
- EE486 Advanced Computer Arithmetic.
- University of California, Davies, Vojin
Oklobdzija, - ECE278 Computer Arithmetic for Digital
Implementation.
15New in this course
- hardware vs. software algorithms
- real life project based on VHDL or Verilog HDL
- operations in the Galois Field (communications)
16Possible follow-up course
Advanced Computer Arithmetic
Square root Exponential and logarithm
functions Trigonometric functions Hyperbolic
functions Fault tolerant arithmetic Low power
arithmetic
17Literature (1)
Required textbooks Behrooz Parhami, Computer
Arithmetic Algorithms and Hardware Design,
Oxford University Press, 1999.
Recommended textbooks
Sundar Rajan, Essential VHDL RTL Synthesis Done
Right, S G Publishing, 1998.
Milos D. Ercegovac and Tomas Lang Digital
Arithmetic, Morgan Kaufmann Publishers, 2004
18Literature (2)
Supplementary books
- M. Lu, Arithmetic and Logic in Computer Systems,
Wiley - Interscience 2004.
- 2. I. Koren, Computer Arithmetic Algorithms,
Brookside Court - Publishers, 1998.
- 3. E. E. Swartzlander, Jr., Computer Arithmetic,
vols. I and II, - IEEE Computer Society Press, 1990.
- 4. Alfred J. Menezes, Paul C. van Oorschot, and
Scott A. Vanstone, - Handbook of Applied Cryptology, Chapter 14,
Efficient - Implementation, Zipped pdf file, 630k, CRC
Press, Inc., - Boca Raton, 1998.
- 5. Christof Paar, Efficient VLSI Architectures
for Bit Parallel - Computation in Galois Fields, VDI Verlag,
1994.
19Literature (3)
Proceedings of conferences ARITH -
International Symposium on Computer Arithmetic
ASIL - Asilomar Conference on Signals, Systems,
and Computers ICCD - International Conference
on Computer Design CHES - Workshop on
Cryptographic Hardware and
Embedded Systems
Journals and periodicals IEEE Transactions on
Computers, in particular special issues on
computer arithmetic 8/70, 6/73, 7/77,
4/83, 8/90, 8/92, 8/94. IEEE Transactions on
Circuits and Systems IEEE Transactions on
Very Large Scale Integration IEE
Proceedings Computer and Digital Techniques
Journal of VLSI Signal Processing
20Homework
- reading assignments (main textbook articles)
- analysis of hardware and software algorithms
- and implementations
- design of small hardware units using VHDL
Optional assignments
Possibility of trading analysis vs. design
software vs. hardware
21Midterm exams
Exam 1 - 2 hrs 30 minutes, in class
multiple choice short problems Exam 2 48
hrs, take-home analysis and design
of arithmetic units using VHDL
Practice exams on the web
Tentative days of exams
Exam 1 - March 28 Exam 2 - May 7-8
22Project (1)
Project I (20 of grade)
Design and comparative analysis of fast adders
(several hundred bits long)
- Optimization criteria
- minimum latency
- maximum throughput
- minimum area
- minimum product latency area
- maximum ratio throughput/area
- scalability
Similar for all students
Done individually
Final report due Monday, March 21
23Project (2)
Project II (30 of grade)
Long unsigned or signed integers
- Fast
- multiplication
- squaring
- division
- modular reduction, or
- modular exponentiation
or
Floating-point numbers
- Fast
- addition or
- multiplication
24Project II (rules)
- Real life application
- Requirements derived from the analysis of the
application - Typically both hardware and software design
- Several project topics proposed on the web
- You can choose project topic by yourself
- Can be done in a group of 1-3 students
Written report oral presentation Thursday May 12
25Project II (rules)
- Every team works on a slightly different problem
- Project topics should be more complex for larger
teams
- Cooperation (but not exchange of code)
- between teams is encouraged
26Project
Hardware
Software
High level language (C preferred)
VHDL (or Verilog) code
Latency and/or throughput
Execution time
Area
Memory requirements
Scalability
Scalability
27Prerequisites
ECE 545 Introduction to VHDL
or
Permission of the instructor, granted assuming
that you know
VHDL or Verilog,
High level language
28Degrees of freedom and possible trade-offs
speed
area
ECE 645
power
testability
ECE 682
ECE 586, 681
29Degrees of freedom and possible trade-offs
speed
latency
area
throughput
30Timing parameters
definition
units
pipelining
time point?point
ns
delay
ns
bad
latency
time input?output
throughput
Mbits/s
good
output bits/time unit
rising edge ?rising edge of clock
ns
good
clock period
1
MHz
clock frequency
good
clock period
31Project technologies
semi-custom Application Specific Integrated
Circuits
and Field Programmable Gate
Arrays
32Levels of design description
Algorithmic level
Level of description most suitable for synthesis
Register Transfer Level
Logic (gate) level
Circuit (transistor) level
Physical (layout) level
33Register Transfer Logic (RTL) Design Description
Registers
Combinational Logic
Combinational Logic
Clock
34RTL Block Synthesis
Estimated Area
Estimated Timing
Simplified design flow
35Design Process for ASICs (1)
VHDL code
VHDL simulator
Functional verification
Library of standard cells
Logic Synthesis
Speed without routing Area without routing
Netlist
36Design Process (2)
Netlist
Library of standard cells
Placing routing
Area with routing Speed with routing
Layout
37Design process for FPGAs (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be
able to perform an encryption algorithm by
itself, executing 32 rounds..
Specification (Lab Experiments)
VHDL description (Your Source Files)
Library IEEE use ieee.std_logic_1164.all use
ieee.std_logic_unsigned.all entity RC5_core is
port( clock, reset,
encr_decr in std_logic
data_input in std_logic_vector(31 downto 0)
data_output out std_logic_vector(31
downto 0) out_full in
std_logic key_input in
std_logic_vector(31 downto 0)
key_read out std_logic ) end
AES_core
Functional simulation
Synthesis
Post-synthesis simulation
38Design process for FPGAs (2)
Implementation
Timing simulation
Configuration
On chip testing
39CAD software available at GMU (1)
VHDL simulators
- available from all PCs in the ECE educational
labs - using an X-terminal emulator
- available remotely from home using a fast
Internet - connection
- Aldec Active-HDL (under Windows)
- available in the FPGA Lab, ST II, room 203
40CAD software available at GMU (2)
Tools used for logic synthesis
- Synopsys Design Compiler (under Unix)
- available from all PCs in the ECE educational
labs - using an X-terminal emulator
- available remotely from home using a fast
Internet - connection
- Synplicity Synplify Pro (under Windows)
- available in the FPGA Lab, ST II, room 203
41CAD software available at GMU (3)
Tools used for implementation in the FPGA
technology
- Xilinx ISE (under Windows)
- available in the FPGA Lab, ST II, room 203
42How to learn CAD software available at GMU?
- ModelSim (under Unix)
- covered in ECE 545 Introduction to VHDL
- MGC Mega Tutorial available at
http//cpe.gmu.edu/mgc.htm
- Synopsys Design Compiler (under Unix)
- introduction covered in ECE 545 Introduction to
VHDL (F04) - additional hands-on session at the end of
February
- Aldec Active-HDL (under Windows)
- Synplicity Synplify Pro (under Windows)
- Xilinx ISE (under Windows)
- covered in ECE 545 Introduction to VHDL (F04)
43VHDL for Specification
VHDL for Simulation
VHDL for Synthesis
VHDL for Synthesis
VHDL for Synthesis of Arithmetic Circuits
44VHDL Design Styles
VHDL Design Styles
behavioral (algorithmic)
structural
Components and interconnects
Concurrent statements
Sequential statements
- Registers
- State machines
- Test benches
Subset most suitable for synthesis
45Data-flow VHDL Example
46Data-flow VHDL Example
LIBRARY ieee USE ieee.std_logic_1164.all
ENTITY fulladd IS PORT (Cin, x, y IN
STD_LOGIC s, Cout OUT STD_LOGIC )
END fulladd ARCHITECTURE LogicFunc OF
fulladd IS BEGIN s lt x XOR y XOR Cin Cout lt
(x AND y) OR (Cin AND x) OR (Cin AND y) END
LogicFunc
47Data-flow VHDL
Major instructions
Concurrent statements
- concurrent signal assignment (?)
- conditional concurrent signal assignment
-
(when-else) - selected concurrent signal assignment
-
(with-select-when) - generate scheme for equations
-
(for-generate)
48Structural VHDL Example
x
y
x
y
x
y
x
y
3
2
2
3
1
1
0
0
c
c
c
2
3
1
c
c
FA
FA
FA
FA
out
in
s
s
s
s
3
2
1
0
MSB position
LSB position
49Structural VHDL Example
LIBRARY ieee USE ieee.std_logic_1164.all
ENTITY adder4 IS PORT ( Cin IN STD_LOGIC
x3, x2, x1, x0 IN STD_LOGIC y3, y2,
y1, y0 IN STD_LOGIC s3, s2, s1, s0
OUT STD_LOGIC Cout OUT STD_LOGIC )
END adder4 ARCHITECTURE Structure OF adder4
IS SIGNAL c1, c2, c3 STD_LOGIC COMPONENT
fulladd PORT ( Cin, x, y IN STD_LOGIC
s, Cout OUT STD_LOGIC ) END
COMPONENT BEGIN stage0 fulladd PORT MAP (
Cin, x0, y0, s0, c1 ) stage1 fulladd PORT MAP
( c1, x1, y1, s1, c2 ) stage2 fulladd PORT
MAP ( c2, x2, y2, s2, c3 ) stage3 fulladd
PORT MAP ( Cin gt c3, Cout gt Cout, x gt x3, y
gt y3, s gt s3 ) END Structure
50Structural VHDL
Major instructions
- component instantiation (port map)
- component instantiation with generic
-
(generic map, port map) - generate scheme for component instantiations
-
(for-generate)
51Behavioral VHDL (subset)
Major instructions
Sequential statements
General
- process statement (process)
- sequential signal assignment (?)
Registers
State machines
Testbenches
- loops (for-loop, while-loop)
52Behavioral VHDL Example
LIBRARY ieee USE ieee.std_logic_1164.all
ENTITY reg8 IS PORT ( D IN
STD_LOGIC_VECTOR(7 DOWNTO 0) Resetn,
Clock IN STD_LOGIC Q OUT
STD_LOGIC_VECTOR(7 DOWNTO 0) ) END reg8
ARCHITECTURE Behavior OF reg8
IS BEGIN PROCESS ( Resetn, Clock ) BEGIN IF
Resetn '0' THEN Q lt "00000000" ELSIF
Clock'EVENT AND Clock '1' THEN Q lt D
END IF END PROCESS END Behavior
53Processes in VHDL
- Processes Describe Sequential Behavior
- Processes in VHDL Are Very Powerful Statements
- Allow to define an arbitrary behavior that may be
difficult to represent by a real circuit - Not every process can be synthesized
- Use Processes with Caution in the Code to Be
Synthesized - Use Processes Freely in Testbenches
54How to learn VHDL for synthesis?
- Sundar Rajan, Essential VHDL RTL Synthesis Done
Right, - S G Publishing, 1998.
- Lecture slides for ECE 545 from Fall 2004
- Tutorials available at the Programmable Logic
Jump Station - http//www.optimagic.com/tutorials.html
- Practice, Practice, Practice!!!
55Testbench
Non-synthesizable
testbench
Synthesizable
design entity
. . . .
Architecture N
Architecture 2
Architecture 1
56Design Environment
HDL Design (VHDL or Verilog)
Testbench (Analyzer in C or HDL)
Testbench (Generator in C or HDL)
Reference Model ( in C )
57Primary applications (1)
Execution units of general purpose microprocessors
Integer units
Floating point units
Integers (8, 16, 32, 64 bits)
Real numbers (32, 64 bits)
58Primary applications (2)
Digital signal and digital image processing
e.g., digital filters Discrete
Fourier Transform Discrete Hilbert
Transform
General purpose DSP processors
Specialized circuits
Real numbers
59Primary applications (3)
Coding
Error detection codes Error correcting codes
Elements of the Galois field GF(2n)
(4-64 bits)
60Secret-key (Symmetric) Cryptosystems
key of Alice and Bob - KAB
key of Alice and Bob - KAB
Network
Decryption
Encryption
Bob
Alice
61Primary applications (4)
Cryptography
Secret key cryptography
IDEA, RC6, Mars
Twofish, Rijndael
Elements of the Galois field GF(2n)
(4, 8 bits)
Integers (16, 32 bits)
62Main operations
Auxiliary operations
2 x SQR32, 2 x ROL32
XOR, ADD/SUB32
RC6
MARS
XOR, ADD/SUB32
MUL32, 2 x ROL32, S-box 9x32
XOR ADD32
Twofish
96 S-box 4x4, 24 MUL GF(28)
Rijndael
16 S-box 8x8 24 MUL GF(28)
XOR
8 x 32 S-box 4x4
Serpent
XOR
63Public Key (Asymmetric) Cryptosystems
Private key of Bob - kB
Public key of Bob - KB
Network
Decryption
Encryption
Bob
Alice
64RSA as a trap-door one-way function
PUBLIC KEY
C f(M) Me mod N
M
C
M f-1(C) Cd mod N
PRIVATE KEY
N P ? Q
P, Q - large prime numbers
e ? d ? 1 mod ((P-1)(Q-1))
65RSA keys
PUBLIC KEY
PRIVATE KEY
e, N
d, P, Q
N P ? Q
P, Q - large prime numbers
e ? d ? 1 mod ((P-1)(Q-1))
66Primary applications (5)
Cryptography
Public key cryptography
RSA, DSS, Diffie-Hellman
Elliptic Curve Cryptosystems
Long integers (1000-2000 bits)
Elements of the Galois field GF(2n)
(150-250 bits)
67Topic 1
C A B mod 232, C A2 mod 232
Function 32-bit unsigned
multiplication and squaring
modulo 232
Application modern secret-key ciphers,
candidates to the new
Advanced Encryption
Standard (AES) MARS developed by IBM
RC6 developed at MIT
Environment hardware, software for 8-bit
processors
Optimization
- maximum throughput
- minimum latency
- minimum area
68256
C ? Ai Bi
Topic 2
i1
Function 64-bit signed
multiplier-accumulator (MAC)
accumulating at least 256 partial products
Application digital filters
Environment hardware,
software for a general purpose DSP or
microprocessor
Optimization
Hardware - maximum throughput
limited area Software minimum execution time,
limited memory
69Topic 3
C A B CA / B
Function multiplication of two 64-bit
signed numbers
division of a 128-bit number by a 64-bit
number
Application general purpose microprocessor
Environment hardware,
software for a 64-bit processor without
multiplication and
division built in
Optimization
Hardware minimum latency
maximum throughput limited
area Software minimum execution time,
limited memory
70Topic 4
C AE mod N
Function modular exponentiation CME
mod N M, N arbitrary
768-bit numbers, E2161
Application modern public-key ciphers
RSA
Diffie-Hellman
Elliptic
Curve Cryptosystems
Environment hardware, software for 32-bit or
8-bit processors
Optimization
Hardware - minimum latency
limited area Software minimum execution time,
limited memory
71Topic 5
Z XY Z X Y
Function floating point addition and
multiplication
according to ANSI/IEEE 754
Application general purpose microprocessor
or digital signal
processor
Environment hardware,
software for a 32-bit processor without
floating point
operations
Optimization
Hardware minimum latency
maximum throughput limited
area Software minimum execution time,
limited memory