Title: EASTERN MEDITERRANEAN UNIVERSITY
1EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF
COMPUTER ENGINEERING
Evgeny
Dukhnich PRESENTATION
FAST HARDWARE-ORIENTED ALGORITHMS for DSP
21. Introduction
- The very fast growth of modern VLSI complexity
offers a hardware - realization of an ever-growing share of
mathematical means. It - essentially raises the computer performance.
However, familiar - computational algorithms for signal processing
are not hardware- - oriented. The exceptions are the famous
CORDIC-algorithms and - some FFT-algorithms.
3MODERN SIGNAL PROCESSING
- Applications
- image processing, computer vision, speech, radar,
and so on - Algorithms
- Matrix transforms, convolution, filtering,
positioning etc. - Main Problems
- Linear systems eigenvalues, singular values,
least squares and so on - Candidate Solutions
- QR-decomposition by Givens rotations or
Householder reflections
4AIMS OF THE RESERCH
- Increase the effectiveness of DSP hardware
- Design new hardware-oriented algorithms for DSP
- Implement algorithms by designing new VLSI- chips
5Software (a) and hardware (b) realization of the
algorithm f ab/c d
6VLSI-technology requirements
- - algorithms must have a guaranteed accuracy and
convergence after a fixed number of steps - - every step of the algorithm must have a limited
set of simple operations (add, shift, etc) with
the same short realization time - - algorithms must have the possibility of
decomposition on equal parts with a limited set
of types - - algorithms must realize the highest possible
typical computing procedure which are frequently
found in signal processing methods.
7What is the CORDIC Algorithm?
- CORDIC
- COordinate Rotation DIgital Computer
- Introduced by J. Volder in 1959 1.
- Aim to build a special-purpose digital computer
for - airborne navigation.
- performing rotations
- compute of sine, cosine and arctangent
- to multiply or divide numbers using only
shift-and-add elementary steps. - In 1971, J. Walther 2 generalized the algorithm
- computes logarithm, exponentials and square roots.
8What is the CORDIC Algorithm?
- Used in
- HP35 (pocket calculator), June 1972.
- Intel 8087 (arithmetic coprocessor), 1983.
- Solving Linear Systems, 1982.
- Signal Processing Applications, July 1995.
- Filtering, 1984.
- Single Value Decomposition (SVD), June 1993.
- Complex SVD, June 2000
9CORDIC AlgorithmKey Ideas
If we have a computationally efficient way of
rotating a vector, we can evaluate cos, sin, and
tan1 functions Rotation by an arbitrary angle is
difficult, so we perform psuedorotations Use
special angles to synthesize a desired angle z z
a(1) a(2) . . . a(m)
102-D CORDIC Algorithm
Y
Vector R from its initial angle ? will be rotated
by an angle ?.
X cos? -sin? X Y sin? cos? Y
V(X,Y)
? Requires multiplication
R
R
V(X,Y)
?
?
X
Using a series of smaller rotation angles ?i
,we can avoid the multiplication
?S?i?i where i0 ?n, ?i-1,1 ,
?iatan(2-i)
An elementary rotation matrix (Pi)could be
derived where elements will be zeros ones or twos
with integer power.
1 -?i2-i Pi
?i2-i 1
11CORDIC AlgorithmKey Ideas
Rotate the vector OE (i) with end point at (x
(i), y (i)) by a (i) x (i1) x (i)cos a (i) y
(i) sin a (i) (x (i) y (i) tan a (i))/(1
tan2a (i))1/2 y (i1) y (i) cos a (i) x (i)
sin a (i) (y (i) x (i) tan a (i))/(1 tan2a
(i) ) 1/2 z (i1) z (i) a (i) Goal eliminate
the divisions by (1 tan2a (i)) 1/2 and choose a
(i) so that tan a(i) is a power of 2
12Basic CORDIC Iterations
Pick a (i) such that tan a (i) di 2 i, di ÃŽ
1, 1 x(i1) x(i) di y(i)2i y (i1) y
(i) di x(i)2iCORDIC iteration z (i1) z
(i) di tan1 2i If we always pseudorotate by
the same set of angles (with or signs), then
the expansion factor K is a constant that can be
precomputed Example pseudorotation for 30
degrees 30.0 _at_ 45.0 26.6 14.0 7.1 3.6
1.8 0.9 0.4 0.2 0.1 30.1
e (i) tan 1 2-i
13Basic CORDIC Iteration
14CORDIC Rotation Mode
15CORDIC Vectoring Mode
16Generalized CORDIC
17Rotation Modes
18CORDIC Rotation/Vector Modes
19 CORDIC processor
20Iterative CORDIC Structure
Taken from A Survey of CORDIC Algorithms for
FPGA Based Computers, R. Andraka, FPGA98
21Publications
- Doukhnitch E. Highly parallel multidimensional
CORDIC-like algorithms, Artificial Intelligence,
No3, pp.284-293, Ukraine, 2001. - Doukhnitch E. One way to execute digital linear
transform, Kibernetica, (Cybernetics and Systems
Analysis, ISSN 1060-0396), No5, pp.96-98, Kiev,
May 1982 - E. Doukhnitch, Multidimensional Cordic-like
Algorithms for DSP, in Proc.of The Sixteenth
Intern. Symp. on Computer and Information
Sciences, ISCIS XVI, pp.368-375, Antalya, Turkey,
Nov. 2001. - 4. E. Doukhnitch, Octonion CORDIC Algorithms for
DSP, in Proc. of the 6th Symp. on Signal
Processing, DSPCS2002, pp. 158-163, Sydney,
Australia, Jan. 2002. - 5. E. Doukhnitch, Hardware-oriented Algorithms
for Fast Householder Transform, in Proc. of the
First Intern. Conference on Signal Processing
and Applications DSPA-98, v.2e, pp.129-132,
Moscow, July, 1998
22Performing any given matrix M into the triangular
form
Upper Triangular matrix
Orthogonal matrix x M
All element of the M
23WHAT IS GIVENS ROTATION?
- Givens Transformations is an orthogonal matrix
used for zeroing a selected entry of the matrix. - That is mean Givens Rotations introduce zeros one
at a time. - The Givens Rotation is realized by the well-known
hardware oriented CORDIC algorithm.
.
24GIVENS ROTATION
252-D CORDIC Algorithm
- kV PiV where i0?N
- N is number of iterations (approximately number
of bits for result representation) - To zero the element Y, operator of rotation
direction can be taken. So, the computation of
elements for matrix P is not required and a
parallel implementation of rotation
transformation on many vectors by the same angle
is possible.
26CORDIC modules array for 4X4 matrix
triangularization
27CORDIC ALGORITHM of 2-D PLANE ROTATION
4-D QUTERNION CORDIC ALGORITHM
- Extension of the original 2-D CORDIC
- algorithms is being 4-D Euclidean algorithm
28Publications
1.E. Doukhnitch, Synthesis for the Discrete
Quaternion Transform Algorithms Class, in Proc.
of Intern. Conf. On Intelligent Multiprocessor
Systems, pp.44-48, Taganrog, Russia, 1999 2.
E. Doukhnitch, O. Strelnikov, A. Andreev,
Application of Kronecker Matrix Product for the
Synthesis of Hardware-oriented DSP Algorithms,
in Proc. of Intern. Conf. on Signal Proc.
DSPA-99, pp. 78-83, Moscow, Sept. 1999 Â
29CONTROL SIGNS are either 1 or -1
- The rotation parameters tk
f (k) is set of non-decreasing positive
integers
30Architecture of a 4-D quaternion processor
31HOUSEHOLDER TRANSFORMATION
- Householder matrix P is a symmetric and
orthogonal matrix of the form PI-wwT with
unit matrix I and real vector w (wTw2) - If a1 first column of matrix A, then
- a11-s
- w ?u where u a21 and
s?(a1Ta1)1/2, ?(s2 - a11s)-1/2 -
- am1
- Then result of Householder transformation is
- s a12 a1m
- PA 0 a22 a2m
-
- 0 am2 amm
Repetitions of these macrooperations with vectors
wj (j2,3,,m) produces an upper triangular
matrix A
32HOUSEHOLDER TRANSFORMATION Example
From aT(1,4,7) construct the auxiliary vector
uT(-7.124,4,7) Normalize to
wT(-0.662,0.372,0.651)
Treat the lower 2x2 submatrix of P1A. From
aT(4.602,-0.696) construct a 2x2 Householder
matrix, add a trivial first line and column to
promote to a 3x3 matrix
33FAST HOUSEHOLDER TRANSFORM (1985)
- Method with simplified operations shifts and
adds - Suitable for VLSI-designing
Factorization of matrix P ? P
?Pi i0 Iterative process for PA
transformation Ai1 PiAi (1)
i0,1,2,,n A0A. The result is An?PA, if
n?? The process is represented as a sequence of
elemental reflections for m2 with vector wiT
(-2-ic1(i), c2(i)) c1(i) , c2(i) -
direction for every coordinate
34FAST HOUSEHOLDER TRANSFORM (Continue)
- In this case wiTwi ki ? 2, therefore
- Pi ki-1(kiI 2wiwiT) ki-1Ti
- where
- 1 2-2i -2-i1c1(i)c2(i)
- Ti
- -2-i1c1(i)c2(i) 2-2i 1
- FHT algorithm for m2
- Ai1 ki-1TiAi,
- c1(i) sgn a11(i),
- c2(i) sgn a21(i),
- i 0, 1, , n A0 A.
35ITERATIVE REFLACTIONS
i0
i1
a21
0
i2
w2
w0
w1
sign a21(i)
1
2
-2-i
0
a11
After n steps (in)
36FHT ALGORITHM FOR m3
wiT (-2-ic1(i), c2(i),c3(i)) ki
2-2i11. Pi ki-1(kiI 2wiwiT)
ki-1Ti  where  -2-2i 2
2-i1c1(i)c2(i) 2-i1c1(i)c3(i) Ti
2-i1c1(i)c2(i) 2-2i
-2c2(i)c3(i) Â 2-i1c1(i)c3(i)
-2c2(i)c3(i) 2-2i
Ai1 ki-1TiAi, c1(i) sign a11(i),
c2(i) sign a21(i), c3(i) sign a31(i), i
0, 1, , n A0 A.
37DESIGN OF CHIP
Let us represent matrix Ti as -2-(2i1) 1
2-ic1(i)c2(i) 2-ic1(i)c3(i) Ti 2
2-ic1(i)c2(i) 2-(2i1)
-c2(i)c3(i) 2-ic1(i)c3(i)
-c2(i)c3(i) 2-(2i1) ai1 Tiai For
n8, coefficient will be 7 K? (ki/2)
where ki/2 2-(2i1)1 i0
38DESIGN OF CHIP
a1 is the first column of matrix A and
a11(i)X(i), a21(i)Y(i), a31(i)Z(i)
39UNITS FOR STEP i
Each unit performs FHT algorithm for m3
c1 a1 c2 u1(i) a2 c3 a3
a11(i1)
a11(i)
a12(i)
a12(i1)
a13(i)
a13(i1)
c1 a1 c2 u2(i) a2 c3 a3
- s a12 a13
- PA 0 a22 a23
- 0 a32 a33
a21(i1)
a21(i)
a22(i)
a22(i1)
a23(i)
a23(i1)
c1 a1 c2 u3(i) a2 c3 a3
a31(i1)
a31(i)
a32(i)
a32(i1)
a33(i)
a33(i1)
40m-D HouseHolder CORDIC Algorithm
- rotational matrix Rm,i is
- Rm,i(1/(1(m-1)ti2))(1 Si)
- 1-(m-1)ti2 2ti 2ti
2ti - -2ti
1(m-3)ti2 -2ti2 -2ti2 - -2ti -2ti2
1(m-3)ti2 -2ti2 - . . .
. - . .
. .
(1 Si) - . .
. . - - 2ti -2ti2 -2ti2
1(m-3)ti2
41Octonion CORDIC Algorithm (2002)
- R8,i 1/cos?fi?8S?8 (E differs by e11-1 from a
unit matrix) - 1 ?iti ?iti ?iti ?iti
?iti ?iti ?iti - -?iti 1 -?iti ?iti
?iti -?iti ?iti -?iti - -?iti ?iti 1
-?iti ?iti -?iti -?iti ?iti - R8,i -?iti -?iti ?iti 1
?iti ?iti -?iti -?iti
- -?iti -?iti -?iti -?iti
1 ?iti ?iti ?iti - -?iti ?iti ?iti
-?iti -?iti 1 ?iti -?iti - -?iti -?iti ?iti ?iti
-?iti -?iti 1 ?iti - -?iti ?iti -?iti ?iti
-?iti ?iti -?iti 1 -
- Control Signs are
- ?i fi?sign(y2,i) ?i fi?sign(y3,i)
f(i) 0, 1, 2, 3, 4, 5, 6, 7, 8, ,13,
14, ,32, - ?i fi?sign(y4,i) ?i
fi?sign(y5,i) - ?i fi?sign(y6,i) ?i
fi?sign(y7,i) ?i fi?sign(y8,i) fi
sign(y1,i) ti2-f(i) ? f(i) is the shift
sequence
Scaling Factor
42Matrix Triangularization by Octonion CORDIC
Algorithm
8x8 Matrix OverTriangularization In 8 steps
matrix will be formed. Results with for each
step will be saved.
43Some guessed Questions
- Why not higher Dimensions like 16-D ?
- Because Cayley numbers are the end-point of a
very interesting sequences of Algebras. Dimension
greather than 8 is not possible. - Since HouseHolder exists as m-D why will we need
Octonion CORDIC? - There is no shift sequence where the convergence
is proved for 8-D HouseHolder. - Octonion CORDIC generalizes the Quaternion CORDIC
where the Original CORDIC algorithm is its
particular case. - Hardware complexity of Octonion will be less than
the hardware complexity of HouseHolder and
hopefully will work faster than HouseHolder.
44Some patents
- E. Doukhnitch , O. Strelnikov, Special-purpose
processor  for eigenvalue decomposition, Patent
2168760, published in Russian Patent Bulletin
16, 2001. - E. Doukhnitch, S. Derevenskov, Matrix
processor, Patent 2079879, published in Russian
Patent Bulletin 14, 1997. - . E. Doukhnitch, Unit for m-D coordinate
transformation, Patent 2029356, published in
Russian Patent Bulletin 5, 1995.
45Fast Hardware-Oriented Algorithm for Cellular
Mobiles Positioning (2004)
- Finding the location of a cellular mobile phone
is one of the important features of the 3G
wireless communication systems. Many valuable
location based services can be enabled by this
new feature. All location determination
techniques which are based on cellular system
signals and global positioning system (GPS) use
standard trigonometric complex computation
methods that are usually implemented in software
46Publications
1. Doukhnitch E., Muhammed Salamah, Deniz
Devrim, A fast hardware-oriented algorithm for
cellular mobiles positioning, LNCS, Vol. 3280,
Spriger-Verlag (2004), pp.267-277, ISSN
0302-9743. 2. Â Doukhnitch E., Muhammed Salamah,
Fast hardwareoriented algorithm for 2-D
Positioning, Artificial Intelligence, No4,
pp.69-78, Ukraine, 2004, ISSN 1561-5359.
47Time of arrival (TOA) position determination
method
MS
48. The traditional algorithm
49Positions of Base Stations
50The SDD algorithm
51Idea of Positioning
52Parallel rotations of vectors
While( xixi1 gt d) Rotate d1 While(yigtyi1) Rot
ate d2 End while End while
53FAST ROTATIONS
- The rotation matrix M is as follows
-
-
- We took the sin function as
- sinA2-k
- and we approximate the cos function as follows
- cosA1-2-(2k1)
- Therefore, BS or vector coordinates are
recursively rotated as follows - xi1xi-xiayib
- yi1yi-yia-xib
- Where, a sinA , and b cosA.
54Weights of the Operations
55Number of Operations Versus sin(s) for the
Traditional and Our Algorithms
56Conclusion
There is presented a fast hardware-oriented
algorithm for locating a mobile in a cellular
network with a high accuracy. The main benefit of
the algorithm is that it avoids the calculations
of trigonometric functions. The calculations are
based on simple add, shift, and compare logical
operations and therefore it can be implemented
easily in hardware. In addition, its shown that
the number of involved operations is lower than
that of the traditional algorithm which implies
less computation time.