Title: Decimal Multiplication with Efficient Partial Product Generation
1Decimal Multiplication with Efficient Partial
Product Generation
- Mark Erle, Eric Schwarz
- Server Technology Group
- IBM
Mike Schulte Dept. of Electrical Computer
Engineering University of Wisconsin at Madison
2Outline
- Introduction and motivation
- Decimal multiplication challenges
- Novel aspects of algorithm
- Algorithm components
- Operand recode
- Digit-by-digit multiplication
- Partial product generation
- Overlap removal encoding
- Partial product accumulation
- Final product correction
- Summary
3Introduction and Motivation
- Preponderance of business data in decimal form
- Inexact mapping between decimal and binary
- Decimal arithmetic used (required) in banking,
finance, insurance, accounting - Increasing support in arithmetic community
(revising IEEE 754/854) - Significant speedup achievable in hardware
- Multiplication a key function
4- By the way, were about
- 20 through the talk
- 0.2010 0.001100112
5Decimal Multiplication Challenges
- Greater number of multiplicand tuples
- Complicates partial product generation
- Representing decimal values with two-state
devices - Complicates partial product generation
- Complicates partial product accumulation
- Inability to use binary arithmetic techniques
directly
6Novel Aspects of Algorithm
- Recode operands
- Simplify partial product generation
- Improve latency of partial product generation
- Restrict magnitude range of partial product
digits - Simplify partial product accumulation
- Improve latency of partial product accumulation
7Key Aspect of Algorithm
- Generate partial products as needed, not a priori
- Benefits
- Reduces cycles to generate tuples
- Reduces wiring to distribute tuples
- Eliminates registers needed to store tuples
- Cost can be delay during iterative portion of
algorithm - Reduce cost via pipelining
- Generate partial product in cycle i
- Accumulate partial product in cycle i1
8Operand Recode - Complexity of Digit-by-digit
Products
9Operand Recode - Mechanism
- Need signed-digits to restrict range
- E.g., 2 5 6 is recoded into 3 -4 -4
- aiS .elem. -5, -4, , 0, , 4, 5
- Recode in parallel all digits .ge. 5
- Four cases ai .ge. 5 ?, ai-1 .ge. 5 ?
- Need three operations
- Do nothing
- Increment
- Radix complement
- Diminished radix complement
10Operand Recode -Implementation
- Recode entire multiplicand, recode multiplier
digit by digit - Fig. a single digit
- Fig. b n-digit
11Digit-by-digit Product - Mechanism
- Restrict digits to yield only 16 combinations
- Magnitude 0, , 9 ? -5, , 5 (100)
- Absolute value -5, , 5 ? 0, , 5 (36)
- Zero identity 0, , 5 ? 2, , 5 (16)
- Lookup-table or combinatorial logic
- Product characteristics
- Absolute value ? sign correction
- 0, , 25, i.e., two digits ? overlap removal
- Restrict LSD to 5 ? signed-digit addition
- LSD magnitude restriction eases
- Overlap removal
- Partial product accumulation
12Partial Product - Implementation
- LSD mux selects
- a0S or biS 0
- a0S 1
- biS 1
- a0S and biS gt 1
- MSD mux selects
- a0S and biS lt 2
- a0S and biS gt 1
- Fig. a single digit
- Fig. b n1 -digit
13Overlap Removal Encoding
- Partial products are sign-corrected,
signed-magnitude digits in overlapped form - In each digit position
- Four-bit, signed-magnitude digit -5, , 5
- Three-bit, signed-magnitude digit -2, , 2
- Prepare for partial product accumulation via
Svoboda signed-digit adder - Use combinatorial circuit to
- remove the overlap
- produce Svoboda-encoded signed-digits
14Partial Product Accumulation
- Addition with signed-digits eliminates carry
propagation - Use Svoboda signed-digit adder to accumulate
- Partial product in encoded form
- Shifted intermediate product (previous iteration)
- One final product digit converted to BCD each
cycle - Four cases IPi0 .ge. 0 ?, IPi-10 .ge. 0 ?
- Need four operations
- Convert to BCD
- Convert to BCD and decrement
- Convert additive inverse to BCD and radix
complement - Convert additive inverse to BCD, radix
complement, and decrement
15Cycle By Cycle
16Block Diagram -Top
17BlockDiagram -Bottom
18Summary
- Algorithm utilizes restricted-range, signed
digits throughout - Original aspects include
- Recoding operands into restricted-range,
signed-digits - Generating non-overlapping, sign-corrected
partial products from recoded operands - Recoding partial products for entry into
signed-digit adder - Algorithm achieves n5 latency
- Extendable to floating-point multiplication
19Questions Perhaps Some AnswersEnd