Title: A Synthesizable DatapathOriented Programmable Logic Core
1A Synthesizable Datapath-Oriented Programmable
Logic Core
- Steven J.E. Wilton, Chun Hok Ho, Philip Leong,
Wayne Luk, Brad Quinton - University of British Columbia and Imperial
College
2Embedded Programmable Logic Cores
- Embed a small amount of programmable logic onto
an ASIC - Postpone some decisions until late in design
cycle - Fast upgrade path for products
- Embedded Debug
3Soft Programmable Logic Cores
4Soft Programmable Logic Cores
- Advantages
- Easy to integrate, reduces design time
- Very flexible, can create the exact required core
- Easy to migrate to smaller technologies
- Disadvantages
- Inefficient compared to hard cores
- Our thought
- Makes sense if you only want a small core (a few
hundred gates)
5- This talk
- A new architecture for a synthesizable
programmable logic core that supports
datapath (bus-based) circuits
6Previous Synthesizable PLCs
- Kim Bozman and Noha Kafafi
- LUT-Based
- Unique Directional Routing Fabric
7Synthesizable Cores
- Observation 1 To make it truly synthesizable,
must avoid - combinational loops in
the unprogrammed fabric - Observation 2 Each tile need not be identical
8Previous Synthesizable PLCs
- Andy Yan
- Product-term Based Logic Block
- Unique Directional Routing Fabric
- Supported Sequential Circuits
9Our Architecture
- Use it when the PLC is connected to a bus
Bus
Bus
Observation These connections are permanently
tied to the bus
signals, and we know this
when the ASIC is designed
10Logic Architecture
11Logic Architecture
Key point - All bitblocks within a
wordblock share same set of configuration bits
- Means all bitblocks implement the same
function
12Routing Architecture
- Key point Signals are routed as buses
13Routing Architecture
- Key point - Linear array of wordblocks
- - Buses get wider as we go to
the right
14Routing Architecture
- Key point - Linear array of wordblocks
- - Buses get wider as we go to
the right
15Routing Architecture
- Key point - Linear array of wordblocks
- - Number of buses goes up as we
go to the right
16Datapath Architecture
17Multipliers
Two output buses (MSB, LSB)
18Add a Control Block
Control block is based on P-term fine-grained
synthesizable core
19Example Mapping
- Monitor two buses
- - Count the number of times
- each bus matches a mask
- - includes dont care bits
- - Count the number of times
- both buses match the mask
- at the same time
20 - Interesting Questions
- 1. How do the various architectural parameters
affect density? - How does this compare to a fine-grained
architecture?
21Architectural Parameters
- D Number of Wordblocks (incl. multipliers)
- N Bit Width
- M Number of Input Buses
- R Number of Output Buses
- F Number of Feedback Paths
- C Number of Constant Registers
- A Number of Multipliers
- P Number of Product-Term Blocks
22Impact of Number of Word-blocks and bit-width
- Key Result Both bit-width and number of
wordblocks have a - significant impact on area.
23Impact of the Number of Multipliers
- Key result Area increase due to more buses in
the routing
24Impact of the Size of the Control Block
- Key result The control block can dominate if it
becomes too big
25- Bench- Datapath Fined-Grain
ASIC Fine-Grain/ Datapath/ - Mark (ours) (PTerm)
Datapath ASIC - fbly 68,190 132,339,335 9,300
1940 7.33 - dotv3 34,119 65,534,780 6,575
1921 5.19 - dscg 72,178 116,271,968 9,473
1611 7.62 - fir4 76,213 130,971,120 9,843
1718 7.74 - egcd 1,225,231 22,776,474 10,420
18.6 117 - momul 294,135 11,448,589 7,097
38.9 41 - median 142,172 10,733,962 4,420
75.5 32 - debug1 87,265 1,302,928 3,484
14.9 25
26- Bench- Datapath Fined-Grain
ASIC Fine-Grain/ Datapath/ - Mark (ours) (PTerm)
Datapath ASIC - fbly 68,190 132,339,335 9,300
1940 7.33 - dotv3 34,119 65,534,780 6,575
1921 5.19 - dscg 72,178 116,271,968 9,473
1611 7.62 - fir4 76,213 130,971,120 9,843
1718 7.74 - egcd 1,225,231 22,776,474 10,420
18.6 117 - momul 294,135 11,448,589 7,097
38.9 41 - median 142,172 10,733,962 4,420
75.5 32 - debug1 87,265 1,302,928 3,484
14.9 25
Key result 1 Significantly better than
fine-grained architecture
27- Bench- Datapath Fined-Grain
ASIC Fine-Grain/ Datapath/ - Mark (ours) (PTerm)
Datapath ASIC - fbly 68,190 132,339,335 9,300
1940 7.33 - dotv3 34,119 65,534,780 6,575
1921 5.19 - dscg 72,178 116,271,968 9,473
1611 7.62 - fir4 76,213 130,971,120 9,843
1718 7.74 - egcd 1,225,231 22,776,474 10,420
18.6 117 - momul 294,135 11,448,589 7,097
38.9 41 - median 142,172 10,733,962 4,420
75.5 32 - debug1 87,265 1,302,928 3,484
14.9 25
Key result 1 Significantly better than
fine-grained architecture
Key result 2 Overhead roughly the same as
FPGA/ASIC
28- But these results arent fair
- - For each benchmark, we found the optimum set
of - architectural parameters.
- - We need an architecture that works for a
variety of - circuits
29Architecture Construction
- Our thought
- - The number of inputs/outputs is fixed by the
SoC - - The designer has an idea of the size of the
programmable - logic (number of wordblocks)
- Fix all other parameters (as a function of of
wordblocks) - - eg. fixed ratio between number of multipliers
vs. wordblocks - fixed ratio between control logic
and datapath logic, etc. - We arbitrarily chose fixed ratios based on our
experience - - A full architecture study is left as future
work!
30- Bench- Datapath Fined-Grain
ASIC Fine-Grain/ Datapath/ - Mark (ours) (PTerm)
Datapath ASIC - fbly 332,091 132,339,335 9,300
399 35.7 - dotv3 225,518 65,534,780 6,575
291 34.3 - dscg 325,029 116,271,968 9,473
358 34.3 - fir4 307,154 130,971,120 9,843
426 31.2 - egcd 3,778,611 22,776,474 10,420
6.02 363 - momul 486,654 11,448,589 7,097
23.5 68.5 - median 194,654 10,733,962 4,420
55.1 44 - debug1 119,286 1,302,928 3,484
10.9 34
31- Bench- Datapath Fined-Grain
ASIC Fine-Grain/ Datapath/ - Mark (ours) (PTerm)
Datapath ASIC - fbly 332,091 132,339,335 9,300
399 35.7 - dotv3 225,518 65,534,780 6,575
291 34.3 - dscg 325,029 116,271,968 9,473
358 34.3 - fir4 307,154 130,971,120 9,843
426 31.2 - egcd 3,778,611 22,776,474 10,420
6.02 363 - momul 486,654 11,448,589 7,097
23.5 68.5 - median 194,654 10,733,962 4,420
55.1 44 - debug1 119,286 1,302,928 3,484
10.9 34
32- Bench- Datapath Fined-Grain
ASIC Fine-Grain/ Datapath/ - Mark (ours) (PTerm)
Datapath ASIC - fbly 332,091 132,339,335 9,300
399 35.7 - dotv3 225,518 65,534,780 6,575
291 34.3 - dscg 325,029 116,271,968 9,473
358 34.3 - fir4 307,154 130,971,120 9,843
426 31.2 - egcd 3,778,611 22,776,474 10,420
6.02 363 - momul 486,654 11,448,589 7,097
23.5 68.5 - median 194,654 10,733,962 4,420
55.1 44 - debug1 119,286 1,302,928 3,484
10.9 34
Key result 1 Significantly better than
fine-grained architecture
Key result 2 Overhead roughly the same as
FPGA/ASIC
33625mm
625mm
34Conclusions
- Our architecture is 6 to 426 x more efficient
than fine-grained architecture - But, this is only for datapath-oriented circuits.
- However, this is ok
- - In an SoC, we know, when the chip is designed,
whether - the inputs are buses or bits
- - If there are buses, use this architecture
- - If there are not buses, use Andys PTerm
architecture - Final thought using this architecture, the
overhead is similar to - that of a normal FPGA. People already accept
this!