A Synthesizable DatapathOriented Programmable Logic Core - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

A Synthesizable DatapathOriented Programmable Logic Core

Description:

... Core. Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton. University of British Columbia and Imperial College ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 35
Provided by: docI1
Category:

less

Transcript and Presenter's Notes

Title: A Synthesizable DatapathOriented Programmable Logic Core


1
A Synthesizable Datapath-Oriented Programmable
Logic Core
  • Steven J.E. Wilton, Chun Hok Ho, Philip Leong,
    Wayne Luk, Brad Quinton
  • University of British Columbia and Imperial
    College

2
Embedded Programmable Logic Cores
  • Embed a small amount of programmable logic onto
    an ASIC
  • Postpone some decisions until late in design
    cycle
  • Fast upgrade path for products
  • Embedded Debug

3
Soft Programmable Logic Cores

4
Soft Programmable Logic Cores
  • Advantages
  • Easy to integrate, reduces design time
  • Very flexible, can create the exact required core
  • Easy to migrate to smaller technologies
  • Disadvantages
  • Inefficient compared to hard cores
  • Our thought
  • Makes sense if you only want a small core (a few
    hundred gates)

5
  • This talk
  • A new architecture for a synthesizable
    programmable logic core that supports
    datapath (bus-based) circuits

6
Previous Synthesizable PLCs
  • Kim Bozman and Noha Kafafi
  • LUT-Based
  • Unique Directional Routing Fabric

7
Synthesizable Cores
  • Observation 1 To make it truly synthesizable,
    must avoid
  • combinational loops in
    the unprogrammed fabric
  • Observation 2 Each tile need not be identical

8
Previous Synthesizable PLCs
  • Andy Yan
  • Product-term Based Logic Block
  • Unique Directional Routing Fabric
  • Supported Sequential Circuits

9
Our Architecture
  • Use it when the PLC is connected to a bus

Bus
Bus
Observation These connections are permanently
tied to the bus
signals, and we know this
when the ASIC is designed
10
Logic Architecture

11
Logic Architecture

Key point - All bitblocks within a
wordblock share same set of configuration bits
- Means all bitblocks implement the same
function
12
Routing Architecture
  • Key point Signals are routed as buses

13
Routing Architecture
  • Key point - Linear array of wordblocks
  • - Buses get wider as we go to
    the right

14
Routing Architecture
  • Key point - Linear array of wordblocks
  • - Buses get wider as we go to
    the right

15
Routing Architecture
  • Key point - Linear array of wordblocks
  • - Number of buses goes up as we
    go to the right

16
Datapath Architecture

17
Multipliers

Two output buses (MSB, LSB)
18
Add a Control Block

Control block is based on P-term fine-grained
synthesizable core
19
Example Mapping
  • Monitor two buses
  • - Count the number of times
  • each bus matches a mask
  • - includes dont care bits
  • - Count the number of times
  • both buses match the mask
  • at the same time

20
  • Interesting Questions
  • 1. How do the various architectural parameters
    affect density?
  • How does this compare to a fine-grained
    architecture?

21
Architectural Parameters
  • D Number of Wordblocks (incl. multipliers)
  • N Bit Width
  • M Number of Input Buses
  • R Number of Output Buses
  • F Number of Feedback Paths
  • C Number of Constant Registers
  • A Number of Multipliers
  • P Number of Product-Term Blocks

22
Impact of Number of Word-blocks and bit-width
  • Key Result Both bit-width and number of
    wordblocks have a
  • significant impact on area.

23
Impact of the Number of Multipliers
  • Key result Area increase due to more buses in
    the routing

24
Impact of the Size of the Control Block
  • Key result The control block can dominate if it
    becomes too big

25
  • Bench- Datapath Fined-Grain
    ASIC Fine-Grain/ Datapath/
  • Mark (ours) (PTerm)
    Datapath ASIC
  • fbly 68,190 132,339,335 9,300
    1940 7.33
  • dotv3 34,119 65,534,780 6,575
    1921 5.19
  • dscg 72,178 116,271,968 9,473
    1611 7.62
  • fir4 76,213 130,971,120 9,843
    1718 7.74
  • egcd 1,225,231 22,776,474 10,420
    18.6 117
  • momul 294,135 11,448,589 7,097
    38.9 41
  • median 142,172 10,733,962 4,420
    75.5 32
  • debug1 87,265 1,302,928 3,484
    14.9 25

26
  • Bench- Datapath Fined-Grain
    ASIC Fine-Grain/ Datapath/
  • Mark (ours) (PTerm)
    Datapath ASIC
  • fbly 68,190 132,339,335 9,300
    1940 7.33
  • dotv3 34,119 65,534,780 6,575
    1921 5.19
  • dscg 72,178 116,271,968 9,473
    1611 7.62
  • fir4 76,213 130,971,120 9,843
    1718 7.74
  • egcd 1,225,231 22,776,474 10,420
    18.6 117
  • momul 294,135 11,448,589 7,097
    38.9 41
  • median 142,172 10,733,962 4,420
    75.5 32
  • debug1 87,265 1,302,928 3,484
    14.9 25

Key result 1 Significantly better than
fine-grained architecture
27
  • Bench- Datapath Fined-Grain
    ASIC Fine-Grain/ Datapath/
  • Mark (ours) (PTerm)
    Datapath ASIC
  • fbly 68,190 132,339,335 9,300
    1940 7.33
  • dotv3 34,119 65,534,780 6,575
    1921 5.19
  • dscg 72,178 116,271,968 9,473
    1611 7.62
  • fir4 76,213 130,971,120 9,843
    1718 7.74
  • egcd 1,225,231 22,776,474 10,420
    18.6 117
  • momul 294,135 11,448,589 7,097
    38.9 41
  • median 142,172 10,733,962 4,420
    75.5 32
  • debug1 87,265 1,302,928 3,484
    14.9 25

Key result 1 Significantly better than
fine-grained architecture
Key result 2 Overhead roughly the same as
FPGA/ASIC
28
  • But these results arent fair
  • - For each benchmark, we found the optimum set
    of
  • architectural parameters.
  • - We need an architecture that works for a
    variety of
  • circuits

29
Architecture Construction
  • Our thought
  • - The number of inputs/outputs is fixed by the
    SoC
  • - The designer has an idea of the size of the
    programmable
  • logic (number of wordblocks)
  • Fix all other parameters (as a function of of
    wordblocks)
  • - eg. fixed ratio between number of multipliers
    vs. wordblocks
  • fixed ratio between control logic
    and datapath logic, etc.
  • We arbitrarily chose fixed ratios based on our
    experience
  • - A full architecture study is left as future
    work!

30
  • Bench- Datapath Fined-Grain
    ASIC Fine-Grain/ Datapath/
  • Mark (ours) (PTerm)
    Datapath ASIC
  • fbly 332,091 132,339,335 9,300
    399 35.7
  • dotv3 225,518 65,534,780 6,575
    291 34.3
  • dscg 325,029 116,271,968 9,473
    358 34.3
  • fir4 307,154 130,971,120 9,843
    426 31.2
  • egcd 3,778,611 22,776,474 10,420
    6.02 363
  • momul 486,654 11,448,589 7,097
    23.5 68.5
  • median 194,654 10,733,962 4,420
    55.1 44
  • debug1 119,286 1,302,928 3,484
    10.9 34

31
  • Bench- Datapath Fined-Grain
    ASIC Fine-Grain/ Datapath/
  • Mark (ours) (PTerm)
    Datapath ASIC
  • fbly 332,091 132,339,335 9,300
    399 35.7
  • dotv3 225,518 65,534,780 6,575
    291 34.3
  • dscg 325,029 116,271,968 9,473
    358 34.3
  • fir4 307,154 130,971,120 9,843
    426 31.2
  • egcd 3,778,611 22,776,474 10,420
    6.02 363
  • momul 486,654 11,448,589 7,097
    23.5 68.5
  • median 194,654 10,733,962 4,420
    55.1 44
  • debug1 119,286 1,302,928 3,484
    10.9 34

32
  • Bench- Datapath Fined-Grain
    ASIC Fine-Grain/ Datapath/
  • Mark (ours) (PTerm)
    Datapath ASIC
  • fbly 332,091 132,339,335 9,300
    399 35.7
  • dotv3 225,518 65,534,780 6,575
    291 34.3
  • dscg 325,029 116,271,968 9,473
    358 34.3
  • fir4 307,154 130,971,120 9,843
    426 31.2
  • egcd 3,778,611 22,776,474 10,420
    6.02 363
  • momul 486,654 11,448,589 7,097
    23.5 68.5
  • median 194,654 10,733,962 4,420
    55.1 44
  • debug1 119,286 1,302,928 3,484
    10.9 34

Key result 1 Significantly better than
fine-grained architecture
Key result 2 Overhead roughly the same as
FPGA/ASIC
33
625mm

625mm
34
Conclusions
  • Our architecture is 6 to 426 x more efficient
    than fine-grained architecture
  • But, this is only for datapath-oriented circuits.
  • However, this is ok
  • - In an SoC, we know, when the chip is designed,
    whether
  • the inputs are buses or bits
  • - If there are buses, use this architecture
  • - If there are not buses, use Andys PTerm
    architecture
  • Final thought using this architecture, the
    overhead is similar to
  • that of a normal FPGA. People already accept
    this!
Write a Comment
User Comments (0)
About PowerShow.com