Highlevel Synthesis: An Essential Ingredient for Designing Complex ASICs

1 / 27
About This Presentation
Title:

Highlevel Synthesis: An Essential Ingredient for Designing Complex ASICs

Description:

Bluespec results can match carefully coded Verilog ... Is BSV (Bluespec in System Verilog) for real? ... (from 66K Lines-of-Verilog-Code design to 4.7K lines ... –

Number of Views:66
Avg rating:3.0/5.0
Slides: 28
Provided by: Nik17
Category:

less

Transcript and Presenter's Notes

Title: Highlevel Synthesis: An Essential Ingredient for Designing Complex ASICs


1
  • High-level Synthesis An Essential Ingredient
    for Designing Complex ASICs
  • Arvind, Rishiyur S. Nikhil, Daniel L.
    Rosenband, Nirav Dave
  • Massachusetts Institute of Technology, CSAIL
  • Bluespec Inc.
  • ICCAD 2004
  • November 10, 2004

2
The Designers Dilemma
Designer
Architect
More and more gates to design in constant time.
Chips becoming larger and larger
Designer is responsible for micro-architecture
and intra-block interfaces
Architect creates a less precise spec. that
describes larger blocks and interfaces, e.g.
This block takes a 32b address and returns the
longest prefix match with the SRAM table.
Without design exploration, the designer makes an
educated guess
Sub-optimal implementations
3
High-level Synthesis to the Rescue?
  • High-level synthesis promises
  • Higher level of abstraction than RTL
  • Faster design time
  • More code reuse
  • Why hasnt high-level synthesis worked?
  • Tools have attempted to derive micro-architectures
    automatically
  • Ignores designers ingenuity
  • Unpredictable results
  • Poor synthesis results (area, timing, power)

Our high-level synthesis flow avoids these
pitfalls by presenting a high-level of
abstraction while allowing the designer to
specify the micro-architecture.
4
Outline
  • The IP lookup problem
  • Three different implementations and their
    synthesis results
  • Additional case studies
  • Why did Bluespec help?

5
IP Lookup block in a router
Arbitration
Line Card (LC)
Packet Processor
Control Processor
SRAM (lookup table)
Switch
Queue Manager
IP Lookup
Exit functions
6
The IP lookup problem
  • Packets are routed (at line rate 15Mpps for
    10GE) based on the Longest Prefix Match (LPM)
    of packets IP address (32b) with entries in a
    routing table
  • Variable number of memory lookups required
  • Packets must be output in order of arrival

Example lookups
IP Lookup Table
F
A
7
Sparse tree representation
0
3
14
5
E
F
7
10
255
18
2
200
3
4
1
4
8
Table representation issues
  • Real-world lookup algorithms are more complex but
    most follow a sequence of dependent memory
    references. Major challenges
  • small memory foot-print
  • conserving memory bandwidth
  • reasonable latency
  • table updates must be possible
  • Constraint results must be returned in
    order
  • Given a lookup algorithm, the designer still
    faces many micro-architectural choices

9
Outline
  • The IP lookup problem
  • Three different implementations and their
    synthesis results
  • Additional case studies
  • Why did Bluespec help?

10
Longest Prefix Match for IP lookup3 possible
implementation architectures
Circular pipeline
Efficient memory with most complex control
Designers Ranking
Which is best?
11
1. Rigid Static Scheduling
  • Assume the SRAM containing the table has n-cycle
    latency, statically schedule memory accesses to
    avoid conflicts
  • Issues
  • Since an LPM may take 1-3 memory accesses, unused
    slots may be left idle
  • May have to reschedule the pipeline for a
    different memory latency
  • Very difficult to plan if memory is also to be
    used for some unrelated task.

12
2. Adaptive Linear Pipeline
RAM
IP Address Table
port replicator
Memory is usedefficiently
rom2
rom0
rom1
start lookup 1
finish lookup 1, (start lookup 2)
finish lookup 2, (start lookup 3)
finish lookup 3
ofifo
fifo0
fifo1
fifo2
  • Each pipeline stage accesses the memory only if
    required
  • Advantages Better memory utilization, easy
    design, robust to changes in memory latencies
  • Issues FIFO sizing, FIFO area, latency

13
3. Flexible Circular Pipeline
lpmResp
getTicket
Completion buffer
IP Address Table
done
tf
lpmReq
leaf
Complete
Enter
RAM
node
Move
Circulate
ops
  • Completion buffer
  • gives out tokens to control the entry into the
    circular pipeline
  • ensures that departures take place in order even
    if lookups complete out-of-order
  • Advantages Robust to changes in memory latency,
    easy to alter lookup algorithm
  • Disadvantages More complicated control,
    gate-count?

14
Experimental setup
Routing table (real, or random)
compiler
Targeted test data (IP addrs, expected routes)
Forwarding table
Testbench
SRAM(lookup table)
Load SRAM
SRAM address
SRAM data
Generate packets (random IP addr)
IP address
Routing info
Check against expectedresult
Pass Fail
Expected routes
  • Implemented the three architectures in BSV
    (Bluespec System Verilog) and Verilog (RTL).
  • Synthesized designs to compare gate count and
    timing
  • Used the common test infrastructure to verify all
    6 implementations

15
Synthesis results
Synthesized to TSMC 0.18 µm library
  • V Verilog
  • BSV Bluespec System Verilog

Bluespec and Verilog synthesis results are nearly
identical
16
Static pipeline explorationOne spec., two
designers, two results
Each packet is processed by one FSM
Shared FSM
17
Static pipeline explorationBluespec
  • BSV Data Alignment
  • Automatically packs complex data-types into bits
  • Simplifies design process but is not always
    optimal for usage
  • BSV Type System
  • Types provide a level of safety
    (correct-by-construction design) not found in
    other hardware languages
  • Conversions from one type to another can
    introduce extra logic
  • This problem has been solved since by Bluespec
    Inc.

18
Static pipeline explorationsummary
  • Variations within a design
  • - Implementation choice has dramatic impact on
    performance
  • - Bluespec results can match carefully coded
    Verilog
  • - Thinking about micro-architecture is
    important!
  • Much more important than language differences!!

These issues are more serious for larger designs
19
Outline
  • The IP lookup problem
  • Three different implementations and their
    synthesis results
  • Additional case studies
  • Why did Bluespec help?

20
Beyond the paper ...
  • Is BSV (Bluespec in System Verilog) for real?
  • More examples to compare the quality of results
    BSV vs. Verilog
  • Bigger examples to showcase the productivity
  • Usual caveats
  • mileage varies from designer to designer
  • controlled experiments to measure productivity,
    especially for large designs, are difficult

21
Case Study Pkt
  • In an apples-to-apples comparison with a product
    ASIC coded in Verilog, in-house Bluespec team
    demonstrated
  • 4 man-months to complete 1.5M gates
  • Pass full regression test suite
  • 13x reduction in source code (from 66K
    Lines-of-Verilog-Code design to 4.7K lines of
    code)
  • 66 reduction in verification bugs
  • Matched performance (clock speed, area)
  • Enabled major design space explorations within
    time budgets

200 MHz, 1.5M gates, 0.18u
22
Case Study MPEG4 design blocks
YUV data
MPEG4Stream
Motion Compensation
IDCT
InverseQuantization
InverseAC/DCPrediction
InverseScan
VideoBitstreamDecoder
  • Inverse Discrete Cosine Transform (IDCT)
  • Lines of code 2716 (Verilog) 723
    (BSV)
  • Total effort 4 man-weeks(Verilog) 2.5
    man-weeks(BSV)
  • Area (gates) 52K(Verilog) 48K(BSV)
  • Motion compensation decoding
  • 1184 lines of BSV
  • Arch 6 weeks Coding 3 weeks Verif 5 weeks
  • 180 nm TSMC 10 nsec cycle time (7.52 nsec slack)

23
Outline
  • The IP lookup problem
  • Three different implementations and their
    synthesis results
  • Additional case studies
  • Why did Bluespec help?

24
Bluespec significantly reduces lines of code
Design ExamplesLines of Code
Bugs
Note Typical RTL design metric. There are
approximately 1-2 bugs for every 100 lines of code
  • Depends on factors like
  • Designer
  • Architectural Complexity
  • Quality of static verification
  • Level of correct-by-construction
  • Re-use

Lines of Code
25
So how did Bluespec help?
  • Module interfaces are more than wires
  • capture protocol
  • strongly typed
  • FIFO, Port Replicator,
  • Completion Buffer, ...
  • Automatic rule scheduling
  • Rule atomicity helps in identifying conflicts
  • Eliminates subtle bugs in control logic
  • very important in some designs
  • Rich high-level, two-level, modern language
  • Data structures and Polymorphic Type system
  • Rules, actions, modules, ... are all first class
    objects in the language
  • exploitation of these features
    requires training
  • in modern high-level programming

26
Conclusion
  • Micro-architecture exploration is important to
    the design process
  • High-level synthesis is important for rapid
    design exploration
  • without RTL it is difficult to optimize the
    architecture for area, time or power
  • High-level synthesis must allow the designer to
    express micro-architectures
  • High-level synthesis (BSV-style) produces
    comparable timing and area results to Verilog

27
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com