A SOC DSP Design Methodology Case Study - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

A SOC DSP Design Methodology Case Study

Description:

Holmdel, NJ 07733. joew_at_lucent.com. 2. Lucent Technologies. Bell Labs Innovations. Outline ... Wire load model assumes that all single fanout nets have ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 20
Provided by: josephw8
Category:
Tags: dsp | soc | case | design | methodology | study

less

Transcript and Presenter's Notes

Title: A SOC DSP Design Methodology Case Study


1
A SOC DSP Design Methodology Case Study
  • Joseph Williams
  • Room 4e-525
  • 101 Crawfords Corner Rd.
  • Holmdel, NJ 07733
  • joew_at_lucent.com

2
Outline
  • Testchip 1 architecture and methodology summary
  • Testchip 1 design review
  • Testchip 2 architecture and methodology summary
  • Testchip 2 design review
  • Results comparison

3
Daytona testchip 1
32-bit RISC 64-bit SIMD
32-bit RISC 64-bit SIMD
Hardware Debug
Hardware Debug
I/O Subsystem
L1 Cache
L1 Cache
Memory Controller
SRAM
PE Controller
PE Controller
Arbiters Semaphores
Transaction Manager DMA
128-bit Split Transaction Bus
Host Interface
32-bit RISC 64-bit SIMD
32-bit RISC 64-bit SIMD
Host I/O
Hardware Debug
Hardware Debug
L1 Cache
L1 Cache
PE Controller
PE Controller
4
Vanilla Flow
Restructure RTL
Implement RTL
RTL
Cell Library
Standard Wire Load
Reoptimize
Synopsys RTL Synthesis
Netlist
Avanti Place Route
Final Netlist
Layout Parasitics
5
Chip implementation specifics
6
(No Transcript)
7
Testchip 1 back-end design debriefing
  • Back-end process slipped schedule endlessly
  • Back-end process required 9 months
  • Well over 18 man months of effort
  • The die size was several times larger than
    predicted
  • 0.35u implementation abandoned for 0.25u due to
    congestion
  • Die utilization for final design was below 25
  • Timing closure was a nightmare
  • Initial target of 150Mhz was abandoned
  • Could not achieve timing closure on most blocks
    without several time consuming iterations
  • Inter-block routing required many manual fixes
  • Tools required hours and days to produce results
    and crashed regularly

8
What the hell happened?!!
  • The design had many characteristics which make
    back-end difficult
  • Many wide busses with large fanin and fanout
  • Centralized state machines controlling vast
    regions of logic
  • Timing paths which span multiple blocks
  • The design methodology was not sufficient to
    handle a design of this complexity
  • Pre-layout estimates of parasitics were very
    inaccurate
  • No mechanisms existed to predict and manage
    congestion
  • Large design database required extensive tool
    run-times
  • Separate designers did not understand the
    implication of connections to inter-block routing
    resources

9
Modify the methodology to handle large SOC designs
  • Invest time in the redesign of the architecture
    to make the logical and physical hierarchy
    similar
  • Partition the physical design early in the
    synthesis process
  • Define groupings of cells small enough to be
    timed accurately with wire load models (local
    nets)
  • Identify nets which cross group boundaries
    (global nets)
  • Invest time in multiple stages of floorplanning
  • Use a multipass synthesis strategy with
    successive refinement of wire parasitics from
    floorplan

10
Hierarchical Wire Load Models
Block D
100K_WLM
Block C
50K_WLM
Block A
Block B
10K_WLM
10K_WLM
11
Table Format Wire Load Model
wire_load_table(10K_WLM) fanout_length( 1,
0.002) fanout_length( 2, 0.005) fanout_length(
3, 0.013) fanout_length( 4, 0.022) fanout_length
( 7, 0.033) fanout_length( 11,
0.054) fanout_capacitance( 1, 0.002) fanout_capa
citance( 2, 0.005) fanout_capacitance( 3,
0.013) fanout_capacitance( 4, 0.022) fanout_capa
citance( 7, 0.033) fanout_capacitance( 11,
0.054)
fanout_resistance( 1, 0.005) fanout_resistance(
2, 0.005) fanout_resistance( 3,
0.139) fanout_resistance( 4, 0.276) fanout_resis
tance( 7, 0.550) fanout_resistance( 11,
0.785) fanout_area( 1, 0.5) fanout_area( 2,
1.0) fanout_area( 3, 1.5) fanout_area( 4,
2) fanout_area( 7, 3.5) fanout_area( 11, 5.5)
12
Wire Load Model Limitations
Block A encloses 50,000 standard cells and uses
50K_WLM
Wire A0
Wire A2
Wire A1
Wire load model assumes that all single fanout
nets have the same capacitance, resistance, and
area
13
Partition nets into two classes
SIMD Datapath Floorplan
  • Local nets
  • Typically over 99 of nets fit in this class
  • Wire load models give reasonably accurate
    parasitics
  • Global nets
  • Typically less than 1 of nets fit in this class
  • Wire load models are unreasonably inaccurate for
    large designs, must use floorplan data

Local Nets
Global Nets
14
Daytona testchip 2
15
Improved Flow Part 1
Restructure RTL
Modify Physical Groupings
Implement RTL
RTL
Cell Library
Standard Wire Load
Synopsys Pass1 RTL Synthesis
Netlist
Cell Clusters
Net Groups
Avanti Floorplan
Layout Parasitics
Cell Placement
16
Improved Flow Part 2
Avanti Floorplan
Cell Placement
Annotated Global Nets
Custom Wire Load
Inplace Optimization
Synopsys Pass2 Resynthesis
Netlist
Cell Clusters
Net Groups
Avanti Place Route
Layout Parasitics
Final Netlist
17
Chip implementation specifics
18
Testchip 2 back-end design debriefing
  • Back-end process significantly accelerated
  • Back-end process required 2 months
  • Under 2 man months of effort
  • The final die size matched the predicted die size
  • Physical cell grouping and early floorplanning
    resulted in no routing congestion in the final
    layout
  • Die utilization for final design exceeded 95
  • Average cell size 2-3x smaller due to average
    reduction in net size
  • Timing closure achieved with only two iterations
    per pass
  • No manual fixes required post-layout
  • Tools runtimes 5-20x faster, few crashes

19
Daytona testchip 1 and 2 comparison
Write a Comment
User Comments (0)
About PowerShow.com