Title: A SOC DSP Design Methodology Case Study
1A SOC DSP Design Methodology Case Study
- Joseph Williams
- Room 4e-525
- 101 Crawfords Corner Rd.
- Holmdel, NJ 07733
- joew_at_lucent.com
2Outline
- Testchip 1 architecture and methodology summary
- Testchip 1 design review
- Testchip 2 architecture and methodology summary
- Testchip 2 design review
- Results comparison
3Daytona testchip 1
32-bit RISC 64-bit SIMD
32-bit RISC 64-bit SIMD
Hardware Debug
Hardware Debug
I/O Subsystem
L1 Cache
L1 Cache
Memory Controller
SRAM
PE Controller
PE Controller
Arbiters Semaphores
Transaction Manager DMA
128-bit Split Transaction Bus
Host Interface
32-bit RISC 64-bit SIMD
32-bit RISC 64-bit SIMD
Host I/O
Hardware Debug
Hardware Debug
L1 Cache
L1 Cache
PE Controller
PE Controller
4Vanilla Flow
Restructure RTL
Implement RTL
RTL
Cell Library
Standard Wire Load
Reoptimize
Synopsys RTL Synthesis
Netlist
Avanti Place Route
Final Netlist
Layout Parasitics
5Chip implementation specifics
6(No Transcript)
7Testchip 1 back-end design debriefing
- Back-end process slipped schedule endlessly
- Back-end process required 9 months
- Well over 18 man months of effort
- The die size was several times larger than
predicted - 0.35u implementation abandoned for 0.25u due to
congestion - Die utilization for final design was below 25
- Timing closure was a nightmare
- Initial target of 150Mhz was abandoned
- Could not achieve timing closure on most blocks
without several time consuming iterations - Inter-block routing required many manual fixes
- Tools required hours and days to produce results
and crashed regularly
8What the hell happened?!!
- The design had many characteristics which make
back-end difficult - Many wide busses with large fanin and fanout
- Centralized state machines controlling vast
regions of logic - Timing paths which span multiple blocks
- The design methodology was not sufficient to
handle a design of this complexity - Pre-layout estimates of parasitics were very
inaccurate - No mechanisms existed to predict and manage
congestion - Large design database required extensive tool
run-times - Separate designers did not understand the
implication of connections to inter-block routing
resources
9Modify the methodology to handle large SOC designs
- Invest time in the redesign of the architecture
to make the logical and physical hierarchy
similar - Partition the physical design early in the
synthesis process - Define groupings of cells small enough to be
timed accurately with wire load models (local
nets) - Identify nets which cross group boundaries
(global nets) - Invest time in multiple stages of floorplanning
- Use a multipass synthesis strategy with
successive refinement of wire parasitics from
floorplan
10Hierarchical Wire Load Models
Block D
100K_WLM
Block C
50K_WLM
Block A
Block B
10K_WLM
10K_WLM
11Table Format Wire Load Model
wire_load_table(10K_WLM) fanout_length( 1,
0.002) fanout_length( 2, 0.005) fanout_length(
3, 0.013) fanout_length( 4, 0.022) fanout_length
( 7, 0.033) fanout_length( 11,
0.054) fanout_capacitance( 1, 0.002) fanout_capa
citance( 2, 0.005) fanout_capacitance( 3,
0.013) fanout_capacitance( 4, 0.022) fanout_capa
citance( 7, 0.033) fanout_capacitance( 11,
0.054)
fanout_resistance( 1, 0.005) fanout_resistance(
2, 0.005) fanout_resistance( 3,
0.139) fanout_resistance( 4, 0.276) fanout_resis
tance( 7, 0.550) fanout_resistance( 11,
0.785) fanout_area( 1, 0.5) fanout_area( 2,
1.0) fanout_area( 3, 1.5) fanout_area( 4,
2) fanout_area( 7, 3.5) fanout_area( 11, 5.5)
12Wire Load Model Limitations
Block A encloses 50,000 standard cells and uses
50K_WLM
Wire A0
Wire A2
Wire A1
Wire load model assumes that all single fanout
nets have the same capacitance, resistance, and
area
13Partition nets into two classes
SIMD Datapath Floorplan
- Local nets
- Typically over 99 of nets fit in this class
- Wire load models give reasonably accurate
parasitics - Global nets
- Typically less than 1 of nets fit in this class
- Wire load models are unreasonably inaccurate for
large designs, must use floorplan data
Local Nets
Global Nets
14Daytona testchip 2
15Improved Flow Part 1
Restructure RTL
Modify Physical Groupings
Implement RTL
RTL
Cell Library
Standard Wire Load
Synopsys Pass1 RTL Synthesis
Netlist
Cell Clusters
Net Groups
Avanti Floorplan
Layout Parasitics
Cell Placement
16Improved Flow Part 2
Avanti Floorplan
Cell Placement
Annotated Global Nets
Custom Wire Load
Inplace Optimization
Synopsys Pass2 Resynthesis
Netlist
Cell Clusters
Net Groups
Avanti Place Route
Layout Parasitics
Final Netlist
17Chip implementation specifics
18Testchip 2 back-end design debriefing
- Back-end process significantly accelerated
- Back-end process required 2 months
- Under 2 man months of effort
- The final die size matched the predicted die size
- Physical cell grouping and early floorplanning
resulted in no routing congestion in the final
layout - Die utilization for final design exceeded 95
- Average cell size 2-3x smaller due to average
reduction in net size - Timing closure achieved with only two iterations
per pass - No manual fixes required post-layout
- Tools runtimes 5-20x faster, few crashes
19Daytona testchip 1 and 2 comparison