Design Optimization in CellBased Design Environment - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Design Optimization in CellBased Design Environment

Description:

Design Optimization with On-Demand Library Generation ... Design Experiment. Post-Layout Optimization ... Short TAT Design of High-Performance and Cost-Effective SoCs ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 50
Provided by: Hidetosh8
Category:

less

Transcript and Presenter's Notes

Title: Design Optimization in CellBased Design Environment


1
Design Optimization in Cell-Based Design
Environment
  • Hidetoshi Onodera
  • Department of Communications and Computer
    Engineering
  • Kyoto University

2
Design Optimization in Cell-Based Design
Environment
  • Design Optimization with On-Demand Library
    Generation
  • Overview Background and Objectives
  • On-Demand Library Generation
  • Design Experiment
  • Post-Layout Optimization
  • Cell-Based Data-Path Design with On-Demand
    Library Generation

3
Design Optimization with On-Demand Library
Generation
  • Conventional Cell-Based Design
  • Cell library is supplied by a fab. or library
    vendor
  • Designer uses the library as a black box.
  • Pros High reliability
  • Cons Excessive safety margin
    moderate performance
  • On-Demand Library Generation
  • The design of library is a part of the total
    design cycle
  • Library can be optimized according to the spec.,
    process, circuit structure, etc.
  • Pros Performance enhancement
  • Cons Need automatic library generation/characteri
    zation

4
Design Optimization with On-Demand Library
Generation Objective
Short TAT Design of High-Performance and
Cost-Effective SoCs
  • Custom(Optimized) Design in Cell-Based(ASIC)
    Design Environment
  • Transistor-level optimization in ASIC design
    environment
  • Enhancement of IP Re-Usability
  • Performance adjustability under various process
    technologies
  • Short TAT Design by On-Demand Library Generation
  • Management of UDSM Effects
  • Interconnect delay, cross talk, performance
    variability, etc.

5
Design Optimization with On-Demand Library
Generation Overview
RTL
Performance Estimation
Logic Synth.
On-Demand Library Generation
Post-Layout Optimization
Layout Synth.
Delay, Power, Noise Optimization
Spec.
Circuit
Process
Optimized Library
  • Tr. Sizing for delay/power/noise
  • Based on detail-routed layout

ASIC/SoC
  • Tr sizes
  • Variety
  • Strength

Tuning
6
Design Optimization in Cell-Based Design
Environment
  • Design Optimization with On-Demand Library
    Generation
  • Overview Background and Objectives
  • On-Demand Library Generation
  • Design of library structure
  • Cell layout generation
  • Design Experiment
  • Post-Layout Optimization
  • Cell-Based Data-Path Design with On-Demand
    Library Generation

7
Design of Library Structure
  • Effect of library-structure(varieties in logic
    and strength) is experimentally examined.
  • Variety in logic Compact set is OK.
  • Basic logics(nand, nor, xor)
  • Simple complex logics(aoi, oai)
  • Positive logics(and, ao, etc.)
  • Driving strength Wide variety is necessary.
  • Small and intermediate strength for power
    reduction
  • Large strength for high speed

Cell Library with Variable Driving Strength
8
Generation of Cell Layout with Variable Driving
Strength
Requirements
Symbolic layout
Real layout
  • Process independence
  • Dense layout
  • Variable driving strength
  • cell height
  • Tr. width inside cell
  • Coping with
  • phase shift mask
  • mismatch of Tr. and wire pitches

Applied to 0.35, 0.18, 0.13 um
9
Example of Variable Driving Strength
Standard size
Half size
Fixed height
Adjustable Tr. width
Fixed pin locations
10
Features of Symbolic Layout
  • Hierarchically defined virtual grid for
  • the adaptability of design rules
  • the flexibility in transistor width

11
Examples of Generated Layout
9-pitch, max width.
9-pitch, different width
11-pitch, max width
12
Comparison with Fixed Cells Used for Mass
Production(0.18 mm)
Similar Performance
13
Comparison with Fixed Cells Used for Mass
Production(0.18 mm)
Small Area Penalty
14
Design Optimization in Cell-Based Design
Environment
  • Design Optimization with On-Demand Library
    Generation
  • Overview Background and Objectives
  • On-Demand Library Generation
  • Design Experiment
  • Real Chip Example
    DSP for moving Picture
    Compression
  • 32-bit RISC Core
  • Post-Layout Optimization
  • Cell-Based Data-Path Design with On-Demand
    Library Generation

15
DSP for Moving Picture Compression
  • DLX-based RISC Processor
  • 10 bit x 16 parallel SIMD operation
  • 15 k cells
  • 0.35 mm 3 Metal Process
  • 2 Cores with Different Libraries
  • Fixed Library Process Specific Library
  • On-Demand Library

16
Result
Routing Resource Limited
Fixed Lib.
Core area 8 less
(Cell area 17 less)
4.9mm
Power Dissipation (Measured at 25MHz, 1.6V) 10
less (21 less with optimized Flip-Flops)
On-Demand Lib.
17
Design Experiments
  • 32 bit RISC Core
  • 9 k Cells
  • 0.35 mm 3 Metal Process
  • Design Specifications(Clock Freq.)
  • 100 MHz, 120 MHz, 130 MHz
  • Libraries under Comparison
  • Fixed Library Process Specific Library
  • On-Demand Library

18
Timing Closure
Timing failed
Timing failed
19
Design Results with On-Demand Lib.
(a) 100 MHz
(b) 120 MHz
(c) 130 MHz
9-pitch cell
11-pitch cell
13-pitch cell
20
Area-Delay Trade-off Characteristics
21
Design Optimization in Cell-Based Design
Environment
  • Design Optimization with On-Demand Library
    Generation
  • Overview Background and Objectives
  • On-Demand Library Generation
  • Design Experiment
  • Post-Layout Optimization
  • Cell-Based Data-Path Design with On-Demand
    Library Generation

22
Post Layout Transistor Sizing for Power and
Crosstalk Reduction
Standard Size
After Optimization
Fixed Height
Width is tunable.
Pin Location is fixed.
Interconnect is preserved while tuning.
23
Results of Power Optimization
  • Constraints
  • Minimum Delay
  • Max Transition
  • 0.5ns
  • Noise Margin
  • 0.25Vdd
  • 0.35mm Process

60 Power Reduction
Power is evaluated by PowerMill.
24
An Example of Power Reduction (des)
  • Initial Circuit(x1,x2,x3,x4,,) 14.7mW
  • Discrete Opt. (. x0.15, x0.5) 11.3mW(-23)
  • Continuous Opt.
    6.4mW(-56)

25
Initial and Optimized Layouts
Initial
Optimized
26
Peak Current Reduction
66 reduction
Flattened current
Reduction of IR-drop, di/dt noise,
electromigration Reliability is enhanced.
des, fastest, Max transition 0.5ns
27
Results of Crosstalk Noise Reduction
Crosstalk noise is reduced while delay is kept
constant.
28
Design Optimization with On-Demand Library
Generation
  • Custom Design Quality in Cell-Based Design
    Environment by
  • On-Demand Library Generation
  • Post-Layout Transistor Sizing
  • Large Power Savings
  • Cross-Talk Noise Reduction
  • PS Generated Libraries are used in Japanese MPC
    service for academia (similar to MOSIS).

29
Design Optimization in Cell-Based Design
Environment
  • Design Optimization with On-Demand Library
    Generation
  • Overview Background and Objectives
  • On-Demand Library Generation
  • Design Experiment
  • Post-Layout Optimization
  • Cell-Based Data-Path Design with On-Demand
    Library Generation

30
Data-Path Design in Cell-Based Design Environment
Data-Path Circuits
Regular flow of signals (Bit-slice layout)
Transistor-level Performance Opt.
Evaluate the impact of the above operations
31
Three layout (placement) procedures
  • Manual placement of cells and I/Os considering
    regularity of signal flow. Automatic routing
  • Manual placement of I/Os considering regularity
    of signal flow. Automatic placement and
    routing.
  • Automatic placement and routing of I/Os and cells

Layout 3 circuits in the same area and compare
wire length and delay
32
Test circuits
  • Carry select adder(8 bit and 32 bit)
  • 16-bit tree-style multiplier
  • 0.35mm technology with three metal layers

carry select adder
multiplier
33
Signal flow
4-2 adder
partial product
16-bit multiplier using 4-2 adder
34
Signal flow
folded 16-bit multiplier using 4-2 adder
35
Design Time
  • Manual placement of I/Os and cells
  • 8-bit carry select adder 3 hours
  • 32-bit carry select adder 4 hours
  • 16-bit tree-style multiplier 5 hours

36
Design Results (Total Wire Length)
Total wire length decreased by Max. 63 Ave. 22
-8
-18
Not much difference in Manual and Semi-Auto
(Manual I/Os, Automatic cells)
-63
-20
-58
-15
37
Design Results (Delay)
  • Critical path delay evaluated by PathMill

-5
-5
Delay decreased by Max. 12 Ave. 4
-12
-11
-3
Not much difference in Manual and Semi-Auto
(Manual I/Os, Automatic cells)
-3
38
Issues in Manual Cell Placement
  • Longer design time
  • Difficulty in achieving compact layout while
    keeping regularity

Manually placed 16-bit multiplier Using 4-2 adder
39
Area Reduction by Automatic Placement
  • Automatic placement reduces dead space as far as
    routability allows.
  • Proper placement of I/Os ensures little
    degradation of performance

40
Design Results (Total Wire Length)
  • Core ratio, from 0.65 to 0.96
  • Decrease in total wire length

Total
41
Design Results (Delay)
  • Constant delay around 7.6ns

Manual placement I/Os Automatic
placement Cells
42
Data-Path Design in Cell-Based Design Environment
Data-Path Circuits
Regular flow of signals (Bit-slice layout)
Transistor-level Performance Opt.
Cell-Based Design Environment
Layout design with regular flow of signals
Transistor sizing inside leaf-cell
Evaluate the impact of the above operations
43
Experiment Tr. sizing(1/2)
  • Evaluate performance improvement by Tr. sizing
  • Longest path delay
  • Power dissipation
  • Tr. Sizing strategy
  • 1. Minimize longest path delay
  • 2. Minimize sum of Tr. sizes within 3 delay
    increase

44
Experiment Tr. sizing(2/2)
  • Limited implementation
  • Non-linear optimizer pathmill
  • Same logic cells have same structure.
  • 52 variables in 32b-CSA
  • (28Tr. in FA 12Tr. in MUX 12Tr. in MUX)
  • 34 variables in 16b-multiplier
  • (28Tr. in FA 6Tr. in AND2)
  • Optimized 32b-CSA is imported into 16b-multiplier
    as CPA.

45
CPU Time
  • 8-bit Carry Select Adder 4 hours
  • 32-bit Carry Select Adder 7 hours
  • 16-bit Tree-style multiplier 5 days

46
Tr-Sizing Results (Delay)
Max. 20 Ave. 15 decrease
-14
-11
-20
-15
-19
-13
Evaluated by PathMill
47
Tr-Sizing Results (Power)
Max. 58 Ave. 43 decrease
-38
-41
-49
-36
-58
-36
Evaluated by PowerMill with 100 random patters in
10 ns cycles
48
Cell-Based Data-Path Design with On-Demand
Library Generation
  • Experimentally evaluates effectiveness of
  • Bit-slice layout that maintains regularity and
    signal flow
  • Transistor sizing
  • Automatic cell placement realizes equal quality
    in arithmetic unit design
  • Transistor sizing is very effective both in delay
    and power

49
Design Optimization in Cell-Based Design
Environment
  • Design Optimization with On-Demand Library
    Generation
  • Custom Design Quality in Cell-Based Design
    Environment by
  • On-Demand Library Generation
  • Post-Layout Transistor Sizing
  • Large Power Savings
  • Cross-Talk Noise Reduction
  • Cell-Based Data-Path Design with On-Demand
    Library Generation
  • ManualI/Os, AutomaticCells works well
  • Transistor sizing is effective both in delay and
    power
Write a Comment
User Comments (0)
About PowerShow.com