Title: Mozammel Hossain
1Ph.D. Preliminary Exam
- Mozammel Hossain
- Colorado State University
- Department of Electrical and Computer Engineering
- Nest Circuit Lead, IBM, Austin, TX
- Advisor
- Prof. Tom W. Chen
- Committee Members
- Prof. Yashwant Malaiya
- Dr. Sudeep Pasricha
- Dr. Ali Pezeshki
2Research Area
- Synthesis Based Design and Implementation
Methodology of - High Speed, High Performing Unit (LBS)
- Sync-Async Interface timing
- Arrays with clock gating
- To convert to synthesizable macro
3Outlines
- Introduction
- Overview of Present Synthesis Methodology
- Future Research and innovation in Synthesis
Methodology - Problem definitions
- Large Block Synthesis (LBS) L2 Cache Unit
- Sync-Async Interface timing
- Clock Gating support for Array Design
- Approaches
- Preliminary results
- Conclusion and Future Work
- Acknowledgement
4Introduction
- Technology market demand faster turn around of IC
design and designers struggle to meet
performance requirements. - Increasing costs for design, validation, and time
to market. - past generations of microprocessors had more
custom circuit design to meet tighter cycle time
battle. - moving towards common synthesizable design
methodology and most cases sacrificing desired
speed of the chip in favor of new functionality
and time to market.
5Introduction Design Methodology
6Introduction Macro Design Spectrum
5) Custom design(conventional)
4) Custom prerouting
3) Embed custom components
Design Customization
2) Preplace lcb/latches
1) VHDL structuring,parm customization
ATTRIBUTE BLOCK_DATA of add64 label is
"LOGIC_STYLE/xxxx/"
0) Vanilla synthesis
Design Effort
7IntroductionTrend of Design methodology for
last 16 years
8Overview of Present Synthesis Methodology
- Synthesis
- VHDL compile vhdl
- PDSRTL front-end synthesis
- PDSEMPAD early mode padding
- MAR routing
- RAPIDS post routing optimization
- PROMOTE promote routed design
- Run all backend tools (PDV, extraction, timing)
9Overview of Present Synthesis Methodology
MAR/Rapids
Cadence Space
Backend tools
PDV
RLMB
10Overview of Present Synthesis Methodology
Slack sharing Example
Broken path
Has margin to share
- Look at timing across multiple latches
- Consider sharing positive slack
11Overview of Present Synthesis Methodology
Slack sharing Example
Balanced Slack
Balanced Slack
- Delayed 1st Clock by 17 ps
- Balanced slack of 3ps across 2 latches
12Overview of Present Synthesis Methodology
- Works very well on
- Traditional control macro with 2.5-5M Transistors
or about - 20K-40K Latches
- Timing non-critical macro
- Non-embeded IP macro
- Without parents blockages
- Unit buffer, latches, clock blockages
- Slack sharing at synchronous clock domain
- Without clock gating after Local Clock Buffer
(LCB)
13FutureResearch and innovation in Synthesis
Methodology
- 1. Problem definition Large Block Synthesis
(LBS) - Current methodology does not work well for much
bigger design L2 Cache Unit (20M Transistor) - Need techniques such as IP pre-placement,
dataflow structuring, and hierarchical embedded
synthesis. - Need techniques for Wire Trait, soft hierarchy,
Interior PIN - Congestion analysis at Critical timing and wiring
area. - Develop Synthesis Methodology to support
- Significant Shorter Design Cycle
- Significant Physical Design Resources Reduction
- Potential Area Reduction
14FutureResearch and innovation in Synthesis
Methodology
- LBS test case to develop methodology
- Why L2 Cache Unit?
- Area challenged unit
- Has both 11 and 21 clocking methodology
- 11 Clocking is same clock speed as Core clock
- Paths on11 clocking, are highly timing
challenged - Require Dual voltage routing and clock gating
- Combination of data flow and control macros
- Big unit to challenge tool flow run time and data
management
15Why L2 unit as test case?
C5
C4
C3
C2
C1
C0
Core
C8
C7
C6
C11
C10
C9
L2 Unit
L3 Unit
16LBS Why L2 unit as test case?
FutureResearch and innovation in Synthesis
Methodology
- Total Cache size 512KByte
- gt4 GHz with core interface, control and Data
Flow interface - gt2 GHz with cache, dir, address, L3 and Fabric
interface - Unit Size gt 4.0 sq mm in 22nm, Total Black Box
82 - of Transistor including cache 44M of
Synthesizable Transistor 19M
17FutureResearch and innovation in Synthesis
Methodology
LBS Physical Design Resource Comparison with
Proposed Methodology
Physical Design Resources Traditional Approach (man month) Synthesizable Unit Approach (man month)
Ckt. Designer 18 0
Unit Timer 6 0
Unit Integrator 6 0
Unit Ckt. Lead 6 12
Total Resources 36 12
18FutureResearch and innovation in Synthesis
Methodology
- 2. Problem definition Synthesis timing
methodology for Sync-Async interface. - Slack Sharing can not be done at Sync-Async
Interface. Can result in meta-stable condition .
Need to develop a methodology. - To handle Slack sharing in synthesis and timing
environment - Identify latches involved. Turn-ff slack sharing.
- For Design Automation.
19FutureResearch and innovation in Synthesis
Methodology
Slack Sharing can not be done at Sync-Async
Interface
20FutureResearch and innovation in Synthesis
Methodology
Slack-Sharing at Sync-Async Interface can result
in Meta-stability condition
Meta-stability At Latch point
21FutureResearch and innovation in Synthesis
Methodology
- 3. Problem definition Clock Gating support for
Array Design in Synthesis Methodology. - Compliable Array offers fixed menu with limited
read write ports. - Does not support clock gating.
- Current methodology does not allow any gates
between LCB (Local Clock Buffer) and Latch to
prevent electrical rule violation. - Wiring, gate placement timing constraints need
to be developed. - Minimum custom design Only Array Column
- Potential Benefits
- Around 20 Physical Design Resources Reduction.
- Significant Shorter Design Cycle
- Apply learning to other array design for more
savings. - Potential area saving in Synthesis flow.
22Proposed Array Design in Synthesis Methodology
FutureResearch and innovation in Synthesis
Methodology
- LCB Local Clock Buffer
- Generate CLK for MS Latch
23Approaches
- Pre-Placing Hard IP in LBS
- Pseudo Algorithm
- begin_place place
ltinst_namegt xloc ltgt yloc ltgt ltrotgt movetypefixed - end_place
- Wire Trait Example in LBS
- Pseudo Parms file
- ltFlowgt ltwire_codegt lttime gaingt ltrouting
layersgt - synthesis_layer_traits W20S10L15 3 3 M2 X3
- fine_opt_layer_traits W20S10L15 3 3 M2 X3
24Approaches Soft-Hierarchy in LBS
Algorithm inst_namerlctl prefixl2rlctl xlowlt
gt ylowlt gt width height where ltinst_namegt
user specified name to recognize gates prefix
is the name of logic gates used in VHDL xlow,
ylow left lower coordinate width, height width
and height of macro in micron
25Approaches Synthesis Parms in LBS
- VT Upgrade
- user_native_vt 1
- user_alternate_vt 2 3
- Interior PIN
- pds_assign_interior_pins true
- pds_pin_spec ltmetal layergt ltwidthgt
ltheightgt - pds_horizontal_pin_spacing ltmetal layergt
ltSpacinggt" - pds_vertical_pin_spacing ltmetal layergt
ltSpacinggt - Rapids
26Approaches Congestion Analysis
- Routing resource allocation at top level
- Negotiate routing resources with macro (IP)
- Negotiate PIN placement with macro (IP)
27Approaches Synthesis Methodology at Sync-Async
Interface
Application of Sync-Async Latch
28Approaches Synthesis Methodology at Sync-Async
Interface
Pseudo Algorithm to exclude Sync-Async Latch in
slack borrowing
29Preliminary Results Placed and Timed Gates of L2
of Transistor including cache 44M of
Synthesizable Transistor 19M
30Preliminary Results Slack Take Down of L2
31Preliminary Results Clock Gating at Array
interface
- Clock gating is not working
- Red shape/line Current Routing and Placement
- Violates timing at array cell, Electrical check
- Blue shape/line Desired Routing and Placement
L
LCB
L
LCB
LCB
32Preliminary Results Clock Gating at Array
interface
- Clock gating is not working
33Conclusion and Future Works
- With robust tool sets, newly proposed synthesis
methodology and design guideline, L2 cache unit
design can take almost 50 less resources to
design even without dedicated unit timing and
integration resources. - Preliminary data is very promising.
- Further Experiment with 10 less unit area once
design is closed. - Timing at Sync-Async interface methodology in
Synthesis flow is being developed with user
controlled parms. - Clock-Gating work in progress with collaboration
from the Tool development team of IBM. - Save 20 of design effort at present application
in RF design - Potential lead to more physical design effort
savings in all type of array design. i.e SRAM,
CAM, DRAM
34Acknowledgement
- Advisor Prof. Tom W. Chen
-
- Committee Members
- Prof. Yashwant Malaiya
- Dr. Sudeep Pasricha
- Dr. Ali Pezeshki
- IBM
- Joshua Friedrich
- Dr. Vikas Agarwal
- Chirag Desai
- John Badar
35Back-ups
36Personal Background
- Educational
- BS in Electrical Engineering, BUET, Dhaka,
Bangladesh - ME in Electrical Engineering, CUNY, New York
- Professional
- Product Development Engineer, Advanced Micro
Devices (AMD), TX1994 1997 - Circuit Design, Critical timing path analysis,
Layout for K5 development team - Hardware Development Engineer, Mentor Graphics
Corporation, NJ 1997 -1999 - Test chip, Data Path Design, verilog model for
ROM/RAM, - Member of Technical Staff, Hewlett Packard (HP),
CO 1999 2002 - Circuit design for FPU, High Speed IO Driver,
Place and route, Timing analysis - Senior Engineer, International Business Machines
(IBM), TX 2003 Present - Fabric Unit interim/co-Circuit Lead P6
- GX, TP, CLIB, PC Unit Circuit Lead P6 DD1
- L2, L3, NCU Circuit Lead P6 DD2
- L2, NCU Circuit Lead P7 DD1, DD2
- Nest Circuit Lead for P8, P9