Mozammel Hossain - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Mozammel Hossain

Description:

Clock-Gating work in progress with collaboration from the Tool development team of IBM. Save 20% of design effort at present application in RF design Potential lead ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 37
Provided by: mozammel
Category:

less

Transcript and Presenter's Notes

Title: Mozammel Hossain


1
Ph.D. Preliminary Exam
  • Mozammel Hossain
  • Colorado State University
  • Department of Electrical and Computer Engineering
  • Nest Circuit Lead, IBM, Austin, TX
  • Advisor
  • Prof. Tom W. Chen
  • Committee Members
  • Prof. Yashwant Malaiya
  • Dr. Sudeep Pasricha
  • Dr. Ali Pezeshki

2
Research Area
  • Synthesis Based Design and Implementation
    Methodology of
  • High Speed, High Performing Unit (LBS)
  • Sync-Async Interface timing
  • Arrays with clock gating
  • To convert to synthesizable macro

3
Outlines
  • Introduction
  • Overview of Present Synthesis Methodology
  • Future Research and innovation in Synthesis
    Methodology
  • Problem definitions
  • Large Block Synthesis (LBS) L2 Cache Unit
  • Sync-Async Interface timing
  • Clock Gating support for Array Design
  • Approaches
  • Preliminary results
  • Conclusion and Future Work
  • Acknowledgement

4
Introduction
  • Technology market demand faster turn around of IC
    design and designers struggle to meet
    performance requirements.
  • Increasing costs for design, validation, and time
    to market.
  • past generations of microprocessors had more
    custom circuit design to meet tighter cycle time
    battle.
  • moving towards common synthesizable design
    methodology and most cases sacrificing desired
    speed of the chip in favor of new functionality
    and time to market.

5
Introduction Design Methodology
6
Introduction Macro Design Spectrum
5) Custom design(conventional)
4) Custom prerouting
3) Embed custom components
Design Customization
2) Preplace lcb/latches
1) VHDL structuring,parm customization
ATTRIBUTE BLOCK_DATA of add64 label is
"LOGIC_STYLE/xxxx/"
0) Vanilla synthesis
Design Effort
7
IntroductionTrend of Design methodology for
last 16 years
8
Overview of Present Synthesis Methodology
  • Synthesis
  • VHDL compile vhdl
  • PDSRTL front-end synthesis
  • PDSEMPAD early mode padding
  • MAR routing
  • RAPIDS post routing optimization
  • PROMOTE promote routed design
  • Run all backend tools (PDV, extraction, timing)

9
Overview of Present Synthesis Methodology
MAR/Rapids
Cadence Space
Backend tools
PDV
RLMB
10
Overview of Present Synthesis Methodology
Slack sharing Example
Broken path
Has margin to share
  • Look at timing across multiple latches
  • Consider sharing positive slack

11
Overview of Present Synthesis Methodology
Slack sharing Example
Balanced Slack
Balanced Slack
  • Delayed 1st Clock by 17 ps
  • Balanced slack of 3ps across 2 latches

12
Overview of Present Synthesis Methodology
  • Works very well on
  • Traditional control macro with 2.5-5M Transistors
    or about
  • 20K-40K Latches
  • Timing non-critical macro
  • Non-embeded IP macro
  • Without parents blockages
  • Unit buffer, latches, clock blockages
  • Slack sharing at synchronous clock domain
  • Without clock gating after Local Clock Buffer
    (LCB)

13
FutureResearch and innovation in Synthesis
Methodology
  • 1. Problem definition Large Block Synthesis
    (LBS)
  • Current methodology does not work well for much
    bigger design L2 Cache Unit (20M Transistor)
  • Need techniques such as IP pre-placement,
    dataflow structuring, and hierarchical embedded
    synthesis.
  • Need techniques for Wire Trait, soft hierarchy,
    Interior PIN
  • Congestion analysis at Critical timing and wiring
    area.
  • Develop Synthesis Methodology to support
  • Significant Shorter Design Cycle
  • Significant Physical Design Resources Reduction
  • Potential Area Reduction

14
FutureResearch and innovation in Synthesis
Methodology
  • LBS test case to develop methodology
  • Why L2 Cache Unit?
  • Area challenged unit
  • Has both 11 and 21 clocking methodology
  • 11 Clocking is same clock speed as Core clock
  • Paths on11 clocking, are highly timing
    challenged
  • Require Dual voltage routing and clock gating
  • Combination of data flow and control macros
  • Big unit to challenge tool flow run time and data
    management

15
Why L2 unit as test case?
C5
C4
C3
C2
C1
C0
Core
C8
C7
C6
C11
C10
C9
L2 Unit
L3 Unit
16
LBS Why L2 unit as test case?
FutureResearch and innovation in Synthesis
Methodology
  • Total Cache size 512KByte
  • gt4 GHz with core interface, control and Data
    Flow interface
  • gt2 GHz with cache, dir, address, L3 and Fabric
    interface
  • Unit Size gt 4.0 sq mm in 22nm, Total Black Box
    82
  • of Transistor including cache 44M of
    Synthesizable Transistor 19M

17
FutureResearch and innovation in Synthesis
Methodology
LBS Physical Design Resource Comparison with
Proposed Methodology
Physical Design Resources Traditional Approach (man month) Synthesizable Unit Approach (man month)
Ckt. Designer 18 0
Unit Timer 6 0
Unit Integrator 6 0
Unit Ckt. Lead 6 12
Total Resources 36 12
18
FutureResearch and innovation in Synthesis
Methodology
  • 2. Problem definition Synthesis timing
    methodology for Sync-Async interface.
  • Slack Sharing can not be done at Sync-Async
    Interface. Can result in meta-stable condition .
    Need to develop a methodology.
  • To handle Slack sharing in synthesis and timing
    environment
  • Identify latches involved. Turn-ff slack sharing.
  • For Design Automation.

19
FutureResearch and innovation in Synthesis
Methodology
Slack Sharing can not be done at Sync-Async
Interface
20
FutureResearch and innovation in Synthesis
Methodology
Slack-Sharing at Sync-Async Interface can result
in Meta-stability condition
Meta-stability At Latch point
21
FutureResearch and innovation in Synthesis
Methodology
  • 3. Problem definition Clock Gating support for
    Array Design in Synthesis Methodology.
  • Compliable Array offers fixed menu with limited
    read write ports.
  • Does not support clock gating.
  • Current methodology does not allow any gates
    between LCB (Local Clock Buffer) and Latch to
    prevent electrical rule violation.
  • Wiring, gate placement timing constraints need
    to be developed.
  • Minimum custom design Only Array Column
  • Potential Benefits
  • Around 20 Physical Design Resources Reduction.
  • Significant Shorter Design Cycle
  • Apply learning to other array design for more
    savings.
  • Potential area saving in Synthesis flow.

22
Proposed Array Design in Synthesis Methodology
FutureResearch and innovation in Synthesis
Methodology
  • LCB Local Clock Buffer
  • Generate CLK for MS Latch

23
Approaches
  • Pre-Placing Hard IP in LBS
  • Pseudo Algorithm
  • begin_place place
    ltinst_namegt xloc ltgt yloc ltgt ltrotgt movetypefixed
  • end_place
  • Wire Trait Example in LBS
  • Pseudo Parms file
  • ltFlowgt ltwire_codegt lttime gaingt ltrouting
    layersgt
  • synthesis_layer_traits W20S10L15 3 3 M2 X3
  • fine_opt_layer_traits W20S10L15 3 3 M2 X3

24
Approaches Soft-Hierarchy in LBS
Algorithm inst_namerlctl prefixl2rlctl xlowlt
gt ylowlt gt width height where ltinst_namegt
user specified name to recognize gates prefix
is the name of logic gates used in VHDL xlow,
ylow left lower coordinate width, height width
and height of macro in micron
25
Approaches Synthesis Parms in LBS
  • VT Upgrade
  • user_native_vt 1
  • user_alternate_vt 2 3
  • Interior PIN
  • pds_assign_interior_pins true
  • pds_pin_spec ltmetal layergt ltwidthgt
    ltheightgt
  • pds_horizontal_pin_spacing ltmetal layergt
    ltSpacinggt"
  • pds_vertical_pin_spacing ltmetal layergt
    ltSpacinggt
  • Rapids

26
Approaches Congestion Analysis
  • Routing resource allocation at top level
  • Negotiate routing resources with macro (IP)
  • Negotiate PIN placement with macro (IP)

27
Approaches Synthesis Methodology at Sync-Async
Interface
Application of Sync-Async Latch
28
Approaches Synthesis Methodology at Sync-Async
Interface
Pseudo Algorithm to exclude Sync-Async Latch in
slack borrowing
29
Preliminary Results Placed and Timed Gates of L2
of Transistor including cache 44M of
Synthesizable Transistor 19M
30
Preliminary Results Slack Take Down of L2
31
Preliminary Results Clock Gating at Array
interface
  • Clock gating is not working
  • Red shape/line Current Routing and Placement
  • Violates timing at array cell, Electrical check
  • Blue shape/line Desired Routing and Placement

L
LCB
L
LCB
LCB
32
Preliminary Results Clock Gating at Array
interface
  • Clock gating is not working

33
Conclusion and Future Works
  • With robust tool sets, newly proposed synthesis
    methodology and design guideline, L2 cache unit
    design can take almost 50 less resources to
    design even without dedicated unit timing and
    integration resources.
  • Preliminary data is very promising.
  • Further Experiment with 10 less unit area once
    design is closed.
  • Timing at Sync-Async interface methodology in
    Synthesis flow is being developed with user
    controlled parms.
  • Clock-Gating work in progress with collaboration
    from the Tool development team of IBM.
  • Save 20 of design effort at present application
    in RF design
  • Potential lead to more physical design effort
    savings in all type of array design. i.e SRAM,
    CAM, DRAM

34
Acknowledgement
  • Advisor Prof. Tom W. Chen
  • Committee Members
  • Prof. Yashwant Malaiya
  • Dr. Sudeep Pasricha
  • Dr. Ali Pezeshki
  • IBM
  • Joshua Friedrich
  • Dr. Vikas Agarwal
  • Chirag Desai
  • John Badar

35
Back-ups
36
Personal Background
  • Educational
  • BS in Electrical Engineering, BUET, Dhaka,
    Bangladesh
  • ME in Electrical Engineering, CUNY, New York
  • Professional
  • Product Development Engineer, Advanced Micro
    Devices (AMD), TX1994 1997
  • Circuit Design, Critical timing path analysis,
    Layout for K5 development team
  • Hardware Development Engineer, Mentor Graphics
    Corporation, NJ 1997 -1999
  • Test chip, Data Path Design, verilog model for
    ROM/RAM,
  • Member of Technical Staff, Hewlett Packard (HP),
    CO 1999 2002
  • Circuit design for FPU, High Speed IO Driver,
    Place and route, Timing analysis
  • Senior Engineer, International Business Machines
    (IBM), TX 2003 Present
  • Fabric Unit interim/co-Circuit Lead P6
  • GX, TP, CLIB, PC Unit Circuit Lead P6 DD1
  • L2, L3, NCU Circuit Lead P6 DD2
  • L2, NCU Circuit Lead P7 DD1, DD2
  • Nest Circuit Lead for P8, P9
Write a Comment
User Comments (0)
About PowerShow.com