Title: Low Power Clocking
1Low Power Clocking
- Through the Use of Dual Edge Triggered Flip-Flops
- Gabriel Ricardo
- Theresa Holliday
2Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including Dual edge into synthesis
flow - Preliminary comparisons
- Conclusions and Future Work
- Questions
3Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including Dual edge into synthesis
flow - Preliminary comparisons
- Conclusions and Future Work
- Questions
4Symmetric Pulse Generator Flip-Flop (SPGFF)
- First stage, X and Y, are dynamic, second stage
static NAND - Results in small delay
- Can size to trade some delay for power
5Operation of SPGFF
- Transparency window created by CLK and CLK3 for
stage 1 (CLK1 and CLK4 for stage 2), allows for X
(Y) to conditionally evaluate based on input D. - Output stage NAND allows for X, Y to be passed to
output based on clock value without the need for
a latch.
6Transmission Gate Master Slave (TGMS)
7Comparison between SPGFF and TGMS in 0.18um
8Advantages of SPGFF
- Lowest clock energy of other DET-CSEs, resulting
in higher clock power savings - Energy delay product comparable to high
performance single edge triggered clocked storage
elements
9Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including Dual edge into synthesis
flow - Preliminary comparisons
- Conclusions and Future Work
- Questions
10Characterization Methodology Generating
synthesis views
- Created automated process for generating synopsys
liberty format (.lib) synthesis models. - Using perl scripts and gspice (spice
pre/post-processor) - Characterized for timing and energy.
- Can easily extend to generate cadence synthesis
models (.tlf).
11Characterization Methodology Trip-points
- Used same trip-points as those in technology
library. - Nominal conditions 25C, 1.8V supply
- Can easily generate best and worst case corner
models (over temp and supply variation). - Cell delay defined as clock 50 rise/fall to
Output (Q or QN) 50 rise/fall - Transition time 10-90 rise, 90-10 fall
time
12Trip-points - Falling
13Trip-points - Rising
14Characterization Methodology - Drive
Characteristics
- Build 5x5 non-linear delay table.
- Clock slope values (nano-seconds)
- 0.03, 0.1, 0.4, 1.5, 3
- Output load values (fF)
- 0.35, 21, 38.5, 147, 311
15Characterization Methodology Trip-points
- Setup time sweep input transition towards
active edge until 10 increase in clock to output
delay. - Hold time sweep input transition away from
active edge until 10 increase in clock to output
delay.
16Characterization Methodology Setup-hold
17Characterization Methodology Setup and Hold
- Build 3x2 non-linear delay table. (3ps accuracy)
- Clock slope values (nano-seconds)
- 0.03, 3
- Data slope values (nano-seconds)
- 0.03, 0.9, 3
18Characterization Methodology Internal energy
- Characterized over same data points as drive
characteristics for internal energy (5x5 lookup
table). - Data pin, clock pin energy tables generated (1x5
lookup table).
19Characterization Results- single vs dual-edge
D to Q delay
TGMS
SPGFF
20What is typical output load?
- Extracted output loading from netlist for all
CSEs. - Average load 24fF
- (6.8 min. inverters)
- 90 of CSEs have load less than 60fF
- (17 min. sized inverters)
21Netlist extracted CSE output loading statistics
22Characterization Results- single vs dual-edge
Delay
TGMS
SPGFF
23Characterization Results zoomed-in- single vs
dual-edge delay
TGMS
SPGFF
24Characterization Results- single vs dual-edge
Energy delay product
TGMS
SPGFF
25Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including Dual edge into synthesis
flow - Preliminary comparisons
- Conclusions and Future Work
- Questions
26Leon SPARC core configuration
27Leon SPARC synthesis
- Synthesized using TSMC 0.18um standard cell
library. - Target frequency of 200MHz
- Limit use of single sized D-FF.
28SET- Synthesis flow
29SET-CSE synthesis summary
Area and Power
30Core summary
Approximately 20k-gates
31Clock tree loading
- based on library wire-load model
32Clock tree power estimation
- High-fanout nets are beyond the librarys
wire-load models interpolation range. - wire-load models are not meant for estimating
balanced distribution nets such as clock nets. - Using library wire-load models for clock tree is
not valid. - Use an H-tree estimation equation to obtain a
ball-park number.
33H-tree estimation equation
- Equation developed by ACSEL lab member Nikola
Nedovic. - recursively calculates H-tree loading for a given
area, number of CSEs in design, and number of
H-tree levels.
34H-tree estimation method
35H-tree estimation method
Table taken from Nedovic, Nikola, Ph.D.
Dissertation, UCD, CLOCKED STORAGE ELEMENTS FOR
HIGH-PERFORMANCE APPLICATIONS
36H-tree estimation method
37Total H-tree power
38SET-CSE synthesis summarywith H-tree estimate
Area and Power
39SET-CSE power profilewith H-tree estimate
40SET-CSE Core power profile
41Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including Dual edge into synthesis
flow - Preliminary comparisons
- Conclusions and Future Work
- Questions
42Modeling DET-CSEs for Synthesis
- Need to model the timing parameters for both
edges.
43Modeling DET-CSEs for Synthesis
- Can model complex timing relationships for
synthesis.
44Modeling DET-CSEs for Synthesis
- Synthesis tool will time, and (try to) meet
constraints for the dual-edge triggered
synchronous system.
45Modeling DET-CSEs for Synthesis
- Synthesis tool will use the worst timing arc
relationship for critical path constraint.
46Modeling DET-CSEs for Synthesis
- Synthesis tools are not capable of inferring a
dual-edge triggered device from HDL code. - For meeting timing we only care about the
strictest constraint anyway. (i.e. for one pair
of launch and capture edges). - Unnecessary to model complex timing device.
47Modeling DET-CSEs for Synthesis
- Simply model DET-CSE as a SET-CSE with worst-edge
timing parameters.
48Synthesis flow for DET-CSEs
49Synthesis flow for DET-CSEs
- Use synthesis directives to force use of DET-CSE
modeled device. - Synthesize for target throughput, not frequency.
- Worst-case models for meeting critical-path
timing constraints. - generate a worst-case hold model, to verify the
race-path. - Fastest clk-Q with worst-case hold time
50Modeling DET-CSEs for Synthesis
51DET-CSE synthesis summarywith H-tree estimate
Area and Power
52DET-CSE power profile
53DET Core summary
Approximately 20k-gates (based on nand4)
54DET-CSE power profile
55Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including DETCSEs into synthesis flow
- Preliminary comparisons
- Conclusions and Future Work
- Questions
56Issues with DET-CSE integration
- Memory blocks are single-edge triggered and must
be clocked at twice the core clock rate. - Currently using a dual-edge triggered VHDL
behavioral model for memory blocks for netlist
simulations. - Possible solutions
- Clock the memory blocks at 2x nominal.
- Modify memory address and data latch to be
dual-edge triggered.
57Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including Dual edge into synthesis
flow - Preliminary comparisons
- Conclusions and Future Work
- Questions
58Power Comparison of two design netlists
SPGFF
TGMS
Core Total 92.46mW
Core Total 106.8mW
Total 111mW
Total 84.2mW
27mW savings
24 power savings in core
59Summary of comparison
- 24 savings in core power.
- Estimated 28 increase in sequential cell area
(17 increase in core area). - Both meet specified performance _at_ 200MHz (report
zero slack).
60Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including Dual edge into synthesis
flow - Preliminary comparisons
- Conclusions and Future Work
- Questions
61Summary
- Established methods for automated cell
characterization. - Developed design flow for DET-CSE integration.
- Demonstrated pre-layout results.
- Obtained functional DET-CSE netlist.
- Investigated functionally enhanced DET-CSEs
(scan, reset).
62Future work
- Expand family of DET-CSEs (i.e. sizings,
functionalities) - Obtain more accurate clock tree loading.
- Perform layout of cells for more accurate
comparison.
63Functionally enhanced Dual-Edge Triggered
Flip-Flops
- Need to show that functions such as reset, set,
and scan can be added to DETCSEs - Need to do analysis of power and performance
impact of added functionality - Do DETCSEs still result in practical power
savings?
64Scan in SPGFF
65Scan in DFF
Functional Schematic of DFF with Scan
66Clear in SPGFF
67Clear in DFF
68Preliminary Results of Adding Functionalities
69Outline
- Dual Edge Flip-Flops overview
- Standard Cell Characterization
- LEON Synthesis for SET design
- LEON Synthesis for DET design
- Issues with including Dual edge into synthesis
flow - Preliminary comparisons
- Conclusions and Future Work
- Questions