Title: FIFO Chip Design Example
1FIFO Chip Design Example
2FIFO Example
- We will now try to put together the concepts of
- Cell based design
- Super Buffer
- Clock trees
- IP reuse
- Getting a chip into a Pad frame
- FIFO
- Simple, Regular
3Getting Started
- The first thing we must do is decide the pins in
an actual pad frame with the package. - This will give us the context we need to make
intelligent decisions about routing.
4MOSIS Pad frame
- The stand tiny chip from MOSIS can support 40
pins. - You need to start with the pin out of the actual
packaged chip to make the part useable and
testable. - We will use pin 1 as VDD and 21 as GND as a
standard. This inputs will come in the top
(2-20) and out puts in general will be out the
bottom (21-40) - We will choose pin 2 for CK, 3 for NPRE and 4 for
NCLR - A0-A15 will map to pins 5-20.
- Y0-Y15 will map to pins 40-25.
5Packaged Part
Note I will not fab a part without the pins
list!
6Sample Pad frame
Area inside is 895mm by 895mm.
pin 21
pin 1
You can get more area buy using less pins. (Read
Data in serially?)
You can have larger circuits but they use up
more MOSIS money
7Bonding Diagram
This goes in the package.
8How big a FIFO can we make?
- Our DFF is 72mm x 36mm in area
- A MOSIS tiny chip gives you about 900mm x 900mm
of space Assume that we can only use ½ the
space. - This can be increased if you use less than 40
pads. - Number of rows 450/36 gives 12
- Number of columns 900/72 gives 12
9Saving Space
We could get rid of not clock by adding an
inverter and save 3mm.
We could overlay the clock and reset signals and
save 10mm.
We could overlay the ground wires and save 3 mm.
10Trade offs
- Replacing not clock with an inverter.
- New Cell Height 33 (450/33 gives 13)
- New Cell Width 728 (900/ gives 11)
- Routing is easier
- Do not have to worry about skew between not clock
and clock - Will the power go up?
- Maybe. You would need another super buffer to
drive not clock. In this case you only need one.
11Trade offs
- Overlay the reset and clock signals
- New average Cell Height 31 (450/31 gives 14)
- No New Cell Width
- Need two DFF parts one flipped with different
wiring to the global signals one unchanged - We already need two type of FF one with D and not
D and the other with D input only. This would
make 4 different FF!
12Trade offs
- Overlay the ground signals
- New average Cell Height 34.5 (450/34.5 gives 13)
- No New Cell Width
- Electro migration?
- Nothing works!
- We have to try it all!
- Still only 15 wide!
- We could shrink height by 3mm which would give us
16 bits wide but then AOI logic would not fit
into the cell height. - We beg the senior engineer for 50mm more space.
134 DFFs
20 min
40 min
5 min
7 min
- All The FFs need to have the Not clock removed!
- Need to have to verify 4 new parts from one old
part! - This will take some time!
- No choice.
- New Average Cell Height 31.375 16 bits high will
give less than 500 microns so it it will fit in
the expanded space.
14Derivative DFF Design
It also helped that my NAND3 was designed to
have flexible routing, rather than minimum area.
I really saved some time by reusing the same
template.
15Not D Internal Routing Up
- Not CK are provided by inverters to be added as
required. Not D is generated by the NAND2 from
D. Since we will not be operating at less than
1ns the increase setup time will not matter.
16DFF_DI_RU
NPRE
NCLR
CK
Q QN
Use the nand as an inverter.
D
17DFF_DI_RD
CK NPRE NCLR
18DFF_DE_RU
19DFF_DE_RU
D ND
20DFF_DE_RD
Q QN
D ND
21DFF_INV
22Design Review
- After looking at the parts so far it looks like
there could be an electro migration problem where
the VDD is bought into the circuit
Since all the FF use the same basic parts, We
just have to fix it once in each cell. You can
even edit it in place! I had to flatten the NTAP
to do this. I had to add some nwell due to a DRC
error.
23New DFF Structure
24Back to the FIFO
- We can fit 16 bits high within 500 microns
- We can fit 900/80 long (11)
- We can do a FIFO 10 bits deep.
- We will use 16 x 10 DFF (160)
- 8 DFF_DI_RU
- 8 DFF_DI_RD
- 72 DFF_DE_RU
- 72 DFF_DE_RD
25Gut check on power
- 160 DFF
- Each one has 21 NMOS and 21 PMOS
- This is like having 21 inverters
- Total number of inverters is 3360 (6720
transistors) - The power for one inverter at 30 Mhz is
The power for an alfa of one and 3360 Inverters
is 58mW We know that not all transistors do not
switch every clock cycle so this is an upper
bound.
2630 MHz! What happened to 200MHz?
- With no PLL and the data coming from off chip the
maximum clock rate and off chip speed is about 30
MHz! One could design special output buffers but
these are tricky and would use more power! - We will continue to test at a higher speed
because the simulation will go faster! - For the final pin to pin simulation we will have
to simulate at 30 MHz for at least 20 clock
periods.
27FIFO Schematic
Start off with the basic structure that can be
copied and pasted.
28FIFO Schematic
Hard to see!
29FIFO Schematic Complete
30FIFO Symbol
31Verilog takes less than a second to verify.
32Verilog Test bench
33Spice Test Bench
34Input Vectors
35Input Vectors
36Input Vectors
37Output
38Output
39Spice Summary
- The circuits has been validated
- The simulation took about 10 minutes to run!
40Layout
Pre Rout CK NPRE and NCLR
Set up The first 4 FF
Then make it 10 across
41Layout with only cells
42Route VDD and Ground
43Final FIFO Layout
GND
VDD
DATA FLOW
44Final Layout Verification
45Post Extraction Simulation
46Modify a Pad frame
- The parts we need are
- input buffer (padinc)
- output buffer (padio)
- corners (fc)
- VDD pad (padvdd)
- GND pad (paddgnd)
- You can FTP a sample pad frame from mosis
- http//www.mosis.org/Technical/Designsupport/pad-l
ibrary-scmos.html - The Docs are there as well.
47Sample Padframe
Load in a sample padframe. To change a pin just
select it and press q for edit, and then change
the same to what you want.
48Change Pin 21 from padinc to padgnd
padinc to padgnd
49Make sure pads abut.
correct
But Metal 1 together and make sure the PSEL line
is on he horizontal axis.
Not correct!
Pin 26
50Change pins 22-24 to unused
51FIFO_PF
- After you make a padframe open up a new cell and
add the the instance of your padframe. - Then add pins.
52Create Pin Names
Finished pin
Do not use a global Variable for VDD and GND!
Use metal 3 input/output
Use a width of 50
53Pin to Pin Test Bench
- VDD and GND can not be global then have to be
direct pins. - The test bench and symbol are almost identical to
to FIFO part - Copy the FIFO to a new cell called FIFO_PF
- Edit the symbol to add two ne pins VDD and GND
which are in/out. - Change the schematic accordingly.
54FIFO_PF Layout
- Parts
- Super buffer for clock
- FIFO
- Padframe
- Stamp them down and wire them up!
55PAD I/O
OEN Output enable DO sends output to pad DB
inverts data coming from the pad DI
does not invert data coming in from the pad
DB DI
OEN DO
56PAD VDD/GND
GND
57Connect a metal 2 path 2.7u wide to DIB
58DRC
- The pads do not pass DRC. Pads are an exception
to the rule, and these pads have proven
themselves in the field. - We have to do DRC on everything else.
- Draw a Do not do DRC layer around the pads with
the edge of the pads just over the metal 2
connection (We want to make sure we are connect
right?)
59Do not DRC Layer
Put it just over the metal 2
60FIF0_PF Symbol
Rearrange ports to match chip.
Note Delete the schematic view of the FIFO_PF.
61FIF0_PF_TB
62Simulation Trouble
- I ran into some major trouble that took two days
to fix. - The pad frame I had built was off a little bit
- It takes 3 minutes to extract and almost 10
minutes to simulation (Then you see an error.) - That is almost 15 minutes to try each solution.
- Also the pad frame kept moving on me when I would
inadvertently moved it when I was zoomed in on a
small feature.
63Solutions
- Start off with just the pad frame and just wire
the input and outputs together and see if it acts
like a buffer. - Build the pad frame in the top most cell.
- Put all the parts in the center.
- Select everything inside the pad frame and make
it a cell. - Edit the cell in place.
- Do the shortest simulation possible to make sure
everything is connected (Like a reset or set
operation) - You can probe the extracted wires by descend
editing to the extracted view and selecting the
wire you want. - Get the pad frame working on its own in parallel
with the circuit design.
64The rise time on the when NPRE goes low is 4ns!
The output Y seems ok and since the circuit is
supposed to work at 30 MHz I will not try to
buffer the signal. This simulation ran for 8mins
40 secs to get 20ns
65 A B
C
66Analysis
- Point A is when NCLR goes low thus changing the
state of every FF (notice the power surge - Point B is when NCLR goes high and the FIFO
begins to fill - Point C is when the all the FF are turning on at
the same time. (Note the power surge.) - We will take the average power from 600 to
700ns, and average over 4 clock cycles. - Average Power 305mW/4 gives 76mW
67Outputs Y15-Y8
A
B
10 Clock Cycles for Y to get A after NCLR goes
high.
68Outputs Y7-Y0
69Inputs A15-A8
70Inputs A7-A0
71Final FIFO Layout
See how much area is taken up by routing!
Super Buffer
A0
NCLR
D A T A F L O W
NPRE
CK
A15
VDD
Y0
GND
Y15
72Statistics (Working alone in the middle of the
summer)
- FIFO_PF
- DRC 55 seconds
- Extract 190s
- Quick simulation (20ns 520s)
- Long 30MHz simulation (500ns XXXs)
- Whole Chip 48 man hours.
73Design Statistics
- This does not include the time to write the
documentation. - The total project took somewhere between 40 to 60
hours. - This means that documentation can take a lot of
time. - The numbers do not add up but one can easily see
that the - time required to complete a step goes up at
the best linearly with - of transistors and at worst exponentially!
- You need to plan accordingly!
74DRC and Extract time vs. gate count
75Simulation time vs. gate count
76Design time vs. gate count
77How fast do I work?
Can you measure your output in transistors per
hour?
78Design Review
- Clock Rate30MHz
- Power76mW
- Area 1500mm x 1500mm2.25x106mm
- Power Density76mW/Area gives .33 3x10-7 W/mm2 .
No cooling required! - 2.3x10-7 W/mm2 no cooling
- 1.0x10-6 W/mm2 with expensive cooling
79Lessons Learned
- Design re-use is a faster method of design.
- Getting the circuit into a pad frame can take a
large amount of time. - Get pad frame done before you need it.
- Verilog simulation are very fast but give no
timing data unless it is built in. - The project will all ways take longer than
expected! (even if you plan for it!)