Module 2 - PowerPoint PPT Presentation

About This Presentation
Title:

Module 2

Description:

readily available as IP in silicon. 4. Designing for 100 MHz. Multi-Chip ... Lumped-capacitance trace length: 3 inches max for a 1-ns transition time (7.5 cm) ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 82
Provided by: berni4
Category:
Tags: address | an | ip | module | trace

less

Transcript and Presenter's Notes

Title: Module 2


1
Designing for 100 MHz
2
1999 Designs Demand...
  • Higher system speed
  • Higher integration
  • smaller size, less power, better reliability
  • Lower cost
  • Shorter development time
  • Better product differentiation

3
Traditional Multi-Chip Boards
  • Discrete design components
  • CPU, memory
  • bus transceivers, PCI controller, FIFOs
  • Ethernet controller, Graphics accelerator, MPEG,
    DSP, etc.
  • programmable logic as glue and custom function
  • Advantages
  • well-documented sophisticated functions
  • readily available as IP in silicon

4
Multi-Chip Board Problems
  • Physical size
  • Power consumption and reliability
  • PC board signal integrity
  • Limited flexibility
  • prevents design modifications and upgrades
  • prevents product diversification
  • prevents product customization
  • Poor product differentiation
  • standard parts standard architecture

5
FPGA Advantages
  • Smaller size
  • Lower power consumption
  • Better signal integrity
  • fewer PC-board issues
  • Enhanced flexibility
  • easy modifications, upgrades, etc.
  • Enhanced product differentiation
  • proprietary architectures

6
FPGAs Users Want...
  • System clock rate of 100 MHz
  • gt100,000 gates
  • Efficient design methodologies
  • Availability of well-documented Cores
  • Reasonable cost

7
The FPGA Solution
4th Generation FPGALogicMemoryRouting
Delay-Locked Loop for Fast Clock and I/O
3.3 ns Synchronous Dual-Port SRAM
Multi-Standard Select I/O
500 Mbps SelectMAP Configuration
Temperature Sensing
8
Now the Challenge...
Design a 100 MHz system
  • Together, we can do it...
  • well supply the ingredients...
  • you use them intelligently
  • But dont forget...
  • the clock period is less than 10 ns !

9
Designing for 100 MHz.
  • Volts, Amps, and Watts
  • PCB signal distribution
  • chip inputs and outputs
  • power and thermal considerations
  • Ones and zeros
  • logic emulation
  • Bits and bytes
  • memory hierarchy

10
Moore Meets Einstein
2048 1024 512 256 128 64 32 16 8 4 2 1
Trace Length MHz
Clock Frequency
Inches per 1/4 Clock Period
65
70
75
80
85
90
95
00
05
10
Year
  • Speed Doubles Every 5 Years ...But the speed of
    light never changes

11
Volts, Amps, and Watts
  • PCB design issues
  • capacative loading
  • transmission lines and termination
  • Chip inputs and outputs
  • clock distribution and DLLs
  • I/O standards
  • Power and thermal considerations
  • temperature sensing diode
  • power supply decoupling
  • Configuration
  • new SelectMAP mode

12
Capacitive Loading
  • Capacitance slows outputs and increases power
  • output delay increase
  • 25 ps per pF of additional loading
  • output power dissipation increase
  • 11 µW per MHz per pF with 3.3-V swing
  • Sources of capacitance
  • 10 pF max for each device pin
  • 2 pF per inch for narrow traces ( 0.8 pF/cm )
  • 130 pF per inch2 for copper areas ( 20 pF/cm2)
  • IBIS files provide output impedance details

13
Transmission Lines
  • Some traces must be treated as transmission lines
    to minimize ringing
  • transmission line if round trip gt transition time
  • lumped-capacitance if round trip lt transition
    time
  • Signal delay on a PCB
  • 140 to 180 ps per inch ( 50 to 70 ps/cm)
  • Lumped-capacitance trace length
  • 3 inches max for a 1-ns transition time (7.5 cm)
  • 6 inches max for a 2-ns transition time (15 cm)

14
Terminated Transmission Lines Reflections and
ringing
Traditional Thevenintermination at the end
V
CC
100 ?
50 ?
100 ?
Dynamic termination at the end is better and
saves power
50 ?
50 ?
100 pF
Series termination at the source is best single
source and destination only!
22 ? 27 ?
50 ?
(50 ? Total)
15
On-Chip Clock Distribution
Clock
CLB
Data
IOB
  • Clock distribution introduces delay
  • larger chips suffer more clock delay

16
Clock Delay Problems
  • Clock delay increases clock-to-output times
  • Clock delay leads to unacceptable input hold time
  • set-up time is negative
  • Additional data delay can eliminate the hold time
  • set-up time becomes positive
  • but tolerance build-up widens the data-valid
    window

IOB Flip-Flop
Clock Required Data Valid (without
delay) Required Data Valid (with delay)
Q
D
Delay
Data
Clock Distribution Delay
Clock
17
DLLs Maximize I/O Speed
  • Clock-to-output time plus set-up time
    determinesthe I/O speed and data bandwidth
  • min clock period max clock-to-out max set-up
  • Traditional solution
  • use highly buffered, balanced clock trees
  • needed to reduce internal clock skew
  • cannot totally eliminate the delay
  • The Virtex solution
  • use a Delay-Locked-Loop ( DLL )
  • aligns the internal and external clocks
  • effectively eliminates the clock-distribution
    delay

18
Virtex Has 4 Independent DLLs
Clock
Error
Comparator
Delay
CLB
IOB
Data
  • DLLs adjust clock delay to align internal and
    external clocks
  • digital closed-loop control
  • 25 to 200-MHz range, 35-picosecond resolution

19
Fast Clock-to-Out With DLL
  • 160 MHz inter-chip data rate
  • 16-mA LVTTL
  • IOB register to IOB register

Virtex FPGA
Virtex FPGA
0.5 ns
D
Q
DLL
DLL
3.8 ns
1.9 ns
Clock
20
LVTTL Data Rate with DLL
  • 1.4 ns measured clock-to-output delay

Output standard LVTTL Fast 16mA (OBUF_F_16) Temp
100C, Vdd2.375V, Vcco3.3V Waveforms 1
CLKIN 2 DATA OUT (no DLL) 3 DATA OUT (DLL
deskewed) Timing w/o DLL w/ DLL r-gtr
r-gtf r-gtr r-gtf 3.9n 3.9n 1.4n 1.4n
21
Other DLL Functions
  • Double the incoming clock frequency
  • fast internal operation slow external clock
  • Clock mirroring to the PCB
  • Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16
  • Adjust clock duty cycle to 50-50
  • Create four quadrature clock phases
  • input four sequential bits per clock period

22
Duty Cycle Correction
  • 25 duty cycle in 50 duty cycle out

Virtex FPGA
1X
DLL
25 MHz 25 Duty Cycle
25 MHz 50 Duty Cycle
23
Clock Doubling and Mirroring
  • Clock mirror with less than 100 ps skew
  • simplifies PCB clock distribution

Virtex
SDRAM
74 MHz 1
DLL 1
37 MHz
SystemClock
Exactly Aligned
1 Input Load
74 MHz 2
DLL 2
74 MHz Internal
37 MHz Internal
Zero-Delay Internal Clock Buffer
Actual HDTV Customer Example
System Clock
SDRAM
Inside FPGA
Inside FPGA
24
Precise Clock Mirroring
  • 2x system clock for board use

Virtex FPGA
2X
DLL
66MHz Clock
132 MHz Clock
25
Clock Division
  • Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16
  • maintain synchronous edges

CLKIn 200 MHz
CLKout 200 MHz
CLKDV 12.5 MHz
26
Multi-Standard SelectI/O
GTL
2.5V SSTL
MicroProcessor
SRAM
1.8V
SDRAM
SDRAM
5V Tolerant
FLASH
Mixed Signal
5V
3.3V LVTTL
Busses/Backplanes(3/5V PCI, ISA, GTL)
DSP
27
Mix Match Output Standards
  • User-supplied voltages determine output swing
  • 3.3 V, 2.5 V, 1.5 V
  • one voltage per bank
  • a bank is half of a chip edge
  • Output characteristics are programmable on a
    per-pin basis
  • push-pull or open-drain
  • LVTTL drive strength
  • 2-mA to 24-mA sink and source current
  • LVTTL Slew rate

28
Mix Match Input Standards
Internal Reference
  • Internal or user-supplied threshold voltage
  • selectable on a per-pin basis
  • one user-suppliedthreshold voltage per bank
  • Programmable over-voltage protection
  • 5-V tolerant or diodeclamp to VCCO
  • selectable on a per-pin basis

VREF
Input
Input
Input
Input
Input
Input
VREF
29
SSTL Clock-to-Out With DLL
  • 200 MHz inter-chip data rate
  • SSTL 3, Class II
  • IOB register to IOB register

Virtex FPGA
Virtex FPGA
0.3 ns
D
Q
DLL
DLL
2.8 ns
1.9 ns
Clock
(Stub Series Transceiver Logic)
30
SSTL Data Rate with DLL
  • 1.3 ns measured clock-to-output delay
  • much lower noise than LVTTL

Output standard SSTL 3 Class 2 (OBUF_SSTL3_II) T
emp100C, Vdd2.375V, Vcco3.3V,
Vtt1.5V Waveforms 1 CLKIN 2 DATA OUT (no
DLL) 3 DATA OUT (DLL deskewed) Timing w/o
DLL w/ DLL r-gtr r-gtf r-gtr r-gtf 3.5n
3.8n 1.1n 1.3n
31
From FPGA to System ComponentRedefining the
FPGA
Cache SRAM (Mbytes)
Chip 1
Chip 1
SDRAM (133MHz)
LVCMOS
x2 CLK
x1 CLK
Low Voltage CPU
SSTL3
LVTTL
GTL
High Speed System Backplane
"Virtex moves FPGAs from glue to system
component - Ron Neale, EE
32
Power and Thermal Issues
  • Power and heat are serious concerns
  • All CMOS power consumption is dynamic
  • proportional to VCC2
  • proportional to capacitance
  • proportional to frequency
  • Virtex conserves power
  • 2.5-V supply voltage
  • small geometries and short interconnects reduce
    capacitance

33
Virtex Power Consumption
  • Virtex is designed to conserve power
  • 100 MHz 16-bit counters
  • 12.5 MHz average transition rate
  • 6.5 mW per counter including clock distribution
  • 100 MHz 8-bit counters
  • 25 MHz average transition rate
  • 5 mW per counter including clock distribution

34
Thermal Management
  • Temperature-sensing diode
  • matched to maxim MAX 1617 A/D
  • programmable alarms
  • similar to the Pentium II solution

Virtex FPGA
DXP
SBMCLK
Maxim MAX1617
SBMDATA
DXN
ALERT
35
Power Supply Decoupling
  • CMOS power-supply current is dynamic
  • current pulse every active clock edge
  • Peak current can be 5x the average current
  • instantaneous current peaks can only besupplied
    by decoupling capacitors
  • Use one 0.1 µF ceramic chip capacitor for each
    power-supply pin
  • low L and R are more important than high C
  • double up for lower L and R if necessary
  • use direct vias to the supply planes, close to
    the power-supply pins

36
Virtex Configuration
  • New byte-wide SelectMAP mode
  • up to 528 Mbps at 66 MHz
  • simple handshake protocol
  • up to 400 Mbps at 50 MHz
  • no handshake required
  • Configuration bit-stream length
  • 0.5 Mbits to 6.1 Mbits

Control Logic (EPLD)
Busy
CS
Address Configuration EPROM
Data
WE, CS
Virtex FPGA
37
Volts, Amps, and Watts Recap
  • PCB design issues
  • minimize capacitance for higher speed
  • terminate transmission lines to reduce ringing
  • Chip inputs and outputs
  • use DLLs to maximize I/O bandwidth
  • use SelectI/O to interface with different
    standards
  • Power and thermal considerations
  • use the sensing diode to manage chip temperature
  • decouple the power supply well
  • Configuration
  • configure faster with the SelectMAP mode

38
Designing for 100 MHz.
  • Volts, Amps, and Watts
  • PCB Signal Distribution
  • chip Inputs and Outputs
  • power and Thermal Considerations
  • Ones and zeros
  • logic Emulation
  • Bits and bytes
  • memory hierarchy

39
Spending the 10 ns Budget
  • Fast logic requires fast function generators
  • signals often pass through several function
    generators
  • Routing delays must also be kept short
  • there are routing delays between every function
    generator
  • Arithmetic delays are important
  • carry chains often create critical paths

40
You Dont Have To Be An Expert
  • You dont have to be an FPGA architecture expert
    to implement high-performance designs
  • the benefits of a good architecture are automatic
  • all the logic goes faster
  • software provides easy access to the features
  • You can achieve high-performance only with a good
    FPGA architecture
  • a good FPGA empowers its users
  • Youll design better if you know the architecture
  • matching your design style to the available
    features increases performance and/or lowers cost

41
Virtex CLB
  • Logic and arithmetic delay reduction demands
    improvements in the CLB
  • Virtex CLB is divided into two slices, each with
  • 2 function generators
  • 2 flip-flops
  • 2 bits of carry logic

Carry
Carry
Fnct Gen
Fnct Gen
Carry
Carry
Fnct Gen
Fnct Gen
42
Fast Function Generators
  • Each function generator emulates 2 to 3 levels
    of logic
  • a 10-level logic path typically requires 3 to 5
    Function Generators in series
  • at 100 MHz, they must be less than 2 ns each
    including the routing
  • Virtex has 0.6-ns function generators
  • leaves 1.4 ns for each route

43
Connecting Function Generators
  • Some functions need several function generators
  • F5 MUXs connect pairs of function generators
  • functions with 5 to 9 inputs
  • F6 MUXs connect all 4 function generators
  • functions with 6 to 17 inputs

Fnct Gen
Fnct Gen
F5
F5
F6
Fnct Gen
Fnct Gen
44
Fast Local Routing
  • Local routing provides fast interconnects
  • in a CLB, Function Generators connect with
    minimal routing delays
  • fast paths between adjacent CLBs increases
    flexibility

Carry
Carry
Fnct Gen
Fnct Gen
Carry
Carry
Fnct Gen
Fnct Gen
Carry
Carry
Fnct Gen
Fnct Gen
Carry
Carry
Fnct Gen
Fnct Gen
45
Use Pipelining for Speed
  • Shorter clock periods means doing less each
    period
  • create a pipeline structure
  • pipeline stages operate concurrently
  • more functions are done at the same time
  • throughput increases
  • All function generators have output flip-flops
  • most pipeline support is free

46
16-Bit Pipeline in One LUT
  • In directly cascaded pipelines the flip-flopsare
    not free
  • One SRLUT can implementup to 16 bits of delay
  • shift data in and select the appropriate tap

Delay Select
Output
16-Bit Shift Register
Input
47
Fast Logic Needs Fast Routing
  • Our typical design with 3 to 5 CLBs needed an
    average routing delay of 1.4 ns or less
  • the Virtex routingarchitecture deliversthis
    performance
  • Delay is independentof direction
  • dependablyshort delays

48
Go Farther, Faster
  • Virtex achieves its speed through a hierarchy of
    highly buffered routing resources
  • wires span 1, 2, or 6 CLBs
  • The Virtex routing architecture is designed for
    large arrays
  • todays FPGAs are big but tomorrows will be
    even bigger
  • Virtex is designed to maintain its performance
    even in very large arrays

49
No Routing Congestion
  • For high-speed applications, routing must be
    dependably fast
  • not just capable of being fast
  • In the past, high device utilization has caused
    routing congestion
  • critical nets might be forced to meander
  • Virtex minimizes these problems
  • abundant resources prevent congestion

If it needs to be fast, it will be fast
automatically!
50
Built-in Tri-State Busses
  • Bi-directional busses are supported directly by
    tri-state buffers built into each CLB
  • two drivers per CLB
  • segmentable every four CLB columns

CLB
CLB
CLB
CLB
CLB
51
Arithmetic A Special Case
  • Adders, accumulators, counters, and comparators
    all depend on carry chains
  • Carry-chain logic is usually much deeper than the
    rest of the design
  • 32 levels for a 16-bit ripple adder
  • too deep to use function generators at 100 MHz
  • arithmetic delays would limit performance
  • Dedicated carry logic provides the desired speed
  • 16-bit adders can operate at up to200 MHz
    register-to-register

52
Wide Arithmetic
  • 64-bit adders would require 128 levels of logic
  • expensive complex carry schemes would be needed
    to preserve performance
  • Virtex minimizes the carry propagation delay
  • 100 ps per bit pair
  • zero routing delay between CLBs
  • Minimal performance loss for each extra bit

16-bit adders operate at up to 200 MHz 64-bit
adders operate at up to 135 MHz
53
Efficient Virtex Multipliers
  • Cascade vs. tree structure
  • cascade simpler and smaller
  • tree is faster
  • Virtex gives the best of both worlds
  • as fast as a tree
  • smaller than a cascade
  • 160 MHz clock rate for pipelined 16 x 16
    multiplier

Cascade Tree Virtex Tree
Delay
4 x 4
8 x 8
16 x 16
Cascade Tree Virtex Tree
Number of CLBs
4 x 4
8 x 8
16 x 16
54
Fast Address Decoders
  • Wide address decoderscould slow operation
  • wide AND gates withinvertable inputs
  • Virtex carry-chain MUXscan act as AND gates
  • combine functiongenerator ANDs
  • 64-bit decoders operateat up to 155 MHz

0
1
0
0
1
0
0
1
0
0
1
0
1
55
Speed Is Never Wasted
  • You can never have too much performance
  • excess performance can always be traded for size
    and cost reduction
  • Replace single-cycle functions with smaller
    multi-cycle versions
  • a 2-cycle multiplier is half the cost of a
    single-cycle multiplier

Reduce costs by designing down to the performance
you need
56
Creating a High-Speed Clock
  • Logic sometimes needs to operate faster than the
    available clock
  • multiple RAM accesses in a single cycle
  • low-speed PCB clock distribution for power or
    noise reduction
  • Virtex DLLs can double and redouble incoming
    clocks

2X
2X
DLL1
DLL2
45 MHz
90 MHz
180 MHz
57
Optimized for the Future
  • Deep sub-micron technology permits larger and
    larger array sizes
  • poses new circuit-design challenges
  • changes the rules of FPGA architecture
  • Across-chip routing is the most vulnerable
  • could easily limit design performance
  • Virtex is designed for long-term growth
  • even long, across-chip routes will remain fast
  • Virtex is tomorrows FPGA today!

58
10 ns is Long Enough
  • Virtex CLBs can implement relatively complex
    functions in 10 ns
  • 0.6 ns per 4-input function generator
  • Virtex offers fast interconnections
  • even across-chip when fully utilized
  • fast tri-state buses
  • Support for very fast arithmetic operations
  • 16-bit adders at 200MHz

59
Implement Designs Automatically
  • You dont have to be an FPGA wizard to use Virtex
  • Virtex is optimized for automated implementation
  • uniform structure
  • efficient mapping/synthesis
  • ample routing
  • simple placement and no congestion
  • predictable performance
  • effective synthesis
  • IP cores speed design even more
  • validated functionality with guaranteed
    performance

60
Designing for 100 MHz
  • Volts, Amps, and Watts
  • PCB signal distribution
  • chip inputs and outputs
  • power and thermal considerations
  • Ones and zeros
  • logic emulation
  • Bits and bytes
  • memory hierarchy

61
100 MHz Memory
  • Virtex memory operates up to 200 MHz
  • High-speed memory has two benefits
  • data storage
  • work-in-progress
  • input/output buffers, FIFOs
  • accelerating complex functions
  • store pre-computed values in look-up tables

62
Data Storage Hierarchy
  • Virtex supports 3 levels of memory hierarchy
  • On-chip SelectRAM
  • small-to-medium memories
  • 0.6-ns read access time
  • On-chip Block SelectRAM
  • larger memories
  • true dual-ported operation
  • 3.3-ns read access time
  • Fast SelectI/O interfaces to external RAM
  • DLL boosts memory bandwidth

63
SelectRAM
  • SelectRAM uses CLB LUTs as user memory
  • 16-deep RAMs
  • 32-deep RAMs
  • 16-deep dual-ported RAMs
  • 16-deep shift registers
  • Cascadable for larger memories
  • 128 or more words deep
  • uses logic resources for expansion

64
Block SelectRAM
  • Up to 32 dual-ported 4096-bit RAM Blocks
  • synchronous read and write
  • True dual-port memory
  • each port has full read and write capability
  • different clocks for each port
  • Configurable aspect ratio
  • trade width for depth
  • 4096 x 1 bit to 256 x 16 bits
  • separate configurations for each port
  • Dedicated routing for memory expansion

65
High-Speed Memory Interfaces
  • SelectI0 and DLLs together provide fast access to
    many types of external memory
  • Xilinx currently offers two reference designs
  • fully synthesized
  • automatic placement and routing
  • SDRAM up to 125 MHz
  • ZBTRAM up to 143 MHz

(Zero Bus-Turn-around)
66
Input/Output Data Buffers
  • High-performance systems need data buffers to
    decouple internal operation from I/O activity
  • I/O may be sporadic (burst-mode busses)
  • I/O may be faster or slower
  • I/O may be wider or narrower
  • I/O buffers can take several forms
  • dual-ported RAMs
  • ping-pong buffers
  • FIFOs

67
Dual-ported I/O Buffers
  • Block SelectRAM is ideal for I/O buffers
  • dual-ported operation
  • independent clocks and controls
  • bridges between clock domains
  • simultaneous read and write
  • port-specific aspect-ratio control
  • built-in rate/width conversions
  • SelectRAM provides similar benefits on a
    smaller scale

68
Ping Pong Buffers
  • Ping-pong buffers are pairs of blocks that
    alternate between input and processing
  • SRLUT for small buffers
  • self-addressing input
  • 0.6-ns read access
  • Larger buffers can usethe dual-ported Block RAM
  • one address bit alternatesread/write areas
  • 3.3-ns read access


Read Address

16-Bit Shift Register
Output

16-Bit Shift Register
Select
Input
69
Small FIFOs in SRLUTs
  • Small FIFOs can be implemented in SRLUTs
  • word count addresses the output data
  • increment and enable SRLUT to Push
  • decrement to Pop
  • enable only for both
  • 16-Byte FIFO in 4 CLBs
  • 16 x 16 in 6 CLBs
  • 200 MHz
  • Expandable for deeperFIFOs

Pop
Down Word Counter Up


Push
Output
16-Bit Shift Register
Input
70
Large FIFOs in Block RAM
  • Large FIFOs can use the dual-ported block RAM
  • add read and write address counters
  • Asynchronous push and pop
  • Different port sizes give rate-for-width
    conversion
  • Block RAM FIFOs can operate at up to 170 MHz
    including flag logic

Input
Output
Data
Data
Block SelectRAM
Counter
Counter
Addrs
Addrs
WE
Pop
En
En
Control Logic
Full
Empty
Push
71
Pre-computing for Speed
  • Some functions are too complex for 10-ns logic
    implementation
  • pipelining is not always possible
  • An alternative is to pre-compute all the possible
    results and store them in memory
  • select a result according to the inputs
  • Function time is independent of complexity
  • 0.6 ns SelectRAM access time
  • 3.3 ns Block SelectRAM access time
  • The function table can be smaller than the logic

72
Multiplication By A Constant
  • Sometimes, data has to be scaled
  • multiplied by a constant value
  • A full multiplier is too expensive
  • it can multiply by a variable
  • unnecessarily general and too complex
  • Storing all multiples of the constant is a
    better alternative
  • smaller and much faster

Constant
Multiplier Array
Scaled Data
Input
Product Table
Scaled Data
Input
73
16-bit Scaler
  • A 216-word product table is impractical
  • partition the input into nibbles
  • use 16-word LUTs for nibble products
  • combine the partial products in adders
  • Roughly half the CLBs of a full multiplier
  • for a 16-bit Coefficient36 CLBs vs.62 CLBs
  • Pipeline the addersfor extra speed

Input
x4096
LUT
x256
Scaled Data
LUT
x16
LUT
LUT
74
Changing the Constant
  • The SRLUT mode can be used to update the table
  • push-only stack
  • last 16 bits loaded define the table
  • A simple accumulatorcomputes all productsof a
    new constant


Input
Output
16-Bit Shift Register
Reg- ister
Reg- ister
Constant
Clear
Load
Change Constant
75
Large Function Tables
  • Larger functions can be implemented in the Block
    SelectRAM
  • 12-input functions
  • micro-coded state machines
  • Data tables can also be implemented
  • sine/cosine tables for DSP, for example
  • dual-ported access gives the sine and cosine
    simultaneously
  • a simple address offset gives 90º phase shift for
    accessing sine and cosine from a single table

76
Block RAM/ROM Creation
  • CORE Generator software creates RAMs and ROMs
  • simple GUI interface
  • Initialization file is loaded into RAMs and
    ROMs at configuration time

77
Memory Summary
  • Virtex has two kinds of internal memory
  • distributed SelectRAM for small RAMs
  • Block SelectRAM for larger RAMs
  • SelectRAM
  • 0.6 ns read access time
  • 16- and 32-word RAMs
  • 16-word dual-ported RAMs
  • 16-word shift registers
  • sequential write/random-access read
  • FIFOs, pipelining, LUT functions, etc...

78
Memory Summary
  • Dual-ported 4096-bit Block SelectRAM
  • 3.3 ns read access time
  • true dual-ported operation
  • both ports are read/write
  • ports can be clocked asynchronously
  • configurable aspect ratio
  • 4096 x 1 bit to 256 x 16 bits
  • configure ports differently for width/rate
    conversion
  • High-speed SelectI/O access to external RAM

79
Designing for 100 MHz
  • Volts, Amps, and Watts
  • DLLs and flexible I/O standards
  • fast inter-chip communication
  • simple rules for good signal integrity
  • Ones and zeros
  • fast logic and fast interconnect
  • dependable high performance
  • Bits and bytes
  • distributed SelectRAM
  • dual-ported Block SelectRAM

80
The Virtex Family
  • The complete Virtex Data Sheet is on your AppLinx
    CD-ROMand at www.xilinx.com/partinfo/virtex.pdf

81
Designing for 100 MHz
Write a Comment
User Comments (0)
About PowerShow.com