Annapolis Wildstar FPGA Board - PowerPoint PPT Presentation

About This Presentation
Title:

Annapolis Wildstar FPGA Board

Description:

Annapolis Wildstar FPGA Board Charles Ross Monica Chawathe – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 58
Provided by: moni2254
Category:

less

Transcript and Presenter's Notes

Title: Annapolis Wildstar FPGA Board


1
Annapolis Wildstar FPGA Board
  • Charles Ross
  • Monica Chawathe

2
Wildstar Board
3
Starfire Board
4
WildStar Board (Simplified)
2M
2M
2M
2M
1M
1M
Virtex 2000E 1
Virtex 2000E 2
Virtex 2000E 0
Host
1M
1M
2M
2M
2M
2M
LAD Bus
3 Virtex 2000E FPGAs, 12 Memories (20 MB)
5
Host
LAD Bus
6
StarFire Board (Simplified)
1M
1M
1M
Virtex 1000 1
Host
1M
1M
1M
LAD Bus
1 Virtex 1000 FPGA, 6 Memories (6 MB)
7
Memory Layout
  • Local
  • Always 32-bit words
  • Two on PE 1
  • Two on PE 2
  • Mezzanine
  • 32 or 64, depending on source (PEx / PE0)
  • Both address and word size
  • 4 between PE 1 0
  • 4 between PE 2 0
  • Latency 4 cycles

8
Mezzanine Memory
  • 32 vs 64 (Same memory)
  • Switch Modes
  • 00 Straight
  • 01 Crossed
  • 10 Lo Thru
  • 11 Hi Thru

Mem
Mem
64
32
PEx
PE0
9
PEx (1 and 2)
Right Local
Right Mezz
STUFF
Right
Left
Left Local
Left Mezz
LAD
10
PE0
PE1 Right Mezz
PE2 Right Mezz
STUFF
Right
Left
PE1 Left Mezz
PE2 Left Mezz
LAD
11
Clocks 4 of them!?
  • K, M, P, U
  • KClock LAD Transactions (K?)
  • MClock Memory Transactions
  • PClock Processing Clock
  • UClock User Clock
  • Okay, but why? What are they?

12
KClock LAD
  • PE ?? Host
  • 33MHz or 66MHz
  • 33MHz Easy to Place and Route
  • 66MHz 2X Host Bandwidth
  • Host and Chip must agree!!
  • Set in VHDL and Host Code
  • Clock is actually based on PCI Clock
  • Varies per host
  • Ours is approx. 33.23MHz / 66.46MHz
  • Asynchronous to all other clocks

13
MClock Memory
  • Speed of Memory IO
  • Both Local Mezzanine
  • User Selectable
  • 25MHz 133MHz Wildstar
  • 25MHz 100MHz Starfire

14
PClock Processing
  • Based on MClock
  • Divisor between 1-16
  • Slower than MClock (Or Equal)
  • Can Speed up Memory I/O
  • Decoupling may allow different Speeds
  • Increase M, increase Divisor
  • Ex Slow Component in Application (30MHz)
  • M30Mhz Divisor 1 ? P30MHz
  • M60Mhz Divisor 2 ? P30MHz
  • 2 Memory Accesses per Clock

15
PClock Processing (More)
  • Optional
  • We normally dont use it for ease
  • MClock is used Directly
  • Less Logic than PM/1
  • No need to jump Clock Boundaries
  • Chip must either
  • Not care what the ratio is
  • Know at compile what ratio will be

16
UClock User Clock
  • User Selectable
  • 0.32MHz 133MHz Wildstar
  • 0.32MHz 100MHz Starfire
  • We have never used it
  • 3 is plenty, isn't it?
  • Asynchronous to all other clocks

17
Hardware Components
  • Roll your own
  • Manual LAD addressing (33/66 Differ)
  • Manual Memory use Contention
  • Manual EVERYTHING!
  • CAN be very fast 140 MHz
  • Annapolis Supplied Components
  • MUCH Easier
  • Slower (Approx. 40-60 MHz)

18
LAD Bus
  • 33MHz / 66MHz Selectable
  • Changes the communication protocol
  • Amt of Latency, etc..
  • Component Addressing scheme
  • 0x0000-0x7FFF Component Within PE
  • Higher Bits Address Board and PE
  • Ignore them
  • unless you roll your own LAD code

19
LAD Bus (More)
  • The Addressing of the LAD bus
  • A lot like subnet masks in IP Networking
  • MASK
  • Which bits address the component
  • Which bits are intra-component
  • BASE
  • Where does this component begin
  • ADDRMASKBASE Are you talkin to ME?
  • ADDR(MASK) What address in me?
  • Examples
  • B 0x4800 M0x7F00 ? 0x4800 0x48FF
  • B 0x3200 M0x7C00 ? 0x3200 0x35FF

20
Inside the Chips
Your Application
Some Memory
Mem Mux
. . . . .
. . . . .
RegFile
LAD-Mem Bridge
Reset
Some Memory
Mem Mux
Clocks
. . . . .
LAD-Mem Bridge
LAD Mux
Annapolis Provided
User Provided
LAD
21
LAD-MUX
Your Application
Some Memory
Mem Mux
. . . . .
. . . . .
RegFile
LAD-Mem Bridge
Reset
Some Memory
Mem Mux
Clocks
. . . . .
LAD-Mem Bridge
LAD Mux
LAD
22
LAD-MUX
  • Gives LAD access to components
  • Bridges gap between IO Pins and Logical LAD
  • Handles Protocols for you
  • 66 and 33
  • ONE per chip

23
Reset
Your Application
Some Memory
Mem Mux
. . . . .
. . . . .
RegFile
LAD-Mem Bridge
Reset
Some Memory
Mem Mux
Clocks
. . . . .
LAD-Mem Bridge
LAD Mux
LAD
24
Reset
  • Allows Host to RESET the Chip
  • Causes clocks to destabilize momentarily
  • Causes chip to return to known init state
  • (If you write your VHDL right)
  • All Annapolis components are written right

25
Clocks
Your Application
Some Memory
Mem Mux
. . . . .
. . . . .
RegFile
LAD-Mem Bridge
Reset
Some Memory
Mem Mux
Clocks
. . . . .
LAD-Mem Bridge
LAD Mux
LAD
26
Clocks
  • Provides user access to
  • All 4 Clocks (or Clock x2)
  • When clocks are stable
  • DLL locked Signals
  • Clocks on a Virtex use DLLs
  • Delay-Locked Loop
  • not Dynamic Link Library
  • Shame on you windows users!

27
Register File
Your Application
Some Memory
Mem Mux
. . . . .
. . . . .
RegFile
LAD-Mem Bridge
Reset
Some Memory
Mem Mux
Clocks
. . . . .
LAD-Mem Bridge
LAD Mux
LAD
28
Register File
  • Provides host access to 1-D array of 32-bit
    registers
  • Size must be a power of 2
  • Can be used for
  • Ready The host says I can go now
  • Done Hey Host, I am done!
  • Small 32-bit IO The answer is 42!
  • Run time parameters Threshold is 63

29
LAD to Mem Bridge
Your Application
Some Memory
Mem Mux
. . . . .
. . . . .
RegFile
LAD-Mem Bridge
Reset
Some Memory
Mem Mux
Clocks
. . . . .
LAD-Mem Bridge
LAD Mux
LAD
30
LAD to Mem Bridge
  • Provides host with access to the memories
  • Mezzanine or Local Memories
  • 2 Kinds, 32 and 64
  • Transfers happen in bursts
  • 256 DWORDS for 32 bit memories
  • 512 DWORDS for 64 bit memories
  • (its all transparent to the user though)

31
Memory-Mux
Your Application
Some Memory
Mem Mux
. . . . .
. . . . .
RegFile
LAD-Mem Bridge
Reset
Some Memory
Mem Mux
Clocks
. . . . .
LAD-Mem Bridge
LAD Mux
LAD
32
Memory-Mux
  • Provide multiple clients with access to the
    memories
  • Arbitrates between clients
  • Priority
  • Number of the client decides priority
  • Maximum utilization
  • Might starve some clients
  • Fair
  • Round Robin
  • Wastes some cycles
  • Each Client gets 1/n

33
Memory Access
  • Address of DWORD or QWORD
  • Data_Out To Memory
  • Data_In From Memory
  • Write Direction of Request
  • Request I want memory
  • Acknowledge Okay!
  • Data_Valid 4/5 Cycle Delayed Ack (See Bugs
    Later)
  • 32 bit Memories Only
  • Low/High Enable This half is useful
  • 64 bit Memories Only
  • High/Low_Data_Valid 4/5 Cycle Delayed (Ack
    Low/High Enable)
  • 64 bit Memories Only

34
32-bit Memory Read
35
64-bit Memory Read
36
32-bit Memory Write
37
64-bit Memory Write
38
Others - Useful
  • RAM Blocks
  • Host and Client Access to on-chip memories
  • 256 32-bit words
  • Interrupts to host
  • Systolic Buses
  • 2 36-bit busses between PE1 and PE2
  • top and bottom
  • Bi-directional
  • Tri-state
  • PE0 Standard Buses
  • 2 2-bit busses between PE0 and Pex
  • Bi-directional
  • Tri-state

39
Others Useless
  • LED (there are 2 LEDs per Chip)
  • Red and Green
  • Cant see them
  • IO Card
  • 114 bit IO
  • We dont have one
  • Test Pins
  • 18 bits
  • No testing our board, please! )

40
Software API
  • Annapolis Supplied
  • Driver Functions
  • Open, Close, Set Clocks, DMA, Read, Write,
    Download Configurations, Interrupt, Readback,
    etc..
  • Convenience Functions
  • Interface code to theLad to Memory Bridges

41
Open/Close
  • Grabs the board exclusively
  • Uses kernel mutex
  • CAN do it in shared mode, but DONT
  • Can set LAD Speed as well
  • See Bugs Later

42
Chip Configuration
  • Programs a PE from a memory array containing the
    bitstream
  • x86 files
  • Can de-program as well
  • Why bother?
  • As long as everyone Plays nice
  • BE CAREFUL WHAT YOU PROGRAM!
  • if you program a PE with a bitstream that is
    corrupted, or not for the correct chip, or
    mangled in some way you can release the magic
    smoke from the chips!
  • 40,000 board!

43
Set Clock Speeds
  • UClock speed
  • MClock speed
  • and PClock divisor

44
Register IO
  • Reads/Writes to the LAD Address space
  • to communicate with anything plugged into a LAD
    MUX
  • Reset
  • Register Files
  • Etc.

45
Memory IO
  • for LAD to MEM Bridges
  • Abstracts the IO Bursts, addressing, etc.
  • Create Memory Objects
  • Read/Write/Copy/Set
  • Release

46
Others You Wont Need
  • Display (4 Char LCD on the board)
  • Interrupts
  • Temperature / Power
  • Readback / Singleshot
  • DMA
  • Versions / Hardware Config
  • Etc..

47
Tools
  • You write Host code (in c)
  • compile with gcc, etc.
  • Link in the libraries and such
  • You write Chip code (in VHDL)
  • Simulate and Verify with ModelSim
  • Synthesize with Synplify
  • Linux / Solaris / WinNT
  • Place and Route with Xilinx foundation tools
  • WinNT / Linux (with wine)

48
ModelSim
  • VHDL Simulation tool
  • Annapolis provides
  • Host simulation components
  • VHDL Description of the WHOLE board
  • LAD
  • Memories (Local Mezzanine)
  • Busses
  • Etc
  • You provide
  • VHDL to run inside the chip (May contain
    Annapolis components as well)
  • Talk to me if you want to use ModelSim to debug!

49
Synplify
  • Synplicity Inc.
  • Converts VHDL (or Verilog) into an EDIF
  • EDIF description of your program in terms of
    virtex parts (4 input LUTs, FlipFlops, Ramblocks,
    Etc)
  • Fast
  • 1-30 minutes

50
Place and Route
  • Maps to lower level components
  • Lays them out
  • Routes between them
  • Slow
  • 10 minutes 2 days
  • Provides a bitstream (.bit file)
  • directly converted to .x86 for config

51
Paths Environment
  • Need environment variables and path additions
  • add this to the end of your your .cshrcsource
    cs670/WildExamples/cshrc_additions
  • If you use bash, sh, zsh, etc..
  • Youre on your own!
  • Look at the file, figure it out!
  • OR
  • Use csh or tcsh!

52
Examples
  • cs670/WildExamples/csu_example
  • Basic CSU made example using only PE1
  • Copies 1Mb from Left? Right Local Mem
  • cs670/WildExamples/annap_example
  • All the Annapolis supplied examples
  • May need path adjusting, etc..
  • Not meant to work as is
  • Useful to get a feel for other stuff

53
Hints
  • Timing
  • Count MClock, and put it in a RegFile
  • Cycles / Freq Time
  • Host timing is too coarse
  • Start / Stop and Working / Done
  • Use a RegFile Easier than Interrupts
  • (Havent gotten them to work with LAD Mux)

54
Manuals
  • Ask Sanjay! )
  • 1 copy of our HUGE Starfire / Wildstar manuals
  • I have the original
  • You may use it near my desk
  • If it wanders from my cube
  • Broken Legs

55
HELP!
  • Bugs? - 99 correct is 100 Wrong
  • 1 Reread your VHDL and host code
  • Silly bugs are easy to make, and spot
  • 2 Simulate it
  • You can see the signals. It almost always agrees
    with the actual hardware
  • 3 Simulate again
  • No Really Simulate it!
  • 4 Look in the manuals
  • Helpful sometimes
  • 5 rossc_at_cs.colostate.edu

56
BUGS!!!!
  • Querying the LAD bus speed in host code will
    return 66MHz if the LAD Bus was EVER at 66MHz
    since last reboot even if it is CURRENTLY at
    33MHz!
  • DONT USE IT, EVER!
  • The Data_Valid Signals are WRONG! They appear to
    be delayed 5 cycles instead of 4 in the real
    code. They are correct in simulation.
  • Use a 4 cycle delay on (Req and Ack) Instead!
  • Use the simulation to ensure your delayed signal
    matches

57
Lets Look at it!
  • Lemme open emacs
  • VHDL
  • Host Code
  • Execution
  • Simulation
  • Little Wiggly Green Wires!
Write a Comment
User Comments (0)
About PowerShow.com