Title: An FPGAbased SDRAM Controller with Complex QoS Requirements
1An FPGA-based SDRAM Controllerwith Complex QoS
Requirements
2Overview
- Introduction
- motivation - the FlexFilm project
- problem definition
- related work
- A prioritized SDRAM controller with traffic
shaping - Simulation Setup Results
- Conclusion
3Motivation The FlexFilm Project
- Flexible high-end digital film processing
- up to 4K-Resolution (8.4 GBit/s) real-time
- 4096 x 3112 x 48 bit/pixel x 24 FPS
- multiple streams at once
- multiple algorithms on a single platform
- Solution FPGA-based reconfigurable platform
Switch
Interface PCI-Express
Processing Unit
External RAM 2x 128 MB 64 Bit DDR, 125 MHz
FlexWAFE FPGA
I/O
R A M
R A M
Bridge FPGA
Image Processing FPGA incl. CPU Xilinx Virtex II
Interface Host
I/O
4FlexWAFE Engine
- Data Path
- reconfigurable
- stream processing
- high bandwidth
- real-time
- PowerPC
- host control interface
- less computation intensive tasks
- parameter calculation
FPGA Xilinx Virtex-II Pro
Image Processing Data Path
5Memory Access
- External RAM
- onboard FPGA-Memory too small
- one 4K image 80 MByte
- FPGA 1 MByte
- SDRAM preferred
- size
- cost
external DDR-SDRAM
Xilinx Virtex-II Pro
Image Processing Data Path
6Memory Control
- SDRAM Controller
- high bandwidth required
- multiple channels
- Data Path and CPU access same memory
- different access patterns
external DDR-SDRAM
Xilinx Virtex-II Pro
Scheduling SDRAM-Controller
Image Processing Data Path
7Memory Control
- External SDRAM
- internal FPGA memoryto small
- high bandwidth required
- multiple channels
- data path and CPU access same memory
- Different access patterns !
external DDR-SDRAM
Xilinx Virtex-II Pro
SDRAM-Controller
Image Processing Data Path
8Different Memory Access Patterns
- Data Path
- regular access patterns
- prefetch possible
- long latency allowed
- compensated by buffers
- real-time operation -latency must be bounded
- QoS requirementguaranteed minimum throughput at
guaranteed maximum latency
- CPU
- irregular access patterns
- prefetch not possible
- stalls at memory access
- system performance loss
- buffers dont help
- short latency needed
- every nanosecond counts
- in case of real-time latency must be bounded
- QoS requirementminimum possible latency
9Related Work
- Streams and Known Access Patterns
- Imagine Dally, Rixner, Kaspasi et al.,Prophid
Meerbergen et al.,Phrabat Mishra - do not supply different memory access patterns
- Different Access Patterns
- Sonics Memmax Controller Sonics Inc.,
Weber,MediaTek Corp. Lee, Lin / Ciao-Tung
University Jen - QoS different service levels
- similar architectures, complex ASIC-based approach
10Our Approach
- Previous Work
- access prioritization
- different scheduler implementations
- SIPS 2003, Seoul
- DAC 2005
- additional flow control unit
- different flow control types
11Example
- 2 Access Streams
- Bursty CPU access
- 2 bursts, burst length 2 (2 accesses per
cacheline fill) - Periodic Video Access
- every 2 access cycles, maximum delay 2 cycles
bursty CPU access
2
1
3
4
5
6
7
periodic video access
12Example
- No QoS scheduling
- Result
- CPU access extended by 3 cycles
3 cycles
bursty CPU access
2
1
3
4
5
6
7
periodic video access
13Example
- Access Prioritization
- CPU requests are executed first
- Result
- no CPU access delay, but
- video access misses deadline
bursty CPU access
2
1
3
4
5
6
7
periodic video access
periodic video acc (orig.)
14Example
- Simple Flow Control
- 1 CPU access every 2 cycles
- Result
- CPU access delayed by 2 cycles
- video access meets deadline
2
bursty CPU access
2
1
3
4
5
6
7
periodic video access
periodic video acc (orig.)
15Example
- Complex Traffic Shaping
- allow 2 CPU accesses every 4 cycles
- known as Leaky Bucket in networking
applications - Result
- CPU access delayed by 1 cycle
- video access meets deadline
bursty CPU access
2
1
3
4
5
6
7
periodic video access
periodic video acc (orig.)
16SDRAM Controller Block Diagram
Prioritized Memory Access Scheduler
Flow Control
Read
High Priority
Write
DDR-SDRAM
Read
Std Priority
STOP
Write
read data bus
Data I/O
control flow
write data bus
data flow
17Simulation Environment
SDRAM Controller
CPU
Caches
Flow Control
CPU PowerPC, ARM pegwitdecode
DDR-SDRAM
I/O
Image Datapath 2048 x 1556 x 24 FPS 16 bit
grayscale Image Input and Output 3-level
Discrete Wavelet Transformation
DWT
18Simulation Results
Traffic Shaping n n consecutive requests Tø T
/ navg. clock cycles between two requests
- Evaluation
- Traffic shaping very efficient
- n gt 1 hardly more efficient than n 1reason
blocking read cache transactions
Required memory accesses per cacheline fill PPC
1, ARM 4
19Conclusion
- CPU and data path memory access
- show different access patterns
- result in complex quality of service (QoS)
requirementsfor the memory controller - SDRAM controller architecture
- supports QoS by
- requests prioritization
- flow control
- Results overall system speedup
Thank you very much !
20SDRAM Controller Ressources
- Configuration
- application ports
- high priority 1 read, 1 write 32 bit
- standard priority 2 reads, 5 write 32 bit
- 32 bit DDR-SDRAM, 4 banks
- flow control for high priority ports
- 125 MHz
- Ressources for Virtex2 Pro 50
21FlexFilm Processing Unit