The Microarchitecture of FPGA-Based Soft Processors - PowerPoint PPT Presentation

About This Presentation
Title:

The Microarchitecture of FPGA-Based Soft Processors

Description:

The Microarchitecture of FPGA-Based Soft Processors Peter Yiannacouras Jonathan Rose Greg Steffan University of Toronto Electrical and Computer Engineering – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 31
Provided by: looie2
Category:

less

Transcript and Presenter's Notes

Title: The Microarchitecture of FPGA-Based Soft Processors


1
The Microarchitectureof FPGA-Based Soft
Processors
  • Peter Yiannacouras
  • Jonathan Rose
  • Greg Steffan
  • University of Toronto
  • Electrical and Computer Engineering

2
Processors and FPGAs
  • Processors present in many digital systems

Processor
Custom Logic
  • Soft processors - implemented in FPGA fabric

3
Motivation for understanding soft processor
architecture
  • Soft processors are popular
  • 16 of FPGA designs use a soft processor
  • FPGA Journal, November 2003
  • This number has and will continue to increase
  • Soft processors are end-user customizable
  • Application-specific architectural tradeoffs
  • Can be tuned by designers

4
Dont we already understand processor
architecture?
  • Not accurately/completely
  • Accurate cycle-to-cycle behaviour
  • Estimated area/power
  • No clock frequency impact
  • Not in FPGA domain
  • Lookup tables vs transistors
  • Dedicated RAMs and Multipliers fast

5
Research Goals
  • Generate soft processor implementations
  • System for generating RTL
  • Develop measurement methodology
  • Metrics for comparing soft processors
  • Develop understanding of architectural tradeoffs
  • Analyze area/performance/power space

6
Soft Processor Rapid Exploration Environment
(SPREE)
7
Input Instruction Set Architecture (ISA)
Description
  • Graph of Generic Operations (GENOPs)
  • Edges indicate flow of data
  • ISA
  • Datapath

MIPS ADD add rd, rs, rt
FETCH
SPREE
RFREAD
RFREAD
ADD
RFWRITE
8
Input Datapath Description
  • Interconnection of hand-coded components
  • Allows efficient synthesis
  • Described using C
  • ISA
  • Datapath

Ifetch
Reg File
Ifetch
Reg File
SPREE
Mul
Data Mem
Mul

ALU
Shifter
Write Back
ALU
SPREE Component Library
9
Step 1.ISA vs Datapath Verification
  • ISA
  • Datapath
  • Components described using GENOPs

Verify
FETCH
SPREE
RFREAD
RFREAD
ADD
RFWRITE
10
Step 2.Datapath Instantiation
  • ISA
  • Datapath
  • Multiplexer insertion
  • Unused connection/component removal

SPREE
11
Step 3.Control Generation
  • ISA
  • Datapath

Control
Control
Control
Control
Mul
Reg File
Ifetch

Write Back
SPREE
ALU
Data Mem
12
Output Verilog RTL Description
  • ISA
  • Datapath

Verilog RTL
Control
Control
Control
Control
Mul
Reg File
SPREE
Ifetch

Write Back
ALU
RTL
Data Mem
13
Back-end Infrastructure
Benchmarks (MiBench, Dhrystone 2.1, RATES, XiRisc)
Quartus II 4.2 CAD Software
Modelsim RTL Simulator
Stratix 1S40
2. Resource Usage 3. Clock Frequency 4. Power
  1. Cycle Count

14
Metrics for Measurement
  • Area Equivalent Stratix Logic Elements (LEs)
  • Relative silicon areas used for RAMs/Multipliers
  • Performance Wall clock time
  • Cycle count clock frequency
  • Arithmetic mean across benchmark set
  • Energy Dynamic Energy (eg. nJ/instr)
  • Excluding I/O

15
Trace-Based Verification
  • Ensure SPREE generates functional processors

Trace
RTL
110100 101011 111101
Modelsim (RTL Simulator)
?
Compare
Benchmark Applications
Trace
?
MINT (Instruction-set Simulator)
110100 101011 111101
16
Architectural Exploration Results
17
Architectural Features Explored
  • Hardware vs software multiplication
  • Shifter implementation
  • Pipelining
  • Depth
  • Organization
  • Forwarding

18
Validation of SPREE Through Comparison to
Alteras Nios II
  • Has three variations
  • Nios II/e unpipelined, no HW multiplier
  • Nios II/s 5-stage, with HW multiplier
  • Nios II/f 6-stage, dynamic branch prediction
  • Caveats not completely fair comparison
  • Very similar but tweaked ISA
  • Nios II Supports exceptions, OS, and caches
  • We do not and save on the hardware costs

19
SPREE vs Nios II
faster
  • 3-stage pipe
  • HW multiply
  • Multiply-based
  • shifter

smaller
20
Architectural Features Explored
  • Hardware vs software multiplication
  • Shifter implementation
  • Pipelining
  • Depth
  • Organization
  • Forwarding

21
Hardware vs Software Multiplication
  • Hardware multiply is fast but not always needed
  • Wastes area (220 LEs) and can waste energy

22
Shifter Implementation
  • Shifters are expensive in FPGAs
  • We explore three implementations
  • Serial shifter (shift register)
  • Multiplier-based barrel shifter (hard multiplier)
  • LUT-based barrel shifter (multiplexer tree)

23
Performance-Area of Different Shifter
Implementations
faster
smaller
24
Pipeline Depth
  • Explored between 2 and 7 stages
  • 1-stage and 6-stage pipeline not interesting

F/D/R/EX/M
WB
2-stage
F/D
R/EX/M
WB
3-stage
F
D
R/EX/M
WB
4-stage
F
D
R/EX
EX/M
WB
5-stage
F
D
EX
EX/M
WB
R
EX
(new) 7-stage
25
Pipeline Depth and Performance
26
Pipeline Organization Tradeoff
4-stage (A)
F
D
R/EX/M
WB
4-stage (B)
F/D
R/EX
EX/M
WB
27
Pipeline Forwarding
F
D/R
EX
M
WB
  • Prevent stalls when data hazards occur
  • MIPS has two source operands (rs rt)
  • Four forwarding configuration are possible
  • No forwarding
  • Forward rs
  • Forward rt
  • Forward both rs and rt

28
Pipeline Forwarding
29
Summary of Presented Architectural Conclusions
  • Hardware multiplication can be wasteful
  • Multiplier-based shifter is a sweet spot
  • 3-stage pipelines are attractive
  • Tradeoffs exist within pipeline organization
  • Forwarding
  • Improves performance by 20
  • Favours the rs operand

30
Future Work
  • Explore other exciting architectural axes
  • Branch prediction, aggressive forwarding
  • ISA changes
  • VLIW datapaths
  • Caches and memory hierarchy
  • Compiler optimizations
  • Port to other devices
  • Explore aggressive customization
  • Add exceptions and OS support
Write a Comment
User Comments (0)
About PowerShow.com