Title: eMIPS Project Overview i'e''' What if your next computer was an FPGA
1eMIPS Project Overviewi.e...What if your next
computer was an FPGA?
- CPSC 462 Lab
- Gabe Knezek
- Texas AM University
- Parts by
- Alessandro Forin
- Microsoft Research, Redmond
2The eMIPS Workstation
- Motherboard Xilinx ML401 evaluation board for
the Virtex4 FPGA
3Outline
- Quick review of FPGA
- Introduction to reconfigurable FPGAs
- Soft processors FPGA
- Quick review of MIPS processor
- Putting the two together
- eMIPS
- Some things it lets you do
4FPGA reconfiguration
- Recent FPGAs from Xilinx let you reconfigure part
of the FPGA while the rest stays intact and
running
5FPGA reconfiguration
6FPGA Soft Processors
- Take a processor as described in RTL (Verilog,
VHDL) - Synthesize it to an FPGA
- Similar to what you did in Computer Architecture
7Advantages
- Extremely customizable flexible
- Can be upgraded and changed in the field
- Can be bundled with other hardware on the FPGA to
accelerate processing certain operations - Ex a DSP
8Disadvantages
- More expensive then a dedicated ASIC
- Usually slower due to FPGAs non-optimized
routing - For applications requiring use of FPGA for other
operations, can take up lots of space
9Example CPUs
- Usually RISC (MIPS) (as its much simpler then
x86 instructions to implement fast) - Otherwise, almost always ARM or PowerPC (but
usually embedded as a CPU FPGA) - Ex Our Spartan3 has 1920 slices
- Xilinx MicroBlaze
- Fast, 32-bit, MIPS, possible MMU / 1000-2500
slices - Xilinx PicoBlaze
- 8-bit, designed to be small / 100 slices
- Altera NiosII
10Review Simple MIPS
ALUop
ALU Control
ALUctr
3
func
RegDst
op
3
Main Control
Instrlt50gt
6
ALUSrc
6
Instrlt3126gt
Instructionlt310gt
nPC_sel
Instruction Fetch Unit
Rt
Rd
lt2125gt
lt1620gt
lt1115gt
lt015gt
Clk
RegDst
0
1
Mux
Imm16
Rd
Rs
Rt
Rs
Rt
ALUctr
RegWr
5
5
5
MemtoReg
busA
MemWr
Zero
Rw
Ra
Rb
busW
32
32 32-bit Registers
0
ALU
32
busB
32
0
Clk
Mux
32
Mux
32
1
WrEn
Adr
1
Data In
32
Data Memory
Extender
imm16
32
16
Instrlt150gt
Clk
ALUSrc
ExtOp
11Pipelineing
6 PM
Midnight
7
8
9
11
10
Time
30
40
20
30
40
20
30
40
20
30
40
20
- Sequential laundry takes 6 hours for 4 loads
12Pipelining
13Cool but?
- So you can synthesize a CPU to an FPGA.
- You can reconfigure parts of an FPGA on the fly.
- Now what?
- You do custom on-the-fly hardware parallelism and
acceleration!
14(No Transcript)
15Dynamically Extensible Processors
- Using an FPGA, we have realized a (MIPS)
processor that extends itself at runtime, using
Extensions that are safe for multi-user operating
systems - Applications
- Speedup execution using an Application-Specific
CPU (M2V) - Unobtrusively monitor (real-time) software (P2V)
- Loadable software debugging support (eBug)
- Load/Unload peripherals at runtime, minimizing
chip area - Load/Unload processor cores on demand
- First release now available for non-commercial use
16Optimizing the ISA with M2V
Hardware Designers
Software Developers
Profiling
Compiled Code
Top one or two Basic Blocks
Basic Blocks Implemented as Hardware Extensions
Original Binaries Modified to utilize Hardware
Acceleration
Same speed, half the area of hand-generated
Verilog code
17Application speedups (worst case)
- Binaries with Hardware Acceleration
- Extended instructions are inserted into the
Binaries. If the HW Extension is loaded the
instruction executes and skips the basic block.
Otherwise eMIPS interprets the instruction as a
NOP and executes the block.
Video Games
Spec2000
Real-Time
Other code Op78 sp,ra,10 New
Instruction Lw ra,10(sp) Original
Basic Jr ra
Block Addiu sp,sp,18 Other code
18Assertion Based Verification with P2V
- Use the IEEE-standard hardware Property
Specification Language (PSL) to verify C
(real-time) programs - Implement it using a simulator, or in
reconfigurable hardware - PSL-to-Verilog compiler creates Extensions from
PSL code - zero instrumentation code and zero overhead!
19Applications
- Security - Stack Smashing
- Implement propertynever (writing return)
- Undefeatable by software
- (unlike canaries)
- No slowdown of software
- Can afford to protect frequently called functions
20Stack smashing example
21Applications
- Debugging Threading Issues
- Ensure resources only modified by thread which
owns a lock - No timing change -gt no debugger induced timing
bugs
22Extensible Peripherals
- Use the eMIPS extension slot for I/O peripherals
- Safely load/unload peripherals on demand
- Saves area, forward-compatible, bug fixes,
- Flexible interface solves perf. and atomicity
issues
23eBug the extensible debugger
- Safe, in-process, JTAG-style software debugger
- Extensible in hw (watchpoints)
- Extensible in sw (communication protocols)
- Use P2V as a trigger