Memory Optimizations In High Level Synthesis - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Memory Optimizations In High Level Synthesis

Description:

Partitioning will reduce the number of point to point. connections and multiplexer's width. ... Input is the (Mach Suif) IR in which. Each opcode is of Suif ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 34
Provided by: fpga3
Category:

less

Transcript and Presenter's Notes

Title: Memory Optimizations In High Level Synthesis


1
Memory Optimizations In High Level Synthesis
  • By
  • Nitin K. Agarwal
  • Under the Supervision of
  • Dr. Preeti Ranjan Panda
  • Department of Computer Science, I.I.T. Delhi

2
Overview
  • Motivation Objective.
  • Related Work.
  • Behavioral VHDL generation.
  • Structural VHDL generation.
  • Partitioning Algorithms.
  • References.

3
ASSET Flow
Application Specification in C
HW/SW Partitioner
Functions mapped on Hardware
Part of My Project
Functions mapped to Software
C to VHDL
Compiler
Silicon Compiler
ASIC
ASIC
Processor
Memory
4
Input/output of C to VHDL Converter.
Functional Specification In C
Component Lib.
C to VHDL Converter (Structural RTL)
C to VHDL Converter (Behavioral)
Structural RTL VHDL
Behavioral VHDL
5
Assumption on Input C
  • No pointers.
  • No global variable should be used.
  • No aggregate data type is supported.
  • Only Int and Bool types are supported
  • (For behavioral generation).

6
Objective
  • Update the existing CtoVHDL converter from SUIF1
    to SUIF2 for behavioral VHDL generation.
  • Partition the floating registers into register
    files.
  • Generation of Structural VHDL Code.
  • Perform case studies to observe the gain.

7
Partitioning of Registers
RF 1
R1,R5, R3
RF 2
R2, R8
R4, R6 R7
RF 3
8
Before Partitioning
R1
R2
R3
R4
9
After Partitioning
RF With 2 read 1 Write Port
10
Related Work - SiliconC A Hardware Back End
For SUIF
C Code
SUIF IR
C to SUIF
EXIT 1
PORKY
Structural VHDL Code
SSA
BITSIZE
VGEN
11
SiliconC Different Stages.
  • EXIT 1
  • Creates single-entry, single-exit functions.
  • VHDL-generation stage to synthesize the function
    output port, becomes simple.
  • PORKY (Optimization pass)
  • Perform classical compilers optimizations
  • Static Single Assignment (SSA)
  • Each variable is assigned only once.
  • BITSIZE
  • Tries to find used bit width of each operand
  • VGEN
  • It generates structural VHDL code.

12
Interface of Core Co-proc
CORE ASIC
Data Bus
Processor
Output Ready
Signals for a particular protocol
Input Ready
Start
Wrapper
13
Behavioral Code Generation
C to SUIF 1 IR
PORKY
SUIF1 to SUIF2
C Spec.
Identify Bool Var.
SUIF2 to VHDL
VHDL Code
14
Partitioning
  • Input is the CDFG on which
  • Scheduling is done
  • FU binding is done
  • Output is the CDFG in which
  • Each register will be allocated to some RF
  • Assumptions
  • It would not change the schedule
  • Benefit
  • Partitioning will reduce the number of point to
    point
  • connections and multiplexers width.

15
Need of Writing Synthesizer
C to VHDL (Behavioral Code)
C Spec.
Behavioral Compiler
Net List
16
Complete Flow
C to MachSuif
Opcode Splitting
List Scheduling
C Spec.
Comp. Lib.
Partitioning Port Binding
Data Path Generation
Structural VHDL Code
FU Binding
Register Allocation
Control Part Synthesis
Implemented By me
17
Opcode Splitting
  • Input is the (Mach Suif) IR in which
  • Each opcode is of Suif virtual machine
  • Output is the IR in which
  • Each opcode is of Hw virtual machine
  • It splits the opcode of Suif virtual machine to
    the
  • primitive opcodes of Hw virtual machine.
  • One to many mapping is possible (Not implemented
    yet).

18
Component Library
  • Component library specifies
  • Resource Allocation
  • Interface of component.
  • Number of components.
  • Architecture of component
  • Module Selection
  • Mapping of FU to opcodes of Hw virtual machine.
  • Component Library is specified using MDES.
  • It is parsed using MQS.
  • Extracted information is put into the IR for the
    further passes.

19
Partitioning
  • Input is the IR on which
  • Scheduling is done
  • FU binding is done
  • Output is the IR in which
  • Each register will be allocated to some RF
  • Assumptions
  • It would not change the schedule
  • Benefit
  • Partitioning will reduce the number of point to
    point
  • connections and multiplexers width.

20
Partitioning On Values
  • Pros
  • Less interconnections
  • Cons
  • More registers
  • Scheduling and FU binding
  • should be done.
  • IR has to be converted into
  • SSA form.

Partitioning On Values
Port Mapping
Register Allocation Within RF
21
Partitioning On Registers
  • Pros
  • Less registers
  • Cons
  • More interconnections
  • Scheduling, FU binding
  • and Register Allocation
  • should be done

Global Register Allocation
Partitioning of Registers into RFs
Port Mapping
22
Example
  • Add V1, V2 -gt V5
  • Mul V5, V6 -gt V3
  • ..
  • ..
  • Sub V3 , V4 -gt V7
  • Mul V7, V8 -gt V2

Control Step i
Control Step i1
RF has 2 R 1 W Ports
23
Partitioning On Values
V1 V2 V3 V4
V5 V6 V7 V8
8 Registers , 8 Point to point connections
24
Partitioning On Registers
R1 R2
R3 R4
Reg. Alloc. V1, V7 -gt R1 V2, V8 -gt R2 V3, V5 -gt
R3 V4, V6 -gt R4
4 Registers , 12 Point to point Connections
25
Partitioning Algorithms
  • Assumptions
  • Do not change the schedule
  • Scheduling FU Binding should be done.
  • Iterative Version
  • Put as many registers as you can, in a single
    register file (using ILP).
  • Repeat the process till all the registers are
    allocated.
  • Concurrent Version
  • Tries to allocate all the registers
    simultaneously in some
  • Register files.
  • Objective Minimizes the number of RFs.
  • More complex but more optimal results

26
Port Mapping Algorithms
  • Assumptions
  • Scheduling, FU binding, Partitioning should be
    done.
  • It can be done on values. (Before Reg.
    Allocation).
  • Iterative Version
  • Take each control step and perform port mapping.
  • Objective Minimizes the interconnections
  • Concurrent Version
  • Takes all the control step simultaneously.
  • Objective - Minimizes the interconnections .

27
Partitioning Algorithm
  • It Tries to allocate all the registers
    simultaneously in some Register Files.
  • Pick each register and find out the list of
    candidate
  • RFs.
  • Chose one RF using Interconnection Cost Metric.

28
Algorithm.
  • Partitioning_concurrent_heuristic
  • For each register(R1) Do
  • reg_file_list genRegFileList(R1)
  • If (reg_file_list ! Empty) then
  • reg_file pickBestOne(reg_file_list)
  • else
  • create new reg_file
  • allocate(R1,reg_file)

29
Generate the Candidate RF List for Register
  • genRegRileList(Register R)
  • ret_list all available register files
  • For each cstep(C1) in which R is accessed
  • Prune the ret_list according to access in
  • C1
  • return ret_list

30
Example
Registers 1,2,3,4,5,6,7,8,9,10
C1 1,2,5,4,7,8
C2 5,6,2,7,9
C3 1,4,6,3,8,10
R10
R9
R9
R6
R3
R5
R8
R2
R4
R7
R1
31
Comments
  • Time Complexity
  • O(KN2)
  • Where K is number of control steps
  • N is number of registers
  • Heuristics can be applied on two places
  • Sequence in which individual register is taken
    for allocation.
  • When multiple registers files are available for
  • a single register.

32
References
  • SiliconC - C. Scott Ananian
  • Hardware backend for Suif
  • http//www.flex-compiler.lcs.mit.edu/SiliconC/
  • Embedded system group, I.I.T., Delhi
  • http//www.iitd.ernet.in/esproject
  •  Stanford compiler group.
  • http//www.suif.stanford.edu/
  • M. Balakirshnan et al., Allocation of Mutiport
    Memories in Data Path Synthesis, in IEEE Trans.
    on Computer Aided Design, 1988.

33
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com