DFL Language Training - PowerPoint PPT Presentation

1 / 151
About This Presentation
Title:

DFL Language Training

Description:

Software Version v2.2 Electronic Product Design Design Flow Time-to-Market Raising the abstraction level Code compactness Algorithmic description FIR filter 100 lines ... – PowerPoint PPT presentation

Number of Views:274
Avg rating:3.0/5.0
Slides: 152
Provided by: ddon7
Category:
Tags: dfl | dect | language | training

less

Transcript and Presenter's Notes

Title: DFL Language Training


1
Training Software Version v2.2
2
Training Overview
  • Key Concepts
  • Edit and Compile Source
  • Create Architecture
  • Map to Architecture
  • Schedule Operations
  • Build the RT-Level
  • Verify the Design
  • Create and Use a User Library
  • Supported C subset

3
Key Concepts
  • Edit and Compile Source
  • Create Architecture
  • Map to Architecture
  • Schedule Operations
  • Build the RT-Level
  • Verify the Design
  • Create and Use a User Library
  • Supported C subset

4
Electronic Product Design
High-Complexity Applications
Time-2-Market
Time-2-Profit
Power-Efficient, High-Performance,
Cost-Effective, Flexible Architectures
Low-Cost
Low-Power
Deep-Sub-Micron Silicon Assembly
5
Design Flow
Algorithm
Architecture
RT-level Synthesis
Abstraction Levels
architecture
Gates
Layout
6
Time-to-Market
  • Raising the abstraction level
  • Code compactness
  • Algorithmic description FIR filter
  • 100 lines of C code
  • RT-level description FIR filter
  • 5,200 lines of HDL
  • Blackbox
  • Better simulation performance
  • Easier design transfer and re-use

BEHAVIOR ( C SUBSET ) RT-LEVEL
7
Flexibility
  • Optimal area for application
  • Low-power design
  • More processing power/throughput
  • Same starting point
  • FPGA
  • ASIC
  • cheaper custom solution

BEHAVIOR ( C SUBSET ) RT-LEVEL
8
Flexibility Example
RT-level synthesis
Behavioral synthesis
Reduction 50
Reduction 13
9
Application Area
  • Data path elements are shared over clock cycles
  • Moderate decision making is involved

Controller FSM
Control/ Flags
Control
Data Path Cores Register Files
RAM/ROM Addr/Data Regs
Address/ Data
10
Typical Applications
  • ASSP Application Specific Standard Product
  • Relatively complex data/signal processing
  • GSM, DECT, wireless LAN
  • Speech recognition, compression, processing
  • JPEG, image processing
  • Portable medical electronics
  • ...

11
Design Constraints
  • Design considerations
  • Algorithm level
  • Frame rate
  • Frame 1 execution of your algorithm
  • 1 frame consumes 1 value for each input, produces
    1 value for each output
  • e.g. GSM LTP 1 data frame (160 samples) every 20
    ms
  • Maximal latency delay on signal caused by the
    algorithm
  • RT-level
  • Clock rate
  • e.g. 50 MHz clock
  • Cycle budget Clock rate / Frame rate
  • The amount of clock cycles available to execute
    one frame
  • e.g. for GSM LTP 4000 cycles

12
Target Processor Architecture
branch logic
ALU
MULT
IN
OUT
RAM
ROM
13
Structure of a Cluster
14
Internal Design Flow
15
Internal Design Flow(2)
16
Defaults, Options and Pragmas
  • Increasing order of priority
  • Tool defaults
  • Option settings (if any)
  • Pragmas for specific cases

17
Hardware Libraries
  • Default library
  • Supplied by Frontier
  • Two versions - for Xilinx FPGA flow
  • - for ASIC flow
  • Sufficient to map all supported C operators
  • User libraries
  • Existing hardware blocks
  • Custom hardware blocks for better
    speed/area/power trade-off

18
Project organization
artd_cache
19
  • Key Concepts
  • Create Architecture
  • Map to Architecture
  • Schedule Operations
  • Build the RT-Level
  • Verify the Design
  • Create and Use a User Library
  • Supported C subset

Edit and Compile Source
20
Key Concept
  • In a first step, ART Designer will convert in an
    intelligent way your behavior description of
    your algorithm into an internal representation.
  • intelligent -gt it checks whether the code is
    C/C compliant, if there are non-synthesizable
    constructs present
  • You can describe your algorithm using C/C
    optionally enriched by ART Library fixed-point
    types in C-style or SystemC-style.
  • To use ART Library types
  • include ltfxp.hgt / C/C
    version/
  • include ltsc_fxp.hgt /SystemC
    version/

21
C Compiler optimizations
  • Dead code elimination
  • Constant propagation
  • only for temporary expressions with constants
  • b a 2 3 gt b a 5

22
C Compiler Options (1)
Specification of the include search path Multiple
entries are separated by semicolon Specification
is relative to project subdirectory
Example /home/john/include..MY_INCLUDES/incl
ude
Macros to be defined/undefined Semicolon
separated Example for Defines
FXPTRACEMY_DEFINE1
Enables C test bench generation I/O can be read
in binary or decimal format
Saves the source file obtained after CPP
processing
Enables strict ANSI C compliance
23
C Compiler Options (2)
  • Data flow analysis
  • identifies and accurately represents
  • the parallelism of the C-code by
  • - determining the exact data
  • - dependencies between the variables
  • to achieve - better performance
  • - optimal use of target
    processor

24
Data Flow Analysis
void calc_address(const T_AD i, const T_AD
j, T_AD address) address const1i
const2j void mydesign() ... for (i0
ilt16 i) for (j0 jlt16 j)
calc_address(i,j,address) a Aaddress
.. // calculation of b Aaddress
b
DFA will check whether or not write address is
different from read address for every
iteration! This will determine how much loop
folding can be performed.
25
Pragmas in C Source
  • pragma OUT ltvar_name_1gt ltvar_name_2gt
  • Used to indicate function arguments that are
    strictly outputs
  • This is not checked by the compiler !
  • Example

26
  • Key Concepts
  • Edit and Compile Source
  • Map to Architecture
  • Schedule Operations
  • Build the RT-Level
  • Verify the Design
  • Create and Use a User Library
  • Supported C subset

Create Architecture
27
Key Concept
  • In this step, you instantiate the hardware
    resources that you need to define the target
    architecture you want to use
  • You only have to instantiate the central elements
    of hardware clusters (auxiliary resources like
    register files, muxes and tristate buffers are
    automatically generated at a later step)
  • Cores (ALU, MULT, )
  • Memories (RAM, ROM, )
  • Ports (INPORT, OUTPORT)
  • You also instantiate one type of controller

28
Architecture Model

29
Instantiating Resources
  • Resources can be instantiated from
  • The default library artd_library (for ASIC flow)
    or artd_xilinx_library ( for Xilinx FPGA flow)
  • A user library
  • The libraries must have been selected in the
    Create Architecture options

30
Resources in the Default Library (1)
  • Cores
  • alu, alusat,
  • mult, multp, mac2, mac3
  • acu
  • Memories
  • rom, ram
  • romctrl
  • dpram_r_w, dpram_r_rw, dpram_w_rw, dpram_rw_rw
  • dprom, dpromctrl

31
Resources in the Default Library(2)
  • Ports
  • inport, inport_nohs, inport_noaddr,
    inport_noaddr_nohs
  • outport, outport_nohs , outport_noaddr,
    outport_noaddr_nohs
  • Controllers
  • mbc_11, mbc_12, mbc_22, mbc_23

32
Pragma Syntax Table
  • I integer (e.g. 10)
  • IL integerlist (e.g. 10,20,6 )
  • IW integer or wildcard (e.g. 10 or or _)
  • C quoted string (e.g. "acu")
  • CL quoted stringlist (e.g. "in18","in210")
  • EXPR expression (e.g. __)

33
Pragmas (1)
  • instantiate(C, C, C)
  • instantiate(libraryName, resourceName,
    instanceName)
  • This pragma instantiates a resource defined in a
    library
  • The default library is called artd_library or
    artd_xilinx_library
  • Multiple instances of the same resource can be
    created
  • EXAMPLE
  • instantiate("artd_xilinx_library","multp","multp_1
    ")
  • instantiate("artd_library","mbc_12","ctrl")
  • instantiate(my_own_library",multiplier",mymult"
    )

34
Pragmas (2)
  • instantiate_function(C, C)
  • instantiate_function(functionName,
    instanceName)
  • This pragma instantiates a virtual resource, not
    defined in a library
  • All calls to the named function will be mapped on
    this virtual resource as single-cycle operations
  • Only a single function can be associated with a
    virtual resource
  • Allows design exploration without actually having
    to create a library element
  • EXAMPLE
  • instantiate_function(cordic",cordic_1")

35
Pragmas (3)
  • merge_regfiles(CL, C)
  • merge_regfiles (registerfileName,
    newRegisterfileName)
  • Merge a list of register files into a new
    register file with the specified name
  • May lead to less registers but possibly a longer
    schedule
  • EXAMPLE
  • merge_regfiles("reg_a_ram_1","reg_dx_acu_1",
    addr_reg")

ram_1
ram_1
addr_reg
acu_1
acu_1
36
Pragmas (4)
  • set_regfileports(C,IN,OUT, I)
  • set_regfileports(regFileName,INOUT, nrports)
  • This pragma allows you to generate multiport
    register files
  • This pragma overrules the default register file
    settings of one input port and one output port
  • EXAMPLE
  • set_regfileports(merged_reg",IN,2)
  • set_regfileports(merged_reg",OUT,2)
  • This will result in a multiport register file
    called merged_reg with two input ports and two
    output ports

37
Pragmas (5)
  • connect_bus(C, CL, CL)
  • Connect_bus(busName, writer, reader)
  • Allows you to define a bus and its connctions.
  • With this pragma you can restrict resources from
    writing to specific busses or you can merge a
    number of busses into one single bus.
  • By using multiple connect_bus pragmas you can
    define partial or a complete busnetwork. The
    outport of a resource that still has no bus
    connection after the last connect_bus pragma will
    automatically receive a private bus.
  • EXAMPLE
  • connect_bus( ram2_bus,acu_2dout,reg_a_ram
    _2d0,reg_dx_acu_2d0)
  • Defines a bus called ram2_bus that is
    written to by the output of acu_2 and read by the
    address port of ram_2 and the first input port of
    acu_2

38
Pragmas (6)
  • no_connection(C, CL)
  • No_connection(writer, reader)
  • With this pragma you can restrict connections
    between one output of a resource (defined by the
    first argument!) and a list of inputs.
  • EXAMPLE
  • no_connection( romctrl_1dout,reg_a_ram_2d0,
    reg_dx_acu_2d0)
  • Using this pragma, no connection will be
    present between the output of romctrl_1 and the
    address register of ram_2 and the first input of
    acu_2

39
Default Architecture
  • The following resources from the (ASIC)default
    library are automatically instantiated when a new
    project is created
  • alu, mult
  • acu
  • romctrl
  • ram, rom
  • inport, outport
  • mbc_23

40
Example Pragma File
//INPORT and OUTPORT without address
generation instantiate("artd_library","inport_noad
dr","inport_1") instantiate("artd_library","outpo
rt_noaddr","outport_1") //ACU and ROMCTRL for
RAM and ROM addressing instantiate("artd_library",
"acu","acu_ram") instantiate("artd_library","acu"
,"acu_rom") instantiate("artd_library","romctrl",
"romctrl_ram") instantiate("artd_library","romctr
l","romctrl_rom") //Cores and Memories instantiat
e("my_library","mac","my_mac") instantiate("artd_
library","rom","rom_1") instantiate("artd_library
","ram","ram_1") //Controller instantiate("artd_l
ibrary","mbc_23","ctrl") //dedicate address
generation cluster connect_bus(bus_romctrl_rom,
romctrl_romdout,reg__acu_romd0) connect
_bus (bus_dout_acu_rom,acu_romdout,reg__
acu_romd0, reg_a_acu_romd0) no_connection(
acu_ramdout,reg_a_rom_1) no_connection(r
omctrl_ramdout,reg__acu_rom,reg_a_rom_1
)
41
Views
Architecture view
42

views
  • Architecture view
  • Graphical representation of the selected
    architecture
  • In this view you can select and highlight
    individual components and resources. You can also
    jump to the architecture report for a detailed
    textual overview

43
Reports (1)
  • Architecture Report

44

Reports (2)
  • Architecture report
  • Lists all selected resource instances and its
    registers
  • Lists for each instance/register
  • input ports and connected register files/muxes
  • output ports and connected buses
  • Resources from the default library are listed
    with
  • unspecified types and with their complete
    instructionset
  • Resources from user libraries are listed with
    types
  • and instruction list as specified in the
    library

45
  • Key Concepts
  • Edit and Compile Source
  • Create Architecture
  • Schedule Operations
  • Build the RT-Level
  • Verify the Design
  • Create and Use a User Library
  • Supported C subset

Map to Architecture
46
Key Concepts
  • In the mapping step following tasks are
    performed
  • Memory management variables and temporary
    variables (introduced by the compilation step)
    are allocated to the available memory resources
  • Core resource assignment operations from the
    design are assigned to corresponding core
    resources and translated in RTs(register
    transfers)
  • Multiplexer introduction muxes are introduced if
    more than 1 bus is connected to input of a
    register or if 2 or more variables with different
    types are transferred to that input over a bus
    connected to it

47
Memory Management
Access Speed
Addressed by Data path
RAM
Arrays
ROM
INPORT/OUTPORT
Area per Memory Location
48
Core resource assignment
  • Resource assignment is completely detemined by a
    set of internal mapping rules and by user
    pragmas.
  • The rules are divided in two groups
  • First set applies to the mapping of the core
    resources in the default library. This set of
    rules are transparent for the user but not
    accessable
  • The second set apply to the mapping of operations
    on user-defined resources and are an essential
    part of the pragmas of the corresponding
    user-defined library

49
Mapping rules
  • Operations or instructions on resources from the
    standard library are handled as taking one clock
    cycle. Exception MAC (has a pipeline register)
  • By default, operations and implicit operations
    are mapped to the first instance of a resource
    that can execute the operation
  • First means first instantiated in pragma file of
    previous step
  • Implicit operations
  • - ROM/RAM addressing Initialize address,
    compute next address
  • - FOR loops Initialize loop counter, update,
    test
  • - Implicit constants for all instances

50
Multiplexer introduction
  • In a last stage of the mapping step, muxes are
    introduced were needed.
  • Their function is threefold
  • bus selection
  • data alignment
  • type manupilation performed by coding
    cast operations

51
Pragmas (1)
  • assign_expression(C, EXPR, C)
  • assign_expression(scopeName, expression,
    instanceName)
  • This pragma forces the mapping of operations,
    indicated with an expression, onto a particular
    instance
  • The action of the pragma is restricted to the
    scope indicated by scopeName
  • EXAMPLE
  • assign_expression("/top",__,mult_2")
  • All multiplications in top will be mapped on
    mult_2

52
Pragmas (2)
  • assign_operation(C, C)
  • assign_operation(operationName,
    instanceName)
  • This pragma forces the mapping of operations,
    indicated with the hierarchical name or source
    label, on a particular instance
  • The label needs to be specified using its full
    pathname
  • Wildcard can be used in levels, in labels
  • EXAMPLE
  • assign_operation(/.../incri, acu_2)

void cordic () ... Uintlt4gt tmp_i 0 loopi for
(int i0 ilt14, i) incri tmp_i ...
53
Pragmas (3)
  • assign_variable(C, C)
  • assign_variable(variableName, instanceName)
  • This pragma allows
  • the mapping of a scalar or array variable onto a
    specific memory (RAM, ROM) or port (INPORT,
    OUTPORT)
  • the mapping of constant variables onto a specific
    ROMCTRL memory
  • The variable needs to be specified using its full
    pathname
  • EXAMPLE
  • assign_variable("/top/AX","ram_2")
  • assign_variable("/c4","romctrl_3")
  • assign_variable(/cordic/Q_in,inport_2)

54
Pragmas (4)
  • assign_address(C, C)
  • assign_address(variableName, instanceName)
  • This pragma forces the address computation of a
    specific variable to be performed onto a
    particular instance
  • If one of the operations needed for address
    computation cannot be performed on the given
    instance, the default (acu_1) is used instead
  • EXAMPLE
  • assign_address("/cordic/A","acu_2")

55
Pragmas (5)
  • assign_loopcounter(C, C)
  • assign_loopcounter(iteratorName,
    instanceName)
  • This pragma forces the operations for the
    specified loopcounter (mostly decrement
    operations) to be performed onto the given
    instance
  • The default resource is the first instantiated
    ACU
  • EXAMPLE
  • assign_loopcounter("/cordic/loopi/i","acu_2")

void cordic() ... loopi for (int i0 ilt14,
i) ...
56
Pragmas (7)
  • unroll(C, IW, IW)
  • unroll(loopName, firstIterationsToUnroll,
    lastIterationsToUnroll)
  • This pragma unrolls loops or parts of loops
  • Whole loop is unrolled when using the wildcard
    _
  • EXAMPLE
  • unroll(/cordic/loopi,_,_) // unrolls the
    whole loop
  • unroll(/cordic/loopi,3,0) // unrolls the
    first 3 iterations
  • unroll(/cordic/loopi,2,4) // unrolls first 2
    and last 4 iterations

void cordic() ... loopi for (int i0 ilt14
i) ...
57
Pragmas (8)
  • assign_variable_to_port(C, C, C)
  • assign_variable_to_port(variableName,
    instanceName, accessPortName)
  • This pragma assigns all read/write operations of
    a variable to a specific port of a dual-port
    memory
  • EXAMPLE
  • assign_variable_to_port("/top/A",ram_1,ram_1_acce
    ss2")
  • Assigns all accesses from/to array A to port 2 of
    ram_1

58
Pragmas (9)
  • assign_operation_to_port(C, C, C)
  • assign_operation_to_port(operationName,
    instanceName, accessPortName)
  • This pragma assigns an operation to a specific
    port of a dual-port memory
  • EXAMPLE
  • assign_operation_to_port ("/top/loop/Aread",ram_1
    ,ram_1_access2")
  • the read operation labeled Aread is done via
    access port 2 of ram_1

59
Reports
  • Architecture report
  • List of resource instances is reduced to those
    that are really used
  • Instances from the default library
  • types have been set to the maximal types that are
    used in the source
  • instruction list has been reduced to those
    instructions that are used
  • Instances from a user library
  • actual types and rules have been checked versus
    the types and rules specified in the library
  • The controller sizes are still unknown

60
Reports
  • Memory map
  • Detailed information on all
  • present memory instances
  • - RAM, ROM, ROMCTRL, INPORT, OUTPORT

61
Reports
  • Mux Report
  • Summary and detailed information
  • on all multiplexers
  • Type Report
  • For all muxes
  • - Output buses they are connected to
  • - For each bus variables, types,
  • bitwise connection and alignment

Two-way cross-highlighting with source!
62
  • Key Concepts
  • Edit and Compile Source
  • Create Architecture
  • Map to Architecture
  • Build the RT-Level
  • Verify the Design
  • Create and Use a User Library
  • Supported C subset

Schedule Operations
63
Key Concept
  • In this fourth step, two tasks are performed
  • Scheduling of the operations the resulting
    RT-graph will be ordered along a time axis in as
    few machine cycles as possible taking in account
    data and hardware constraints
  • Register assignment variables will be assigned
    to fields of register files in such a way that
    the overall size of the register files is
    restricted to a minimum

64
List Scheduling
Candidate LIST
Conflict Priority Comp.
Scheduled Operation
INPUT

4
OUTPORT
MULT
ALU
INPORT
OUTPUT
5
65
ALAP and ALAP Greedy Scheduler
ALAP Greedy
ALAP
in
in
2
2
1
1




incx
incx
Parallel Path Optimizer Less Registers
5
5


8
8


4
4
9
9


decx
decx
7
7


incx
incx
out
out
66
Loop Folding (1)
1
2
3
1
2
3
X
X
X
CORE1
X
X
X
CORE1
-----
-----
CORE2
X
X
X
X
X
X
CORE2
X
1 cycle per iteration since there is no dependency
21 cycles
67
Loop Folding (2)
  • Performed automatically
  • Equivalent to pipelining
  • Advantage
  • Faster schedule through more parallelism
  • Disadvantages
  • Larger controller
  • Larger register files

68
Register Assignment
  • For a particular register file assignment of
    variables to specific register fields

0 1 2 3
a
d
field 1
b
e
e
field 2
c
f
field 3
69
Scheduler Options
  • Scheduler algorithm
  • ASAP (default) as soon as possible
  • ALAP as late as possible
  • ALAP Greedy complete paths are scheduled
  • All tests them all for every level of
    hierarchy and takes the one which results in
    smallest cycle-count
  • Unconstrained folding
  • General option for all loops
  • Default on scheduler will try to reduce the
    total machine-count of a for-loop by increasing
    the available parallelism within every iteration
  • Can be overridden by pragma for specific loop

70
Pragmas (1)
  • fold(C, I)
  • fold(loopName, reductionLimit)
  • For a specific loop, this pragma specifies the
    maximum number of iterations by which the
    original number of iterations may be reduced
  • EXAMPLE
  • fold("/cordic/loopi", 3)
  • Iteration reduction of loop loopi" located in
    function cordic" due to folding is maximally 3.
    Suppose the original number of iterations is 14.
    After folding, the resulting number of iterations
    is at least 11.

71
Pragmas (2)
  • max_cycles(C, I)
  • max_cycles(scopeName, nrOfCycles)
  • Only used for the calculation of the cycle count
    !
  • Suppose scope C in the source hierarchy
    contains a conditional statement or a
    non-manifest loop .You can then specify a maximum
    cycle count value for scope C
  • When the exact number of cycles needed for C
    can be computed, this pragma is ignored
  • EXAMPLE
  • max_cycles("/top/block1", 10)
  • If the number of cycles needed to execute
    "/top/block1" cannot be computed exactly, a value
    of 10 will be assumed to calculate the cycle count

72
Views
Load view
73
Views
  • Load view
  • The top part shows the loop structure of the
    resulting schedule
  • Height of vertical bars represents number of loop
    iterations
  • The bottom part shows activity of cores, register
    files and/or buses as a function of the schedule
    (program counter)
  • Dashed vertical lines represent loop and
    condition boundaries
  • Colored vertical line shows split between init
    section and run section
  • One-way cross-referencing to Schedule Report !
  • Brings you to the potential, not to the exact RT

74
Views
made
consumed
  • Lifetime view

75
Views
  • Lifetime view
  • The top part shows for selected register file(s),
    the lifetime of all variables stored in these
    files as a function of the schedule (program
    counter)
  • The bottom shows for selected register file(s),
    the required number of fields as a function of
    the schedule
  • One-way cross-referencing to Schedule Report !

76
Views
  • RAM view

77
Views
  • RAM view
  • The lifetime of variables stored on the selected
    RAM(s) are displayed against the RAM (s)
    address space

78
Reports
  • Schedule report

79
Reports
  • Schedule report
  • Detailed listing of all RTs and their relative
    sequence
  • Two-way cross-referencing with source !
  • Information on all I/O performed by the processor
    via INPORT and OUTPORT resources

80
Reports
  • Cycle Count Report

81
Reports
  • Cycle Count Report
  • Information on number of cycles
  • Exact number for manifest descriptions
  • C program for non-manifest descriptions (except
    when using pragma max_cycles)
  • Information on loop structure of the input
    description

82
Reports
  • Register report

83
Reports
  • Register report
  • Summary overview of register usage
  • Detailed listing for all register files and all
    fields
  • Variables stored in the register field
  • For each variable, potentials at which writes and
    reads occur

84
  • Key Concepts
  • Edit and Compile Source
  • Create Architecture
  • Map to Architecture
  • Schedule Operations
  • Verify the Design
  • Create and Use a User Library
  • Supported C subset

Build the RT-Level
85
Key Concepts
  • In this last step, following is performed
  • Controller generation
  • ROM optimizations
  • HDL generation

86
Controller-based versus Hardwired
Algorithm a x y - zc Controller-based
implementation (e.g. with microprogram)
microprogram 1. zctmp1 2 y-tmp1tmp2 3.
Xtmp2a
Hardwired implementation

x
-
y
z
a

c
87
Multi-branch versus Single-Branch
Single-branch code
if (ingt8) result 3 else if (ingt4)
result 2 else if (ingt2) result
1 else result 0
Consecutive conditional jumps
Multiple-branch code
switch (in) case large result 3
break case medium result 2 break
case low result 1 break
default result 0
Offers a large number of possible next addresses
1
88
Controller Generation
Generates the control bits sent to the different
resources of the datapath
Evaluates boolean combinations of its inputs
(status flags). Output condition code
Decodes the condition code to decide if the PC
has to branch to a specific address. Output
encoded jump address
89
Controller Alternatives (1)
  • mbc_23 (default)

90
Controller Alternatives (2)
  • mbc_22
  • This is a variation on the default controller
  • difference a pipeline stage (status) has been
    removed

91
Controller Alternatives (3)
  • FSM-based the microcode is replaced with a
    synthesizable HDL model
  • mbc_11

92
Controller Alternatives(4)
  • mbc_12 control delay of 2

93
Optimizations
  • Optimization of micro-ROM
  • Removal of columns constant columns and
    duplicate columns are removed

Before Optimization
After Optimization
1 0 1 1 0 1 0 1 1 1 1 0 0 0
1 1 0 0 0 1
1 0 1 1 0 1 0 1
Res. 1
Res. 2
Res. 1
Res. 2
GND
VDD
94
Optimizations
  • Optimization of ROMCTRLs
  • Constants are put in micro-ROM if the number of
    micro-ROM columns is not increasing

95
HDL Netlist Structure
  • Hierarchical design

artd_ltdesigngt_microrom
artd_ltdesigngt
StatusLogic
artd_rom
Controller
BranchLogic
ir
Cores, ports, auxiliary resources
alu_1
Busses
96
Options
  • Netlist manipulations
  • processor init choose to have it internal or
    external
  • optimize dataROM
  • Generate Hold pin to freeze the processor state
  • Separate files generates separate HDL files
    instead of
  • 1 large HDL file

97
Pragmas (1)
  • optimize_dataroms(CL, ON, OFF)
  • optimize_dataroms(romOrRomctrlInstanceName,
  • ON, OFF)
  • Overrules the Options setting for a specific ROM
    or ROMCTRL
  • EXAMPLE
  • optimize_dataroms("rom_", ON)
  • Optimizes contents of all ROM instances whose
    names start with rom_
  • optimize_dataroms("romctrl_1","rom_1", OFF)
  • No optimization of contents of romctrl_1 and rom_1

98
Pragmas (2)
  • define_vhdl_generic(C, C, C, C)
  • define_vhdl_generic(vhdlinstancePathName,
    genericName, genericType, genericValue)
  • Suppose you have your own VHDL core with
    generics. This pragma allows you to supply the
    additional information needed to instantiate the
    core in the VHDL netlist
  • You have to specify such a pragma for every
    generic
  • EXAMPLE
  • define_vhdl_generic(mymult_1", "width1",
    "integer", "8")
  • The generic width1 of type "integer" of the
    instance mymult_1" will be set to the value "8"

99
Pragmas (3)
  • define_verilog_parameters(C, C)
  • define_verilog_parameters(veriloginstancePathName
    , parameterArgumentValues)
  • Suppose you have your own Verilog core with
    parameters
  • This pragma allows you to specify the values
    of this parameters for an instantiate of the core
    in the Verilog netlist
  • EXAMPLE
  • define_verilog_parameters(mymult_1", "8, 7")
  • The parameter string "8, 7" will be used for
    the instantiation of mymult_1 in the Verilog
    netlist

100
Pragmas (4)
  • make_external(CL)
  • make_external(compomentName)
  • The full instance pathname has to be specified
    (see examples)
  • EXAMPLES
  • make_external("rom")
  • makes all instances external whose names start
    with rom
  • make_external(reg_dy_alu_1)
  • makes register file for the Y input of alu_1
    instance external
  • make_external(mult_1")
  • makes mult_1 and its associated registers and
    multiplexers external

101
Pragmas (5)
  • inport_benchmode(C, READONCE,READONCEPERFRAME)
  • For a specific INPORT, provides more flexibility
    for reading input values within the test bench
  • Address generation on
  • Default each stimuli file is read once per frame
  • By selecting READONCE, the stimuli file is only
    read in the first frame
  • The read value(s) will be reused in the next
    frames
  • E.g. useful for constant parameter values
  • Address generation off
  • Default stimuli file is read every time a read
    operation is encountered
  • By selecting READONCEPERFRAME, the stimuli file
    will only be read once per frame
  • Only supported if only one variable is mapped on
    the INPORT instance
  • EXAMPLE inport_benchmode("inport_1",READONCE)

102
Pragmas (6)
  • map_rom_on_lutram(romInstanceName)
  • map_ram_on_lutram(ramInstancename)
  • Indicates wheter the ROM/RAM will be mapped on a
    network of lut rams
  • This pragma will only be effective if you have
    chosen in the create architecture step for the
    Xilinx FPGA flow
  • EXAMPLE map_ram_on_lutram(ram_1)

103
Pragmas (7)
  • map_rom_on_blockram(C,I)
  • map_ram_on_blockram(C,I)
  • Map_rom_on_blockram(romInstanceName,
    max_nr_blockrams)
  • Map_ram_on_blockram(ramInstanceName,
    max_nr_blockrams)
  • Indicates wheter the ROM/RAM will be mapped on a
    network of blockrams and LUT rams
  • This pragma will only be effective if you have
    chosen in the create architecture step for the
    Xilinx Virtex or Spartan II flow
  • EXAMPLE map_ram_on_blockram(ram_1,4)
  • Maps ram_1 on a combination of
    block ram and lut ram.
  • At most 4 block rams may be
    used. Lut rams will only be used
  • if there are not enough block
    rams.

104
Reports
  • Architecture report
  • Controller dimensions are
  • now filled in

105
  • Key Concepts
  • Edit and Compile Source
  • Create Architecture
  • Map to Architecture
  • Schedule Operations
  • Build the RT-Level
  • Create and Use a User Library
  • Supported C subset

Verify the Design
106
Fetching Inputs with INPORT
Ports generated on processor for every INPORT
ART Designer Processor
ltinport_namegt_address
(not for inport_noaddr)
ltinport_namegt_ data
External Memory Device
ltinport_namegt_ dreq
ltinport_namegt_ davail
107
Writing Outputs with OUTPORT
Ports generated on processor for every OUTPORT
ART Designer Processor
ltoutport_namegt_address
(not for outport_noaddr)
External Memory Device
ltoutport_namegt_ data
ltoutport_namegt_ dready
ltoutport_namegt_ daccept
108
Processor Control Pins
clk
ART Designer Processor
ready
rst
start
109
Processor Startup Sequence
110
Timing of Processor I/O
Meaning of the ready flag
Timing of Input Signals
Timing of Output Signals
111
Generated HDL Test Bench
112
Verifying the Generated HDL
  • Example for vsim and Unix
  • go to the artd_vhd or artd_v subdirectory
    of your project
  • vlib work
  • vcom artd_design.vhd artd_bench.vhd
  • vsim artd_bench -c -do run -all quit -f

113
  • Key Concepts
  • Edit and Compile Source
  • Create Architecture
  • Map to Architecture
  • Schedule Operations
  • Build the RT-Level
  • Verify the Design
  • Supported C subset

User Libraries
114
Using your own Data Path Resources
void addsub (const Intlt16gt in1, const Intlt16gt
in2, const Uintlt1gt mode, Intlt16gt out)
pragma OUT out if (mode 0) out
in1 in2 if (mode 1) out
in1- in2
mode
You need to define -I/O -instructionset -timing
1 bit
16 bits
in1
Add Sub
16 bits
out
16 bits
in2
time
115
Constraints on User Library Resources
  • You can only create user-defined resources that
    perform arithmetic or logical operations on their
    inputs
  • types of arguments have to be determined
  • Latency (number of cycles) has to be manifest

116
Library Data Organization
Contains info about the contents and the
parameters of the library
For every resource you need a pragma file with
the ART Designer model
Optional HDL description of every resource for
simulation/synthesis
117
Declaring a User Library
  • Use the Create Architecture options
  • Library Name
  • Symbolic name to be used in pragmas
  • Library Path
  • Actual directory for library data

118
Creating a User Library
  • To create a user-library, ART Designer is
    equipped with a Library Manager
  • ToolsgtLibrary Manager

119
Adding resources
  • Once you have created the user-library, you have
    to add resources to that library
  • ResourcegtNew

Resource Name
Origin ART Builder or user supplied
Pin availability and their names
Model availability and names
120
Necessary pragmas for every resource

121
Pragmas (1)
  • define_view(CORE, C)
  • define_view(CORE,resourceName)
  • Defines a name for the user-defined resource
  • EXAMPLES
  • define_view (CORE,myMult")

122
Pragmas (2)
  • define_inputs(C, CL)
  • define_inputs(resourceName, inputPort)
  • Defines the name and the width of each input.
  • Control input (command bus) can also be specified
    here
  • Up to two command busses can be present
  • EXAMPLE
  • define_inputs(cordiccore", I_in8",
    Q_in8")
  • define_inputs(myBlock, in124, in224,
    Cbus3)

Command bus
123
Pragmas (3)
  • define_outputs(C, CL)
  • define_outputs(resourceName, outputPort)
  • For a specific resource, this pragma defines the
    name and the width of each data output
  • Flag outputs can also be specified here
  • EXAMPLE
  • define_outputs(myBlock", xi16, xq16,
    xp16")
  • define_outputs(myOwnAlu, out16, flag11,
    flag21)

124
Pragmas (4)
  • define_instruction(C, C, CL)
  • define_instruction(resourceName,
    instructionName, control)
  • This pragma allows you to define an instruction,
    along with its control bits
  • default instruction must always be defined
  • Will be applied for the clock cycles when the
    resource is not active
  • Important for resources with internal state
  • EXAMPLE
  • define_instruction(myAlu", "add", "CbusFTT")
  • define_instruction(myBlock, default, )
  • / single-mode block without internal state /

125
Pragmas (5)
  • define_singlecycle_reservationGraph(C, C) or
  • define_pipelined_reservationGraph(C, C) or
  • define_multicycle_reservationGraph(C, C)
  • The pragmas define different reservationgraphs
    which can be applied to specific instructions
  • There are 3 possible predefined
    reservationgraphs
  • - single cycle, pipelined and multicycle
  • It is also possible to create your own
    reservationgraph using the define_reservationGrap
    h pragma in combination with other optional
    pragmas
  • EXAMPLE
  • Define_pipelined_reservationGraph(myMult",
    onepipelinestage")

126
Pragmas (6)
  • map_function(C, C, C,CL, CL, CL)
  • Map_function(funcName, resourceName,
    reservationGraph,
  • inputMapping, outputMapping,
    ctrlMapping)
  • This pragma defines how a function is mapped on a
    resource. It calls the function and maps it
    onto the first instance of this type of resource.
  • EXAMPLE
  • define_rule (mult",mymult,onepipelinestage
    "in1","in2", "out1","FlagZ",
    modedefault)

127
Optional Pragmas
  • map_expression(EXPR, C, C, CL,CL,CL)
  • Map_expression(expression,resourceName,
    reservationGraph, inputmapping,
    outputmapping, ctrlMapping)
  • This pragma allows you to map expression with
    type definition! on a user-defined resource
  • EXAMPLE
  • map_expression(Intlt32gt9(Intlt32gt(_) Intlt32
    (_)),myMult", resgraph", in1,in2,
    out,flagZ,modedefault)

128
Optional Pragmas
  • define_pipeline(C ,I)
  • define_pipeline(timeshapeName, nrOfPipeRegs)
  • This pragma defines a pipelined timeshape that
    can be applied to a resource with pipeline
    registers
  • The instruction(s) with this timeshape and the
    inputs need to be applied during a specific
    cycle, the corresponding output appears after the
    specified number of pipelines 1
  • EXAMPLE
  • define_pipeline("multTimeshape", 1)

129
Optional Pragmas
  • define_sideeffect(C ,IL)
  • define_sideeffect(resourceName,instruction)
  • For resources with internal state, this pragma
    defines which instructions change the state
  • Used by speculation and loopinvariant code
    optimizers
  • EXAMPLE
  • define_sideeffect(userMac2, mac, mpy)

130
Optional Pragmas
  • define_alignment(resourceName ,LSB,MSB)
  • This pragma defines how all operands and
    operations will be aligned on the ports of
    resource
  • No mixed alignment is allowed. LSB alignment is
    the default.
  • EXAMPLE
  • define_aligment("myMult", MSB)

131
Example Pragma Set (1)
define_view(CORE,"acs") define_inputs("acs",
"in116","in216","in316","mode2") define_out
puts("acs","out116","out216","out31","out41"
) define_instruction("acs","default","modeFF"
) define_instruction("acs","nMax1","modeFT")
define_instruction("acs","pMax2","modeTF") def
ine_instruction("acs","nMax2","modeTT") Defin
e_singlecycle_reservationGraph(acs,graph)
define_rule( "pMax1", acs,graph,
"in1","in2","in3","out1","out2","out3","out4",
modedefault) define_rule( "nMax1",
acs,graph, "in1","in2","in3","out1","out2",
"out3","out4",modenMax1) define_rule(
"pMax2", acs,graph, "in1","in2","in3","out1
","out2","out3","out4",modepMax2) define_ru
le( "nMax2", acs,graph, "in1","in2","in3","
out1","out2","out3","out4",modenMax2)
132
Example Pragma Set (2)
  • Defines a resource acs with
  • 3 data inputs
  • 1 control input
  • 4 data output
  • Defines 4 instructions that can be mapped on this
    resource
  • All instructions take a single cycle to execute

133
User-defined Libraries
  • Running ART Designer
  • Needed pragma file that models I/O,
    instructions and timing
  • Performing C simulation of your algorithm
  • Needed
  • Either a behavioral model in C for the complete
    block ...
  • Or a separate behavioral model in C for each
    instruction (recommended)
  • Performing RTL HDL simulation and/or synthesis
  • Needed HDL model

134
  • Key Concepts
  • Edit and Compile Source
  • Create Architecture
  • Map to Architecture
  • Schedule Operations
  • Build the RT-Level
  • Verify the Design
  • Create and Use a User Library

Supported C subset
135
Constant Definition
  • A warning is generated when an overflow or
    quantization occurs during compilation
  • Example
  • Fixlt12,9gt coef -1.70171875Generates the
    following warning Quantization occurred when
    casting the constant "-1.701718749999999900e00"
    to the type fixlt12,9gt".Result is
    "0bt110.010011000" ( "-1.703125" )
  • String literals are only supported for the
    initialisation of ART Library type
    variables.Example Ufixlt5,2gt c0bu011.01

136
Enumeration Constants
  • Identifiers declared as enumerators are treated
    as integer constants.
  • Renumbering is possible.
  • ART Designer will use the smallest possible type
    to represent the enumeration.
  • Example enum State(INIT, EXEC, OUTPUT) State
    states
  • gt Internally, the variable states will be
    represented as a 2-bit variable

137
Data-Types (1)
  • The standard C types are mapped into ART Library
    types before being mapped into an internal
    representation

C Type
ART Library Type
signed char
Fixlt8,0gt
unsigned char
Ufixlt8,0gt
Fixlt16,0gt
signed short int
unsigned short int
Ufixlt16,0gt
signed int
Fixlt32,0gt
unsigned int
Ufixlt32,0gt
signed long int
Fixlt32,0gt
unsigned long int
Ufixlt32,0gt
bool
Uintlt1gt
void
Not mapped
138
Data-Types (2)
  • The C types float, double, long double are NOT
    supported
  • Pointers and pointer arithmetic are NOT
    supported, use arrays instead
  • Arrays with incomplete type descriptions are not
    supported Intlt10gt A5 // is erroneous
  • A structure is expanded into its member
    valuesExample struct algoState long
    buffer char offset struct algoState
    s1 The corresponding HDL variables will
    be s1xbuffer and s1xoffset

Consequence An array of a structure with N
elements is represented as N
different arrays.
139
Data-Types (3)
  • If structures are used as input or output to the
    top function or an ASU, you will get extra inputs
    or outputs in the generated HDL.
  • The bit information for bit-field structure
    members is ignored.Example struct PPN
    unsigned int PFN 22 int 4
    //unused unsigned int CCA 3 bool dirty
    1 bool valid 1 bool global 1
  • Union types are NOT supported.Example union
    value char s int i

140
Expressions
  • All C operations mapped on resources out of the
    default library will get the quantization
    characteristic TRUNCATED and the overflow
    characteristic WRAPPED.
  • This corresponds to the default behavior of
    ART Library.
  • ART Designer only supports the following
    non-default characteristics
  • , - with saturate
  • The division / and modulo operator are NOT
    supported.
  • Special modulo can be performed on the ACU by
    using the predefined functions of ltartd_acu.hgt
  • A shift operation with a negative shift value
    shifts in the opposite direction.
  • Recursive function calls are NOT supported

141
Declarations Initialization
  • Declarations
  • The volatile type qualifier is ignored
  • The register class specifier is ignored
  • Initialization
  • Initialization of static variables by using a
    function argument is not supported.Intlt16gt
    func(Intlt8gt in) static Intlt8gt tempin //
    erroneous
  • Non-initialized static fixed-point variables get
    a dontcare value.
  • Static and global variables having a supported C
    type and which are not explicitly initialized,
    are initialized to zero according to the C
    semantics. ART Designer will automatically
    initialize such variables to zero.

142
Function Declarations (1)
  • Defining the I/O arguments (for top function and
    for functions mapped to resources)
  • return argument (if present) output
  • non-pointer type argument input
  • reference or array type arguments will be mapped
    on an input and output argument. However, this
    input and output argument can be removed
  • Input only use the const qualifier
  • const Intlt8gt sample3
  • Output only use the pragma precompiler
    directive
  • ifdef__SYNTHESIS__
  • pragma OUT ltargument_name_1gt ltargument_name_2gt
  • endif

143
Function Declarations (2)
  • Example

include ltfxp.hgt void adder( const Intlt4gt a,
//const is optional const Intlt4gt b, //const
is optional Intlt4gt c ) pragma OUT c
cab
entity adder is port ( a in
std_logic_vector(3 downto 0) b in
std_logic_vector(3 downto 0) c out
std_logic_vector(3 downto 0) ) end adder
Inputs a and b, outputs c
144
Linkage Preprocessing
  • A linking step is NOT supported by ART Designer,
    consequences are
  • The input may be kept in several files, but they
    have to be included in the file containing the
    top function
  • External functions are not supported
  • External variables are not supported (extern
    specifier)
  • PreprocessingThe preprocessor variable
    __SYNTHESIS__ is automatically set by ART
    Designer and can be used to exclude some parts of
    the C specification from the mapping.Example i
    fndef __SYNTHESIS__ printf(I am debugging\n)
    // NOT mapped by ART Designer endif

145
Bit Operations
  • All bit operations in ART Library are supported
  • concatenation
  • z concat(x,y)
  • bit select or bit set
  • z bit(x,pos)
  • bit(z,pos,x)
  • slice select or slice set
  • z slice(x,pos1,pos2)
  • slice(z,pos1,pos2,x)
  • They may result in non-optimal hardware since
    they are mapped into basic operations

146
Other Constructs
  • The goto statement is NOT supported
  • The standard library is NOT supported
  • stdio.h in C
  • i
Write a Comment
User Comments (0)
About PowerShow.com