HardwareSoftware Cosynthesis for Digital Systems - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

HardwareSoftware Cosynthesis for Digital Systems

Description:

HardwareSoftware Cosynthesis for Digital Systems – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 58
Provided by: Adm952
Category:

less

Transcript and Presenter's Notes

Title: HardwareSoftware Cosynthesis for Digital Systems


1
Hardware-Software Co-synthesis for Digital Systems
  • The study of embedded computing system design
  • Gupta Micheli,
  • IEEE Design Test of Computers 10, no 3. (Sept
    93)29-41
  • P5 of HW/SW Co-design Book
  • Prepared by Dr. Kocan

2
The Problems in HW/SW Co-design
  • Co-specification
  • Creating specifications that describe
  • Hardware elements
  • Software elements
  • The relationships between the elements
  • Co-synthesis
  • Automatic or semi-automatic design of hardware
    and software to meet a specification
  • Co-simulation
  • Simultaneous simulation of hardware and software
    elements (often at different levels of
    abstraction)

3
Co-synthesis Phases
  • Scheduling choosing times at which computations
    occur
  • Allocation determining the processing elements
    (PEs) on which computations occur
  • Partitioning dividing up the functionality into
    units of computation
  • Mapping choosing particular component types for
    the allocated units

These phases are related.
4
HW-SW Co-synthesis for Digital Systems
  • Embedded System Applications
  • general-purpose processors ASICs memory
  • Application-specific the relative timing of
    their actions
  • Real-time embedded systems

5
Challenges in Real-time ESD
  • Performance estimation
  • Selection of appropriate parts for system
    implementation
  • Verification of temporal and functional
    properties of the system

6
(No Transcript)
7
Synthesis-oriented approach
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Capturing specification of system
  • Capture system functionality using a HDL, e.g
    HardwareC, Verilog, VHDL
  • HardwareC a programming language with correct
    unambiguous hardware modeling
  • HardwareC description a set of interacting
    concurrent processes
  • A process restarts itself on completion
  • Nested concurrent sequential operations in the
    body

12
Example HDL functionality specification
  • Two data input operations,
  • a conditional operation to generate counter seed
    z
  • a while-loop to implement a down-counter
  • A graph-based representation captures this spec

13
(No Transcript)
14
System model
  • A system model consists of a set of
    hierarchically related sequencing graphs
  • Vertices represents language-level operations
  • Edges represent dependencies between operations
  • Advantages of graph representation
  • Makes explicit the concurrency inherent in the
    input specification
  • Makes it easier to reason about properties of the
    input description
  • Allow analysis of timing properties of the input
    description

15
Graph Model Properties
  • Sink / source vertices that represent
    no-operations
  • A set of variables defines the shared memory
    between operations in the graph
  • Storage common to the operations
  • Facilitates communication between operations
  • Exactly one execution of an operation with
    respect to each execution of any other operation
    Single Rate Execution of Operations in a graph

16
Multiple Graph Models
  • Operation across graph models follow multi-rate
    execution semantics
  • Variable numbers of executions of an operation
    for an operation in another graph model
  • Use message-passing primitives (send/receive) to
    implement communication across graph models
  • Specification of inter-model communication made
    simple

17
(No Transcript)
18
Modeling Heterogeneous Systems
  • Use multirate spec
  • E.g. ASIC and processor run on different speeds
    clocks

19
Nondeterministic Delay (ND) operations
  • Operations to represent synchronization to
    external events
  • E.g. receive() operation
  • Data-dependent loop operations
  • ND unknown execution delays
  • Modeling ND operations is vital for reactive
    embedded system descriptions

Double circles for ND ops
20
Many possible implementations per system model
  • Timing constraints for defining performance
    requirements of the desired implementation
  • Two types of timing constraints
  • Min/max delay constraint
  • Execution rate constraint

21
(No Transcript)
22
Timing constraints
  • Min/max delay constraints
  • Execution rate constraints
  • Sufficient to capture constraints needed by most
    real-time systems

23
Modeling of delay constraints
rate
Min/max
Edge weight delay of the source operation
Backward edges maximum delay const.
24
Model Analysis
  • Estimate system performance
  • Verify the consistency of specified constraints
  • Performance measures
  • Estimation of operation delays
  • Separate estimations for hardware and software
    implementations
  • Based on the processor to run the software
  • Based on the type of the hardware to be used

25
Processor cost model
  • Execution delay function for basic set of
    processor operations
  • Memory address calculation function
  • Memory access time
  • Processor interruption response time

26
Timing constraint analysis
  • Can imposed constraints be satisfied for a given
    implementation?
  • Assign appropriate delays to the operations with
    known delays in the graph model
  • CONSTRAINT SATISFIABILITY
  • Relating structure, actual delay and constraint
    values
  • Some structural properties of graphs may make a
    constraint unsatisfiable (ND operations)
  • Some constraints may be mutually inconsistent
  • E.g. maximum delay constraint between two
    operations that also have a larger minimum delay
    constraint
  • No assignment of nonnegative operation delay
    values can satisfy such constraints

27
Presence of ND operations
  • A timing constraint is satisfiable if it is
    satisfied for all possible (and may infinite)
    delay values of the ND operations
  • A timing constraint is marginally satisfiable if
    it can be satisfied for all possible values
    within the specified bounds on the delay of the
    ND operations
  • Some implementation assumptions (acceptable
    bounds on ND operation delays)

28
Timing Analysis by graph analysis
  • (1) No ND operations in the graph
  • Edges with finite/known weight
  • Cant satisfy a min/max delay constraint if a
    positive cycle in the graph model exists
  • (sum of the weights on the cycle is positive)
  • (2) ND operations exist
  • Satisfiable if no cycle contains ND operations
  • Cycle contains ND ops, impossible to determine
    the satisfiability of timing cosntraints only
    marginal satisfiability can be guaranteed
  • Cycle breaking by graph transformation

29
Timing analysis
  • Nonpipelined implementations
  • Rate constrains can be min/max delay constraints
    between corresponding source sink operations of
    the graph model
  • Apply min/max constraint satisfiability criterion
    to the analysis of rate constraints

30
Example Rate constraints (graphs with ND ops)
  • process test(p,)
  • in port pSIZE
  • Boolean vINT-SIZE
  • v read p
  • while (v gt0)
  • ltloop-bodygt
  • vv-1

Rate constraint on read operation Unbounded while
operation ? ND operation
v Boolean array to represent an integer
31
  • Overall execution time of the while loop
    determines
  • the interval between successive executions of the
    read operation
  • This variable-delay while loop operation
  • The input rate at port p is variable
  • Cannot be always guaranteed to meet the required
    rate constraint
  • Ensure marginal satisfiability of rate constraint
    by graph transformation and by using a
    finite-size bufffer

32
P transformed into fragments Q R
Rate constraint from sink to source
33
Software Implementation of Ex. A
  • Two threads for each execution of T1, T2
    executes v times
  • Thread T1 Thread T2
  • read v loop synch
  • detach ltloop_bodygt
  • v v-1
  • detach

34
Process P with ND operation
  • ND operation due to an unbounded loop
  • ND operation induces a bipartition of the calling
    process P
  • PF U B
  • F e.g. read operation
  • The set of operations in F must be performed
    before invoking the loop body
  • The set of operations in B can only be performed
    after completing executions of the loop body
  • Functional Pipeline F ? B? Loop to improve the
    reaction rate of P
  • Note we assume nonpipelined hardware, therefore
    the pipelining done in software

35
Constraint Analysis and Software
  • Linear execution semantics imposed by software
    running on single-processor
  • Complicates constraint analysis for software
    implementation of graph model
  • Complete order of operations necessary to perform
    delay analysis
  • Complete ordering (may) create unbounded cycles ?
    make constraints unsatisfiable

36
Example for Completely Ordering of Operations
37
Communication Ops in SW
  • Computation ops must be performed serially
  • Communication ops can proceed concurrently
  • Overlap execution of ND ops (wait for
    synchronization or communication) with some
    (unrelated) computation
  • Requires dynamic software scheduling
  • Simultaneous active ND operations may complete in
    orders that cannot be determined statically

38
Software model a set of fixed-latency concurrent
threads
Delay overheads of dynamic scheduling
39
Thread
  • A linearized set of operations
  • May or may not begin with ND operation (indicated
    by a circle)
  • A thread does not contain any ND operation (other
    than beginning with one)
  • The delay of the initial ND operation is part of
    the scheduling delay (not included in the latency
    of the thread)
  • Multiple threads avoid complete serialization of
    all operations ? may create unbounded cycles
  • SW model enables checking of marginal
    satisfiability of constraints on operations
    belonging to different threads
  • Assume fixed and known delay of scheduling
    operations associated with ND operations

40
System Partitioning
  • system-level partitioning problem
  • The assignments of operations to hardware or
    software
  • Assignment determines the delay of the ops
  • Communication overheads due to ASIC or processor
    assignment
  • Min. comm. Delay
  • Increase ops in SW to increase the processor
    utilization
  • Overall System Performance
  • The effect of HW/SW partition on the utilization
    of processor AND the bandwidth of the bus between
    the processor and ASIC hardware.
  • Devise a partitioning cost function
  • Sizes of hw/sw parts
  • Timing behavior capture the timing performance
    during the partitioning
  • Hard to capture timing behavior (use
    approximation techs)

41
Hardware/Program partitioning
  • HW partitioning divide circuits that implement
    scheduled operations
  • Program-level partitioning addresses operations
    that are scheduled at runtime
  • Use statistical timing properties to drive
    partitioning algorithms

42
Use of timing properties in partition cost
function
Use deterministic bounds on timing properties
that are incrementally computable in the
partition cost function
43
Characterization of SW
  • Thread latency (L) execution delay of a program
    thread
  • Thread execution rate (R) the invocation rate of
    the thread
  • Processor utilization PSum (LxR)
  • Bus utilization (B) total amount of
    communication between the HW and SW.
  • To transfer m variables Bsum rj
  • rj the inverse of the minimum time interval
    between two consecutive samples for variable j
  • Calculate static bounds on SW performance with
    L,R,B,P
  • Overestimating performance parameters Why ?
  • Distribution of thread invocations, and
    communications based on actual data values

44
Hardware Size, Interface Characterization
  • Sum of the size estimates of the resources
    implementing the operations
  • Assign ports for communication between HW and SW.
    one port per variable
  • Bus bandwidth captures the overhead of
    communication

45
Partitioning a specification into HW and SW
implementations
  • Given the cost model for software, hardware, and
    interface
  • Given a set of sequencing graph models and timing
    constraints between operations, create two sets
    of sequencing graph models s. t. one can be
    implemented in hardware and the other in software

46
Constraints after Partitioning
  • Timing constraints are satisfied for the two sets
    of graph models
  • Processor utilization P lt 1
  • Bus utilization B lt B
  • A partition cost function
  • min f(Size_HW,B,P(-1),m)

47
Institutive features of partitioning algorithm
  • Identify operations can be implemented in
    software s.t.
  • Constraint graph implementation can be satisfied
  • The resulting software meets rate constraints on
    its inputs/outputs.
  • Initial partition
  • ND ops of the data dependent loop operations
    define the beginning of the program threads
  • All other operations in HW
  • Compute the reaction rates of the threads
  • Maximum reaction rate the inverse of its
    latency
  • Latency of a program thread is computed from the
    processor delay cost model and a fixed scheduling
    overhead delay
  • Iterative improvement
  • Migrate an operation (affects (1) execution
    delay, (2) latency, (3) reaction rate of the
    thread into which the ops moved)
  • Compute its effect on processor and bus bandwidth

48
System Synthesis
  • Synthesize individual HW and SW components
  • Here generation of interface and software from
    partitioned models
  • We know the program threads
  • Use coroutine scheme for program generation
  • Limit all external dependencies to the first and
    last statements of the threads to have convex
    threads
  • Concurrency might be reduced!

49
Rate Constraints and Software
  • The presence of dependencies on ND operations
  • Sw implementation may not meet the data rate
    constraints on its I/O ports
  • Synchronization-related ND operations
  • Assign a context-switch delay to the respective
    wait operations
  • Check for marginal satisfiability of timing
    constraints
  • Unbounded loop-related ND operations
  • Estimate loop index values for marginal
    satisfiability analysis

50
Example C
  • To obtain a deterministic bound on the reaction
    rate of the calling thread T1.
  • Unroll the looping thread by a variable number
    program threads
  • Scheduling overhead per new thread
  • Dynamic creation of the threads may lead to
    violation of processor utilization constraint
  • Overlap execution of T1 and T2 to ensure marginal
    timing constraint satisfiability
  • Remove wait2 op if T2 does not modify a common
    variable use a buffer to maintain the reaction
    rate

51
No unbounded-delay operations
  • Simplify a SW component into one single program
    thread and a single data channel
  • All data transfers are serialized
  • Disadvantage of the approach no support for
    reordering or branching

52
Example D
  • HW/SW interface
  • Data queues on each channel
  • A control FIFO
  • (holds the thread_ids in the order in which their
    input data arrives)
  • FIFO depth the number of threads of execution
  • Nonpreemptive, priority-based scheduling with
    FIFO control

53
Example E
  • Actual interconnection schematic between HW and
    SW for single data queue
  • Implement ControlFIFO and associated control
    logic as a part of the ASIC or in software

54
Marginal timing satisfiability analysis
The input rate at port p is variable . Cannot
guarantee the reaction rate of T1
55
Hardware-software interface
  • Data transfer from HW to SW must be explicitly
    synchronized
  • Polling strategy
  • Accommodation of different rates of execution
    among HW and SW components (and due to
    unbounded-delay ops)
  • A dynamic scheduling of different threads of
    execution
  • Use Control FIFO for scheduling
  • Data items are consumed in the order in which
    they are produced

56
Interface Protocol for Graphics Controller
Two threads generates line and circle
coordinates in software Control FIFO hold the
ideas of the threads
57
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com