Other Applications of Dependence - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Other Applications of Dependence

Description:

Multi-valued logic: 0, 1, x, z. x = unknown state, z = conflict ... Connectivity. Continuous passing of information. Input port and output port ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 48
Provided by: fph7
Learn more at: https://www.cs.rice.edu
Category:

less

Transcript and Presenter's Notes

Title: Other Applications of Dependence


1
Other Applications of Dependence
  • Allen and Kennedy, Chapter 12

2
Overview
  • So far, weve discussed dependence analysis in
    Fortran
  • Dependence analysis can be applied to any
    language and translation context where arrays and
    loops are useful
  • Application to C and C
  • Application to hardware design

3
Problems of C
  • C as typed assembly language versus Fortran as
    high performance language
  • C focuses more on ease of use and hardware
    operations
  • Post-increments, Pre-increments, Register
    variable
  • Fortran focus is on ease of optimization

4
Problems of C
  • In many cases, optimization is not desired
  • while (!(tp))
  • Optimizers would moves p outside the loop
  • C as well as other new languages focus more on
    simplified software development, at the expense
    of optimizability
  • Use of new languages has expanded into areas
    where optimization is required

5
Problems of C
  • Pointers
  • Memory locations accessed by pointers is not
    clear
  • Aliasing
  • C does not guarantee that arrays passed into
    subroutine do not overlap
  • Side-effect operators
  • Operators such as pre and post increment
    encourage a style where array operations are
    strength-reduced by the programmers

6
Problems of C
  • Loops
  • Fortran loops provides values and restrictions to
    simplify optimizations

7
Pointers
  • Two fundamental problems
  • A pointer variable can point to different memory
    locations during its use
  • A memory location can be accessed by more than
    one pointer variable at any given time, produces
    aliases for the location
  • Resulting in a much more difficult and expensive
    dependence testing

8
Pointers
  • Without knowledge of all possible references of
    an array, compilers must assume dependence
  • Analyzing entire program to find out dependence
    is solvable, but still unsatisfactory
  • Lead to the use of compiler options / pragmas
  • Safe parameters
  • All pointer parameters to a function point to
    independent storage
  • Safe pointers
  • All pointer variables (parameter, local, global)
    point to independent storage

9
Naming and Structures
  • In Fortran, a block of storage can be uniquely
    identified by a single name
  • Consider these constructs
  • p
  • p
  • p
  • (p4)
  • (p4)

10
Naming and Structures
  • Troublesome structures, such as unions
  • Naming problem
  • What is the name of a.b ?
  • Different sized objects to overlap same storage
  • Reduce references to the same common unit of
    smallest storage possible

11
Loops
  • Lack of constraints in C
  • Jumping into loop body is permitted
  • Induction variable (if theres any) can be
    modified in the body of the loop
  • Loop increment value may also be changed
  • Conditions controlling the initiation, increment,
    and termination of the loop have no constraints
    on their form

12
Loops
  • Rewrite as a DO loop
  • It must have one induction variable
  • That variable must be initialized with the same
    value on all paths into the loop
  • The variable must have one and only one increment
    within the loop
  • The increment must be executed on every iteration
  • The termination condition must match
  • No jumps from outside of the loop body

13
Scoping and Statics
  • Create unique symbols for variables with same
    name but different scopes
  • Static variables
  • Which procedures have access to the variable can
    be determined from the scope information
  • If it contains an address, then the content of
    that address can be modified by any other
    procedures

14
Problematic C Dialects
  • Use of pointers rather than arrays
  • Use of side effect operators
  • Complicates the work of optimizers
  • Need to be removed
  • Use of address and dereference operators

15
Problematic C Dialects
  • Requires enhancements in some transformations
  • Constant propagation
  • Treat address operators as constants and
    propagate them where is essential
  • Replace generic pointer inside a dereference with
    actual address
  • Expression simplification and recognition
  • Need stronger recognition within expression where
    variable is actually the base variable

16
Problematic C Dialects
  • Conversion into array references
  • Useful to convert pointers into array references
  • Induction variable substitution
  • Problem with strength reduction of array
    references
  • Expanding side-effect operators also requires
    changes

17
C Miscellaneous
  • Volatile variables
  • Functions with these variables are best left
    without optimization
  • Setjmp and Longjmp
  • Commonly used for error handling
  • Storing and loading current state of computation
    which is complex when optimization is performed
    and variables are allocated to registers
  • No optimization

18
C Miscellaneous
  • Varags and stdargs
  • Variable number of arguments
  • No optimization

19
Hardware Design Overview
  • Today, most hardware design is language-based
  • Textual description of hardware in languages that
    are similar to those to develop software
  • Level of abstraction moving towards low level
    detailed implementation to high level behavioral
    specification
  • Key factor compiler technology

20
Hardware Design Overview
  • Four level of abstraction
  • Circuit / Physical level
  • Diagrams of electronic components
  • Logic level
  • Boolean equations
  • Register transfer level (RTL)
  • Control state transitions and data transfers,
    timing
  • Synthesis conversion from RTL to its
    implementation
  • System level
  • Concentrate on behavior
  • Behavioral synthesis

21
Hardware Design
  • Behavior Synthesis is really a compilation
    problem
  • Two fundamental tasks
  • Verification
  • Implementation
  • Simulation of hardware is slow

22
Hardware Description Languages
  • Verilog and VHDL
  • Extensions in Verilog
  • Multi-valued logic 0, 1, x, z
  • x unknown state, z conflict
  • E.g. division by zero produces x state
  • Operations with x will result in x state -gt cant
    be executed directly
  • Reactivity
  • Propagation of changes automatically
  • always statement -gt continuous execution
  • _at_ operator -gt blocks execution until one of the
    operands change in value

23
Verilog
  • Reactivity
  • always _at_(b or c)
  • a b c
  • Objects
  • Specific area of silicon
  • Completely separate area on the chip
  • Connectivity
  • Continuous passing of information
  • Input port and output port

24
Verilog
  • Connectivity
  • module add(a,b,c)
  • output a
  • input b, c
  • integer a, b, c
  • always _at_(b or c)
  • a b c
  • endmodule

25
Verilog
  • Instantiation
  • Verilog only allows static instantiation
  • integer x, y, z
  • add adder1(x,y,z)
  • Vector operations
  • Viewing other data structures as vector of scalars

26
Verilog
  • Advantages
  • No aliasing
  • Restriction of form of subscripts
  • Entire hardware design given to compilers at one
    time

27
Verilog
  • Disadvantages
  • Non-procedural continuation semantics
  • Lack of loops
  • Loops are implicitly represented by always blocks
    and the scheduler
  • Size

28
Optimizing simulation
  • Philosophy
  • Increases level of abstraction
  • Opts for less details
  • Inlining modules
  • HDLs have two properties that make module
    inlining simpler
  • Whole design is reachable at one time
  • Recursion is not permitted

29
Optimizing simulation
  • Execution ordering
  • The order in which the statement is executed can
    have a dramatic effect on the efficiency
  • Fast in hardware, but not in software
  • Grouping increases performance
  • Execute blocks in topological order based on the
    dependence graph of individual array elements
  • No memory overhead

30
Dynamic versus Static Scheduling
  • Dynamic scheduling
  • Dynamically track changes in values and propagate
    them
  • Mimics hardware
  • Overhead of change checks
  • Static scheduling
  • Blindly sweeps through all values for all objects
    regardless any changes
  • No need for change checks

31
Dynamic versus Static Scheduling
  • If the circuit is highly active, static
    scheduling is more suitable
  • In general, using dynamic scheduling guided by
    static analysis provides the best results

32
Fusing always blocks
  • High cost of change checks motivates fusing
    always blocks
  • Output of a design may change

33
Vectorizing always block
  • Regrouping low level operations back together to
    bring higher lever abstractions
  • Vectorizing the bit operations

34
Two state versus four state
  • Extra overhead in four state hardware
  • Few people like hardware that enters unknown
    states
  • Two state logic can be 3-5x faster
  • Utilization of two valued logic where ever
    possible
  • Finding out part executable in two state logic is
    difficult
  • Use interprocedural analysis

35
Two state versus four state
  • Test for detecting unknown is low cost, 2-3
    instructions
  • Check for unknowns but default quickly to two
    state execution

36
Rewriting block conditions
  • always _at_(posedge(clk)) begin
  • sum op1 op2 c_in
  • c_out (op1 op2) (op2
  • c_in) (c_in op1)
  • end
  • always _at_(op1 or op2 or c_in) begin
  • t_sum op1 op2 c_in
  • t_c_out (op1 op2)
  • end
  • always _at_(posedge(clk)) begin
  • sum t_sum
  • c_out t_c_out
  • End

37
Basic Optimizations
  • Raise level of abstraction
  • Constant propagation and dead code elimination
  • Common subexpression elimination

38
Synthesis Optimization
  • Goal is to insert the details
  • Analogous to standard compilers
  • Harder than standard compilers
  • Not targeted towards a fixed target
  • No single goal. Minimize cycle time, area, power
    consumption

39
Basic Framework
  • Selection outweigh scheduling
  • Analogous to CISC
  • Body of tree matching algorithms
  • Needs constraints

40
Loop Transformations
  • for(i0 ilt100i)
  • ti 0
  • for(j0 jlt3 j)
  • ti ti (ai-jgtgt2)
  • for(i0 ilt100 i)
  • oi 0
  • for(j0 jlt100 j)
  • oi oi mij tj

41
Loop Transformations
  • for(io ilt100 i)
  • ti 0
  • for(i0 ilt100 i)
  • oi 0
  • for(i0 ilt100 i)
  • for(j0 jlt3 j)
  • ti ti (ai-j gtgt 2)
  • for(i0 ilt100 i)
  • for(j0 jlt100 j)
  • oi oi mij tj

42
Loop Transformations
  • for(i0 ilt100 i)
  • oi 0
  • for(i0 ilt100 i)
  • ti 0
  • for(j0 jlt3 j)
  • ti ti (ai-j gtgt 2)
  • for(j0 jlt100 j)
  • oj oj mji ti

43
Loop Transformation
  • for(i0 ilt100 i)
  • oi 0
  • a0 a0
  • a1 a-1
  • a2 a-2
  • a3 a-3
  • for(i0 ilt100 i)
  • t 0
  • t t (a0gtgt2) (a1gtgt2) (a2gtgt2)
    (a3gtgt2)
  • a3 a2 a2 a1 a1 a0 a0 ai1
  • for(j0 jlt100 j)
  • oj oj mjI t

44
Control and Data Flow
  • Von Neumann architecture
  • Data movement among memory and registers
  • Control flow encapsulated in the program counter
    and effected with branches
  • Synthesized hardware
  • Data movement among functional units
  • Control flow is which functional unit should be
    active on what data at which time steps

45
Control and Data Flow
  • Wires
  • Immediate transfer
  • Latches
  • Values hold throughout one clock cycle
  • Registers
  • Static variables in c
  • Held in one or more clock cycle
  • Memories

46
Memory Reduction
  • Memory access is slow compared to unit access
  • Application of techniques
  • Loop interchange
  • Loop fusion
  • Scalar replacement
  • Strip mining
  • Unroll and jam
  • Prefetching

47
Summary
  • Not limited to Fortran
  • Have other applications
  • Early stage of research
Write a Comment
User Comments (0)
About PowerShow.com