A Genetic/Local Search Hybrid Architecture for VLSI Circuit Partitioning

About This Presentation
Title:

A Genetic/Local Search Hybrid Architecture for VLSI Circuit Partitioning

Description:

A Genetic/Local Search Hybrid Architecture for VLSI Circuit Partitioning By Shawki Areibi University of Guelph School of Engineering Guelph, Ontario, Canada –

Number of Views:119
Avg rating:3.0/5.0
Slides: 50
Provided by: scoe5
Category:

less

Transcript and Presenter's Notes

Title: A Genetic/Local Search Hybrid Architecture for VLSI Circuit Partitioning


1
A Genetic/Local Search Hybrid Architecture for
VLSI Circuit Partitioning
  • By Shawki Areibi
  • University of Guelph
  • School of Engineering
  • Guelph, Ontario, Canada

2
Outline
  • Introduction
  • Circuit Layout
  • Motivation
  • Background
  • Methodology
  • Design Challenges
  • Hardware Approach
  • Results
  • Future Work

3
VLSI Design Circuit Layout
Specification
Physical Design Cycle
Partitioning
Divide a circuit into smaller parts.
Placement
Place modules on a chip
Routing
Determine how the wires connect the modules
Extraction Verification
Fabrication
4
Motivation Interconnect Delay
  • Prior to 1.0µ Gate Delay
  • More than 10 Million Transistors
  • After 1.0µ Interconnect Delay

10
Typical Gate Delay
Delay (ns)
Interconnect Delay
1.0
0.1
2.0 µ
1.5 µ
1.0 µ
0.8 µ
0.5 µ
0.35 µ
Minimum Feature Size
5
Motivation Hardware Accelerators
  • Complexity and size of circuits are rapidly
    increasing (3 billion transistors by 2008!)
  • Placing a demand on EDA for faster and more
    efficient techniques for physical Design
    Automation.
  • It will be relatively impossible for even the
    fastest computers to solve effectively these
    problems within an acceptable time frame
  • One possible solution is in the form of hardware
    accelrators
  • The research investigates the development of a
    hardware accelerated Memetic Algorithm for
    Circuit Partitioning

6
Circuit Partitioning
Block 1
Block 0
Modules
0
1
2
3
4
5
Net 3
0
0
0
0
1
1
1
Net 2
0
0
0
1
1
1
2
Nets
M0
M2
M4
M3
M1
M5
0
0
0
0
1
1
3
0
0
0
0
1
1
4
Net 1
5
0
0
0
0
1
1
Net 5
Net 4
0
1
2
3
4
5
Objective Value
2
3
0
1
1
0
1
0
0
1
(Uncut Nets)
7
Heuristic/Meta Heuristic Techniques
  • Local Search
  • Single point based heuristic
  • Swap/Move based technique
  • Iteratively improves solution
  • Expolits the solution space
  • Genetic Algorithm
  • Population based heuristic
  • Based on biological reproduction
  • survival of the fittest
  • Explores solution space

8
Genetic Algorithm
Genetic vs Local Search
Local Search
  • P1
  • P2
  • C1
  • Crossover
  • P1
  • P2
  • C1

Not Global Minimum
9
Research Goals
  • Hardware Implementation of GA
  • Hardware Implementation of Local Search
  • Achieve Speedup
  • Investigate Hybrid Algorithms Techniques
  • Improve Performance
  • Investigate the suitability of High Level
    Languages i.e Handel-C in designing systems.

10
Design Restrictions
  • Architecture must
  • Fit common FPGA devices
  • Adapt to other optimization problems
  • TSP
  • 0-1 Knapsack problem
  • Handle large circuits
  • Have user programmable parameters from host PC

11
MCNC Benchmark Suite
25,114
125
12
Celoxica Handel-C
  • High-level language based on ISO/ANSI-C
  • Eliminates need for retraining software engineers
  • Generates VHDL or a EDIF code
  • Support for most of FPGA devices
  • Optimizes second-party PAR programs

13
Approach
  • Explore the most efficient design
  • Achieve increased performance
  • pipelining and parallelization
  • Divide the tasks into separate but concurrent
    components

FPGA Chip
Different Tasks of algorithm
14
Hardware Parallelism
  • Different sections of design operate concurrently
  • Multiple tasks completed within a single clock
    cycle

F(x) (2 3) (2 5) (4 2)
Basic Computers
Hardware Design
Multiplication
Addition
Division
Addition
15
Hardware Pipelining
  • Assembly line
  • Different stages processed at the same time
  • Number of stages determines throughput

Basic Sequential Computers
Task 1
Task 1
Task 1
Task 2
Task 2
Task 2
Pipelined Hardware Design
Three times the throughput
Task 1
Task 1
Task 1
Task 2
Task 2
Task 2
Task 3
Task 3
Task 4
Task 3
Task 4
Task 4
16
Bitwise Representation
  • To improve timing, each bit of the word
    represents a cell within the solution

. . .
0
1
2
0
1
1
0
0
0
1
1
0
0
1
Cell in partition 1
Cell in partition 0
Parent 0
  • Multiple cells manipulated in a single cycle

1
0
1
1
0
Parent 1
Uniform Mask
0
1
1
0
1
1
0
1
0
0
Offspring
17
Aim of Genetic Algorithm in Hardware
Child 1
Pipelined Flow
Child 0
18
Memory Issues
  • External ram limits the architecture
  • Semaphores add one clock cycle to memory access
  • Memory intensive routines execute sequentially

Fitness Calculation
External Memory
Crossover Routine
Switch
Repair Routine
19
Memory Problems
Request
Request
Request
Request
Request
2
Previous clock
0
4
Memory
1
Memory Access
1
Semaphore
2
Total clock
4
6
20
Memory Solution 1
Request
Request
Previous clock
0
2
4
1
Memory Access
1
Memory
Semaphore
2
Total clock
4
6
21
Fitness Memory
  • Problem with parallelization and memory
  • Limits parallelization

Fitness Calculation
Benchmark Memory
Fitness Calculation
Fitness Module
Fitness Module
22
Genetic Algorithm Timing Results
Results gathered from 5 trial runs Areibi
Software Sun Blade 3000, 900MHz UltraSparc
111 Bitwise Software HP Workstation 2100, 2.4
GHz Intel Pentium 4
23
Genetic Algorithm Solution Results
Results gathered from 5 trial runs Areibi
Software Sun Blade 3000, 900MHz UltraSparc 111
24
Genetic Algorithm Discussion
  • Execution Speed Limitation
  • Sequential Fitness Calculation
  • Operating at ¼ external clock rate
  • Handel-Cs use of Semaphores
  • Potential Design Solution Quality Limitation
  • Difference in Crossover techniques
  • Handel-C (Uniform)
  • Areibi Software (2-Point)
  • Effect of Random Number Generator

25
  • Question Is it possible to improve the solution
    generated by the Genetic Algorithm?
  • Answer YES!
  • Genetic Algorithms are known to be good at
    exploring the solution space but are weak at fine
    tuning the solutions
  • Solution Insert a hill climbing simple Local
    Search to improve upon the solutions

26
Local Search Algorithm
  • The proposed Local Search forces net to be
    contained exclusively within one partition

Partition 1
Partition 0
Net 3
Net 2
M0
M2
M1
M5
Net 1
Net 5
Objective Value
2
3
Net 4
(Uncut Nets)
Cell Data
0
1
2
3
4
5
1
2
3
4
5
0
1
1
0
1
0
0
0
1
0
Partition 0
0
0
1
1
0
1
0
Partition 1
0
0
0
27
Update Partition Data
Netlist
Backup Data and Apply Next Move
Net1
Net2
Net3
Net4
Net5
1
1
M0
1
M1
Determine Modules Connected to Net
Determine Cells Connected to Net
1
1
1
M2
1
M3
Determine Which Other Nets are Connected to this
Module
Determine Which Other Nets are Connected to this
Cells
1
1
M4
1
1
M5
Determine Status of these Nets
Determine Status of these Nets
28
Sequential issues
Select Next Move
Copy Solution
Update Net Info
Block Ram
Block Ram
Block Ram
Block Ram
29
Handel-C Local Search Timing Results
Results gathered from 5 trial runs Areibi
Software Sun Blade 3000, 900MHz UltraSparc
111 Bitwise Software HP Workstation 2100, 2.4
GHz Intel Pentium 4
30
Local Search Discussion
  • Performance Improvement
  • Handel-C achieves 2.1 time speedup over software
  • Improvement caused by the balancing criteria in
    parallel
  • generates 85 of the total software execution
    time
  • Cause of bottlenecks
  • Creating backup copies of the original data
  • Limitation due to memory

31
Memetic Algorithm
  • Two Memetic Algorithms are developed from the
    Genetic Algorithm and Local Search architectures
  • Exhaustive Memetic
  • Applies the Local Search to an random pool of
    individuals from the final Genetic Algorithm
    population
  • Forces the individuals to local maximums
  • Intermediate Memetic
  • Applies the Local Search to a few individuals
    after every X generations of the Genetic
    Algorithm
  • Attempts to steer the population towards higher
    fit solutions

32
Algorithm Solution Quality
33
Algorithm Timing Results
34
Memetic Algorithm Discussion
  • Faster than software GA
  • Solution qualities not equal software GA
  • Exhaustive Memetic Algorithm
  • Equal solution quality to Local Search
  • nearly twice the execution time.
  • Intermediate Memetic architecture
  • weaker results than both Local Search and
    Exhaustive Memetic
  • significantly more execution time.

35
CAD Algorithm Results
  • Genetic Algorithm
  • five times faster than traditional software
  • 85 solution quality of traditional software
  • 2 times slower than bitwise software
  • Local Search
  • 2.1 times faster than bitwise LS software
  • Memetic Algorithms
  • Slightly weaker results than traditional GA
    software
  • Solution qualities equal to pure Local Search
    architecture

36
Current Work
Future Work
Handel-C Local Search
Handel-C Genetic Algorithm
Handel-C Memetic Algoirthm
Investigated Handel-C vs VHDL
Implement the Genetic Algorithm to further
investigate findings
Investigate effects of Crossover and RNG
Improve Fitness Calculation (Pipeline/Parallelism)
Incorporate Memory to eliminate repetitive
searching
37
Conclusion
  • Development of a Handel-C implementation of a
    Memetic Algorithm to incorporate a new local
    search methodology for circuit partitioning
  • Development of a VHDL and Handel-C implementation
    of a Local Search algorithm for circuit
    partitioning including a detailed comparison
    between the two approaches
  • Comparison of a true speed performance of
    Handel-C architectures with software architectures

38
  • Thank You

39
Why Develop in Handel-C vs VHDL
  • Celoxicas Highlights for Handel-C
  • Software engineers design hardware without
    retraining
  • Rapid development of multi-million gate FPGAs
  • Predictable and controllable hardware behavior
  • Enables efficient use of available hardware
  • Decrease in development time (factor of 3 to 4)
  • Questions
  • Are these claims accurate? And is it practical
    for designing hardware?

40
Handel-C Findings
  • Advantages
  • Development time Handel-C (1 week, 1400 lines of
    code)
  • VHDL (5 weeks, 8000 lines of code)
  • Ease of learning language
  • Disadvantages
  • Memory Management ¼ the external clock rate
  • Resources used 1.2 times the un-optimized VHDL
    design
  • Slower execution time than the VHDL design

41
High vs Low level Languages
Performance results of the Local Search
Architectures
  • VHDL requires 55 of the execution time needed by
    Handel-C while operating at half the frequency
  • Reasons for improvement
  • No Semaphores
  • VHDL operates a full clock rate
  • Architecture is optimized to perform specific task

42
Handel-C Discussion
  • Handel-C
  • There are three areas of concern in most hardware
    designs
  • Minimize development time ()
  • Increase execution Performance
  • Decrease size of design
  • The only benefit to the Handel-C language is
    minimizing development time (designing and
    debugging)

43
Random Number Generator
Add and Mult
LFSR
44
Initial Genetic Algorithm Fitness values (prim2)
Hardware
Software
  • Initial Population
  • Mean 1050.23
  • SD 23.69
  • Best 1111.6
  • Worst 996.4
  • Final Value
  • Mean 1742.9
  • SD 6.927
  • Best 1753.6
  • Worst 1722.6

Initial Population Mean 1085.06 SD
26.32 Best 1147 Worst 1011.0 Final
Value Mean 2536.7 SD 15.114 Best
2574.4 Worst 2493.6
45
Reconfigurable memory
  • Cause a break in the pipeline
  • Perform the same task as the current sequential
    fitness calculation except it would allow for
    larger benchmarks
  • Beneficial if configuring block rams (eliminate
    the need for the 4 clock read/write)

46
Other Optimization problems to addapt
Repair Module
Fitness Module
Replacement
Repair Module
Fitness Module
47
Future Work
  • Investigate the difference between the Genetic
    Algorithm software (Areibi01) and the Handel-C
    architecture
  • Optimize the Genetic Algorithm by implementing in
    VHDL
  • Adapt the current design to perform two fitness
    calculations using a single memory read
  • Divide the Fitness Calculation into numerous
    pipeline stages to increase throughput
  • Incorporate memory into the Local Search to
    eliminate repetitive searching

48
FPGAs
  • Re-programmable hardware to perform different
    tasks
  • Inexpensive hardware development
  • Exploits qualities of hardware
  • Parallelism
  • Pipelining

49
VHDL
  • Describes the behavior of circuit
  • Programmed in concurrent manner
  • Complex sequential algorithms
  • Lengthy debugging process
Write a Comment
User Comments (0)
About PowerShow.com