Title: VLSI Design Abstraction Levels
1Introduction
- VLSI Design Abstraction Levels
Idea
Architectural Design
Logical Design
Physical Design
Fabrication
New Chip
2Introduction (VLSI Placement Problem Definition)
- Informal definition and objective
- Mm1, m2, , mn, Ss1, s2, , sk
- Each mi is associated with a set Smi ? S
- Each si is associated with a set Msimjsi?Smj
- LL1, L2, , Lp, p ? n
- Complexity is n! ? NP-Hard problem
3Introduction (VLSI Placement Styles)
- Full Custom Layout
- Gate Array Methodology
- Macro Cell
- Standard Cell Methodology
Wasted Space
Feedthrough Cell
A
Pad
B
Routing Channel
4Introduction (Heuristics Applied to VLSI
Placement)
- Deterministic vs. Stochastic Heuristics
- Constructive vs. Iterative Heuristics
- Constructive Deterministic
- Linear Placement Alg., Min-Cut Placement Alg.
Force Directed Alg. - Iterative Stochastic
- Simulated Annealing, Genetic Alg., Simulated
Evolution Tabu Search
5Multi-Objective Placement Problem
- Multiple Objectives and Constraints
- Conflicting Objectives
- Interconnection Length
- Area
- Critical Path Delay
- Overall Solution Quality Evaluation
6Multi-Objective Placement (Interconnection Length)
- Significance
- Net Definition and Estimation Techniques
- Steiner Tree Approximation
1
2
3
4
7Multi-Objective Placement (Area)
- Standard Cell Placement uses fixed height cells
- Fixed Height Channels are assumed ?
- Only width varies from a solution to another
- Width is given by the longest row
8Multi-Objective Placement (Critical Path Delay)
- A VLSI ckt. is a collection of paths
- Path and Critical Path Definitions
- Given a path ?, let v1, v2, , vk be the nets
belonging to ?. The delay of the path ? is given
by
9Multi-Objective Placement (Overall Solution
Evaluation)
- Multi-objective ? Vector quantity
- Weighted Sum vs. Fuzzy Logic
- Fuzzy Goal Based Cost Measure
- C(x) (C1(x), C2(x), , Cp(x))
- O (O1, O2, , Op), Oi ? Ci(x) ?i, ?x?K
- G (g1, g2, , gp)
- x is acceptable if Ci(x) ? gi ? Oi
10Multi-Objective Placement (Acceptable Solutions)
Cdelay(x)
gdelay?Odelay
Cwidth(x)
gwidth?Owidth
Odelay
Owidth
Owl
gwl?Owl
Cwl(x)
Impossible Solutions
Acceptable Solutions
11Multi-Objective Placement (Membership Rule)
- The rule for determining membership in the fuzzy
set is - If the solution is within acceptable wire length
AND within acceptable circuit delay AND within
acceptable width, THEN it is acceptable.
12Multi-Objective Placement (Membership Function)
?ic
1.0
?ic(x)
gi
Ci/Oi
1.0
Ci(x)/Oi
13Multi-Objective Placement (Lower Bounds)
- Optimum values for wire length, delay and width
are computed according to
14Tabu Search
Start with an Initial Solution s
Initialize TL (Tabu List), AL (Aspiration
Level) and Counter
- Definition, Key feature and Memory
- TS vs. Local Search
- Basic TS Algorithm
Investigate subset V of the neighborhood of s
Find the best solution s in V
Yes
Is the move (s,s) Tabu?
No
Yes
C(s) lt AL
s s
No
Update TL and AL
Counter Counter 1
Yes
No
CounterN_Iterations?
Report the best solution
15Tabu Search Parameters (CL)
- Candidate List
- why a candidate list?
- Construction Strategies
- Aspiration Plus
- Elite Candidate List
- Successive Filter Strategy
- Sequential Fan Candidate List
- Bounded Change Candidate List
16Tabu Search Parameters (Moves)
- Moves and Move Attributes
- what is a move?
- why move attributes?
- Complementing a binary variable in the solution
- Having a change of C(s) - C(s) in the cost
- Similar change in another problem based function
- Any combination of the above
17Tabu Search Paramters (Cost TL)
- Evaluation Function
- Implementation vs. operation cost
- single or multiple objectives
- Tabu List
- Local search, intensification and diversification
- Tabu Tenure
- problem size
- Search objective
18Tabu Search Parameters (AC)
- Aspiration Criteria
- What and Why?
- Global Aspiration by Objective
- Regional Aspiration by Objective
- Aspiration by Search Direction
- Aspiration by Influence
19Tabu Search Classes
- Short Term Memory
- Intermediate Term Memory
- Long Term Memory
20Literature Review (Heuristics Applied to VLSI
Placement)
- Linear Placement Algorithm (CD)
- Min-Cut Placement (CD)
- Force Directed Placement (ID)
- Simulated Annealing (IS)
- Genetic Algorithm (IS)
- Simulated Evolution (IS)
- Tabu Search (IS)
21Literature Review (Tabu Search in VLSI Placement)
- Lim, Chee and Wu92
- Macro-Cell Placement with global routing
- Quad Partitioning Using TS to minimize delay
- Interconnect and cell delay (Weighted Sum)
- Lin and Du90
- Capacitor Placement in radial distribution sys.
- Minimize energy loss. Short term TS random CL
- Improvement in quality and time over SA
22Literature Review (Tabu Search in VLSI Placement)
- Handa and Kuga95
- Analog LSI chip designs placement
- Wire length, area ease routing
- TS performed better when imposed on GA
- Mackey and Carothers96
- Quad-partitioning VLSI Macro-cell placement
- Wire length minimization using Fuzzy Cost
- Significant Improvement over Lim, Chee Wu
23Literature Review (Tabu Search Parallelization)
- Purposes
- How to paralelize?
- One search for time t or p searches for time t/p
24Literature Review (Tabu Search Parallelization)
- Taillard90
- Neighborhood examination for FSSP
- A master broadcasts an initial solution
- Slaves send their best found neighbors
- The master picks the overall best move
- It continues for fixed iterations or until no
improvement is observed
25Literature Review (Tabu Search Parallelization)
- Garcia, Potvin and Rousseau94
- Parallel TS for vehicle routing
- A master and slaves investigate neighborhood
- Each process sends its best move
- The master broadcasts a set of best moves
- De Falco et. al. 94
- Evolution principles were included in PTS
- Neighbor machines exchange best solutions
- If coming best is better, it replaces the local
one
26Literature Review (Tabu Search Parallelization)
- Nair and Freville97
- PTS for 0-1 multi-knapsack problem
- A master generates initial solutions and
strategies - Mori and Hayashim98
- PTS for voltage and reactive power control
- Two Schemes
- Neighborhood Investigation
- Search replication with different Tabu Tenures
27Proposed Algorithm (Basic Proposed TS Algorithm)
- Reads randomly generated solution initializes
- The Alg. runs STMTS for fixed of iterations
- CL is constructed using random moves
- A move is swapping two cells
- A move attribute is the swapped cells numbers
- A compound move can be made with depth d
examining Nv neighbors at each step. - Tabu Tenure used depends on the circuit size
28Proposed Algorithm (Basic Proposed TS Algorithm)
- Aspiration by Objective
- Cost Function used is the same as the one
proposed by Ali in his MS thesis - Wire length, delay and width are computed
- It uses Fuzzy Goal Based Cost Measure
29Parallelization of the Proposed Algorithm
- Parallelized on a NOW using PVM
- Why?
- Two levels of parallelization
- Candidate List Construction
- Tabu search replication
TS Master
TS Worker
TS Worker
TS Worker
CLW
CLW
CLW
CLW
CLW
CLW
30Parallelization of the Proposed Algorithm
(Scenario)
Tab Search Master
Initialize Data Structures and Read Initial
solution
Repeats for No.of Global Iterations
Tab Search Worker
Spawn TSWs and pass them the arguments
Receive arguments from TSM
Send Current Solution to TSWs
Receive Initial Solution from TSM
Candidate List Worker
Perform a diversification step
Receive arguments from TSW
Spawn CLWs and pass them args
Repeats for No.of Local Iterations
Send current solution to CLWs
Receive Initial Solution from TSW
Investigate the neighborhood and find the best
move
Send the best cost and best solution if the TSW
asks for it
Get best cost from all CLWs and best solution as
the overall best
If move isnt tabu or satisfies AC accept it.
Otherwise, reject it
Send best cost and best solution if the TSM asks
for it
Get best cost from all TSWs and best solution as
the overall best
31Parallelization of the Proposed Algorithm (cell
selection)
- When a CLW chooses two cells for swap, one of
them has to be from its range - If not, prob. that 2 CLWS pick the same two cells
is (2/n)2 - Prob. that k CLWs pick the same two cells is
(2/n)k - In our case, prob. that 2 CLWS pick the same 2
cells is (1/n)2 - Prob. that more than 2 CLWS pick the same cells
is 0 ? - Prob. that 2 CLWs pick the same cells is reduced
by 4 - Prob. that k gt 2 CLWs pick the same cells is
eliminated - CLWs make compound moves ? Sequential Fan CL
Strat.
32Parallelization of the Proposed Algorithm ( of
choices)
- If ClWs have no restriction in choosing cells ?
C2n choices - Prob. that 2 cells are taken from the same range
is k(k-1)/n2 - If each of k CLWs have to choose both cells from
its range ? k ? C2n/k choices - In this case, if we have 100 cells and 4 CLWs ?
we normally have 4950 choices and only 1200
choices with the restriction ? 75.8 of the
neighborhood is ignored - In our case, prob. that the 2 cells are taken
from the same range is 1? (k-1)/n (k-1)/n ?
prob. of choosing cells from the same range is
multiplied by n/k ? prob. of choosing 1 cell from
outside the range is reduced by n/k
33Applying the Algorithm in a Heterogeneous
Environment
- A NOW is normally heterogeneous in
- Machine architecture, data format, computational
speed, network type, machine load and network
load - PVM takes care of the first 2
- In our implementation we care for others
- The master gets best sol. from any TSW that
finished LI - Once finished TSWs are half the total, the master
asks others - TSWs check for such a message every 10 iterations
- Once they receive it, they kill the currently
running CLWs and report their best solutions to
the master
34Applying the Algorithm in a Heterogeneous
Environment
- Same principle applies between TSW CLW
- CLWs check frequently for a message that either
kills them or asks them for their best - By that, we account for machine load, machine
speed and network load heterogeneity - Experiments were run on PX/SPARC, Sparc-Station
10, LX/SPARC and UltraSparc 1 - All have the same OS (Solaris 2.5)
35Diversification of the Search Process
- What and why?
- Penalization of frequent moves
- Kelly et. al.94 proposed the following strategy
for QAP - Let most recent minimum be ?min?min(1),
?min(2), , ?min(n) and the current sol. be
?cur?cur(1), ?cur(2), , ?cur(n), then all
swaps ?cur(x) ? ?cur(y) such that ?cur(x)
?min(x) or - ?cur(y) ?min(y) are considered. The swap
with highest improvement or least degradation is
performed. Such moves are made until no more
moves are available
36Diversification of the Search Process
- In our work, the given scheme is modified to make
every TSW investigate a different space - Every time a TSW gets a solution from the TSM, it
diversifies within its assigned range. It
performs swaps to a predetermined depth. At every
swap, it makes Nv trials and accepts the best - The 1st cell has to be from the range ?
probability that 2 CLWs make the same move is
reduced by 4 and probability that k gt 2 CLWs make
the same move is eliminated - A condition for the swap is that the new
locations have to be different from original ones
37Experiments and Results
- Effect of Low-level parallelization degree
- Effect of High-level parallelization degree
- Effect of Accounting for Heterogeneity
- Effect of Diversification
- Weighted Sum vs. Fuzzy Evaluation
- Comparison with Previous Results
38Experiments and Results (Benchmark
Characteristics)
39Experiments and Results (Experiments Parameters)
40Experiments and Results (Effect of Number of CLWs)
- CLWs from 1 to 4 and TSWs fixed to 4
- 12 Machines are included in the PVM
41Experiments and Results (Effect of CLWs on Qual.)
42Experiments and Results (Runtime of CLWs)
43Experiments and Results (Speedup of CLWs)
44Experiments and Results (Efficiency of CLWs)