The Benefit of Concurrent Model Checking - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

The Benefit of Concurrent Model Checking

Description:

The Benefit of Concurrent Model Checking BVSRC Berkeley Verification and Synthesis Research Center Baruch Sterin, A. Mishchenko, N. Een, Robert Brayton – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 24
Provided by: Alan204
Category:

less

Transcript and Presenter's Notes

Title: The Benefit of Concurrent Model Checking


1
The Benefit of Concurrent Model Checking
  • BVSRC
  • Berkeley Verification and Synthesis Research
    Center
  • Baruch Sterin, A. Mishchenko, N. Een, Robert
    Brayton
  • BVSRC
  • UC Berkeley
  • Thanks to NSF, SRC, NSA, and Industrial
    Sponsors,
  • IBM, Intel, Synopsys, Mentor, Magma, Altera,
    Atrenta, Microsemi, Jasper, Oasys, Real Intent,
    Tabula, Verific

2
Overview
  • Overview
  • Model checking engines
  • Example
  • Non-concurrent
  • Hybrid approach
  • Concurrent verify and refine.
  • Flow
  • Example
  • Why more powerful
  • Questions and objections addressed
  • Future work

3
Concurrent Model Checking
  • Overview
  • Employ multiple MC engines using hybrid
    concurrency on a multi-core server
  • Benefits
  • Faster
  • almost linear speedup
  • plus does not waste time making a wrong decision.
  • More powerful
  • can solve harder problems
  • Makes sequential approach obsolete
  • No reason not to use concurrency
  • even for 1 core
  • simpler
  • Concurrency controlled by Python front end.

4
Model Checking Engines
  • Random simulation
  • Semi-formal simulation
  • Bounded model checking (BMC) 15
  • BDD-based reachability 725
  • Property directed reachability (PDR) 4
  • Interpolation 14
  • Synthesis
  • rewriting 10
  • retiming 13
  • sequential signal correspondence 26
  • with constraint extraction
  • phase abstraction 27
  • temporal decomposition 23
  • Abstraction 8
  • counterexample-based (CB) 19
  • proof-based (PB) 2021
  • Speculation 23
  • Verification engines
  • 1-3 incomplete
  • 4-6 complete
  • Transformation engines
  • 7 equivalence preserving
  • 8-9 abstracting

5
Example of non-concurrent MC
Read_file test_lru_consist_miss_slbc.sixth_sense_
style_1sif_prop2_fixed2 PIs 532, POs 1, FF
2389, ANDs 12049 prove quick_verify (try many
engines to see if one can prove) Simplifying Numb
er of constraints 3 Forward retiming,
quick_simp, scorr_constr, trm PIs 532, POs
1, FF 2342, ANDs 11054 Simplify PIs
532, POs 1, FF 2335, ANDs 10607 Phase
abstraction PIs 283, POs 2, FF 1460,
ANDs 8911 quick_verify (try many engines to
see if one can prove) Abstracting Initial
abstraction PIs 1624, POs 2, FF 119,
ANDs 1716, max depth 39 Testing with BMC bmc3
-C 100000 -T 50 -F 78 No CEX found in 51
frames Latches reduced from 1460 to 119 Simplify
PIs 1624, POs 2, FF 119, ANDs 1687,
max depth 51 Trimming PIs 158, POs 2,
FF 119, ANDs 734, max depth 51 Simplify
PIs 158, POs 2, FF 119, ANDs 731, max
depth 51 quick_verify (try many engines to see
if one can prove) Speculating Initial
speculation PIs 158, POs 26, FF 119,
ANDs 578, max depth 51 Fast interpolation
reduced POs to 24 Testing with BMC bmc3 -C 150000
-T 75 No CEX found in 1999 frames PIs 158,
POs 24, FF 119, ANDs 578, max depth
1999 Simplify PIs 158, POs 24, FF 119,
ANDs 535, max depth 1999 Trimming PIs
86, POs 24, FF 119, ANDs 513, max depth
1999 Verifying (try many engines to see if one
can prove) Running reach -v -B 1000000 -F 10000
-T 75 BDD reachability aborted RUNNING
interpolation with 20000 conflicts, 50 sec, max
100 frames 'UNSAT Elapsed time 457.87
seconds, total 458.52 seconds
6
  • NOTES
  • The file IE1.aig is first read in and its
    statistics are reported as 532 primary inputs, 1
    output, 2389 flip-flops, and 12049 AIG nodes.
  • 3 implicit constraints were found, but they were
    only mildly useful in simplifying the problem.
  • Phase abstraction found a cycle of length 2 and
    this was useful for simplifying the problem to
    1460 FF from 2335 FF. Note that the number of
    outputs increased to 2 because the problem was
    unrolled 2 time frames.
  • Abstraction was very successful in reducing the
    FF count to 119. This was proved valid out to 39
    time frames.
  • BMC verified that the abstraction produced is
    actually valid at least to 51 frames, which gives
    us good confidence that the abstraction is valid
    for all time.
  • Trimming reduced the inputs relevant to the
    abstraction from 1624 to 158 and simplify reduced
    the number of AIG nodes to 731.
  • Speculate produced a speculative reduced model
    (SRM) with 24 new outputs to be proved and low
    resource interpolation proved 2 of them. The SRM
    model is simpler and has only 578 AIG nodes. The
    SRM was tested with BMC and proved valid out to
    1999 frames.
  • Subsequent trimming and simplification reduced
    the PIs to 86 and the AIG nodes to 513.
  • The final verification step first tried BDD
    reachability allowing it 75 sec. and to grow to
    up to 1M BDD nodes. It could not converge with
    these resources so it was aborted. Then
    interpolation was able to prove UNSAT, and hence
    all 24 outputs are proved.
  • Although quick_verify was applied between
    simplification and abstraction, and between
    abstraction and speculation, it was not able to
    prove anything, so its output is not shown.
  • The total time for this proof was 457 sec. run on
    a Lenovo X301 laptop.

7
test_lru_consist_miss_slbc.sixth_sense_style_1sif_
prop2_fixed2.aig PIs532,POs1,FF2389,ANDs12049
Executing super_prove 'INTRP', 'BMC',
'pre_simp' For_Retime PIs532,POs1,FF2365,AND
s11064 Number of constraints 2, frames
1 PIs529,POs1,FF2342,ANDs10611 Simplify
PIs529,POs1,FF2265,ANDs10068 Trying
temporal decomposition - for max 15.0 sec. No
reduction Trying phase abstraction - Max phase
2 1, 2 Reparam PIs 1056 gt 264 Simplify with
2 phases PIs264,POs2,FF1462,ANDs8319 Method
pre_simp ended first in 89 sec.
PIs264,POs2,FF1462,ANDs8319 Running
abstract 'INTRP', 'BMC3', 'initial_abstract' Me
thod initial_abstract ended first in 106
sec. Initial abstraction PIs1621,POs2,FF105,A
NDs1427,max depth42 Iterating abstraction
refinement PIs1621,POs2,FF105,ANDs1427,max
depth42 Latches reduced from 1462 to
105 Running pre_simp Reparam PIs 330 gt
328 PIs328,POs2,FF105,ANDs1184,max
depth42 Min_Retime PIs328,POs2,FF98,ANDs116
4,max depth42 Reparam PIs 328 gt 299 Simplify
PIs299,POs2,FF98,ANDs1064,max
depth42 Reparam PIs 299 gt 266 Trying temporal
decomposition - for max 15.0 sec. No
reduction Reparam PIs 266 gt 261 Running
speculate 'INTRP', 'BMC3', 'initial_speculate'
Method initial_speculate ended first in 38
sec. Initial speculation PIs261,POs38,FF96,AN
Ds833,max depth42 Iterating speculation
refinement BMC3 -- cex in 0.17 sec. at depth 22
gt PIs261,POs37,FF96,ANDs830,max
depth42 INTRP UNSAT in 1.4 sec. Total clock
time taken by super_prove 366.549089 sec.
Same example of with concurrent MCwithout PDR
8
Same example of with concurrent MC but with PDR
test_lru_consist_miss_slbc.sixth_sense_style_1sif_
prop2_fixed2 PIs532,POs1,FF2389,ANDs12049
Executing super_prove 'PDR', 'INTRP', 'BMC',
'PDRm', 'pre_simp' PIs532,POs1,FF2389,AN
Ds12049 For_Retime PIs532,POs1,FF2365,ANDs1
1064 Number of constraints 2, frames
1 Reparam PIs 532 gt 529 PIs529,POs1,FF2342,AN
Ds10611 Simplify PIs529,POs1,FF2265,ANDs100
68 PDRm proved UNSAT in 42 sec. Total clock time
taken by super_prove 42.384159 sec.
9
Hybrid Approach
c_verify
REACH and REACHm optional depending on size
(PIs, FFs)
c_refine
refine
10
c_prove
11
Concurrent Prover Flow - hybrid
c_prove
Start
UNSAT SAT
UNSAT SAT
undecided
backup
kill
SAT
UNSAT SAT
undecided
pause
UNSAT SAT
CEX
c_refine
UNSAT
SAT
undecided
pause
c_refine
UNSAT
CEX
SAT
undecided
means runs concurrently
SAT
(c_prove outputk)
End with a definitive answer
12
Multiple output variation on c_refine
  • If there are more than X outputs
  • group outputs and use poor mans concurrency
    (PMC)
  • repeatedly take a group of X outputs at a time
  • start with time-out of 2 sec.
  • after all output groups done, double time-out and
    repeat
  • if cex found
  • refine and start at last time-out value and
  • last group of X where cex was found.

13
Example of Concurrent Flow
l2snfsm_prop11_fixed2 PIs38,POs1,FF372,ANDs215
0 Executing super_prove Initial
PIs38,POs1,FF372,ANDs2150 Running
Simplification 'PDR', 'INTRP', 'BMC', 'PDRm',
'pre_simp' these run in parallel PIs38,POs1,FF
371,ANDs2150 Fwd_Retime PIs38,POs1,FF349,AN
Ds2056 No constraints found Simplify
PIs38,POs1,FF336,ANDs1951 Trying temporal
decomposition - for max 15.0 sec. No
reduction Method pre_simp ended first in 9
sec. PIs38,POs1,FF336,ANDs1951
14
Running abstract Start PIs38,POs1,FF336,AN
Ds1951 'PDR', 'INTRP', 'BMC3', 'PDRm',
'initial_abstract' Running initial_abstract with
bob10,stable6,time100,depth20 Method
initial_abstract ended first in 103
sec. PIs38,POs1,FF336,ANDs1951,max
depth11 Initial abstraction PIs116,POs1,FF25
8,ANDs1576,max depth11 Iterating abstraction
refinement Verify time set to 125 PIs116,POs1,FF
258,ANDs1576,max depth11 Reparam PIs 116 gt
59 changes inputs to be smaller
number . many iterations here SIM -- cex in
41.48 sec. at depth 104 gt cex_po
0 PIs45,POs1,FF329,ANDs1925,max
depth11 Reparam PIs 45 gt 39 Latches reduced
from 336 to 329 simplify PIs39,POs1,FF329,ANDs
1924,max depth11 Min_Retime
PIs39,POs1,FF329,ANDs1914,max depth11 No
constraints found Simplify PIs39,POs1,FF328,A
NDs1900,max depth11 Trying temporal
decomposition - for max 15.0 sec. No reduction
15
Running speculate 'PDR', 'INTRP', 'BMC3',
'PDRm', 'initial_speculate' Method
initial_speculate ended first in 39 sec. Initial
speculation PIs39,POs241,FF178,ANDs1335,max
depth11 Iterating speculation refinement PDRM
-- cex in 5.64 sec. at depth 40 gt
PIs39,POs239,FF178,ANDs1332,max
depth11 BMC3 -- cex in 1.84 sec. at depth 22 gt
PIs39,POs235,FF178,ANDs1326,max
depth22 many iterations here BMC3 -- cex
in 11.91 sec. at depth 25 gt PIs39,POs204,FF19
1,ANDs1350,max depth25 BMC3 -- cex in 17.77
sec. at depth 25 gt PIs39,POs203,FF195,ANDs13
81,max depth25 BMC -- cex in 29.44 sec. at
depth 25 gt PIs39,POs204,FF195,ANDs1390,max
depth25 BMC -- cex in 37.03 sec. at depth 26 gt
PIs39,POs203,FF195,ANDs1389,max
depth25 Find_cex_par turned on poor mans
concurrency turned on here Verify time set to
148 Number of POs 203 gt 69 t_poor 2 PDRM
UNSAT in 0.08 sec. PDRM UNSAT in 0.07
sec. many iterations here PDR UNSAT in 0.25
sec. PDRM UNSAT in 0.02 sec. all outputs
processed gt 69 outputs proved Number of POs
reduced to 0 Total clock time taken by
super_prove 483.238051 sec. Out7 'UNSAT'
16
Why is concurrent more powerful?
  • Example of Iterating speculation refinement
  • verify time set to 50
  • Initial size PIs171,POs41,FF255, ANDs2275
  • SIMULATION cex 4.268283 sec, frame 911
  • SIMULATION cex 0.096659 sec, frame 17
  • BMC cex 6.534474 sec, frame 17
  • SIMULATION cex 0.726484 sec, frame 1363
  • SIMULATION cex 5.740357 sec, frame 391
  • BMC cex 9.506526 sec, frame 17
  • SIMULATION cex 6.436064 sec, frame 984
  • SIMULATION cex 1.212145 sec, frame 444
  • PDRM cex 4.335237 sec, frame 18
  • BMC cex 9.853237 sec, frame 17
  • SIMULATION cex 6.335866 sec, frame 81
  • SIMULATION cex 4.595637 sec, frame 22
  • SIMULATION cex 4.594522 sec, frame 40
  • SIMULATION cex 9.182059 sec, frame 58
  • PDRM cex 5.637425 sec, frame 20
  • BMC cex 9.861210 sec, frame 17

17
Why is concurrent more powerful?
refine
refine
refine
refine
refine
refine
refine
refine
refine
refine
cex
cex
cex
cex
cex
cex
cex
cex
cex
cex
Final abstraction/ speculation
Initial abstraction/ speculation
18
Hard examples - academic
Hard HWMCC10 Examples Hard HWMCC10 Examples Hard HWMCC10 Examples Hard HWMCC10 Examples Hard HWMCC10 Examples Hard HWMCC10 Examples
Name Prim. Inputs Flip flops And nodes Result Time ( sec.)
bobsmhdlc0 61 291 1647 Unsat 434
bobsmhdlc10 61 290 1628 Unsat 450
bobsmhdlc20 61 289 1612 Unsat 1002
bobsmhdlc30 61 300 1574 Unsat 1245
Pdtrod6x8p21 9 84 4318 Unsat 1224
Pdtpmsudc122 16 36 553 Unsat 48
Bobpcihm0 304 1422 9627 none -
Bobsminiuart0 16 114 571 none -
Bobsmcodic0 34 1850 18762 none -
Nusmvqueue1 82 84 2376 none -
Pdtpmsudc161 20 48 741 none -
Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC
19
Name Primary Inputs Flip flops And nodes Result Time (sec)
bypass33 856 781 11945 Unsat 84
GCT_38 266 607 14308 Unsat 188
pmu_wr_11 74 1072 7155 Unsat 875
tp_p_w_0 35 208 1228 Unsat 601
KML_M_21 155 3795 20098 Unsat 353
test_hit_4 1570 3107 16701 Unsat 153
two_back62 144 1660 13411 Sat 173
bypass_28_0 156 68 3504 Unsat 9
MCS_MCS_13 247 2654 9985 Unsat 30
sc_sc_0 249 5609 31029 none -
DA_DA_11 168 429 4771 Unsat 37
p3_d_n_0 17 197 1355 Sat 180
pclem_0 77 1564 9460 Unsat 193
assert_p_7_0 207 157 3549 Unsat 396
MCA_MCA_0 131 1718 6615 Unsat 24
MCS_rand5 144 2707 10239 Unsat 441
mcx_z_10 4 2269 9974 none -
sc_ver2_0 19 959 3274 Sat 433
symm_0 34 815 4101 Sat 56
Erat_0 86 396 3016 Unsat 720
Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints
Hard examples - Industrial At
the time, the IBM SixSense program did not have a
PDR engine, so we eliminated those problems that
were made easier because of PDR in our code.
A subset of the IBM benchmarks, not solved by
SixthSense using its default Expert System flow
in two hours
20
Multiple output variation on c_refine
  • How long does it take?
  • Let O POs, E MC engines used concurrently,
    C cores, T final time-out, X
    outputs grouped together
  • Final sweep (with no cexs and assuming no memory
    conflicts)
  • with using full concurrency time T(OE)/C
  • with grouping and full concurrency time
    T(O/X)(XE)/C T(OE)/C
  • with grouping and PMC time T2
    (O/X)(XE)/C 2T(OE)/C
  • Why not do full concurrency and no grouping?
  • Grouping done to lessen memory conflicts.
  • at most XE processes are concurrent on server
  • choose X so that little memory conflict (why not
    choose X C/E?)
  • PMC done to find cex early when doing grouping.
  • easy cexs across all outputs are found early
  • When cexs found (some heuristics)
  • refine and start PMC at last time-out value
    (instead of 2 sec.)
  • heuristic that expects next cex will take at
    least that time to find
  • first try the last set of X where cex was found.
  • heuristic that expects that last group where cex
    was found is most likely to yield the next cex.

Number of concurrent engines running per coren
21
Questions addressed
  • Memory Use and Conflicts?
  • experiments run on 2 processor 4-core each, 24
    Gb, 64K L1, 256K L2, 4 Mb server
  • grouping designed to alleviate severe memory
    conflicts.
  • did not observe slowdown due to memory conflicts,
    but more experiments need to be done
  • Run-time speedup?
  • linear up to cores
  • concurrency alleviates wasting time due to wrong
    decisions
  • solving problems not solved by sequential flow
  • Wasting processor power trying many things but
    throw away all but one?
  • wastage if some cores sitting idle
  • alternative is to run wrong engine for a longer
    time
  • Use SOTA algorithm?
  • too many MC algorithms
  • expert system proposed which learns which
    algorithms are best for a given design project
    (Z. Nevo - IBM)

22
Future Work
  • More and better engines
  • Improved BDD reachability engine (we hope)
  • We have 4
  • We had a quite weak (HWMCC08) in 08
  • Now have two reasonably good ones.
  • May have a much better one in a few months.
  • Improved circuit-based SAT solver
  • Currently used in signal correspondence to
    simplify larger circuits
  • Faster but sometimes limited quality
  • Will be improved to see if it can compete with
    MiniSat 1.14c
  • New specialized techniques for SEC
  • More use of concurrency
  • e.g. exchange information between engines.
  • will not work on parallelizing individual engines

23
To Learn More
  • Recent papers http//www.eecs.berkeley.edu/alanmi
    /publications
  • IWLS
  • N. Een, A. Mishchenko, and R. Brayton, Efficient
    implementation of property directed
    reachability". IWLS'11.
  • B. Sterin, N. Een, A. Mishchenko and R. Brayton,
    The Benefit of Concurrency in Model Checking,
    IWLS11.
  • S. Ray and R. Brayton, Proving Stabilization
    Using Liveness-to-Safety Conversion, IWLS11
  • Other
  • R. Brayton and A. Mishchenko, "ABC An academic
    industrial-strength verification tool", Proc.
    CAV'10, LNCS 6174, pp. 24-40.
  • N. Een, A. Mishchenko, and N. Amla, "A
    single-instance incremental SAT formulation of
    proof- and counterexample-based abstraction".
    Proc. FMCAD10.
  • H. Savoj, D. Berthelot, A. Mishchenko, and R.
    Brayton, Combinational techniques for sequential
    equivalence checking". Proc. FMCAD10, pp.
    158-162.
  • Send email
  • alanmi_at_eecs.berkeley.edu
  • brayton_at_eecs.berkeley.edu
  • een_at_eecs.berkeley.edu
  • Visit BVSRC webpage www.bvsrc.org

24
(No Transcript)
25
end
26
Why is concurrent more powerful?
  • Iterating speculation refinement
  • verify time set to 50
  • SIMULATION cex 4.26 sec, frame 911 gt
    PIs171,POs41,FF255,ANDs2275,max depth28
  • SIMULATION cex 0.09 sec, frame 17 gt
    PIs171,POs43,FF255,ANDs2280,max depth28
  • BMC cex 9.50 sec, frame 17 gt
    PIs171,POs43,FF255,ANDs2282,max depth28
  • SIMULATION cex 6.43 sec, frame 984 gt
    PIs171,POs47,FF255,ANDs2292,max depth28
  • SIMULATION cex 1.21 sec, frame 444 gt
    PIs171,POs49,FF255,ANDs2302,max depth28
  • PDRM cex 4.33 sec, frame 18 gt
    PIs171,POs48,FF255,ANDs2304,max depth28
  • BMC cex 9.85 sec, frame 17 gt
    PIs171,POs55,FF256,ANDs2346,max depth28
  • SIMULATION cex 6.33 sec, frame 81 gt
    PIs171,POs55,FF256,ANDs2347,max depth28
  • SIMULATION cex 4.59 sec, frame 22 gt
    PIs171,POs55,FF257,ANDs2366,max depth28
  • SIMULATION cex 4.59 sec, frame 40 gt
    PIs171,POs54,FF257,ANDs2363,max depth28
  • BMC cex 6.96 sec, frame 17 gt
    PIs171,POs51,FF258,ANDs2377,max depth28
  • PDRM cex 5.84 sec, frame 22 gt
    PIs171,POs51,FF259,ANDs2385,max depth28
  • BMC cex 7.11 sec, frame 17 gt
    PIs171,POs47,FF259,ANDs2377,max depth28
  • PDRM cex 3.58 sec, frame 19 gt
    PIs171,POs46,FF259,ANDs2374,max depth28
  • PDRM cex 6.04 sec, frame 19 gt
    PIs171,POs45,FF259,ANDs2371,max depth28
  • PDRM cex 8.89 sec, frame 20 gt
    PIs171,POs44,FF259,ANDs2372,max depth28
  • BMC cex 7.50 sec, frame 17 gt
    PIs171,POs41,FF260,ANDs2366,max depth28
Write a Comment
User Comments (0)
About PowerShow.com