Title: A Billion Cycles a Day: Industrial Verification
1A Billion Cycles a DayIndustrial Verification
- Matthew Heath
- Presentation to Synthesis Verification Class
- May 6, 2003
Based on Validating the Intel Pentium 4
Microprocessor by Bob Bentley, DAC 2001
2How do you verify a design with...
- 42 million transistors
- 1 million lines of RTL code
- 600 1000 people working on it
- A 3-year design time
- Daily design changes
3How do you verify a design which has bugs like
this??
- The FMUL instruction, when the rounding mode is
set to round up, incorrectly sets the sticky
bit when the source operands are src1670
X2i15 12i src2670 Y2j15 12jwhere
ij 54 and X,Y are integers
4And the answer is...
- Hire 70 validation engineers
- Buy several thousand compute servers
- Write 12,000 validation tests
- Run up to 1 billion simulation cycles per day for
200 days - Check 2,750,000 manually-defined properties
- Find, diagnose, track, and resolve 7,855 bugs
- Apply formal verification with 10,000 proofs to
the instruction decoder and FP units - This found that obscure FMUL bug!
5We know why validation is hard for tools.Why is
it hard for people who run them?
- To meet an aggressive tapeout schedule, design
and validation must occur in parallel without one
blocking the other. - Validation starts before the design is done
- Design changes occur while validation tests are
running - Both design and validation must continue in the
presence of known, unfixed bugs
6The design team
- 300 designers write RTL code
- Refer to architectural spec, textbooks, research
papers, conversations - Start with basic functionality and progressively
add features according to project staging plan - Do simple self-checks along the way
7The validation team
- 100 validators write RTL tests
- Refer to same sources as designers, plus the RTL
implementation itself - Write functional tests to exercise features as
theyre implemented - Run tests on RTL simulator
- Diagnose failures
- File bug reports in central database
8The management
- Collect and analyze data
- Pass/fail status of tests
- Bug database statistics (counts, priority, age,
discovery rate, fix rate, etc.) - RTL feature implementation progress
- Compare trends with project schedule
- Respond if necessary
- Re-allocate resources to high-risk areas
- Prioritize work
9SRTL Structural RTL
- Boolean equations no behavioral syntax
- State-accurate
- RTL state maps directly to schematic state
- High-level constructs supported
- Macros, constants, loops, vectors
- Design hierarchy
- Full-chip has 6 clusters
- Each cluster has several units
- Each unit has tens of functional blocks
- Each block has O(104) transistors
- Each designer owns several functional blocks
10SRTL models
- Cluster and full-chip level
- Full-chip models consume 1GB of disk space
- Compiled, executable SRTL code
- Source code
- Test environments
- Include emulation of external logic
- Direct control over interface signals
- Pre-defined sets of signals commonly selected for
tracing during test debug - Like an Awaves configuration
- Library of useful test fragments
11Most design work at cluster level
- Decouples cluster and full-chip validation
- Designers graft to latest cluster models
- Check-out and edit selected source files
- Incremental model build
- Run validation tests
- Revision control system
- Designers check-in edited source files
- Log messages include change descriptions, author,
timestamp
12Cluster model release process
- Designers periodically turn-in selected
checked-in versions of source files - Cluster model builders process turn-ins
- Merge changes from different versions of the same
source file included in multiple turn-ins - Coordinated turn-ins sometimes necessary
- Compile an executable cluster SRTL model
- Run tests provided by the validators
- Report test failures to validators and designers
for debug - Acceptable models released to design team for
future grafts
13Full-chip model release process
- Same process, different hierarchy
- Cluster model builders don designers hat
- Graft to full-chip model
- Edit based on changes to recent cluster models
- Incremental full-chip model build
- Run full-chip validation tests
- Debug failures, full-chip turn-in
- Now full-chip model builders take over...
- Process turn-ins from all clusters
- Run full-chip validation tests again!
- Release full-chip models to design team
14Netbatch
- 109 simulation cycles / day 10 Hz 105 sec/day
103 computers - Netbatch manages compute server workload
- For a given SRTL model and set of tests, create a
job file and send it to netbatch - Each sub-team has a netbatch allocation
- Jobs exceeding allocation enter wait queue
- Wait times of 24 hrs not uncommon
- Test results
- Pass/fail statistics
- Failure time and meaningful error message
- Traces of user-selected system state
15Efficiency improvements
- A SRTL change made by a designer...
- Appears in a cluster model 1 week later
- Appears in a full-chip model 2 weeks later
- Validators find bugs in released models which the
designer has already fixed - Onion peeling vs. whack-a-mole debug
- Temporarily disabling failing properties
- Releasing models which fail some tests
- System state capture and restore
16Central bug database
- Released model version
- Failing validation test symptoms
- Root cause
- Requested design change
- Priority
- Log of discussion among designers, validators,
and managers - Status / disposition
- New, ETA, test fixed, design fixed ( version),
validated, dropped
17Bug root causes
18Schematic formal verification
- Use formal techniques because schematic
simulation takes too long - Schematic design starts long before SRTL design
is done - Bottom-up
- Verify SRTL macros vs. library cells first
- Black-box macrocells verify block
- Because SRTL is state-accurate, verification is
combinational only!
19One SRTL state may map to multiple functionally
equivalent schem states
X
Z
W
Y
CLK
W1
Q
D
Z
X
Z X Y MSFF (Z, W, CLK)
Y
W2
Q
D
CLK
Z1
W1
X
Y
Z2
W2
CLK
20Retiming must be back-annotated into SRTL
Z1
X
Z X Y MSFF (Z, W, CLK)
W
Y
Z2
CLK
Y
MSFF (X, Y, CLK) Z Y
X
Z
CLK
21Conclusion
- Efficient verification of large-scale designs is
a daunting management challenge - Design and validation are concurrent, not
iterative - Possible with adequate resources and powerful
tools to use the resources efficiently - Methodology constraints keep the problem
tractable - Clear communication among team
- Careful documentation
- Progress tracking is key to staying on schedule
- Motto If it hasnt been verified, it doesnt
work.
22How NOT to do verification...
- Arnold was unhappily aware that the complete
Jurassic Park program contained more than half a
million lines of code, most of it undocumented,
without explanation... - What are you doing, John?
- Checking the code.
- By inspection? Thatll take forever.
- Michael Crichton, Jurassic Park