A Billion Cycles a Day: Industrial Verification - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A Billion Cycles a Day: Industrial Verification

Description:

... 'Validating the Intel Pentium 4 Microprocessor' by Bob Bentley, ... 'Checking the code.' 'By inspection? That'll take forever.' - Michael Crichton, Jurassic Park ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 23
Provided by: ecsU5
Category:

less

Transcript and Presenter's Notes

Title: A Billion Cycles a Day: Industrial Verification


1
A Billion Cycles a DayIndustrial Verification
  • Matthew Heath
  • Presentation to Synthesis Verification Class
  • May 6, 2003

Based on Validating the Intel Pentium 4
Microprocessor by Bob Bentley, DAC 2001
2
How do you verify a design with...
  • 42 million transistors
  • 1 million lines of RTL code
  • 600 1000 people working on it
  • A 3-year design time
  • Daily design changes

3
How do you verify a design which has bugs like
this??
  • The FMUL instruction, when the rounding mode is
    set to round up, incorrectly sets the sticky
    bit when the source operands are src1670
    X2i15 12i src2670 Y2j15 12jwhere
    ij 54 and X,Y are integers

4
And the answer is...
  • Hire 70 validation engineers
  • Buy several thousand compute servers
  • Write 12,000 validation tests
  • Run up to 1 billion simulation cycles per day for
    200 days
  • Check 2,750,000 manually-defined properties
  • Find, diagnose, track, and resolve 7,855 bugs
  • Apply formal verification with 10,000 proofs to
    the instruction decoder and FP units
  • This found that obscure FMUL bug!

5
We know why validation is hard for tools.Why is
it hard for people who run them?
  • To meet an aggressive tapeout schedule, design
    and validation must occur in parallel without one
    blocking the other.
  • Validation starts before the design is done
  • Design changes occur while validation tests are
    running
  • Both design and validation must continue in the
    presence of known, unfixed bugs

6
The design team
  • 300 designers write RTL code
  • Refer to architectural spec, textbooks, research
    papers, conversations
  • Start with basic functionality and progressively
    add features according to project staging plan
  • Do simple self-checks along the way

7
The validation team
  • 100 validators write RTL tests
  • Refer to same sources as designers, plus the RTL
    implementation itself
  • Write functional tests to exercise features as
    theyre implemented
  • Run tests on RTL simulator
  • Diagnose failures
  • File bug reports in central database

8
The management
  • Collect and analyze data
  • Pass/fail status of tests
  • Bug database statistics (counts, priority, age,
    discovery rate, fix rate, etc.)
  • RTL feature implementation progress
  • Compare trends with project schedule
  • Respond if necessary
  • Re-allocate resources to high-risk areas
  • Prioritize work

9
SRTL Structural RTL
  • Boolean equations no behavioral syntax
  • State-accurate
  • RTL state maps directly to schematic state
  • High-level constructs supported
  • Macros, constants, loops, vectors
  • Design hierarchy
  • Full-chip has 6 clusters
  • Each cluster has several units
  • Each unit has tens of functional blocks
  • Each block has O(104) transistors
  • Each designer owns several functional blocks

10
SRTL models
  • Cluster and full-chip level
  • Full-chip models consume 1GB of disk space
  • Compiled, executable SRTL code
  • Source code
  • Test environments
  • Include emulation of external logic
  • Direct control over interface signals
  • Pre-defined sets of signals commonly selected for
    tracing during test debug
  • Like an Awaves configuration
  • Library of useful test fragments

11
Most design work at cluster level
  • Decouples cluster and full-chip validation
  • Designers graft to latest cluster models
  • Check-out and edit selected source files
  • Incremental model build
  • Run validation tests
  • Revision control system
  • Designers check-in edited source files
  • Log messages include change descriptions, author,
    timestamp

12
Cluster model release process
  • Designers periodically turn-in selected
    checked-in versions of source files
  • Cluster model builders process turn-ins
  • Merge changes from different versions of the same
    source file included in multiple turn-ins
  • Coordinated turn-ins sometimes necessary
  • Compile an executable cluster SRTL model
  • Run tests provided by the validators
  • Report test failures to validators and designers
    for debug
  • Acceptable models released to design team for
    future grafts

13
Full-chip model release process
  • Same process, different hierarchy
  • Cluster model builders don designers hat
  • Graft to full-chip model
  • Edit based on changes to recent cluster models
  • Incremental full-chip model build
  • Run full-chip validation tests
  • Debug failures, full-chip turn-in
  • Now full-chip model builders take over...
  • Process turn-ins from all clusters
  • Run full-chip validation tests again!
  • Release full-chip models to design team

14
Netbatch
  • 109 simulation cycles / day 10 Hz 105 sec/day
    103 computers
  • Netbatch manages compute server workload
  • For a given SRTL model and set of tests, create a
    job file and send it to netbatch
  • Each sub-team has a netbatch allocation
  • Jobs exceeding allocation enter wait queue
  • Wait times of 24 hrs not uncommon
  • Test results
  • Pass/fail statistics
  • Failure time and meaningful error message
  • Traces of user-selected system state

15
Efficiency improvements
  • A SRTL change made by a designer...
  • Appears in a cluster model 1 week later
  • Appears in a full-chip model 2 weeks later
  • Validators find bugs in released models which the
    designer has already fixed
  • Onion peeling vs. whack-a-mole debug
  • Temporarily disabling failing properties
  • Releasing models which fail some tests
  • System state capture and restore

16
Central bug database
  • Released model version
  • Failing validation test symptoms
  • Root cause
  • Requested design change
  • Priority
  • Log of discussion among designers, validators,
    and managers
  • Status / disposition
  • New, ETA, test fixed, design fixed ( version),
    validated, dropped

17
Bug root causes
18
Schematic formal verification
  • Use formal techniques because schematic
    simulation takes too long
  • Schematic design starts long before SRTL design
    is done
  • Bottom-up
  • Verify SRTL macros vs. library cells first
  • Black-box macrocells verify block
  • Because SRTL is state-accurate, verification is
    combinational only!

19
One SRTL state may map to multiple functionally
equivalent schem states
X
Z
W
Y
CLK
W1
Q
D
Z
X
Z X Y MSFF (Z, W, CLK)
Y
W2
Q
D
CLK
Z1
W1
X
Y
Z2
W2
CLK
20
Retiming must be back-annotated into SRTL
  • Exception Inverters

Z1
X
Z X Y MSFF (Z, W, CLK)
W
Y
Z2
CLK
Y
MSFF (X, Y, CLK) Z Y
X
Z
CLK
21
Conclusion
  • Efficient verification of large-scale designs is
    a daunting management challenge
  • Design and validation are concurrent, not
    iterative
  • Possible with adequate resources and powerful
    tools to use the resources efficiently
  • Methodology constraints keep the problem
    tractable
  • Clear communication among team
  • Careful documentation
  • Progress tracking is key to staying on schedule
  • Motto If it hasnt been verified, it doesnt
    work.

22
How NOT to do verification...
  • Arnold was unhappily aware that the complete
    Jurassic Park program contained more than half a
    million lines of code, most of it undocumented,
    without explanation...
  • What are you doing, John?
  • Checking the code.
  • By inspection? Thatll take forever.

- Michael Crichton, Jurassic Park
Write a Comment
User Comments (0)
About PowerShow.com