ChingTsun Chou March 2003 Slide 0 - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

ChingTsun Chou March 2003 Slide 0

Description:

The views expressed in this talk are the presenter's alone and ... Overview of Intel's Scalability Port (SP) ... SP was implemented in Intel's E8870 chipset ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 16

Provided by: chingts

Category:

more less

Transcript and Presenter's Notes

Title: ChingTsun Chou March 2003 Slide 0

1
Applying Formal Methods to Protocol
Specifications andSystem Architecture
Ching-Tsun Chou Multi-Processor
Architecture Enterprise Platforms Group Intel
Corporation
2
Disclaimer

The views expressed in this talk are the
presenters alone and not necessarily those of
Intel Corporation

3
Why formal methods?

Architectural specifications contain complex
distributed protocols whose correctness is
nontrivial to establish
Examples Directory-based cache coherence
protocols, forward-progress mechanisms,
variations of sliding window protocols,
Goal Get protocol specifications correct before
implementations commence
The earlier a bug is found, the easier it is to
fix it, and the more flexibility there is in
possible fixes
Formal verification (FV) is a body of powerful
techniques for achieving this goal
Formal modeling promotes clear thinking and
minimizes misunderstanding and misinterpretation
of specifications
In the early stages of protocol design, more bugs
are found during formal modeling than by model
checking
Protocol design and formal modeling should go
hand in hand
Formal modeling produces unambiguous golden
models of at least some aspects of complex
protocols
Executable reference models can be generated from
formal models
Experience shows that formal methods work
Already a standard industry practice Intel, Sun,
IBM, Compaq, SGI, ...
As this talk hopes to demonstrate

4
Formal vs simulation-based verification

Simulation-based verification
Check a small fraction of all possible behaviors
of a large model
Very large and relatively complete model
Model need not be simplified or abstracted
Only a very small part of the state space is
explored
Need to generate tests and collect coverage
feedback
gt Results only as good as your tests and checkers
Formal verification
Check all possible behaviors of a small model
All states are exhaustively explored
No tests are needed and coverage is 100
Only very small models can be handled
Often need drastic simplification and
abstraction
gt Results only as good as your models and
properties

Moral There is no free lunch!
5
Overview of Intels Scalability Port (SP)
architecture

Designed for mid-range shared memory
multiprocessors
Employ high-speed point-to-point interconnect
that provides good scalability for mid-range to
high-end systems
Shared buses are neither cost-effective nor
scalable beyond limited number of processors due
to signaling, thermal, mechanical, and other
challenges
Support flexible system architecture
Enable cost-optimal small systems to scalable
high-end systems
Enable system vendors with proprietary system
interconnects and components to use Intel
building blocks
An instance of SP was implemented in Intels
E8870 chipset

6
Overview of SP cache coherence protocol

Make no assumption whatsoever about the relative
timing of events or the ordering of messages
Completely asynchronous, event-driven
specification
Directory-based, though the directory
is optional (no directory null directory)
may or may not be physically distributed
A generalization of the invalidation-based MESI
protocol, but different caching agents may do
MESI-state transitions at different times
Employ mechanisms to resolve conflicts
collisions of requests from different requesters
to the same cache line in a distributed manner
Table-based specification with 1,000 rows in all
tables
gt20 transaction types, each of which has a
different behavior by itself and can interact
with every other transaction type

7
SP cache coherence protocol validation flow
Protocol specification
Boolean rules
Non-table-based code
Extracted p-tables
Generated p-tables
Formal verification model
C reference model
?

Model checking
Simulation
Find easy bugs in protocol spec
Find hard bugs in protocol spec
Find bugs in implementations
8
Properties verified

Data consistency
If a caches state is valid (i.e., S, E, or M),
then its data is up to date
Cache and directory state consistency
If any cache is in state E or M, the other caches
must be in I
If a presence bit in directory is 0, the
corresponding cache must be in I
If the directory state is I, all presence bits
are 0 (and hence all caches are I)
If the directory state is S, the caches whose
presence bits are 1 are in I or S
If the directory state is E, there is exactly one
presence bit being 1
Weak liveness properties AG EF (cs CS), for
each control state cs and each possible value
CS of cs
Excellent guard against missing rows in protocol
tables and other unexpected cases
Detect both global and local deadlocks, but not
livelock or starvation
Do not rely on fairness assumptions

9
Results of SP cache coherence protocol FV

An SP cache coherence protocol has gt1033
reachable states for a configuration containing 1
cache-line address, 1 home node, 2 caching nodes,
and all gt20 transaction types
Each property takes (on the average) 45 hours to
model-check on a 700 MHz Pentium III Xeon machine
with 4 GB of physical memory
Many interesting bugs were found in successive
versions of SP cache coherence protocol by both
formal modeling and model checking
In fact, more bugs were found by the former than
by the latter in the early phase of SP protocol
design
Not surprisingly, most problems were found when
SP was first designed and during major revisions
(e.g., when new transaction types were added)
But even minor revisions could introduce problems
Moral As far as cache coherence protocols are
concerned, unaided human reasoning should not be
trusted

10
Rule-based table checking flow
Specification document in word processor (e.g,.
FrameMaker)
Specification document in HTML
Convert
Extract flatten
Rules
Pre-processed table
Post-process
Generate
?

Post-processed table
Generated table
11
Why rule-based table checking works

Tables and rules take two fundamentally different
but complementary views
Tables are row-centric and enumeration of cases
(row case)
Rules are column-centric and expression of
relationships between columns
By comparing the two views against each other,
the chance of a bug escaping is minimized
Ideally, tables and rules should be constructed
by two different persons
Expression of complex relationships between
visible columns is simplified by means of hidden
columns
Cause-and-effect metaphor Hidden columns are
the ultimate but invisible causes of visible
columns
Hidden columns are hidden by existentially
quantifying them away
Hidden columns are used to increase further the
difference of the two views

12
Results of rule-based table checking

Coded boolean rules for SP protocol tables and
checked them against each other
Typically dozens of errors were found before
tables and rules agree
Most errors were trivial (e.g., typos), but some
were more serious (e.g., missing cases or
systematic misunderstanding)
Maintained the agreement between tables and rules
over 2 years and tens of major and minor protocol
revisions
Changing rules to keep up with tables almost
never required more efforts than changing tables
themselves
Rule-based table checking is our first line of
defense, flushes out virtually all easy bugs,
and has very low computational overhead
It takes lt 5 minutes to extract and verify by
rules all SP protocol tables
We are not advocating that code review of
tables be eliminated
Code review is still a must at the beginning
We do advocate that insights from code review
be captured and codified by rules and re-used
later when tables are changed

13
Novel applications of binary decision diagrams

Rule-based table generation and checking
Boils down to enumerating satisfying truth
assignments of boolean expressions over
enumerated types
Search for minimal deadlock-free wormhole routing
scheme
A wormhole routing scheme is deadlock-free ? Its
channel dependency graph is acyclic ? The
transitive closure of the graph contains no
self-loop
Hence reducible to BDD fixpoint computation
Details in our FMSD paper
Search for fault-tolerant link initialization
sequences
Details in our FMSD paper
Observations
Formal methods thinking leads to new ways of
looking at old problems
A little BDD goes a long way
BDD is an efficient representation of boolean
rules (1. above)
BDD supports exhaustive search of a solution
space (2. 3. above)

14
Lessons learned

Formal modeling steered us toward more precise
and concrete protocol specifications than we
would have written without it
Even an abstract formal model requires one to
spell out what exactly one means by each protocol
structure and action
Formal modeling also turned out to be an
excellent way to help architects articulate their
ideas
Formal verification gave us much higher
confidence in the correctness of our protocol
specifications than we would have without it
Certain distributed protocols (e.g.,
directory-based cache coherence protocols) are
too complex for unaided human reasoning alone to
get correct
Formal verification makes it less risky to modify
protocol specifications
Architecture definition affords a rich and
fruitful area for the application of formal
methods
Avoid state explosion with a high level of
abstraction
Get to bugs at the earliest possible stage
Encourage architects to choose more
validation-friendly schemes
Applying formal methods to a specification
enables the exploration of design spaces that are
beyond the scope of any particular implementation
Especially important for Intel, which defines
architectures that will be implemented by
multiple vendors over multiple product generations

15
Acknowledgements

Mani Azimi, Jay Jayasimha, Akhilesh Kumar, Victor
W. Lee, Phanindra K. Mannava, Seungjoon Park, and
Aniruddha Vaidya all contributed to the work
described above.

Write a Comment

User Comments (0)