Title: Dynamic Verification of Sequential Consistency
1Dynamic Verification of Sequential Consistency
Albert Meixner Daniel J. Sorin Dept. of
Computer Dept. of Electrical and Science
Computer Engineering Duke University Duke
University
2Introduction
- Multithreaded systems becoming ubiquitous
- Commercial workloads rely heavily on parallel
machines - Reliability and availability are crucial
- Backward Error Recovery can provide high
availability - Recover to known good state upon error
- But can only recover from errors detected in time
- Memory system is of special interest
- Complex Many components, large transistor count
- Numerous error hazards
3Memory System Error Detection
- Must cover all memory system components
- DRAMs, caches, controllers, interconnect, and
write buffers - Mechanisms for individual components exist
- Storage structures ECC
- Interconnect checksums, sequence numbering
- Cache and memory controllers replication
- Adding detection to all components is hard
- Complicates design of every component
- Requires good intuition of interactions and
possible errors - ? Want comprehensive, end-to-end error detection
4Dynamic Verification
- Dynamic verification
- Correct system operation constantly monitored at
runtime - End-to-end scheme
- Detects transient errors, design bugs, and
manufacturing errors - Differs from statically verifying that design is
bug-free - High level invariants are checked, instead of
individual components - Simplified design of system components
- Can detect any low-level error that violates
invariant
5Memory Consistency
- Memory consistency model
- Formal specification of memory system behavior in
a multithreaded system - Defines order in which memory accesses from
different CPUs can become globally visible - Many consistency models exist, we focus on one
- Verifying memory consistency Verifying
correctness of the memory system - Ideal invariant for dynamic verification
6Sequential Consistency (SC)
- Requires appearance of total global order of all
loads and stores in system - Each load must receive value of most recent store
in total order to the same address - Program order of all processors is preserved in
total order - SC is most intuitive consistency model
- Good for programmers
- Speculation can make SC almost as fast as more
relaxed models - Our contribution Dynamic Verification of
Sequential Consistency (DVSC)
7Outline
- Introduction
- DVSC-Direct
- DVSC-Indirect
- Results
- Conclusion
8DVSC-Direct
CPU 1
Program Order
LD A?1
ST B?2
LD A?2
CPU 2
Program Order
LD C?1
ST A?2
LD C?1
Global Order
ST A?2
LD C?1
Verifier
LD A?1
ST B?2
LD A?2
LD C?1
9DVSC-Indirect Idea
- Verify conditions sufficient for Sequential
Consistency - In-order performance of memory operations
- Cache coherence
- Conditions formally defined and proven by Plakal
et al. SPAA 1998 - Two mechanisms
- On-chip checker for in-order performance
- Distributed checker for cache coherence
10In-Order Performance Verification
- A load of block B receives the value of
- the most recent local store to B or most recent
global store to B performed after all local
stores - Trivially observed on in-order processor with
coherent caches - Modern processors execute out-of-order
- Results of ooo-execution are considered
speculative until in-order re-execution and
verification - DVSC-Indirect uses DIVA checker core by Austin
Micro 1999 - Could substitute other mechanisms
11Cache Coherence
- All processors observe the same order of stores
to a given memory location - Difficult because the same memory location can
exist in different caches - Maintained by a coherence protocol
- Different protocols MOSI, MSI, MOESI, Token
Coherence, - Different maintenance mechanisms directory,
snooping - Verification uses divide and conquer
- Verify conditions provably sufficient for cache
coherence - Initially defined for proof of sequential
consistency by Plakal et al. SPAA1998
12Cache Coherence Verification
- Coherence Conditions
- Cache accesses are contained in an epoch
- Stores in read-write epochs
- Loads in read-write or read-onlyepochs
- Read-write epochs do not overlap other epochs
- Block data at beginning of epoch equals block
data at end of last read-write epoch - Verification
- Check if accesses are in appropriate epoch during
DIVA-replay - Collect epoch information at every node and send
to verifier - Verifier checks epoch history for overlaps and
data propagation
Epoch The time interval between obtaining and
losing permissions on a block.
13Implementation Overview
CPU Core
CPU Core
CPU Core
DIVA
DIVA
DIVA
Cache
Record Epochs
Cache
Record Epochs
Cache
Record Epochs
Interconnect
Memory
Collect Epochs
Memory
Collect Epoch
Memory
Collect Epochs
Verify Epochs
Verify Epochs
Verify Epochs
Epoch History
Epoch History
Epoch History
14At the Cache Controller
- All caches keep track of active epochs in the
Cache Epoch Table (CET) - Epoch Inform sent to the memory controllerwhen
epoch ends - Begin and end data are hashed
- Every DIVA cache access checks CET for active
epoch - Ensure access is contained in epoch
- Verification off the critical path
- Second order performance effect from bandwidth
usage
Epoch Inform Epoch Inform
CET Typeread-write or read-only
CET Begin time
CET Begin data
End time
End data
15At the Memory Controller
- Check for epoch overlaps and correct value
propagation - Generally requires entire block history ? O(N)
space - If epoch informs are processed in order
- Need end value of last read-write epoch for
propagation check - Need end time of last read-write and last
read-only epoch for overlap check - O(1) space
- Epochs arrive almost in order
- Fix remaining re-orderings in priority queue
before verifications - Epoch state in Memory Epoch Table (MET)
- Last end time of read-only epoch and read-write
epoch, last value
16Experimental Evaluation
- Empirically determine error detection capability
- Error injection into caches, controller,
interconnect, switches, etc. - Quantify error-free overhead
- Increase in interconnect bandwidth consumption
- Potential decrease in application performance
17Simulation Methodology
- Full-system simulation of 8-CPU UltraSPARC SMP
- Simics functional simulation
- GEMS-based timing simulation
- 2 GB RAM, 4-way 32KB ID L1, 4-way 1MB L2
- SafetyNet for backward error recovery
- MOSI-Directory and MOSI-Snooping
Benchmarks Benchmarks
Apache 2 Static web-server
SpecJBB 3-Tier Java system
OLTP Online transaction system with DB2
Slashcode Dynamic website with perl and mysql
Barnes Barnes-Hut from SPLASH2
18Bottleneck Link Bandwidth - Directory
19Error-Free Runtime - Directory
slower
20Conclusions
- DVSC-Direct and DVSC-Indirect enable end-to-end
verification of the memory system - DVSC-Indirect imposes acceptable hardware and
performance overhead - An extension of DVSC-Indirect to relaxed
consistency is currently under development
21Questions?