Title: Efficient Decentralized Monitoring of Safety in Distributed Systems
1Efficient Decentralized Monitoring of Safety in
Distributed Systems
- Koushik Sen
- Abhay Vardhan
- Gul Agha
- Grigore Rosu
University of Illinois at Urbana-Champaign, USA
2Software Reliability
- Software Validation
- Rigorous and Complete Methods
- Model Checking
- Theorem Proving
- Infeasible for large-scale open distributed
systems - Non-determinism and Asynchrony
- Testing
- Widely used
- Ad-Hoc
- Good Test Coverage Required
- Runtime Monitoring
- Adds rigor to Testing
3Centralized Monitoring Approach
- Monitoring Use Formal Methods in Testing
- Synthesize light-weight Monitors from
Specification - Automata, Rewriting-based Monitors, State
machines - Instrument code to insert monitors
- Execute instrumented code
- Distributed System Monitoring
- Global state is distributed
- For every state update send state to a central
monitor - Central monitor assembles them to form consistent
execution traces (Vector Clocks) - Sequence of global states
- Monitor execution traces
4An Example
- Mobile node a requests certain value from node b
- b computes the value and sends it to a
- Property no node receives a value from another
node to which it had not sent a request
5Centralized Monitoring Example
If a receives a value from b then b calculated
the value after receiving request from a
valRcv ? ?(valComputed ? ?valReq)
valRcv ? ?(valComputed ? ?valReq)
valReq
?valReq
valComputed ? ?valReq
?(valComputed ? ?valReq)
Monitor
b
valComputed
a
valReq
valRcv
6Decentralized Monitoring Approach
If a receives a value from b then b calculated
the value after receiving request from a
valRcv ? _at_b(?(valComputed ? _at_a(?valReq)))
valComputed ? _at_a(?valReq)
?(valComputed ? _at_a(?valReq))
_at_a(?valReq)
b
valComputed
a
valReq
valRcv
?valReq
valRcv ? _at_b(?(valComputed ? _at_a(?valReq)))
7Past time Distributed Temporal Logic (pt-DTL)
- Past Time Linear Temporal Logic Pnueli
- Extended with a Operator from epistemic logic (_at_)
- Aumann76Meenakshi et al. 00
- Properties with respect to a process, say p
- Interpreted over sequence of knowledge that p has
about global state
8Remote Formulas in pt-DTL
- _at_a F at process b
- _at_ makes remote formula F at process a local to
process b - Alarm at process b implies that there was a
fire at a - alarm ? _at_afire
- a formula with respect to process b
9Remote Expressions in pt-DTL
- Remote expressions arbitrary expressions
related to the state of a remote process - Propositions constructed from remote and local
expressions - If my alarm is set then eventually in past
difference between my temperature and temperature
at process b exceeded the allowed value - alarm ? ?((myTemp - _at_btemp) gt allowed)
10Safety in Airplane Landing
- If my airplane is landing then the runway that
the airport has allocated matches the one that I
am planning to use - landing ? (runway _at_airportallocRunway)
11Leader Election Example
- If a leader is elected then if the current
process is a leader then, at its knowledge, none
of the other processes is a leader - elected ? (stateleader ? /\i?j(_at_j(state ?
leader)))
12pt-DTL syntax
- Fi true false P(Ei) Fi Fi Æ Fi
propositional - Fi Fi ?Fi Fi S Fi temporal
- _at_jFj epistemic
- Ei c vi 2 Vi f(Ei) functional
- _at_jEj epistemic
13Interpretation of _at_jEj at process i
p3
m4
m1
m2
p2
_at_ 1(x9)
m3
p1
x7
x9
14Monitoring Algorithm
- Requirements
- Should be fast so that online monitoring is
possible - Little memory overhead
- Additional messages sent should be minimal
ideally zero
15KnowledgeVector
- Let KV be a vector
- one entry for each process appearing in formula
- KVj denotes entry for process j
- KVj.seq is the sequence number of last event
seen at process j - KVj.values stores values of j-expressions and
j-formulae
16Monitoring using KnowledgeVector
- Maintain KnowledgeVector about global state at
each process - Attach KnowledgeVector with outgoing messages
- Update KnowledgeVector with incoming messages
- At each process monitor local KnowledgeVector
17KnowledgeVector Algorithm
- internal event (at process i)
- store eval(Ei,si) and eval(Fi,si) for each _at_iEi
and _at_iFi in KVii.values - send m
- KVii.seq à KVii.seq 1. Send KVi with m as
KVm - receive m
- for all j, if KVmj.seq gt KVij.seq then
- KVij.seq à KVmj.seq
- KVij.values à KVmj.value
- store eval(Ei,si) and eval(Fi,si) for each _at_iEi
and _at_iFi in KVii.values
18Example
p3
p2
Y7
Y3
violation
p1
X5
X9
X6
KV1.seq
(Y _at_1X) at p2
KV1.values
19DIANA Architecture
pt-DTL Monitor
20Conclusion
- pt-DTL can express interesting and useful safety
properties of distributed systems - Decentralized Technique to effectively verify
Distributed Systems at runtime - No extra message over-head for monitoring
- KnowledgeVector as monitors