Title: Property Assurance in Middleware for Distributed Real-Time Systems*
1Property Assurance in Middlewarefor Distributed
Real-Time Systems
- Christopher Gill
- cdgill_at_cse.wustl.edu
- Department of Computer Science and Engineering
- Washington University, St. Louis, MO
Seminar at the Coordinated Sciences Laboratory
University of Illinois Urbana-Champaign Thursday,
May 24, 2007
Research supported in part by NSF grants
CCF-0615341 (EHS) and CCF-0448562 (CAREER)
2A Motivating Example Real-Time Image Transmission
- Chains of end-to-end tasks
- E.g., compress, transmit,
decompress, analyze, and then
display images - Property assurance is crucial
- Soft real-time constraints
- Deadlock freedom
- Many applications have similar needs
- Is correct reuse of middleware possible?
Console
Gill et al., Integrated Adaptive QoS Management
in Middleware An Empirical Case Study (RTAS
04) Wang et al., CAMRIT Control-based
Adaptive Middleware for Real-time Image
Transmission (RTAS 04)
Camera
3Middleware for Distributed Real-Time Systems
- Layered stacks of mechanisms
- thread, port, socket, timer
- reactor, monitor
- client, server, gateway, ORB
- Task chains span multiple hosts
- may be initiated asynchronously
- Limited host resources
- used by multiple task chains
A Distributed System Software Stack
4One Widely Used Mechanism Reactor
Application
Reactor
Socket
read()
handle_input()
data arrival
select()
Event Handlers
handle_input()
read()
Read Handle Set
Reactor abstraction has many variations select
() vs. WaitForMultipleObjects() single thread
vs. thread pool unsynchronized vs. mutex vs.
readers-writer
5An Illustration of Inherent Complexity
- Wait-on-Reactor
- Handler waits in reactor for reply
- E.g., set read_mask, call select() again
- Other requests can be processed while replies are
still pending - For efficiency, call stack remembers handler
continuation - Intervening requests may delay reply processing
(LIFO semantics)
- Wait-on-Connection
- Handler waits on socket connection for the reply
- Blocking call to recv()
- One less thread listening on the Reactor for new
requests - Exclusive handling of the reply
- However, may cause deadlocks if reactor
upcalls are nested
Two essential research questions How can we
represent and analyze such diverse behavior? How
can we enforce properties that span hosts,
efficiently?
6Essential Technical Objectives
- A principled basis for middleware verification
- Model each mechanisms inherent complexity
accurately - Remove unnecessary complexity through abstraction
- Compose models tractably and with high fidelity
to system itself - New protocols and mechanisms for property
enforcement - Exploit call graph structure and other
domain-specific information - Develop efficient local mechanisms for end-to-end
enforcement - Design frameworks to support entire families of
related protocols - Practical extensions to preemption and control
semantics - Leverage existing theory to address part of the
problem space - Identify and exploit domain-specific problem
structure - Develop decidable and tractable representations
of other behavior
7Model Architecture in IF for ACE
- Network/OS layer inter-process communication
abstractions - Middleware layer ACE pattern-oriented
abstractions - Application layer application-specific semantics
within ACE event handlers
8Modeling Threads
- Challenge
- No native constructs for threads in model
checkers that currently support timed automata - Option 1 model all thread actions as a single
automaton - Suitable for high level modeling of application
semantics - Option 2 model a thread as multiple interacting
automata - Interactions model the flow of control
- This option better abstracts the nuances of
ACE-level mechanisms
Foo
Bar
input method_request
output method_request
input method_result
output method_result
9Modeling Thread Scheduling Semantics (1/4)
- Easy to achieve with one automaton per thread
- Specify to model checker directly
- E.g., using IF priority rules
- More difficult with more than one automaton per
thread - Thread of control spans interactions among
automata - Need to express thread scheduling in terms of
execution control primitives provided by the
model checker
Activity1
Activity2
Update Display
Control Flow Rate
1 automaton per thread
prio_rule pid1 lt pid2 if pid1 instanceof
Activity1 and pid2 instanceof Activity2
Foo
Bar
input m_req
output m_req
input m_result
output m_result
10Modeling Thread Scheduling Semantics (2/4)
- Solution
- Introduce a thread id that is propagated along
automata interactions - Thread id acts as an index to a storage area
which holds each threads scheduling parameters
Resulting Behavior
Bar1
Foo1
Foo1
Bar1
1
2
Bar2
Foo2
Bar1
Foo1
Prio5
Prio8
Foo2
Bar2
2
1
thread_schedule pid1 lt pid2 if pid1 instanceof
Foo1 and pid2 instanceof Bar1 and (Foo1pid1).th
readid ltgt (Bar1pid2).threadid
and (Thread((Foo1pid1).threadid)).prio
lt (Thread((Bar1pid2).threadid)).prio )
Hint to the model checker
Give higher preference to the automaton whose
thread (pointed to by thread id) has higher
priority
11Modeling Thread Scheduling Semantics (3/4)
- What if two threads have the same priority?
- In an actual implementation, run-to-completion
(SCHED_FIFO) may control the possible
interleavings - How can we model run-to-completion?
Bar1
Foo1
Foo2
Foo3
Foo1
Bar2
Foo2
Bar1
Foo1
Bar3
Foo3
Bar1
Bar2
Bar3
Bar1
Foo1
Bar1
How do we prune out this space?
Bar2
Foo2
Foo3
Bar3
12Modeling Thread Scheduling Semantics (4/4)
- Solution
- Record id of currently executing thread
- Update when executing actions in each automaton
Currentnil
Bar1
Foo1
Current1
Current2
Foo2
Bar1
Current1
Bar1
Foo3
Current1
Bar1
Hint to the model checker
Give higher preference to the automaton whose
thread is the currently running thread.
Non-deterministic choice if Current is nil
Current2
Bar2
Current2
Bar3
Current2
13Problem Over-constraining Concurrency
Hint to the model checker
Currentnil
Give higher preference to the automaton whose
thread is the currently running thread.
Non-deterministic choice if Current is nil
Bar1
Foo1
Current1
Current2
Foo2
Bar1
Current1
Bar1
Foo3
Current1
Bar1
Current2
Bar2
Current2
Bar3 always chosen to run
Bar3
Bar3
Current2
Current2
Time progresses
Foo3
14Solution Idle Catcher Automaton
Foo3, Bar3 blocked at this point
Current2
Idle catcher runs
- Key idea lowest priority catcher runs when all
others are blocked - E.g., catcher thread in middleware group
scheduling (RTAS 05) - Here, idle catcher automaton
- runs when all other automata are idle (not
enabled), but before time progresses - Resets value of current id to nil
Currentnil
Time progress
Currentnil
Foo3
Bar3
Foo3 or Bar3 could be chosen to run.
Over-constraining eliminated
15Problem Tractability
right away
in a minute
get coffee
go for an espresso
maybe tomorrow?
- Model checking can suffer from state space
explosion - State space reduction, live variable analysis can
help - But even good model checkers dont fully solve
this - Need to think of modeling as a design issue, too
- Does the model represent what it needs to
represent? - Can the model be re-factored to help the checker?
- Can domain specific information help avoid
unnecessary checking?
16Optimization 1 Leader Election
- Leader/Followers concurrency
- Threads in a reactor thread pool take turns
waiting on the reactor - One thread gets the token to access the reactor
- leader - All other threads wait for the token followers
- It does not matter which thread gets selected as
leader in a threadpool - Model checker not aware of this domain specific
semantics - For BASIC-P protocol example, saved factor of 50
in state space, and factor of 20 in time
Token to access the reactor is available
T1
T3
T2
T2
T3
T3
Prune this out
17Optimization 2 System Initialization
- Similar idea, but different technique
- Iff ok to establish initial object relations in
any order, can optimize away - E.g., 2 server automata, each of which creates a
reactor automaton - Useful when modeling object systems in model
checkers with dynamic automaton creation
capability (e.g., IF) - State space reduction depends on application
- Factor of 250 for a deadlock scenario with 2
reactors and 3 threads in each reactor
S1 creates R
S2 creates R
1
1
S1
R
S2
R
S2 creates R
S1 creates R
1
1
S1
R
S2
R
2
2
S2
R
S1
R
Prune this out
18Verification of a Real-Time Gateway
Consumer1
Supplier1
Consumer2
Gateway
Consumer3
Supplier2
Consumer4
- An exemplar of many realistic ACE-based
applications - We modified the Gateway example to add new
capabilities - E.g., Real time, Reliability, Control-Push-Data-Pu
ll - Value added service in Gateway before forwarding
a to consumer - E.g. Using consumer specific information to
customize data stream - Different design, configuration choices become
important - E.g., number of threads, dispatch lanes, reply
wait strategies
19Model Checking/Experiment Configuration
C1
100ms
Gateway
20
C2
100ms
Period
20
100ms
S1
C3
10
50ms
10
S2
50ms
C4
50ms
Relative Deadline
Value-added execution (and its cost)
- Gateway is theoretically schedulable under RMA
- Utilization 80
- Schedulable utilization 100 for harmonic
periods - Assumption Messages from 50ms supplier is given
higher preference than 100ms supplier - ACE models let us verify scheduling enforcement
- IN THE ACTUAL SYSTEM IMPLEMENTATION
Deadline Exec time
C1 100ms 20ms
C2 100ms 20ms
C3 50ms 10ms
C4 50ms 10ms
20Real-time Gateway Single Thread
Gateway
ConsumerHandler
SupplierHandler
ConsumerHandler
SupplierHandler
Consumer
ConsumerHandler
Supplier
ConsumerHandler
Reactor
- Single reactor thread dispatches incoming events
- I/O (reactor) thread same as dispatch thread
- I/O thread responsible for value added service
21Real-time Gateway Dispatch Lanes
Gateway
ConsumerHandler
SupplierHandler
ConsumerHandler
SupplierHandler
Consumer
ConsumerHandler
Supplier
ConsumerHandler
Reactor
- Single reactor thread again dispatches events to
gateway handlers - I/O (reactor) thread puts message into dispatch
lanes - Lane threads perform value added service,
dispatch to consumers - DO QUEUES HELP OR HURT TIMING PREDICTABILITY?
22Model/Actual Traces for Real-time Gateway
Execution in the context of lane threads
Execution in the context of reactor thread
Single threaded Gateway
Gateway with dispatch lanes
S1,S2
S2
S1,S2
S2
Deadline miss for Consumer4 because of blocking
delay at reactor
C1
C2
C3
C4
C3
C4
C1
C2
C3
C4
C2
Model
C1
C2
C3
C4
Actual
C3
C4
C1
C2
C3
C4
C2
Time
Time
20
40
60
10
30
50
20
40
60
10
30
50
70
80
90
100
Expected execution timeline with RMS
Period Exec time Deadline
C1 100ms 20ms 100ms
C2 100ms 20ms 100ms
C3 50ms 10ms 50ms
C4 50ms 10ms 50ms
C1, C2, C3, C4
C3, C4
C3
C4
C1
C2
C3
C4
C2
20
40
60
10
30
50
70
80
90
100
23Essential Technical Objectives
- A principled basis for middleware verification
- Model each mechanisms inherent complexity
accurately - Remove unnecessary complexity through abstraction
- Compose models tractably and with high fidelity
to system itself - New protocols and mechanisms for property
enforcement - Exploit call graph structure and other
domain-specific information - Develop efficient local mechanisms for end-to-end
enforcement - Design frameworks to support entire families of
related protocols - Practical extensions to preemption and control
semantics - Leverage existing theory to address part of the
problem space - Identify and exploit domain-specific problem
structure - Develop decidable and tractable representations
of other behavior
24Properties, Protocols, and Call Graphs
- Many real-time systems have static call
graphs - even distributed ones
- helps feasibility analysis
- intuitive to program
- Exploit this to design efficient protocols
- pre-parse graph and assign static
attributes to its nodes - Resource dependence, prioritization
- maintain local state about use
- enforce properties according to (static)
attributes and local state - Guard a(fi) lt tRj
- Decrement, increment tRj
Reactor 1
tR1 2
a(f4) 0
f4
f1
a(f1) 1
a(f3) 0
f3
f2
a(f2) 0
tR2 1
Subramonian et al., HICSS04 Sanchez et al.,
FORTE05, IPDPS06, EMSOFT06, OPODIS06
Reactor 2
25Property Enforcement Mechanisms
- Protocol enforcement has a common structure
- pre-invocation method
- invocation up-call
- post-invocation method
- Specialized strategies implement each protocol
- BASIC-P
- annotation variable
- k-EFFCIENT-P
- annotation array
- LIVE-P
- annotation
- balanced binary tree
- All of these protocols work by delaying upcalls
- Constitutes a side effect that model checker
should evaluate
26Timing Traces for BASIC-P Protocol
EH21
EH11
R1
R2
EH31
Flow1
EH22
EH12
R1
R2
EH32
Flow2
Model checking actual timing traces show
BASIC-P protocols regulation of threads use of
resources (no deadlock)
EH23
EH13
R1
R2
EH33
Flow3
27BASIC-P Blocking Delay Comparison
Actual Execution
Model Execution
Blocking delay for Client2
Blocking delay for Client3
28Overhead of ACE TP/DA reactor with BASIC-P
Negligible overhead with no DA protocol
Overhead increases linearly with of event
handlers due suspend/resume actions on handlers
at BASIC-P entry/exit
29Essential Technical Objectives
- A principled basis for middleware verification
- Model each mechanisms inherent complexity
accurately - Remove unnecessary complexity through abstraction
- Compose models tractably and with high fidelity
to system itself - New protocols and mechanisms for property
enforcement - Exploit call graph structure and other
domain-specific information - Develop efficient local mechanisms for end-to-end
enforcement - Design frameworks to support entire families of
related protocols - Practical extensions to preemption and control
semantics - Leverage existing theory to address part of the
problem space - Identify and exploit domain-specific problem
structure - Develop decidable and tractable representations
of other behavior
30Concurrency, Resources, and the System State Space
Aswathanarayana et al., RTAS05
- Example concurrency architecture processing
pipelines - Thread per pipeline stage (image codec,
filtering, analysis, etc.) - Resource itself constrains timed state space
- Pipelines progress can only diverge within total
resource bound - Even off-the-shelf scheduling further limits
state space - Interleaving of resource allocation further
bounds divergence
31Scheduling and Preemption
Joint work with Douglas Niehaus and Noah Watkins
(University of Kansas) and Venkita Subramonian
(ATT Labs)
- Compare to classic static scheduling policies
like RMS - In those approaches preemption occurs at well
defined times - Even if release times are out of phase, can tag
and bound early - Yet, more nuanced policies are often needed in
practice - E.g., fair-progress based, with variable
execution time per job - More difficult to bound, though similarly
quasi-cyclic
32Model Composition and State Space Exploration
- Problems
- No common framework for checking models of timed
component interactions with preemption and
alternative concurrency semantics - Decidability/tractability of models with
preemption - Solution approaches
- Develop new component-based modeling semantics
- Huang-Ming Huang component automata model,
algorithm, checker - Reduce state space using domain-specific
information - Terry Tidwell exploit scheduling-induced
quasi-cyclic structure
33Solution Approach Timed ? Time Domain Automata
- Exploit schedulers enforcement of fairness
- Can parameterize time and state
- Likely to result in a quasi-cyclic structure
Tidwell et al., WUSTL CSE Technical Report
2007-34
34Solution Approach Time/Progress Bounds
- Bounded fairness gives a particularly nice case
- Captures behavior of fair-progress scheduled
systems - Leads to a quasi-cyclic state space, allows
analysis - Notice convergence in the limit to a common state
Tidwell et al., WUSTL CSE Technical Report
2007-34
35A Brief Survey of Closely Related Work
- Vanderbilt University and UC Irvine
- GME, CoSMIC, PICML, Semantic Mapping
- UC Irvine
- DREAM
- UC Santa Cruz
- Code Aware Resource Management
- UC Berkeley
- Ptolemy, E-machine, Giotto
- Kansas State University and University of
Nebraska - Bogor, Cadena, Kiasan
36Concluding Remarks
- Timed automata models of middleware building
blocks - Are useful to verify middleware concurrency and
timing semantics - Domain specific model checking refinements
- Help improve fidelity of the models (run to
completion, priorities) - Can achieve significant reductions in state space
- Can make otherwise intractable problems checkable
- Property protocols
- Reduce what must be checked by provable
enforcement - Also benefit from model checking (due to side
effects) - Future work extend modeling approach beyond
real-time concerns to cyber-physical system
concerns - Model dependence, interference, faults and
failure modes - Potential solution approach linear hybrid
automata augmented with new domain-aware
techniques for constraining complexity