Title: RTESCMS Potential Collaboration
1RTES/CMS Potential Collaboration
- RTES Group
- Vanderbilt University
- University of Illinois, Urbana Champaign
- University of Pittsburgh
- Syracuse University
- Fermilab
(NSF ITR grant ACI-0121658)
2Outline
- RTES Overview
- Goals, Team, Deliverables
- Tool Approach
- Modeling, Armors, VLA
- Demo Description
- Potential Collaborations
- System Configuration
- Run-Control
- Fault Mitigation
- GUI
3RTES Team
- The Real Time Embedded System Group
- A collaboration of five institutions,
- University of Illinois
- University of Pittsburgh
- University of Syracuse
- Vanderbilt University (PI)
- Fermilab
- Physicists and Computer Scientists/Electrical
Engineers with expertise in - High performance, real-time system software and
hardware, - Reliability and fault tolerance,
- System specification, generation, and modeling
tools. - NSF ITR grant ACI-0121658
4RTES Goals
- High availability
- Fault handling infrastructure capable of
- Accurately identifying problems (where, what, and
why) - Compensating for problems (shift the load,
changing thresholds) - Automated recovery procedures (restart /
reconfiguration) - Accurate accounting
- Extensibility (capturing new detection/recovery
procedures) - Policy driven monitoring and control
- Dynamic reconfiguration
- adjust to potentially changing resources
5RTES Goals (continued)
- Faults must be detected/corrected ASAP
- semi-autonomously
- with as little human intervention as possible
- distributed and hierarchical monitoring and
control - Life-cycle maintainability and evolvability
- to deal with new algorithms, new hardware and
new versions of the OS - User-defined Actions
- Customized to application/users
6The RTES Solution (for BTeV)
Modeling
Analysis
Resource
Reconfigure
Performance Diagnosability Reliability
Synthesis
Design and Analysis
Fault Behavior
Feedback
Algorithms
Synthesis
Runtime
Region Operations Mgr
ExperimentControl Interface
L1
L2/3
Soft Real Time
Hard
High Level Hierarchical fault management
Low Level
7RTES Concepts
- A hierarchical fault management system and
toolkit - Model Integrated Computing
- GME (Generic Modeling Environment) system
modeling tools - ARMORs (Adaptive, Reconfigurable, and Mobile
Objects for Reliability) - Robust framework for detection and reaction to
faults in processes - VLAs (Very Lightweight Agents for limited
resource environments) - Sensors/actuators to monitor/mitigate at every
level
8Configuration through Modeling
- Multi-aspect tool, separate views of
- Hardware components and physical connectivity
- Executables configuration and logical
connectivity - Fault handling behavior using hierarchical state
machines - Model interpreters can generate the system
- At the code fragment level (for fault handling)
- Download scripts and configurations
- Modeling languages are application specific
- Shapes, properties, associations, constraints
- Appropriate for application/context
- System model
- Messaging
- Fault mitigation
- GUI, etc.
9Modeling Environment GME
- Fault handling
- Process dataflow
- HW Configuration
GME is an Open-Source, Meta-configurable,
multi-aspect graphical modeling tool
10System Integration Modeling Language SIML
- Model Component Hierarchy and Interactions
- Loosely specified model of computation
- Model information relevant for system
configuration - Links to other narrowly focused modeling
languages - provides overall picture and access to models in
other languages - Overall Deployment View
11System Architecture expressed with SIML
- RunControl Manager
- Router Information
- How many regions ?
- How many worker nodes inside the region?
- Node Identification information
12SIML - Generation
- Configuration files
- Build Scripts
- Deployment Scripts
- Router Configurations
13Data Type Modeling Language DTML
- Modeling of Data Types and Structures
- Auto-generate marshalling-demarshalling
interfaces for communication
14Fault Mitigation Modeling Language - FMML
C
- Specification of Fault Mitigation Behavior using
Hierarchical Finite State Machines (A) - Configuration and instantiation of FM behaviors
as ARMORs (B) - Specification of FM Triggering Communication (C)
A
B
15FMML Generation
- Model translator generates fault-tolerant
strategies and communication flow strategy from
FMML models - Strategies are plugged into ARMOR infrastructure
as ARMOR elements - ARMOR infrastructure uses these custom elements
to provide customized fault-tolerant protection
to the application
16User Interface Modeling Language
- Enables reconfiguration of user interfaces
- Structural and data flow codes generated from
models - User Interface produced by running the generated
code
17User Interface Generation
Generator
18RTES Demonstration at IEEE RTAS05
- Used Tools and Models to Generate a Family of
Demos - 4, 16, 32, and 64 Processor Systems
- Demonstrates Fault Mitigation in a L2/L3 Trigger
Prototype for BTeV - GUI Matlab-based
- GUI design specified by GME models (GUIML)
- Network/Messaging Elvin publish/subscribe
- Messages defined by GME models (DTML)
- RunControl (RC) state machines
- Defined by GME models (SIML)
- Infrastructure ARMORs
- Custom Fault Mitigation elements defined by
models (FMML) - Application L2/3 FilterApp, DataSource
- Actual physics trigger code
- File-reader supplies physics/simulation data to
the FilterApp
19(No Transcript)
20Potential RTES contributions to CMS
- Graphical Modeling Tools for
- Specifying FunctionManager StateMachines with
FMML - Specifying Communication Messages at a
higher-level of abstraction with DTML - Can synthesize serialization/deserialization code
for the specific implementation technology such
as SOAP - Designing GUIs independent of the implementation
technology with GUIML - Can synthesize Java applet code for rendering and
communication over SOAP - Designing System Configurations with SIML a la
DuckCAD - Can synthesize artifacts in addition to XML
configuration files - Fault Tolerance Approach and Concepts
- Hierarchical Fault Mitigation via
collaborating/coordinating FM Managers - Custom fault-mitigation behavior specification as
hierarchical finite state machines - ARMORs and VLAs
21Potential RTES/RCMS Mapping
SIML
DTML
Configures
FMML
GML
And others
22Modeling Configurations with SIML
XDAQ ptMAZE example
Defining a partition or region
- lt?xml version'1.0'?gt
- ltPartitiongt
- ltDefinitionsgt
- ltClassDef id"15"gtptMAZElt/ClassDefgt
- ltClassDef id"11"gtRoundTriplt/ClassDefgt
- lt/Definitionsgt
- ltHost id "0" url"http//host140000"gt
- ltAddress type"ptMAZE"
- port"56"
- boardId"0"
- service"maze_service_immediate"
- switch"MAZE_SWITCH_M3E128"/gt
- ltApplication class"RoundTrip"
- targetAddr"auto"
- instance"0" network"ptMAZE"gt
- ltDefaultParametersgt
- ltParameter name"samples"
- type"unsigned long"gt
- 1000000
Host or application attributes
23Modeling Configurations with SIML
XDAQ ptMAZE example
Defining communications
- ltTransport class"ptMAZE"
- targetAddr"auto" instance"0"gt
- ltDefaultParametersgt
- ltParameter name"pollingMode" type"bool"gt
- false
- lt/Parametergt
- ltParameter name"mtuSize" type"int"gt
- 4096
- lt/Parametergt
- . . .
- lt/DefaultParametersgt
- lt/Transportgt
- lturlTransportgt
- //linux/x86/libptMAZE.so
- lt/urlTransportgt
- lt/Hostgt
- ltHost id "1" url"http//host240000"gt
- ltAddress type"ptMAZE"
protocol attributes
Defining an application
24Function Manager StateMachine Artifacts
- Statemachine.java
- Setup state-machine.
- States.java
- Set of possible states.
- Inputs.java
- Set of possible triggers.
- TransitionActions.java
- During state transition.
- TransitionFailedAction.java
- Transition Action failed.
- StateChangedAction.java
- When state has changed.
- FailureAction.java
- Transition failed.
25StateMachine.java
- public class HelloStateMachine extends
UserStateMachine -
- ...
- StateMachineDefinition fsmdef new
StateMachineDefinition() -
- // Inputs (Commands)
- fsmdef.addInput(HelloInputs.GOTOHELLO)
- fsmdef.addInput(HelloInputs.GOTOINIT)
- // Initial state
- fsmdef.setInitialState (HelloStates.INITIAL)
- // States
- fsmdef.addState(HelloStates.INITIAL)
- fsmdef.addState(HelloStates.HELLO)
- fsmdef.addTransition(
- HelloInputs.GOTOHELLO,
- HelloStates.INITIAL,
As expressed in FMML Models
26States.java
- public final class HelloStates
-
- public static final State INITIAL new State(
"Initial" ) - public static final State HELLO new State(
"Hello" ) - public static final State ERROR new State(
"Error" ) -
27Inputs.java
- public class HelloInputs
-
- public static final Input GOTOHELLO new Input(
"GoToHello" ) - public static final Input GOTOINIT new Input(
"GoToInit" )
28TransitionActions.java
- public class HelloTransitionActions
- extends UserTransitionActions
-
-
- public void helloAction()
- throws UserActionException
-
- System.out.println("helloAction Executed" )
- logger.info( "helloAction Executed")
-
29TransitionFailedAction.java
- public class HelloTransitionFailedActions
- extends UserTransitionFailedActions
-
-
- public void helloFailedAction()
- throws UserActionException
-
- logger.info("Executing helloFailedAction")
-
- getUserStateMachine().setState(DaqkitStates.ERROR
) - logger.info(helloFailedAction Executed")
-
(This requires extensions to the modeling
language)
30SOAP Messages - Client-Server ExampleSerializing
SOAPName commandName envelope.createName (
increment ) SOAPName originator
envelope.createName ( originator ) SOAPName
targetAddr envelope.createName ( targetAddr
) SOAPBody body envelope.getBody() SOAPElement
command body.addBodyElement ( commandName )
..
We can provide an abstract API call for creating
message such that the user code need not have
any understanding of the underlying SOAP calls
31Deserializing SOAP Messages
SOABBody body reply.getSOAPPart().getEnvelope().
getBody() if (body.hasFault()) SOAPFault
fault body.getFault() string msg Server
error msg fault.getFaultString() XDAQ_RAI
SE (xdaqException, msg) else SOAPName
counterTag (Counter, , ) vectorltSOAPElement
gt content body.getChildElements() for (int i
0 i lt content.size() i)
vectorltSOAPElementgt c contenti.getChildElemen
ts(counterTag) for (int j 0 j lt c.size()
j) 54 if (c0.getElementName()
counterTag) cout ltlt The server replied with
counter cout ltlt c0.getValue() ltlt endl
59
Reply to the message deserializing
32I2O Message RU Builder example
- DTML language
- allows both simple
- and composite types
- Floats ,integers ,
- signed , unsigned
- can be specified.
- Corresponding
- marshall demarshall
- code can be
- generated from models
33Discussions
- Relevancy/Interest?
- Tuning the concepts.
- Next Steps?
- More documentation CMS/XDAQ
- GUI, SOAP messages, I2O Messages, State Machines,
Data monitor, - XDAQ Examples
- Full, Larger-scale, Applications
- How can we contribute?
- Visit? June 4th
- Goals, Preparations?
34Backup Slides
35ARMOR Adaptive Reconfigurable Mobile Objects of
Reliability
36Very Lightweight Agents
- Minimal footprint
- Platform independence
- Employable everywhere in the system!
- Monitors hardware and software
- Handles fault detection communications with
higher level entities
37L2/3 Prototype Farm Setup
38The Demonstration System Architecture
public
private
39Matlab GUI
- Monitoring/Display of Node, and Region Health and
Performance - Command Interface for starting/stopping system
- Debug Interface for injecting faults