Title: An Application Monitoring System in Grid
1An Application Monitoring System in Grid
- Bartosz Balis
- Tomasz Szepieniec
2AGENDA
- PART I Monitoring introduction (B. Balis)
- PART II Concepts and problems in Grid
application monitoring (T. Szepieniec)
3PART I - OUTLINE
- Application Monitoring
- OMIS
- Grid Monitoring System OCM-G
- Instrumentation
4Application monitoring
- Monitor obtain information on or manipulate
target application - e.g. read status of applications processes,
suspend application, read / write memory, etc. - Monitoring module needed by tools
- Debuggers
- Performance analyzers
- Visualizers
- ...
5Monitoring integrated module
- Monitoring module integrated with GUI
- Usual case
6Monitoring autonomous system
- Separate monitoring system
- Tool / Monitor interface OMIS
7Monitoring system benefits
- Modularity
- GUI development separated from monitoring module
development - Single monitoring system
- multiple tools do not conflict each other
- coordination of access to shared objects
- enables tool interoperability
8Interoperability
- Multiple tools
- Common monitoring system
- Interoperability
- cooperation
- e.g. debugger visualizer
9OMIS
- Universal tool-mon.sys. interface
- Target system view
- hierarchical set of objects
- nodes, processes, threads
- grid additional objects sites
- objects identified by tokens, e.g. n_1, p_1, etc.
- Three types of services
- information services
- manipulation services
- event services
10OMIS services
- Information services
- obtain information on target system
- e.g. node_get_info obtain information on nodes
in the target system - Manipulation services
- perform manipulations on the target system
- e.g. thread_stop stop specified threads
- Event services
- detect events in the target system
- e.g. thread_started_libcall detect infocations
of specified functions
11OMIS requests
- Services can be combined into monitoring requests
- Two types of requests
- Unconditional requests
- to be executed immediately
- executed only once
- Conditional requests
- to execute actions whenever event occurs
- actions can be executed multiple time
12OMIS unconditional requests
Actions
Operands
stop thread t_1
13OMIS conditional requests
thread_started_libcall(t_1, MPI_Send)
counter_inc(c_1)
Event
Operands
Actions
whenever thread t_1 invokes MPI_Send, increment
counter c_1
14Grid-enabled OMIS-compliant Monitoring System
OCM-G
- Scalable
- distributed
- decentralized
- Efficient
- local buffers
- ? Three types of components
- local monitors (LM)
- service managers (SM)
- application monitors (AM)
15Service Managers and Local Monitors
- Service Managers
- one or more in the system
- request distribution
- reply collection
- Local Monitors
- one per node
- handle local objects
- actual execution of requests
16Application monitors
- Embedded in applications
- Handle some actions locally
- buffering data
- filtering of instrumentation
- monitoring requests
- E.g. REQ read variable a, REP value of a
- asynchronous
- no OS mechanisms involvedion
17PART II OUTLINE
- Concepts and problems in Grid application
monitoring - OCM-G as Grid service
- OCM-G component discovery
- Handling Multiple Applications
- OMIS message routing
- Application start-up
18OCM-G as Grid service
- Permanently running in Grid
- Global service
- multiple applications
- multiple users
- based on Globus users
- multiple tools
19OCM-G component discovery
- Use cases
- Assemble individual parts into a single
monitoring system - Enabling attachment of application processes and
tools - External mechanisms based on
- local file system
- MDS
- R-GMA
- ...
20Handling Multiple Applications
- Create virtual monitoring system
- One SM becomes MainSM for each application
- which one?
- Expansion and localization mechanism works in
virtual mode
21OMIS message routing
- Two options
- route via Main SM (option 1)
- SM communicate directly(option 2)
Where localization info should be stored and how
updated?
22Application start-up requirements
- Application started as usual (e.g. PORTAL,
globusrun) - Independent of communication protocol the
application uses - Use GRAM to place application processes where
OCM-G is enabled
23Application start-up HOWTO
- Monitoring parameters are put in command line
- Initial OMIS requests can also be put in command
line (e.g. breakpoints) - Obligatory starting parameters
- application token chosen apriori
- MainSM token should be known apriori
- (we work on how to avoid this)
24Conclusion
- OMIS as communication protocol
- Autonomous application monitoring system
- OCM-G use Globus services
- Normal application startup can be achieved
- Some problems are still to be addressed...