Title: Investigating Survivability Strategies for UltraLarge Scale ULS Systems
1Investigating Survivability Strategies for
Ultra-Large Scale (ULS) Systems
Jaiganesh Balasubramanian jai_at_dre.vanderbilt.edu w
ww.dre.vanderbilt.edu/jai
Dr. Aniruddha Gokhale gokhale_at_dre.vanderbilt.edu w
ww.dre.vanderbilt.edu/gokhale
Dr. Douglas C. Schmidt schmidt_at_dre.vanderbilt.ed
u www.dre.vanderbilt.edu/schmidt
Dr. Sherif Abdelwahed sherif_at_isis.vanderbilt.edu w
ww.isis.vanderbilt.edu/sherif
2Ultra-Large Scale (ULS) System Characteristics
- Key characteristics of the problem space
- Network-centric, dynamic, very large-scale
systems of systems - Stringent simultaneous QoS demands, e.g., never
die, time-critical, etc. - Highly diverse, complex, increasingly
integrated/autonomous application domains
3Motivating Scenario for ULS
- Impact of Service-Oriented Architectures on
enterprise distributed real-time embedded (DRE)
ULS systems - Applications composed of an operational string
of services - A service is an assembly of components
- Dynamic (re)deployment of services into
operational strings is necessary - Performability performance survivability
requirements
- Key challenges
- Regulating adapting to (dis)continuous changes
in runtime environments - e.g., online prognostics, dependable upgrades
- Satisfying tradeoffs between multiple (often
conflicting) QoS demands - e.g., secure, real-time, reliable, etc.
- Satisfying QoS demands in face of fluctuating
and/or insufficient resources - e.g., mobile ad hoc networks (MANETs)
4Some Performability Challenges for ULS Systems
- Performability challenges in dynamic provisioning
of operational strings services - Service workloads resource capacity issues
service placement depends on workloads
available resources - Service accessibility patterns service
survivability depends on its sharing degree
- Differentiated levels of QoS affects resource
provisioning survivability strategies - Operational string service failover different
failover possibilities e.g., as a whole or part
operational string or one service at a time - No one-size-fits-all dependability strategy
cannot dictate one survivability strategy on all
services operational strings
Application performability addressed by resolving
service placement survivability problems
5Model of Approach
- Model addresses various concerns
- Per-service concern Choice of implementation
- Depends on resources, compatibility with other
components in assembly - Coupling concern Choice of invocation
communication mechanism used - Sharing concern Shared services will need
proactive survivability since it affects several
services simultaneously - Failure recovery concern What is the unit of
failover?
- Availability concerns What is the degree of
redundancy? What replication styles to use? Does
it apply to whole assembly? - Deployment concerns How to select resources? How
much sharing? - Assembly concerns What components to assemble
dynamically? Configurations optimizations for
end-to-end performability?
Service placement service survivability
strategies address these concerns
6Addressing the Service Placement Problem
- Service placement problem must consider
- Set of computation nodes attributed by
- Processing index or capacity
- Memory index or capacity
- Survivability index
- Set of communication links attributed by
- Bandwidth index
- Survivability index
- Set of components attributed by
- Different implementations offering performance
tradeoffs across quality dimensions - Different implementations consuming various
amounts of resources - Constraints on being deployed as an assembly to
offer a complete service
- Replica placement issues involve
- Different availability requirements for different
assemblies of components - Multiple replicas needed, tolerate
non-availability of replicas based on importance
of assemblies - Replica resource provisioning depending on
replication schemes used - Load balancing of replicas if resources available
but introduce run-time problems on consistency
Service placement algorithms must consider
tradeoffs between providing performance to
applications providing survivability to
applications, allocating resources either to
primaries or replicas
7Addressing the Survivability Problem
- A configurable approach to survivability
including micro- (infrastructure) macro-
(assembly operational string) level strategies
- Micro-level strategies monitor infrastructure
state to make proactive decisions at - Component level (swapping migration)
- Middleware level (configurations)
- Component Server Level (process resource
allocations) - Node level (multiple components)
- Macro-level strategies monitor assembly health to
make failover decisions
- Failover based on type of failover unit
- Affects service placement decisions
- May involve load balancing
- State synchronization issues
- Replication styles (hidden by FT strategies)
- Initial prototype developed using
Component-Integrated ACE ORB (CIAO) Deployment
Configuration Engine (DAnCE) (www.dre.vanderbilt
.edu) - Future work on Data Distribution Service (DDS)
Distributed Real-time Specification for Java
(DRTSJ)