Transparent Cross-Border Migration of Parallel Multi Node Applications - PowerPoint PPT Presentation

About This Presentation

Title:

Transparent Cross-Border Migration of Parallel Multi Node Applications

Description:

CLI. CRM. Negotiation. Scheduler. SSC. PP. Cross Border Migration: Checkpoint Migration ... CLI. CRM. Negotiation. Scheduler. PP. WS-AG implementation based on GT4 ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 53

Provided by: ach112

Category:

more less

Transcript and Presenter's Notes

Title: Transparent Cross-Border Migration of Parallel Multi Node Applications

1
Transparent Cross-Border Migration of Parallel
Multi Node Applications

Dominic Battré, Matthias Hovestadt, Odej Kao,
Axel Keller, Kerstin Voss
Cracow Grid Workshop 2007

2
Outline

Motivation
The Software Stack
Cross-Border Migration
Summary

3
The Gap between Grid and RMS

User asks for SLA
Grid Middleware realizes job by means of local
RMS
BUT RMS offer Best Effort
Need SLA-aware RMS

4
HPC4U Highly Predictable Clusters for
Internet-Grids

Objective
Software-only solution for an SLA-aware, fault
tolerant infrastructure, offering reliability and
QoS, and acting as active Grid component
Key Features
System level checkpointing
Job migration
Job types sequential and MPI-parallel
Planning based scheduling

5
HPC4U Planning Based Scheduling
queuing systems planning systems
planned time frame present present and future
new job requests insert in queues re-planning
assignment of planned start time no all requests
runtime estimation not necessary mandatory
backfilling optional yes, implicit
advance reservations not possible yes, trivial
queues
new jobs
Machine
new jobs
time
6
HPC4U Software Stack
User- / Broker- Interface
CLI
Negotiation
RMS
Scheduler
SSC
Process
Network
Storage
Cluster
7
HPC4U Checkpointing Cycle
7. Job runningagain
RMS
5. Link to Snapshot
4. Snap-shot !
Network
Storage
Process
2. In- Transit Packets
8
Cross Border Migration Intra Domain
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
9
Cross Border Migration Target Retrieval
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
10
Cross Border Migration Checkpoint Migration
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
11
Cross Border Migration Remote Execution
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
12
Cross Border Migration Result Migration
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
13
Cross-Border Migration Using Globus
User- / Broker- Interface
CLI
WS-AG
CRM
Broker

WS-AG implementation based on GT4
Developed in EU project AssessGrid
Source specifies SLA / file staging parameters
Subset of JSDL (POSIX Jobs)
Resource determination via broker
Source directly contacts destination
Destination pulls migration data via Grid-FTP
Destination pushes result data back to source
Source uses WSRF event notification

Negotiation
PP
RMS
Scheduler
SSC
Process
Network
Storage
Cluster
14
Ongoing Work Introducing Risk Management
User- / Broker- Interface
CLI
CRM
Broker
WS-AG

Topic of EU project AssessGrid
Encorporated in SLA
Provider
Estimates risk for agreeing an SLA
Considers propability of failure in schedule
Assessment based on historical data

Risk Assessor
Negotiation
PP
RMS
Scheduler
Consultant Service
SSC
Monitoring
Process
Network
Storage
Cluster
15
Summary Best Effort is not Enough
Cross border migration and Risk assessment
provide new means to increase the reliability
of Grid Computing.
16
More information

Read the paper
AssessGrid www.assessgrid.eu
HPC4U www.hpc4u.eu
OpenCCS www.openccs.eu

Thanks for your attention!
17
Contents

BACKUP

18
Scheduling Aspects

Execution Time
Exact start time
Earliest start time, latest finish time
User provides stage-in files by time X
Provider keeps stage-out files until time Y
Provisional Reservations
Job Priorities
Job Suspension

19
HPC4U Planning Based Scheduler

Space-Sharing

Run-time estimation ? Start time assignment

32
l3h
Reser-vation 2
l2h
CPUs
16
Reservation for Grid Job according SLA (l6h)
Reser-vation 2
Time
4
8
10
12
2
6
14
16
20

HPC4U

21
Motivation Fault Tolerance

Commercial Grid users need SLAs
Providers cautious on adoption
Reason Business case risk
Missed deadlines due to system failures
? Penalties to be paid
Solution Prevention with Fault Tolerance
Fault tolerance mechanisms available, but
Application modification mandatory
Overall solution (System software, process,
storage, file system, network) required
Combination with Grid migration missing

22
HPC4U Objective

Software-only solution for a SLA-aware, fault
tolerant infrastructure, offering reliability and
QoS, acting as active Grid component
Key features
Definition and implementation of SLAs
Resource reservation for guaranteed QoS
Application-transparent fault tolerance

23
HPC4U Concept

SLA negotiation as an explicit statement of
expectations and obligations in a business
relationship between provider and customer
Reservation of CPU, storage and network for
desired time interval
Job start in checkpointing environment
In case of system failure
? Job migration / restart with respect to SLA

24
HPC4U Project Outcomes
25
Phases of Operation

Negotiation of SLA
Pre-Runtime Configuration of Resources
e.g. network, storage, compute nodes
Runtime Stage-In, Computation, Stage-Out
Post-Runtime Re-configuration

26
PhasePre-Runtime

Task of Pre-Runtime Phase
Configuration of all allocated resources
Goal Fulfill requirements of SLA
Reconfiguration affects all HPC4U elements
Resource Management System
e.g. configuration of assigned compute nodes
Storage Subsystem
e.g. initialization of a new data partition
Network Subsystem
e.g. configuration of network infrastructure

27
Phase Runtime

Runtime Phase lifetime of job in system
adherence with SLA has to be assured
FT mechanisms have to be utilized
Phase consists of three distinct steps
Stage-In
transmission of required input data from Grid
customer to compute resource
Computation
execution of application
Stage-Out
transmission of generated output data
fromcompute resource back to Grid customer

28
Phase Post-Runtime

Task of Post-Runtime Phase
Re-Configuration of all resources
e.g. re-configuration of network
e.g. deletion of checkpoint datasets
e.g. deletion of temporary data
Counterpart to Pre-Runtime Phase
Allocation of resources ends
Update of schedules in RMS and storage
Resources are available for new jobs

29
Motivation Cross Border Migration
Customer
HPC4U
29
30

PROCESS

31
Subsystems

Process Subsystem
checkpointing of network
cooperative checkpointing protocol (CCP)
Network Subsystem
checkpoint network state
Storage Subsystem
provision of storage
provision of snapshot

32
Metacluster Checkpointing Subsystem

Virtualization of Resources
Capture of full application context
resources, states, process hierarchy
Non-intrusive
? Virtual Bubble

STORAGE

34
Storage subsystem
Virtual Storage Manager

Functionalities
Negotiates the storage part of the SLA
Provides storage capacity at a given QoS level
Provides FT mechanisms
Requirement manage multiple jobs running on the
same SR

35
Data Container concept

Idea
create storage environment for applications at a
desired QoS level with abstraction of physical
devices
Components

File I/O (read, write, open,)
Data Container
Block I/O (read, write, ioctl)
Logical space
Block I/O
Storage Resource
36
Data container properties

Storage part of the SLA
Data container section
Size
File system type
Number of nodes that need to access the data
container (private/shared)
Performance section
Application I/O profile ? Benchmark
Bandwidth (in MB/s or IO/s)
Or Default configuration
Dependability section
Data redundancy type (within a cluster)
Snapshot needed or not
Data replication or not (between clusters)
Job specific section
Jobs time to schedule and time to finish

37
Fault Tolerance Mechanisms

RAID
Tolerate the failure of one or more disks
RAIN
Tolerate the failure of one or more nodes
Implementation
Hardware
Software
Storage FT mechanisms rely on special data
layouts

Software

Storage Snapshot

38
Data container snapshot

Provide instantaneous copy of data containers
Technique used Copy-On-Write (COW)
create multiple copies of data without
duplicating all the data blocks
With checkpoint, it allows application restart
from a previous running stage
Impact on SR performance
Taken into account at negotiation time

39
Snapshot single node job restart after node
failure

Characteristics
The job is running on a single node
The data container is private to that node
Data container snapshot resides on the same
storage resource

40
Interfaces with other components
RMS
Interface VSM - RMS
VSM
Interface VSM SR
Storage Resource (SR)
Storage Subsystem
Network (socket , RDMA, )
41

ASSESSGRID

42
Motivation AssessGrid
Checkpoint
Accept this job?
Node crashes
Restart
43
Grid Fabric Layer with Risk Assessor

NegotiationManager
Agr./Agr.Fact. WS
checks whether offer complies to template
initiation of file transfers
Scheduler
creates tentative schedules for offers
Risk Assessor
Consultant Service
records data
Monitoring
runtime behavior

44
Motivation AssessGrid

Aim of AssessGrid
Introduce risk awareness in Grid technology
Risk awareness incorporated across three layers
End-user
Broker
Service Provider

45
AssessGrid - Architectural Overview

End-user
Portal
Broker
Risk Assessor
Confidence Service
Workflow Assessor
Provider
Negotiator
Scheduler
Risk Assessor
Consultant Service

46
Precautionary Fault-Tolerance

Use of planning based scheduler

How many
spare
resources are
available at
execution time?

47
Estimating Risk for a Job Execution

Use of planning based scheduler
How much slack time is available for fault
tolerance?
How much effort do I undertake for fault
tolerance?
What is the considered risk of resource failure?

Execution Time
Slack Time
Latest Finish Time
Earliest Start Time
48
Risk Assessment

Estimate risk for agreeing an SLA
consider risk of resource failure
estimate risk for a job execution
initiate precautionary FT mechanisms

low risk middle risk high risk
49
Risk Management at Job Execution
Events
Risk Management
Decisions Actions
Risk Assessment Business Model (price,
penalty) Weekend/Holiday/Workday Schedule (SLAs,
best effort) Redundancy Measures
50
Detection of Bottlenecks