Transparent Cross-Border Migration of Parallel Multi Node Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Transparent Cross-Border Migration of Parallel Multi Node Applications

Description:

CLI. CRM. Negotiation. Scheduler. SSC. PP. Cross Border Migration: Checkpoint Migration ... CLI. CRM. Negotiation. Scheduler. PP. WS-AG implementation based on GT4 ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 53
Provided by: ach112
Category:

less

Transcript and Presenter's Notes

Title: Transparent Cross-Border Migration of Parallel Multi Node Applications


1
Transparent Cross-Border Migration of Parallel
Multi Node Applications
  • Dominic Battré, Matthias Hovestadt, Odej Kao,
    Axel Keller, Kerstin Voss
  • Cracow Grid Workshop 2007

2
Outline
  • Motivation
  • The Software Stack
  • Cross-Border Migration
  • Summary

3
The Gap between Grid and RMS
  • User asks for SLA
  • Grid Middleware realizes job by means of local
    RMS
  • BUT RMS offer Best Effort
  • Need SLA-aware RMS

4
HPC4U Highly Predictable Clusters for
Internet-Grids
  • Objective
  • Software-only solution for an SLA-aware, fault
    tolerant infrastructure, offering reliability and
    QoS, and acting as active Grid component
  • Key Features
  • System level checkpointing
  • Job migration
  • Job types sequential and MPI-parallel
  • Planning based scheduling

5
HPC4U Planning Based Scheduling
queuing systems planning systems
planned time frame present present and future
new job requests insert in queues re-planning
assignment of planned start time no all requests
runtime estimation not necessary mandatory
backfilling optional yes, implicit
advance reservations not possible yes, trivial
queues
new jobs
Machine
new jobs
time
6
HPC4U Software Stack
User- / Broker- Interface
CLI
Negotiation
RMS
Scheduler
SSC
Process
Network
Storage
Cluster
7
HPC4U Checkpointing Cycle
7. Job runningagain
RMS
5. Link to Snapshot
4. Snap-shot !
Network
Storage
Process
2. In- Transit Packets
8
Cross Border Migration Intra Domain
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
9
Cross Border Migration Target Retrieval
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
10
Cross Border Migration Checkpoint Migration
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
11
Cross Border Migration Remote Execution
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
12
Cross Border Migration Result Migration
User- / Broker- Interface
User- / Broker- Interface
CLI
CLI
CRM
CRM
Negotiation
PP
Negotiation
PP
RMS
RMS
Scheduler
Scheduler
SSC
SSC
Process
Network
Storage
Process
Network
Storage
Cluster
Cluster
13
Cross-Border Migration Using Globus
User- / Broker- Interface
CLI
WS-AG
CRM
Broker
  • WS-AG implementation based on GT4
  • Developed in EU project AssessGrid
  • Source specifies SLA / file staging parameters
  • Subset of JSDL (POSIX Jobs)
  • Resource determination via broker
  • Source directly contacts destination
  • Destination pulls migration data via Grid-FTP
  • Destination pushes result data back to source
  • Source uses WSRF event notification

Negotiation
PP
RMS
Scheduler
SSC
Process
Network
Storage
Cluster
14
Ongoing Work Introducing Risk Management
User- / Broker- Interface
CLI
CRM
Broker
WS-AG
  • Topic of EU project AssessGrid
  • Encorporated in SLA
  • Provider
  • Estimates risk for agreeing an SLA
  • Considers propability of failure in schedule
  • Assessment based on historical data

Risk Assessor
Negotiation
PP
RMS
Scheduler
Consultant Service
SSC
Monitoring
Process
Network
Storage
Cluster
15
Summary Best Effort is not Enough
Cross border migration and Risk assessment
provide new means to increase the reliability
of Grid Computing.
16
More information
  • Read the paper
  • AssessGrid www.assessgrid.eu
  • HPC4U www.hpc4u.eu
  • OpenCCS www.openccs.eu

Thanks for your attention!
17
Contents
  • BACKUP

18
Scheduling Aspects
  • Execution Time
  • Exact start time
  • Earliest start time, latest finish time
  • User provides stage-in files by time X
  • Provider keeps stage-out files until time Y
  • Provisional Reservations
  • Job Priorities
  • Job Suspension

19
HPC4U Planning Based Scheduler
  • Space-Sharing
  • Run-time estimation ? Start time assignment

32
l3h
Reser-vation 2
l2h
CPUs
16
Reservation for Grid Job according SLA (l6h)
Reser-vation 2
Time
4
8
10
12
2
6
14
16
20
  • HPC4U

21
Motivation Fault Tolerance
  • Commercial Grid users need SLAs
  • Providers cautious on adoption
  • Reason Business case risk
  • Missed deadlines due to system failures
  • ? Penalties to be paid
  • Solution Prevention with Fault Tolerance
  • Fault tolerance mechanisms available, but
  • Application modification mandatory
  • Overall solution (System software, process,
    storage, file system, network) required
  • Combination with Grid migration missing

22
HPC4U Objective
  • Software-only solution for a SLA-aware, fault
    tolerant infrastructure, offering reliability and
    QoS, acting as active Grid component
  • Key features
  • Definition and implementation of SLAs
  • Resource reservation for guaranteed QoS
  • Application-transparent fault tolerance

23
HPC4U Concept
  • SLA negotiation as an explicit statement of
    expectations and obligations in a business
    relationship between provider and customer
  • Reservation of CPU, storage and network for
    desired time interval
  • Job start in checkpointing environment
  • In case of system failure
  • ? Job migration / restart with respect to SLA

24
HPC4U Project Outcomes
25
Phases of Operation
  • Negotiation of SLA
  • Pre-Runtime Configuration of Resources
  • e.g. network, storage, compute nodes
  • Runtime Stage-In, Computation, Stage-Out
  • Post-Runtime Re-configuration

26
PhasePre-Runtime
  • Task of Pre-Runtime Phase
  • Configuration of all allocated resources
  • Goal Fulfill requirements of SLA
  • Reconfiguration affects all HPC4U elements
  • Resource Management System
  • e.g. configuration of assigned compute nodes
  • Storage Subsystem
  • e.g. initialization of a new data partition
  • Network Subsystem
  • e.g. configuration of network infrastructure

27
Phase Runtime
  • Runtime Phase lifetime of job in system
  • adherence with SLA has to be assured
  • FT mechanisms have to be utilized
  • Phase consists of three distinct steps
  • Stage-In
  • transmission of required input data from Grid
    customer to compute resource
  • Computation
  • execution of application
  • Stage-Out
  • transmission of generated output data
    fromcompute resource back to Grid customer

28
Phase Post-Runtime
  • Task of Post-Runtime Phase
  • Re-Configuration of all resources
  • e.g. re-configuration of network
  • e.g. deletion of checkpoint datasets
  • e.g. deletion of temporary data
  • Counterpart to Pre-Runtime Phase
  • Allocation of resources ends
  • Update of schedules in RMS and storage
  • Resources are available for new jobs

29
Motivation Cross Border Migration
Customer
HPC4U
29
30
  • PROCESS

31
Subsystems
  • Process Subsystem
  • checkpointing of network
  • cooperative checkpointing protocol (CCP)
  • Network Subsystem
  • checkpoint network state
  • Storage Subsystem
  • provision of storage
  • provision of snapshot

32
Metacluster Checkpointing Subsystem
  • Virtualization of Resources
  • Capture of full application context
  • resources, states, process hierarchy
  • Non-intrusive
  • ? Virtual Bubble

33
  • STORAGE

34
Storage subsystem
Virtual Storage Manager
  • Functionalities
  • Negotiates the storage part of the SLA
  • Provides storage capacity at a given QoS level
  • Provides FT mechanisms
  • Requirement manage multiple jobs running on the
    same SR

35
Data Container concept
  • Idea
  • create storage environment for applications at a
    desired QoS level with abstraction of physical
    devices
  • Components

File I/O (read, write, open,)
Data Container
Block I/O (read, write, ioctl)
Logical space
Block I/O
Storage Resource
36
Data container properties
  • Storage part of the SLA
  • Data container section
  • Size
  • File system type
  • Number of nodes that need to access the data
    container (private/shared)
  • Performance section
  • Application I/O profile ? Benchmark
  • Bandwidth (in MB/s or IO/s)
  • Or Default configuration
  • Dependability section
  • Data redundancy type (within a cluster)
  • Snapshot needed or not
  • Data replication or not (between clusters)
  • Job specific section
  • Jobs time to schedule and time to finish

37
Fault Tolerance Mechanisms
  • RAID
  • Tolerate the failure of one or more disks
  • RAIN
  • Tolerate the failure of one or more nodes
  • Implementation
  • Hardware
  • Software
  • Storage FT mechanisms rely on special data
    layouts

Software
  • Storage Snapshot

38
Data container snapshot
  • Provide instantaneous copy of data containers
  • Technique used Copy-On-Write (COW)
  • create multiple copies of data without
    duplicating all the data blocks
  • With checkpoint, it allows application restart
    from a previous running stage
  • Impact on SR performance
  • Taken into account at negotiation time

39
Snapshot single node job restart after node
failure
  • Characteristics
  • The job is running on a single node
  • The data container is private to that node
  • Data container snapshot resides on the same
    storage resource

40
Interfaces with other components
RMS
Interface VSM - RMS
VSM
Interface VSM SR
Storage Resource (SR)
Storage Subsystem
Network (socket , RDMA, )
41
  • ASSESSGRID

42
Motivation AssessGrid
Checkpoint
Accept this job?
Node crashes
Restart
43
Grid Fabric Layer with Risk Assessor
  • NegotiationManager
  • Agr./Agr.Fact. WS
  • checks whether offer complies to template
  • initiation of file transfers
  • Scheduler
  • creates tentative schedules for offers
  • Risk Assessor
  • Consultant Service
  • records data
  • Monitoring
  • runtime behavior

44
Motivation AssessGrid
  • Aim of AssessGrid
  • Introduce risk awareness in Grid technology
  • Risk awareness incorporated across three layers
  • End-user
  • Broker
  • Service Provider

45
AssessGrid - Architectural Overview
  • End-user
  • Portal
  • Broker
  • Risk Assessor
  • Confidence Service
  • Workflow Assessor
  • Provider
  • Negotiator
  • Scheduler
  • Risk Assessor
  • Consultant Service

46
Precautionary Fault-Tolerance
  • Use of planning based scheduler
  • How many
  • spare
  • resources are
  • available at
  • execution time?

47
Estimating Risk for a Job Execution
  • Use of planning based scheduler
  • How much slack time is available for fault
    tolerance?
  • How much effort do I undertake for fault
    tolerance?
  • What is the considered risk of resource failure?

Execution Time
Slack Time
Latest Finish Time
Earliest Start Time
48
Risk Assessment
  • Estimate risk for agreeing an SLA
  • consider risk of resource failure
  • estimate risk for a job execution
  • initiate precautionary FT mechanisms

low risk middle risk high risk
49
Risk Management at Job Execution
Events
Risk Management
Decisions Actions
Risk Assessment Business Model (price,
penalty) Weekend/Holiday/Workday Schedule (SLAs,
best effort) Redundancy Measures
50
Detection of Bottlenecks
  • Consultant Service
  • Analysis of SLA violation
  • Estimated risk for the job
  • Planned FT mechanisms
  • Monitoring Information
  • Job
  • Resources
  • Data Mining
  • Find connections between SLA violations
  • Detect weak points in the providers
    infrastructure

51
  • WS-AG

52
Components
53
Implementation with Globus Toolkit 4
  • Why Globus?
  • Utility Authentication, Authorization,
    Delegation, RFT, MDS, WS-Notification
  • Impact
  • Problem 1 GRAM (Grid Resource Allocation and
    Management)
  • State machine, incl. File-Staging, Delegation of
    Credentials, RSL
  • Cannot use it written for batch schedulers, nor
    for planning schedulers
  • Problem 2 Deviations from WS-AG spec.
  • Different Namespaces WS-A, WS-RF

54
Implementation with Globus Toolkit 4
  • Technical Challenges
  • xsanyType
  • Wrote custom serializers/deserializers
  • Subtitution groups
  • Used in ItemConstraint (Creation Constraints)
  • Cannot be mapped to Java by Axis
  • Replaced by xsanyType use as DOM tree
  • CreationConstraints
  • Namespace prefixes in XPaths meaningless
  • Need for WSDL and interpretation for xsall,
    xschoice, and friends

55
Context
  • ltwsagContextgt
  • ltwsagAgreementInitiatorgt
  • ltAGDistinguishedNamegt
  • /CDE/O
  • lt/AGDistinguishedNamegt
  • lt/wsagAgreementInitiatorgt
  • ltwsagAgreementRespondergtEPRlt/gt
  • ltAGServiceUsersgt
  • ltAGServiceUsergtDNlt/gt
  • lt/AGServiceUsersgt
  • lt/wsagContextgt

Context
Terms
Creation Constraints
56
Terms, SDTs
  • Conjunction of terms
  • Common structure of templates
  • WS-AG too powerful/difficult to fully support
  • Service Description Term (one)
  • assessgridServiceDescription (extension of
    abstract ServiceTermType)
  • jsdlPOSIXExecutable (executable, arguments,
    environment)
  • jsdlApplication (mis-)used for libraries
  • jsdlResources
  • jsdlDataStaging
  • assessgridPoF (upper bound)

Context
Terms
Creation Constraints
57
Terms, GuaranteeTerms
  • No hierarchy but two meta guarantees
  • ProviderFulfillsAllObligations
  • e.g. Reward 1000 EUR, Penalty 1000 EUR
  • ConsumerFulfillsAllObligations
  • e.g. Reward 0 EUR, Penalty 1000 EUR
  • First violation is responsible for failure
  • No hardware problem, then User fault
  • Other Guarantees
  • Execution Time
  • Any start time (best effort)
  • Exact start time
  • Earliest start time, latest finish time
  • User provides StageIn files by time X
  • Provider keeps StageOut files until time Y

No timely execution
No stage-out
58
Terms
  • SLA does not contain requirements of fault
    tolerance mechanisms
  • Covered by asserted PoF, penalty and loss of
    reputation
  • Compulsory Assessment Intervals not really useful
    for us
  • How often do you assess that job was allocated
    for asserted time?
  • Preferences too complicated

Context
Terms
Creation Constraints
59
CreationConstraints
  • Difficult to support Namespaces
  • //wsag/assessgrid - prefixes are just strings
  • Very difficult to support structural information
  • xsgroup, xsall, xschoice, xssequence
  • Possible but difficult to support xsrestriction
  • xssimple
  • Check for enumeration (xsrestriction of
    xsstring)
  • Check for valid dates (xsrestriction of xsdate)
  • Everything else close to impossible
  • min,maxIn,Exclusive
  • totalDigits, fractionDigits, length, probably
    useless

Context
Terms
Creation Constraints
Write a Comment
User Comments (0)
About PowerShow.com