Cluster Resources Training

About This Presentation

Title:

Cluster Resources Training

Description:

These metrics allow decisions to be made and reports to be generated based on ... Uses of these metrics are widespread and can cover anything from tracking node ... – PowerPoint PPT presentation

Number of Views:152

Avg rating:3.0/5.0

Slides: 119

Provided by: Mus104

Category:

more less

Transcript and Presenter's Notes

Title: Cluster Resources Training

1

Cluster Resources Training
February 2006

2
Presentation Protocols

For problems or questions Send email to
training_at_clusterresources.com
We will pause for questions at the end of each
section
Please remain on mute except for questions
Please do not put your call on hold (the entire
group will hear your music)
Please be conscientious of the other people
attending the conference
You can also submit questions during the training
to the AOL Instant Messenger screen name CRI Web
Training

3
Session 2

6. Reporting Monitoring
7. Grids
8. Utility
9. Torque
10. Future

4
6. Accounting and Statistics

Job and System Statistics
Event Log
Fairshare Stats
Client Statistic Reports
Realtime and Historical Charts with Moab Cluster
Manager
Native Resource Manager
GMetrics
GEvents

5
Accounting Overview

Job and Reservation Accounting
Resource Accounting
Credential Accounting

moab.cfg USERCFGDEFAULT ENABLEPROFILINGTRUE
6
Job and System Statistics

Determining cumulative cluster performance over a
fixed timeframe
Graphing changes in cluster utilization and
responsiveness over time
Identifying which compute resources are most
heavily used
Charting resource usage distribution among users,
groups, projects, and classes
Determining allocated resources, responsiveness,
and/or failure conditions for jobs completed in
the past
Providing real-time statistics updates to
external accounting systems

7
Event Log

Report trace state and utilization records at
events
Scheduler start, stop and failure
Job create, start, end, cancel, migrate, failure
Reservation create, start, stop, failure
Configurable with RECORDEVENTLIST
Can be exported to external systems

http//clusterresources.com/moabdocs/a.fparameters
.shtmlrecordeventlist
http//clusterresources.com/moabdocs/14.2logging.s
htmllogevent

8
Fairshare stats

Provide credential-based
usage distributions over time
mdiag f
Maintained for all credentials
Stored in stats/FS.epochtime
Shows detailed time-distribution usage by
fairshare metric

9
Client Statistic Reports

In-Memory reports available for nodes and
credentials
Node categorization allows fine-grained localized
usage tracking

10
Realtime and Historical Charts with Moab Cluster
Manager

Reports nodes and all creds
Allows arbitrary querying of historical
timeframes with arbitrary correlations

11
Service Monitoring and Management
12
Real-Time Performance Accounting Analysis
13
Search for Specific Jobs, Track Jobs Status
14
View and Manage Fairshare Settings
15
Manage Priority Across the Largest Set of Options
16
View the Cluster by virtually any attribute
17
Improving Resource Monitoring and Reporting

Native Resource Manager Interface
Generic Features/Consumable Resources
Generic Metrics
Generic Events

18
Native Resource Manager Interface

Everything youve ever wanted to do with Moab --
An interface that allows sites to replace or
augment their already existing resource managers
with information from the following
Example Usage
Arbitrary Scripts
Ganglia
FlexLM
MySQL

http//clusterresources.com/moabdocs/13.5nativerm.
shtml http//clusterresources.com/moabdocs/13.7l
icensemanagement.shtml
19
Native Resource Manager Example
moab.cfg interface w/TORQUE RMCFGtorque
TYPEPBS interface w/flexLM RMCFGflexLM
TYPENATIVE RTYPElicense RMCFGflexLM
CLUSTERQUERYURLexec///HOME/tools/license.mon.fl
exlm.pl integrate local node health check
script data RMCFGlocal TYPENATIVE RMCFGlo
cal CLUSTERQUERYURLfile///opt/moab/localto
ols/healthcheck.dat
20
Generic Features / Consumable Resources

Node Features
Opaque string tags associated with compute
resources
Can be requested by jobs, reservations, etc
Generic Consumable Resources
Can specify any arbitrary consumable resource
Can be requested by jobs, reservations, etc
Reserved by advance reservations

moab.cfg NODECFGnode1 FEATURESfast,bigmem N
ODECFGnode2 FEATURESbigmem NODECFGDEFAULT
GRESmatlab4
http//clusterresources.com/moabdocs/12.2nodeattri
butes.shtmlfeatures http//clusterresources.com/m
oabdocs/12.4consumablegres.shtml
http//clusterresources.com/moabdocs/9.2accountin
g.shtmlgevents
21
Generic Metrics

Moab allows organizations to enable generic
performance metrics. These metrics allow
decisions to be made and reports to be generated
based on site specific environmental factors.
This increases Moab's awareness of what is
occurring within a given cluster environment, and
allows arbitrary information to be associated
with resources and the workload within the
cluster. Uses of these metrics are widespread
and can cover anything from tracking node
temperature, to memory faults, to application
effectiveness.
Execute triggers when specified thresholds are
reached
Modify node allocation affinity for specific jobs
initiate automated notifications when thresholds
are reached
Display current, average, maximum, and minimum
metrics values in reports and charts within Moab
Cluster Manager

http//clusterresources.com/moabdocs/9.2accounting
.shtmlgmetrics
22
Generic Metric Example
moab.cfg RMCFGnative TYPENATIVE
CLUSTERQUERYURLfile///HOME/tools/temp.txt
NODECFGn01 TRIGGERatypeexec,action/bin/dra
in.pl OID", etypethreshold,thresholdGMetrictem
pgt150
example temp.txt temperature output node001
GMETRICtemp113 node002 GMETRICtemp107 node00
3 GMETRICtemp83 node004 GMETRICtemp85
23
Generic Events

Generic Events
Can report arbitrary events and failures
Can associate human readable messages with event
Events viewable via Moab clients

moab.cfg RMCFGnative TYPENATIVE
CLUSTERQUERYURLfile///HOME/tools/healthcheck.tx
t GEVENTCFGdiskfull ACTIONnotify,exec/opt/moab
/nodepurge.sh GEVENTCFGpower ACTIONavoid,record
,notify GEVENTCFGcpufailure ACTIONreserve,disab
le,record,notify
example healthcheck.txt node017
GEVENTcpufailureCPU2 Down node135
GEVENTdiskfull/var/tmp Full node139
GEVENTdiskfull/home Full node407
GEVENTpowerTransient Power Supply Failure
http//clusterresources.com/moabdocs/9.2accounting
.shtmlgmetrics
24
Presentation Protocols

For problems or questions Send email to
training_at_clusterresources.com
We will pause for questions at the end of each
section
Please remain on mute except for questions
Please do not put your call on hold (the entire
group will hear your music)
Please be conscientious of the other people
attending the conference
You can also submit questions during the training
to the AOL Instant Messenger screen name CRI Web
Training

25
7. Peer to Peer (Grids)

Cluster Stack / Framework
Moab P2P Grid
Peer Configuration
Resource Control Overview
Data Management
Security

http//clusterresources.com/moabdocs/17.0peertopee
r.shtml
26

Peer Flow
Resource Affinity
Management
Grids and Globus GRAM
Grid Troubleshooting
Data Staging
Information Services

27
Cluster Stack / Framework
Grid Workload Manager Scheduler, Policy
Manager, Integration Platform
Cluster Workload Manager Scheduler, Policy
Manager, Integration Platform
Application
Resource Manager
Portal
Application
Serial
Parallel
Security
GUI
Message Passing
CLI
Operating System
Hardware (Cluster or SMP)
Admin
28
Grid Types
Local Area Grid (LAG)
Wide Area Grid (WAG)
A Local Area Grid uses one instance of Moab
within an environment that shares a user and data
space across multiple clusters, that may or may
not have multiple hardware types, operating
systems and compute resource managers (e.g.
LoadLeveler, TORQUE, LSF, PBS Pro, etc.)
A Wide Area Grid uses multiple Moab instances
working together within an environment that can
have multiple user and data spaces across
multiple clusters, that may or may not have
multiple hardware types, operating systems and
compute resource managers (e.g. LoadLeveler,
TORQUE, LSF, PBS Pro, etc.). Wide Area Grid
management rules can be centralized, locally
controlled or mixed.
Moab
Moab (Master)
Shared User Space
Multiple User Spaces
Shared Data Space
Multiple Data Spaces
Cluster A
Cluster C
Cluster B
Moab
Moab
Moab
Cluster A
Cluster C
Cluster B
Grid Management Scenarios
Centralized Local Management
Centralized Management
Local Management Peer to Peer
Moab (Grid Head Node)
Moab (Grid Head Node)
Local Grid Rules
Local Grid Rules
Shared Grid Rules
All Grid Rules
Moab
Moab
Cluster C
Cluster A
Local Grid Rules
Moab
Moab
Local Grid Rules
Moab
Local Grid Rules
Local Grid Rules
Cluster C
Cluster A
Cluster B
Moab
Moab
Moab
Moab
Cluster C
Cluster A
Cluster B
Cluster B
29
Grid Benefits

Scalability
Resource Access
Load-Balancing
Single System Image (SSI)
High Availability

30
Drawbacks of Layered Approach

Stability
Additional failure layer
Centralized grid management

(single point of failure)
Optimization
Limited local information and control
Admin Experience
Additional tool to learn/configure
Policy Duplication and Conflicts
Additional tool to manage/troubleshoot
User Experience
Additional submission language/environment
Additional tool to track, manage workload

http//clusterresources.com/moabdocs/17.12p2pgrid.
shtml

31
Moab P2P Approach

Little to no user training
Little to no admin training
Single Policy set
Transparent Grid

http//clusterresources.com/moabdocs/17.0peertopee
r.shtml
32
Integrated Moab P2P/Grid Capabilities

Distributed Resource Management
Distributed Job Management
Grid Information Management
Resource and Job Views
Credential Management and Mapping
Distributed Accounting
Data Management

33
Grid Relationship Combinations
Moab is able to facilitate virtually any grid
relationship 1. Join local area grids into wide
are grids 2. Join wide area grids to other wide
area grids (whether they be managed centrally,
locally - peer to peer or mixed) 3. Resource
sharing can be in one direction for use with
hosting centers, or to bill out resources to
other sites 4. Have multiple levels of grid
relationships (e.g. conglomerates within
conglomerates within conglomerates)
4
Moab (Grid Head Node)
Shared Grid Rules
Multiple User Spaces
Multiple Data Spaces
1
2
Local Grid Rules
Local Grid Rules
Local Area Grid Rules
Local Grid Rules
Local Grid Rules
Local Grid Rules
3
Moab
Moab
Moab
Moab
Moab
Moab
Cluster E
Cluster H
Cluster D
Cluster F
Hosting Site
Shared User Space
Shared Data Space
Local Grid Rules
Cluster A
Cluster C
Cluster B
Moab
Cluster G
34
Basic P2P Example
moab.cfg for Cluster A SCHEDCFGClusterA RMCFG
ClusterB TYPEMOAB SERVERnode0341000 RMCFGClus
terB.INBOUND FLAGSCLIENT CLIENTClusterB
moab-private.cfg for Cluster A CLIENTCFGRMClus
terB KEYfetwl02 AUTHadmin1
moab.cfg for Cluster B SCHEDCFGClusterB RMCFG
ClusterA TYPEMOAB SERVERnode0141000 RMCFGClus
terA.INBOUND FLAGSCLIENT CLIENTClusterA
moab-private.cfg for Cluster B CLIENTCFGRMClus
terA KEYfetwl02 AUTHadmin1
35
Peer Configuration

Resource Reporting
Credential Config
Data Config
Usage Limits
Bi-Directional Job Flow

moab.cfg (server 1) SCHEDCFGserver1
SERVERserver1.omc.com42005 MODENORMAL
RMCFGserver2-out TYPEMOAB SERVERserver2.omc.c
om42005 CLIENTserver2 RMCFGserver2-in
FLAGSclient CLIENTserver2
moab-private.cfg (server 1) CLIENTCFGserver2
KEY443db-writ4
36
Jobs

Submitting Jobs to the Grid
msub
Uses Resource Managers submission language and
translates to msub
Viewing Node and Job Information
Each destination Moab server will report all
compute nodes it finds back to the source Moab
server
Show as local nodes each within a partition
associated with the resource manager reporting
them.

37
Resource Control Overview

Full resource information
nodes appear with complete remote hostnames and
full attribute information
Remapped resource information
nodes appear with remapped local hostnames and
full attribute information
Grid mode
information regarding nodes reported from a
remote peer is aggregated and transformed into
one or more SMP-like large pseudo nodes

38
Controlling Resource Information

Direct
nodes are reported to remote clusters exactly as
they appear in the local cluster
Mapped
nodes are reported as individual nodes, but node
names are mapped to a unique name when imported
into the remote cluster
Grid
node information is aggregated into a single
large SMP-like pseudo-node before it is reported
to the remote cluster

39
Grid Sandbox

Constrains external resource access and limits
which resources are reported to other peers

moab.cfg SRCFGsandbox1 PERIODINFINITY
HOSTLISTnode01,node02,node03 SRCFGsandbox1
CLUSTERLISTALL FLAGSALLOWGRID
40
Access Controls

Granting Access to Local Jobs
Peer Access Control

moab.cfg SRCFGsandbox2 PERIODINFINITY
HOSTLISTnode04,node05,node06 SRCFGsandbox2
FLAGSALLOWGRID QOSLISThigh GROUPLISTengineer
moab.cfg (Cluster 1) SRCFGsandbox1
PERIODINFINITY HOSTLISTnode01,node02,node03,node
04,node05 SRCFGsandbox1 FLAGSALLOWGRID
CLUSTERLISTClusterB SRCFGsandbox2
PERIODINFINITY HOSTLISTnode6 FLAGSALLOWGRID
SRCFGsandbox2 CLUSTERLISTClusterB,ClusterC,Clus
terD USERLISTALL
41
Controlling Peer Workload Information

Local workload exporting
Help simplify administration of different
clusters by centralizing monitoring and
management of jobs at one peer and avoids forcing
each peer to the type SLAVE

moab.cfg (ClusterB - Destination Peer)
RMCFGClusterA FLAGSCLIENT,LOCALWORKLOA
DEXPORT source peer
42
Data Management Configuration

Global file systems
Replicated data servers
Need based direct input
Output data migration

moab.cfg (NFS data server) RMCFGstorage
TYPEnative SERVERomc.omc13.com42004
RTYPESTORAGE RMCFGstorage SYSTEMMODIFYURLexec
//HOME/tools/storage.ctl.nfs.pl RMCFGstorage
SYSTEMQUERYURLexec//HOME/tools/storage.query.nf
s.pl
moab.cfg (SCP data server) RMCFGstorage
TYPEnative SERVERomc.omc13.com42004
RTYPESTORAGE RMCFGstorage SYSTEMMODIFYURLexec
//HOME/tools/storage.ctl.scp.pl RMCFGstorage
SYSTEMQUERYURLexec//HOME/tools/storage.query.sc
p.pl
43
Security

Secret key based security is enabled via the
moab-private.cfg file
Globus Credential Based Server Authentication
(4.2.4)

44
Credential Management

Peer Credential Mapping
Source and Destination Side Credential Mapping

moab.cfg SCHEDCFGmaster1 MODEnormal
RMCFGslave1 OMAPfile///opt/moab/omap.dat
/opt/moab/omap.dat (source object map file)
userjoe,jsmith usersteve,sjohnson
grouptest,staff classbatch,serial
user,grid
45

Preventing User Space Collisions

moab.cfg SCHEDCFGmaster1 MODEnormal
RMCFGslave1 OMAPfile///opt/moab/omap.dat
FLAGSclient
/opt/moab/omap.dat (source object map file)
user,c1_ group,_ grid account,temp_

Interfacing with Globus GRAM

moab.cfg SCHEDCFGc1 SERVERhead.c1.hpc.org
RMCFGc2 SERVERhead.c2.hpc.org TYPEmoab
JOBSTAGEMETHODglobus
46

Limiting Access To Peers
Limiting Access From Peers

moab.cfg SCHEDCFG SERVERc1.hpc.org only
allow staff or members of the research and demo
account to use remote resources on c2
RMCFGc2 SERVERhead.c2.hpc.org TYPEmoab
RMCFGc2 AUTHGLISTstaff AUTHALISTresearch,demo

moab.cfg SCHEDCFG SERVERc1.hpc.org
FLAGSclient only allow jobs from remote
cluster c1 with group credentials staff or
account research or demo to use local resources
RMCFGc2 SERVERhead.c2.hpc.org TYPEmoab
RMCFGc2 AUTHGLISTstaff AUTHALISTresearch,demo
47
Utilizing Multiple Resource Managers

Migrate jobs between resource managers
Aggregate Information into a cohesive node view

moab.cfg RESOURCELIST node01,node02 ... RMCFGba
se TYPEPBS RMCFGnetwork TYPENATIVEAGFULL RMC
FGnetwork CLUSTERQUERYURL/tmp/network.sh RMCFG
fs TYPENATIVEAGFULL RMCFGfs CLUSTERQUERYURL/
tmp/fs.sh
sample network script _RX/sbin/ifconfig eth0
grep "RX by" cut -d -f2 cut -d' ' -f1
\ _TX/sbin/ifconfig eth0 grep "TX by" cut
-d -f3 cut -d' ' -f1 \ echo hostname
NETUSAGEecho "_RX _TX" bc
48
P2P Resource Affinity

Certain compute architectures are able to execute
certain compute jobs more effectively than others
From a given location, staging jobs to various
clusters may require more expensive allocations,
more data and network resources, and more use of
system services
Certain compute resources are owned by external
organizations and should be utilized sparingly
Moab allow the use of peer resource affinity to
guide jobs to the clusters which make the best
fit according to a number of criteria

49
Management and Troubleshooting

Peer Management Overview
Use 'mdiag -R' to view interface health and
performance/usage statistics
Use 'mrmctl' to enable/disable peer interfaces
Use 'mrmctl -m' to dynamically modify/configure
peer interfaces
Peer Management Overview
Use 'mdiag -R' to diagnose general RM interfaces
Use 'mdiag -S' to diagnose general scheduler
health
Use 'mdiag -R ltRMIDgt --flagssubmit-check' to
diagnose peer-to-peer job migration

50
Sovereignty Local vs. Centralized Management
Policies
Local Admin can apply policies to manage 1.
Local user access to local cluster resources 2.
Local user access to grid resources 3. Outside
grid user access to local cluster resources
(general or specific policies)
Grid Administration Body can apply policies to
manage 1. General grid policies (Sharing,
Priority, Limits, etc.)
Grid Administration Body
Grid Allocated Resources
3
Each Admin can manage their own cluster
1
Local Admin
2
Portion Allocated to Grid
1

Submit to either
Local cluster
Specified cluster(s) in the grid
Generically to the grid

Local Cluster A Resources
Local Users
Outside Grid Users
51
Grids and Globus

Globus authentication credentials are used to
determine trust between Moab peers and/or grid
users
Trusted peers/users may use the Globus GRAM
service to submit grid workload
Trusted peers/users may use the Globus GridFTP
service to stage data

52
Advanced P2P Example
53
Data Staging

Data Staging
Data Staging Models
Interface Scripts for a Storage Resource Manager

54
Data Staging

Manages intra-cluster and inter-cluster job data
staging requirements so as to minimize resource
inefficiencies and maximize system utilization
Prevent the loss of compute resources due data
blocking and can significantly improve cluster
performance.

55
Data Management Increasing Efficiency
Data Staging Levels of Efficiency and
Control 0. No data staging. 1. Non-Verified
Data Staging is the traditional use of data
staging where CPU requests and data staging
requests are not coordinated, leaving the CPU
request to cause blocking on the compute node
when the data is not available to process. 2.
Verified Data Staging is the added intelligence
to have the workload manager verify that the data
has arrived at the needed location prior to
launching the job, in order to avoid workload
blocking. 3. Prioritized Data Staging uses the
capabilities of Verified Data Staging, but adds
the ability to intercept the data staging
requests and to submit them in an order of
priority that matches that of the corresponding
jobs. 4. Fully Scheduled Data Staging uses all of
the capabilities of Prioritized Data Staging, but
adds the ability to estimate staging periods,
thus allowing workload to be scheduled more
intelligently around data staging conditions.
This capability, unlike the others can be applied
to both external and internal storage scenarios,
while others simply apply to external storage.
Fully Scheduled Data Staging Prioritized Data
Staging Verified Data Staging Non-Verified Data
Staging No Data Staging
4
Optimized Data Staging
3
2
1
Traditional Data Staging
0
56
Optimized Data Staging

Automatically pre-stages input data and stages
back output data with event policies
Coordinate data stage time with compute resource
allocation
Use GASS, gridftp, and scp for data management
Reserve network resources to guarantee data
staging and inter-process communication

Traditional Inefficient Method
CPU Reservation
Prestage
Stage Back
Processing
Compute resources are wasted/ Blocked during
data staging
Compute resources are available to other workload
during data staging
CPU Reservation
Reservation
Reservation
Optimized Data Staging
Prestage
Stage Back
Processing
57
Efficiencies from Optimized Data Staging
Processor Start Time
Traditional Inefficient Method

4 Jobs Completed

Reservation
Reservation
Prestage
Stage Back
Processing
Prestage
Stage Back
Processing
Reservation
Reservation
Prestage
Stage Back
Processing
Prestage
Stage Back
Processing
Intelligent Event-based Data Staging

7.5 Jobs Completed
Efficient use of CPU
Efficient use of Network

Reservation
Event
Event
Reservation
Event
Event
Reservation
Event
Event
Reservation
Event
Event
Processing
Processing
Processing
Processing
Prestage
Stage Back
Prestage
Stage Back
Prestage
Stage Back
Prestage
Stage Back
Reservation
Event
Event
Reservation
Event
Event
Reservation
Event
Event
Reservation
Event
Prestage
Stage Back
Processing
Prestage
Stage Back
Processing
Prestage
Stage Back
Processing
Prestage
Processing
58
Data Staging Models

Verified Data Staging
Prioritized Data Staging
Fully-Scheduled Data Staging
Data Staging to Allocated Nodes

Attribute Description
TYPE must be NATIVE in all cases
RESOURCETYPE must be set to STORAGE in all cases
SYSTEMQUERYURL specifies method of determining file attributes such as size, ownership, etc.
CLUSTERQUERYURL specifies method of determining current and configured storage manager resources such as available disk space, etc.
SYSTEMMODIFYURL specifies method of initiating file creation, file deletion, and data migration
59
Verified Data Staging
Verified Data Staging (Start job after the file
is verified to be in the right location) To
prevent job blocking caused by jobs whos data
has not finished data staging when all data
staging is controlled via external data managers
and no methods exist to control what is staged or
in what order 1. User submits jobs via portal,
or job script like mechanism. Data staging needs
are communicated to a data manager mechanism (HSM
manager, staging tool, script, command, etc.).
Job consideration requests are sent to Moab in
order to decide how and when to run. 2. Moab
periodically queries storage system (SAN, NAS,
Storage Nodes) to see if the file is there yet.
3. The data manager moves the data to the
desired location when it is able. 4. Moab
verifies that the file is there, then releases
the job for submission as long as it satisfies
established policies.
Storage System
Data Manager
Benefits Prevents non-staged jobs from blocking
usage of nodes Drawbacks No job-centric
prioritization takes place in the order of which
data gets staged first
3
1
Job Submission
2
Local Grid Rules
Moab
Cluster A
4
60
Prioritized Data Staging
Prioritized Data Staging (priority order of data
staging) When Moab intercepts data staging
requests submits them through a data manager
according to priority order 1. User submits
jobs via portal, or job script like mechanism.
Data staging needs and Job consideration requests
are sent to Moab in order to decide how and when
to run and to decide priority order of submitting
data staging requests. 2. Moab evaluates
priority, reservations and other factors, and
then submits data staging requests to a data
manager mechanism (HSM manager, staging tool,
script, command, etc.) in the best order to match
established policies. 3. Moab periodically
queries storage system (SAN, NAS, Storage Nodes)
to see if the file is there yet. 4. The data
manager moves the data to the desired location
when it is able. 5. Moab verifies that the file
is there, then releases the job for submission
as long as it satisfies established policies.
Storage System
Data Manager
Benefits Prevents non-staged jobs from blocking
usage of nodes Provides soft prioritization of
data staging requests Drawbacks Prioritization
is only softly provided Insufficient information
for informed CPU reservations to take place
4
Priority Jobs First
1
Job Submission
2
3
Local Grid Rules
Moab
Cluster A
5
61
Fully Scheduled Data Staging External Storage
Fully Scheduled Data Staging (priority order of
data staging and data-staging centric
scheduling) When Moab intercepts data staging
requests to manage data staging order reserves
CPU and other resources based on estimates of
data staging periods 1. User submits jobs via
portal, or job script like mechanism. Data
staging needs and Job consideration requests are
sent to Moab in order to decide how and when to
run and to decide priority order of submitting
data staging requests. 2. Moab evaluates data
size and network speeds to estimate data staging
duration, then uses this estimate to reserve
manage submission of data staging requests and
reservations of CPUs and other resources. 3. Moab
evaluates priority, reservations and other
factors, and then submits data staging requests
to a data manager mechanism (HSM manager, staging
tool, script, command, etc.) in the best order to
match established policies. 4. Moab periodically
queries storage system (SAN, NAS, Storage Nodes)
to see if the file is there yet. 5. The data
manager moves the data to the desired location
when it is able. 6. Moab verifies that the file
is there, then releases the job for submission
as long as it satisfies established policies.
Storage System
Data Manager
Benefits Prevents non-staged jobs from blocking
usage of nodes Provides soft prioritization of
data staging requests Intelligently schedule
resources based on data staging
information Drawbacks Prioritization is only
softly provided
5
Priority Jobs First
1
2
3
Job Submission
4
Local Grid Rules
Moab
Cluster A
6
62
Fully Scheduled Data Staging Local Storage
Fully Scheduled Data Staging (priority order of
data staging and data-staging centric
scheduling) When Moab intercepts data staging
requests to manage data staging order reserves
CPU and other resources based on estimates of
data staging periods 1. User submits jobs via
portal, or job script like mechanism. Data
staging needs and Job consideration requests are
sent to Moab in order to decide how and when to
run and to decide priority order of submitting
data staging requests. 2. Moab evaluates data
size and network speeds to estimate data staging
duration, then uses this estimate to reserve
manage submission of data staging requests and
reservations of CPUs and other resources. 3. Moab
evaluates priority, reservations and other
factors, and then submits data staging requests
to a data manager mechanism (HSM manager, staging
tool, script, command, etc.) in the best order to
match established policies. 4. Moab periodically
queries storage system (SAN, NAS, Storage Nodes)
to see if the file is there yet. 5. The data
manager moves the data to the desired location
when it is able. 6. Moab verifies that the file
is there, then releases the job for submission
as long as it satisfies established policies.
Local Grid Rules
6
Benefits Prevents non-staged jobs from blocking
usage of nodes Provides soft prioritization of
data staging requests Intelligently reserves
resources based on data staging
information Drawbacks Prioritization is only
softly provided
1
Moab
Cluster A
Job Submission
4
S
S
3
S
2
Priority Jobs First
5
Storage is on Local Compute Nodes
Data Manager
63
Data Staging Diagnostics

Checkjob
Stage type - input or output
File name - reports destination file only
Status - pending, active, or complete
File size - size of file to transfer
Data transferred - for active transfers, reports
number of bytes already transferred
Checknode
Active and max storage manager data staging
operations
Dedicated and max storage manager disk usage
File name - reports destination file only
Status - pending, active, or complete
File size - size of file to transfer
Data transferred - for active transfers, reports
number of bytes already transferred

64
Interface Scripts for a Storage Resource Manager

Moab's data staging capabilities can utilize up
to 3 different native resource manager interfaces
Cluster Query Interface
System Query Interface
System Modify Interface

65
Prioritized Data Staging Example
moab.cfg RMCFGdata TYPENATIVE
RESOURCETYPESTORAGE RMCFGdata
SYSTEMQUERYURLexec///opt/moab/tools/dstage.syste
mquery.pl RMCFGdata CLUSTERQUERYURLexec///opt
/moab/tools/dstage.clusterquery.pl RMCFGdata
SYSTEMMODIFYURLexec///opt/moab/tools/dstage.syst
emmodify.pl
66
Information Services

Monitoring performance statistics of multiple
independent clusters
Detecting and diagnosing failures from
geographically distributed clusters
Tracking cluster, storage, network, service, and
application resources
Generating load-balancing and resource state
information for users and middleware services

67
Cluster Resources Training

We will reconvene at 100 pm EST

68
Presentation Protocols

For problems or questions Send email to
training_at_clusterresources.com
We will pause for questions at the end of each
section
Please remain on mute except for questions
Please do not put your call on hold (the entire
group will hear your music)
Please be conscientious of the other people
attending the conference
You can also submit questions during the training
to the AOL Instant Messenger screen name CRI Web
Training

69
8. Utility Computing

Utility Computing Overview
Configuration
Resource Monitoring
Virtual Private Clusters
Resource Access
Accounting
Setting Up a Test Center

70
The Utility Computing Vision

Customer Point of View
Creating a compute resource which dynamic grows
and shrinks with load on demand (with or
without local resources)
Creating a compute resource which can acquire
specialized resources as needed
Creating a compute resource which automatically
replaces failed resources whether they be compute
nodes, network, storage, or other components
Provider Point of View
Creating a compute resource which dynamically
customizes itself to user needs
Creating a compute resource which can guarantee
service levels
Creating a compute resource with tight
integration and transparent usage

http//www.clusterresources.com/products/mwm/moabd
ocs/19.0utilitycomputing.shtml
71
What is Utility Computing?

Allows an organization to provide custom-tailored
resources or services to customers
A hosting center requires one or more of the
following
Secure remote access
Guaranteed resource availability at a fixed time
or series of times
Integrated auditing/accounting/billing services
Tiered service level (QoS/SLA) based resource
access
Dynamic compute node provisioning
Full environment management over compute,
network, storage, and application/service based
resources
Intelligent workload optimization
High availability, failure recovery, and
automated re-allocation

http//www.clusterresources.com/products/mwm/moabd
ocs/19.0utilitycomputing.shtml
72
Utility Computing

Moab enables true utility computing by allowing
compute resources to be reserved, allocated, and
dynamically provisioned to meet the needs of
internal or external workload.

73
Usage Models

Manual
As easy as going to a web site, specifying what
is needed, selecting one of the available
options, and logging in when the virtual cluster
is activated
Automatic
The user simply submits jobs to the local
cluster. User is never aware a hosting center
exists

74
Creating A Utility Computing Hosting Center

Define Hosting Center Objectives
Determine Customer Environment Needs
Determine Resource Integration Methodology
Determine Customer Service Agreement Needs
Identify Resource Monitoring Requirements
Identify Resource Provisioning Requirements
Identify Complete Virtual Cluster Packages

75
Initial Configuration

Enable Resource Monitoring
Enable Resource Provisioning
Identify Initial Virtual Cluster Packages

76
Advanced Configuration

Identify Complete Virtual Cluster Packages
Provide User Interface
Enable Customer Registration
Enable Self-Service Web Site
Enable Email Notifications/Alerts
Enable Service Policies
Automate Customer Management

77
Resource Monitoring

You need to configure Moab to be aware of what
resources are available

Sample output
node001 STATEIdle CPROC2 CMEM512 node002
STATEIdle CPROC2 CMEM512 node010 STATEDown
CPROC2 CMEM1024 n ode011 STATEIdle CPROC2
CMEM1024
78
Provisioning Resources and Managing Dynamic
Security

Moab allocates the compute nodes before the
specified time-frame, and also allocates the
resources required to provision and customize the
node.
Provisioned, customized, and secured.
To properly schedule a request, Moab makes
certain that all needed resources are available
at the appropriate times, whether used by the
requester directly or indirectly, or used by the
system to create, customize or manage the
resource.
Virtual private cluster

79
Virtual Private Cluster

Configuring Virtual Cluster Profiles
VCPROFILE
Sample attributes DESCRIPTION, NODESETLIST
Requesting Resources
mshow a
Can specify XML format
Creating VPCs
-c vpc argument of mschedctl

vpc creation gt mshow -a -p pkgA -w
minprocs4,duration100000 Partition Tasks
Nodes Duration StartOffset
StartDate --------- ----- -----
------------ ------------ -------------- ALL
4 4 100000 000000
132809_04/27 TID4 ReqID0 ALL
4 4 100000 100000
171448_04/28 TID5 ReqID0 ALL
4 4 100000 200000
210127_04/29 TID6 ReqID0 gt mschedctl -c
vpc -a resources5 -a packagepkgA vpc.721

http//clusterresources.com/moabdocs/commands/msho
wa.shtml

80
VPCs cont.

Listing VPCs
mschedctl -l vpc
Modifying VPCs
mschedctl -m vpcltVPCIDgt
Destroying VPCs
mschedctl -d vpcltVPCIDgt

81
Service Level Agreements and QoS Guarantees

Guaranteed Level of Resource Delivery Per Unit
Time
Dedicated Resources During Specified Time-frames
Guaranteed Resource Availability Response Time

82
Tight Integration with Customer Resources

Integration between customer and utility
computing batch system tools
Customization of utility computing resources to
provide similar batch environment
Queues, node features, policies, node ownership,
etc
Customization of utility computing resources to
provide similar execution environment
Operating system, applications, directory
structure, environment variables, etc
Creation of compatible user and group credentials
Automated job migration to utility computing
environment
Automated data migration between customer and
utility computing hosting center
Automated export of utility computing job and
resource status information

83
Enabling Manual Resource Access Requests

Customer must register with hosting service
Accomplished directly in Moab or Moab can extract
it from another database
Provide User Interface
may provide detailed information regarding
resources which can be made available
may provide per-customer views of available
resources where each customer only sees resources
available to him or to his class of service
may provide no general information regarding
resource availability and rather only replies to
the availability of resources for explicit
requests

84
Automating Resource Access Requests

Load Based Allocation
adjust their available resources based on the
nature and quantity of queued workload.
Time Based Allocation
specification of the period and timeframe for
allocation of resources
Failure Based Allocation
Trigger launches provisioning resources to handle
the workload

85
Standby Resources

Allows an organization to reflect its ability to
dynamically allocate and provision utility
resources to its end users, indicating both which
resources can be allocated and how quickly they
can be available.
Assists in better planning
Consistent picture of resource availability

86
Accounting, Costing, and Automated Billing

Automating Accounting
Automating Billing
Monitoring Customer Usage
Evaluating Center Effectiveness

87
Setting Up a Test Center
88
Presentation Protocols

For problems or questions Send email to
training_at_clusterresources.com
We will pause for questions at the end of each
section
Please remain on mute except for questions
Please do not put your call on hold (the entire
group will hear your music)
Please be conscientious of the other people
attending the conference
You can also submit questions during the training
to the AOL Instant Messenger screen name CRI Web
Training

89
9. TORQUE

Installation
Configuration
Job Administration
Cluster Administration
Troubleshooting and Diagnostics
Upgrading TORQUE Versions
Integrating with Moab

90
Resource Manager Responsibilities

Role of a Resource Manager
Provides job queuing facility
Monitors resource configuration, utilization, and
health
Provides remote job execution and job management
facilities
Reports information to cluster scheduler
Receives direction from cluster scheduler
Handles user client requests

91
TORQUE Basics

pbs_server
Manages queue
Collects information from mom daemons
Routes job management requests to mom
Reports to scheduler
Supports user client commands
pbs_mom
Locally monitors individual compute hosts
Reports to server
Performs low-level job management functions as
directed
Coordinates activities across resources allocated
to parallel job
http//clusterresources.com/torquedocs20

92
Installation

Extract and build the distribution on the machine
that will act as the TORQUE server

gt tar -xzvf torqueXXX.tar.gz gt cd torqueXXX gt
./configure gt make gt make install
http//www.clusterresources.com/wiki/doku.php?idt
orque1.1_installation
93
Compute Node Installation

Create the self-extracting, distributable
packages with make packages
Use the parallel shell command from your cluster
management suite to copy and execute the package
on all nodes
Run pbs_mom on all compute nodes
http//clusterresources.com/torquedocs20/a.ltorque
quickstart.shtml

94
Configuration

torque.setup ltUSERgt or pbs_server t create
Compute nodes
MOM server must be configured to trust the
pbs_server daemon
(TORQUECFG)/mom_priv/config
pbsserver parameter
(TORQUECFG)/server_name
server hostname

95
Configuration cont.

On the TORQUE server, append the list of newly
configured compute nodes to the
(TORQUECFG)/server_priv/nodes

server_priv/nodes computenode001.cluster.org com
putenode002.cluster.org computenode003.cluster.org

96
Testing Installation

shutdown server
qterm
start server
pbs_server
verify all queues are properly configured
qstat q
view additional server configuration
qmgr -c 'p s'
verify all nodes are correctly reporting
pbsnodes -a
submit a basic job
echo sleep 30 qsub
verify jobs display
qstat

97
Running Jobs in TORQUE

Start the default TORQUE scheduler
FIFO
Or
Integrate with an external scheduler
Moab Workload Manager
Maui Scheduler

98
Advanced Configuration

Customizing the Install
Most recommended configure options have been
selected as default. The few exceptions include
with-scp and possibly enable-syslog.
Configuring Job Submission Hosts
Use acl_hosts
Use torque.cfg submithosts,allowcomputehosts
Configuring TORQUE on a Multi-Homed Server
Specifying Non-Root Administrators

gt qmgr Qmgr set server managers
josh_at_.fsc.com Qmgr set server operators
josh_at_.fsc.com
99
Job Submission

qsub
Batch and Interactive
Requesting Resources
Examples
To ask for 2 processors on each of four nodes
qsub -l nodes4ppn2
The following job will wait until node01 is free
with 200 MB of available memory
qsub -l nodesnode01,mem200mb
/home/user/script.sh
Directives can be embedded into job script
example on next page
http//clusterresources.com/torquedocs20/commands/
qsub.shtml

100
Example Job Script

!/bin/sh
PBS -N ds14FeedbackDefaults
PBS -S /bin/sh
PBS -l nodes1ppn2,walltime2400000
PBS -M user_at_mydomain.com
PBS -m ae
source /.bashrc
cat PBS_NODEFILE
cat PBS_O_JOBID

101
Monitoring Jobs

gt qstat
Job id Name User
Time Use S Queue
---------------- ----------------
---------------- -------- - -----
4807 scatter user01
125634 R batch

102
Canceling Jobs

qdel
-m sends a comment

qstat Job id Name User
Time Use S Queue ----------------
---------------- ---------------- -------- -
----- 4807 scatter user01
125634 R batch ... qdel -m "hey! Stop
abusing the NFS servers" 4807
103
Job Preemption

OS level preemption
Supports job cancel, requeue, suspend/resume, and
system initiated application level checkpointing
Supports custom suspend/resume and checkpoint
signals via external scheduler configuration
Using a custom checkpoint script
Can be used to specify a particular action to
take when checkpointing a job

104
Adding/Removing Nodes

Dynamic configuration with qmgr
Or
Manually edit the nodes file
TORQUEHOME/server_priv/nodes
Restart pbs_server daemon after change

gt qmgr -c "create node node003"
105
Setting Node Properties

Node Property Attributes
Can apply multiple properties per node
Properties are opaque
Each property can be applied to multiple nodes
Properties can not be consumed
Dynamically with qmgr
Or
Manually edit the nodes file
TORQUEHOME/server_priv/nodes
FORMAT ltNODEIDgt ltPROPERTYgt
Restart pbs_server after change

gt qmgr -c "set node node001 properties
bigmem" gt qmgr -c "set node node001 properties
dualcore"
106
Node States

States
down (down)
offline (drained)
job-exclusive (busy)
free (idle/running)
Changing node state
Offline
pbsnodes -o ltnodenamegt
Online
pbsnodes -c ltnodenamegt
Viewing nodes of a particular state
pbsnodes l

107
Queue Configuration

Configure Queue Attributes
Ex. enabled, max_queuable, kill_delay
http//clusterresources.com/torquedocs20/4.1queuec
onfig.shtml

108
Configuring Data Management

For shared file systems, use usecp
For distributed file systems, use scp
See section 6 of online TORQUE documentation for
details on configuration and trouble-shooting
http//clusterresources.com/torquedocs20/6.1scpset
up.shtml

109
Monitoring Resources

TORQUE reports a number of attributes broken into
3 major categories
Configuration
Includes both detected hardware configuration,
and specified batch attributes
Can report static generic resources via
specification in the mom config file
Utilization
Includes information regarding the amount of node
resources currently available (in use) as well as
information about who or what is consuming it
Can report dynamic generic resources via
specification of a monitor script in the mom
config file
State
Includes administrative status, general node
health information, and general usage status
http//clusterresources.com/torquedocs20/a.cmomcon
fig.shtml

110
Accounting Records

TORQUE maintains accounting records for batch
jobs in the directory
TORQUEROOT/server_priv/accounting/ltTIMESTAMPgt

Record Marker Record Type Description
D delete job has been deleted
E exit job has exited (either successfully or unsuccessfully)
Q queue job has been submitted/queued
S start an attempt to start the job has been made (if the job fails to properly start, it may have multiple job start records)
111
Troubleshooting

TORQUE Log Files
ltTORQUE_HOME_DIRgt/server_logs/
pbs_mom logs
ltTORQUE_HOME_DIRgt/mom_logs/
Use loglevel or qmgr c s s loglevelx
momctl
momctl d 3 -h ltHOSTNAMEgt
tracejob
tracejob -n ltDAYSgt ltJOBIDgt
External scheduler diagnostics

112
Compute Node Health Check

Configured via the pbs_mom config file using the
parameters
node_check_script
node_check_interval
Example Health Check Script

!/bin/sh /bin/mount grep global if ? !
"0" then echo "ERROR cannot locate
filesystem global" fi
113
Considerations Before Upgrading

If upgrading from OpenPBS, PBSPro, or TORQUE
1.0.3 or earlier, queued jobs whether active or
idle will be lost. In such situations, job queues
should be completely drained of all jobs.
If not using the pbs_mom -r or -p flag, running
jobs may be lost. In such cases, running jobs
should be allowed to completed or should be
requeued before upgrading TORQUE.
pbs_mom and pbs_server daemons of differing
versions may be run together. However, not all
combinations have been tested and unexpected
failures may occur.

114
Upgrade Steps

Build new release (do not install) - See TORQUE
Quick Start Guide
Stop all TORQUE daemons - See qterm and momctl -s
Install new TORQUE - use make install
Start all TORQUE daemons - See sections 7 and 8
of the TORQUE Quick Start Guide

115
Integrating with Moab

Auto version detection
Auto import of TORQUE config
Auto configuration of interface
Just works

116
Presentation Protocols

For problems or questions Send email to
training_at_clusterresources.com
We will pause for questions at the end of each
section
Please remain on mute except for questions
Please do not put your call on hold (the entire
group will hear your music)
Please be conscientious of the other people
attending the conference
You can also submit questions during the training
to the AOL Instant Messenger screen name CRI Web
Training

117
10. A Look To The Future