Title: Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment
1Balancing Batch Workloads and CPU Activity in a
Parallel Sysplex Environment
- Prepared by Kevin Martin
- McKesson
- For CMG Canada
- Spring Seminar 2006
2Introduction
- Pharma applications run in a data center in
California. Application support is in San
Francisco and Dallas. - We implemented parallel sysplex environments last
July to improve availability. - We also installed a 2086-350 and 2086-250. The
CPU engines have the same speed, facilitating
reporting and workload balancing.
3(No Transcript)
4Z890-350 CPU Utilization by LPAR
CPU Busy
5DDCA Processor Utilizationby Workload
CPU Busy
6Z890-250 CPU Utilization by LPAR
CPU Busy
7DDCO Processor Utilizationby Workload
CPU Busy
8Reasons for Imbalanced CPU Activity
- Originally the Pharma application ran on one
production LPAR. Hard to decide how to split
processing and maintain data integrity. - Software licenses IMS and COMPAREX only on the
350 and SAS only on the 250 - System tasks TWS controller (job scheduling) on
the 350 and DFHSM migrates and backups on the 250
- Other restrictions due to problems and data
integrity concerns
9Job Routing
- Our goal was to avoid modifying JCL
- We used WLM scheduling environments, and a tool
to assign programs or jobs to the scheduling
environments
10WLM Scheduling Environments
- DDCANY run on DDCA or DDCO
- DDCA DDCA jobs
- DDCOJOBS DDCO jobs
- SAS SAS programs
- DDCO Jobs that run on DDCO using
class 6 - EDE EDICKP DD statement
- MQSERIES MQSERIES
- REEL 3420 tapes
- EDETEST EDE test jobs DM99Txxx
- DDCSPECL programs that run on the 350
11SDSF Resource Display
- RESOURCE DDCA DDCO
-
- DDCANY ON ON
- DDCO OFF ON
- DDCOJOBS OFF ON
- DDCSPECL ON OFF
- DDNAMES ON OFF
- EDE ON ON
- EDETEST ON ON
- IMSTEST ON ON
- MQSERIES ON OFF
- REEL ON OFF
- SAS OFF ON
12WLM and JES Mode Initiators
- For each job class you can specify MODEWLM or
MODEJES in the JES2 parameters - WLM mode initiators can start dynamically on any
LPAR - JES mode initiators are set for each LPAR in
permanent initiators - WLM and JES mode classes can run at the same
time. However, ensure that there are enough JES
mode initiators.
13WLM and JES Mode Initiators
- CLASS Status Mode Wait-Cnt Xeq-Cnt
Hold-Cnt JCLim - H NOTHELD WLM 3
100 - L NOTHELD WLM 1
100 - M NOTHELD WLM 1
100 - N NOTHELD WLM
100 - O NOTHELD WLM
100 - 1 NOTHELD WLM
100 - 2 NOTHELD JES
100 - 3 NOTHELD JES 7
100 - 4 NOTHELD WLM
100 - 5 NOTHELD JES
100 - 6 NOTHELD JES
100
14Problem 1 slower turnaround on one LPAR more
jobs running.
- TWS controller is on DDCA. When a job is
released, a WLM initiator is available on the
same LPAR first. - For example, there could be 15 jobs on DDCA and
only 5 jobs on DDCO. So the jobs on DDCA get
slower turnaround than the ones on DDCO. - This gets worse if high priority jobs are running
on the busy LPAR. The low priority jobs will run
very slowly. - Checked DASD response and tuned JES MAS parms.
- We routed several large priority jobs to DDCO by
assigning specific job names to a scheduling
environment named DDCOJOBS.
15Problem 2 Releasing many jobs at the same time
- 8 or 16 large jobs are released at once. They
are on the critical path for a schedule and they
have a high priority. - With WLM mode initiators most of the jobs could
start on one LPAR because that LPAR was not busy
at the time that the jobs were released. - For example, DDCA could get 2 jobs and DDCO could
get 6 jobs. The jobs on DDCA would finish
earlier, and then DDCA would be idle while DDCO
was still busy. - We assigned these groups of large priority jobs
to JES mode job classes to balance the LPAR
activity better. Defined four class 5 initiators
on DDCA and four class 5 initiators on DDCA.
Assigned DY65 jobs to class 5.
16Problem 3 WLM initiators and jobs on the input
queue
- Priority jobs would start, but lower priority
jobs would wait on the input queue - With over 10,000 jobs running per day, we found
some jobs that were incorrectly classified. - We defined a WLM policy override to change the
BATLOW service class to importance level 3, the
same importance level as the higher priority
batch. After the FIXINPUT policy override was
activated, the jobs on the input queue would
start. Sometimes it would take 10 minutes to
start all of the jobs. Afterwards the regular
policy was activated again.
17How to make WLM policy overrides
- On the WLM service policy selection list, specify
action code 2COPY to copy the base policy to a
new policy named FIXINPUT. - Then specify action code 7Override Service
Classes to modify the service class goals for
FIXINPUT. - Then specify action code 3Override Service Class
to modify the goals for specific service classes
in the policy override. - To activate the policy, enter V
WLM,POLICYFIXINPUT - To display the WLM policy, enter D WLM
18Jobs on the input queue
- Apar UA21235 on z/OS 1.4 systems.
- Correction was released in October, 2005
- Currently WLM does not start additional
initiators for local batch work with system
affinities when idle initiators exist on other
systems in the sysplex. This can lead to
situations where local batch jobs are delayed
for a significant period of time because a local
shortage of initiators exists. The situation is
most visible on large sysplex environments with
batch work having system affinities to only few
systems. WLM improves to start initiators by
looking more closely at the number of initiators
which can really handle the affinity work.
19Summary
- Balance LPAR activity in order to optimize
capacity in a parallel sysplex environment. - WLM mode initiators work well in most cases.
It is essential that the correction for UA21235
is installed. - It is OK to mix WLM mode and JES mode job
classes, provided that there are always enough
fixed initiators for each JES mode job class.
20(No Transcript)
21Changes in CPU utilization
- Overall CPU activity decreased from September to
January due to tuning. - DDCA decreased due to tuning improvements.
- DDCO increased in August and then remained at the
same utilization due to better workload
balancing. - The following graphs show how the LPAR activity
became more balanced.
22(No Transcript)
23(No Transcript)
24(No Transcript)