Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment

1 / 24
About This Presentation
Title:

Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment

Description:

REEL 3420 tapes. EDETEST EDE test jobs DM99Txxx. DDCSPECL programs that run on the 350 ... REEL ON OFF. SAS OFF ON. WLM and JES Mode Initiators ... –

Number of Views:109
Avg rating:3.0/5.0
Slides: 25
Provided by: regio6
Category:

less

Transcript and Presenter's Notes

Title: Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment


1
Balancing Batch Workloads and CPU Activity in a
Parallel Sysplex Environment
  • Prepared by Kevin Martin
  • McKesson
  • For CMG Canada
  • Spring Seminar 2006

2
Introduction
  • Pharma applications run in a data center in
    California. Application support is in San
    Francisco and Dallas.
  • We implemented parallel sysplex environments last
    July to improve availability.
  • We also installed a 2086-350 and 2086-250. The
    CPU engines have the same speed, facilitating
    reporting and workload balancing.

3
(No Transcript)
4
Z890-350 CPU Utilization by LPAR
CPU Busy
5
DDCA Processor Utilizationby Workload
CPU Busy
6
Z890-250 CPU Utilization by LPAR
CPU Busy
7
DDCO Processor Utilizationby Workload
CPU Busy
8
Reasons for Imbalanced CPU Activity
  • Originally the Pharma application ran on one
    production LPAR. Hard to decide how to split
    processing and maintain data integrity.
  • Software licenses IMS and COMPAREX only on the
    350 and SAS only on the 250
  • System tasks TWS controller (job scheduling) on
    the 350 and DFHSM migrates and backups on the 250
  • Other restrictions due to problems and data
    integrity concerns

9
Job Routing
  • Our goal was to avoid modifying JCL
  • We used WLM scheduling environments, and a tool
    to assign programs or jobs to the scheduling
    environments

10
WLM Scheduling Environments
  • DDCANY run on DDCA or DDCO
  • DDCA DDCA jobs
  • DDCOJOBS DDCO jobs
  • SAS SAS programs
  • DDCO Jobs that run on DDCO using
    class 6
  • EDE EDICKP DD statement
  • MQSERIES MQSERIES
  • REEL 3420 tapes
  • EDETEST EDE test jobs DM99Txxx
  • DDCSPECL programs that run on the 350

11
SDSF Resource Display
  • RESOURCE DDCA DDCO
  • DDCANY ON ON
  • DDCO OFF ON
  • DDCOJOBS OFF ON
  • DDCSPECL ON OFF
  • DDNAMES ON OFF
  • EDE ON ON
  • EDETEST ON ON
  • IMSTEST ON ON
  • MQSERIES ON OFF
  • REEL ON OFF
  • SAS OFF ON

12
WLM and JES Mode Initiators
  • For each job class you can specify MODEWLM or
    MODEJES in the JES2 parameters
  • WLM mode initiators can start dynamically on any
    LPAR
  • JES mode initiators are set for each LPAR in
    permanent initiators
  • WLM and JES mode classes can run at the same
    time. However, ensure that there are enough JES
    mode initiators.

13
WLM and JES Mode Initiators
  • CLASS Status Mode Wait-Cnt Xeq-Cnt
    Hold-Cnt JCLim
  • H NOTHELD WLM 3
    100
  • L NOTHELD WLM 1
    100
  • M NOTHELD WLM 1
    100
  • N NOTHELD WLM
    100
  • O NOTHELD WLM
    100
  • 1 NOTHELD WLM
    100
  • 2 NOTHELD JES
    100
  • 3 NOTHELD JES 7
    100
  • 4 NOTHELD WLM
    100
  • 5 NOTHELD JES
    100
  • 6 NOTHELD JES
    100

14
Problem 1 slower turnaround on one LPAR more
jobs running.
  • TWS controller is on DDCA. When a job is
    released, a WLM initiator is available on the
    same LPAR first.
  • For example, there could be 15 jobs on DDCA and
    only 5 jobs on DDCO. So the jobs on DDCA get
    slower turnaround than the ones on DDCO.
  • This gets worse if high priority jobs are running
    on the busy LPAR. The low priority jobs will run
    very slowly.
  • Checked DASD response and tuned JES MAS parms.
  • We routed several large priority jobs to DDCO by
    assigning specific job names to a scheduling
    environment named DDCOJOBS.

15
Problem 2 Releasing many jobs at the same time
  • 8 or 16 large jobs are released at once. They
    are on the critical path for a schedule and they
    have a high priority.
  • With WLM mode initiators most of the jobs could
    start on one LPAR because that LPAR was not busy
    at the time that the jobs were released.
  • For example, DDCA could get 2 jobs and DDCO could
    get 6 jobs. The jobs on DDCA would finish
    earlier, and then DDCA would be idle while DDCO
    was still busy.
  • We assigned these groups of large priority jobs
    to JES mode job classes to balance the LPAR
    activity better. Defined four class 5 initiators
    on DDCA and four class 5 initiators on DDCA.
    Assigned DY65 jobs to class 5.

16
Problem 3 WLM initiators and jobs on the input
queue
  • Priority jobs would start, but lower priority
    jobs would wait on the input queue
  • With over 10,000 jobs running per day, we found
    some jobs that were incorrectly classified.
  • We defined a WLM policy override to change the
    BATLOW service class to importance level 3, the
    same importance level as the higher priority
    batch. After the FIXINPUT policy override was
    activated, the jobs on the input queue would
    start. Sometimes it would take 10 minutes to
    start all of the jobs. Afterwards the regular
    policy was activated again.

17
How to make WLM policy overrides
  • On the WLM service policy selection list, specify
    action code 2COPY to copy the base policy to a
    new policy named FIXINPUT.
  • Then specify action code 7Override Service
    Classes to modify the service class goals for
    FIXINPUT.
  • Then specify action code 3Override Service Class
    to modify the goals for specific service classes
    in the policy override.
  • To activate the policy, enter V
    WLM,POLICYFIXINPUT
  • To display the WLM policy, enter D WLM

18
Jobs on the input queue
  • Apar UA21235 on z/OS 1.4 systems.
  • Correction was released in October, 2005
  • Currently WLM does not start additional
    initiators for local batch work with system
    affinities when idle initiators exist on other
    systems in the sysplex. This can lead to
    situations where local batch jobs are delayed
    for a significant period of time because a local
    shortage of initiators exists. The situation is
    most visible on large sysplex environments with
    batch work having system affinities to only few
    systems. WLM improves to start initiators by
    looking more closely at the number of initiators
    which can really handle the affinity work.

19
Summary
  • Balance LPAR activity in order to optimize
    capacity in a parallel sysplex environment.
  • WLM mode initiators work well in most cases.
    It is essential that the correction for UA21235
    is installed.
  • It is OK to mix WLM mode and JES mode job
    classes, provided that there are always enough
    fixed initiators for each JES mode job class.

20
(No Transcript)
21
Changes in CPU utilization
  • Overall CPU activity decreased from September to
    January due to tuning.
  • DDCA decreased due to tuning improvements.
  • DDCO increased in August and then remained at the
    same utilization due to better workload
    balancing.
  • The following graphs show how the LPAR activity
    became more balanced.

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com