FORTIS DATA SHARING Implementation

About This Presentation

Title:

FORTIS DATA SHARING Implementation

Description:

the CF receives a updated version of a page by DB2A; that same page must be ... For Full Function DB Updates, a new type of lock : Block Locks (OSAM, VSAM ESDS ... – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 72

Provided by: mari233

Category:

more less

Transcript and Presenter's Notes

Title: FORTIS DATA SHARING Implementation

1
FORTIS DATA SHARING Implementation

IMS-DB2 GUIDE - 28/03/2002
Frans De Brabanter/Marc Piron

2
CONTENT

FORTIS history WHY //-sysplex
Data sharing theory
Implementation phases and experiences
Near-term future
Contacts
marc.piron_at_fortisbank.com
franciscus.debrabanter_at_fortisbank.com
and many others ...

3
FORTIS history

1998
merger ASLK-CGER GENERALE BANK
IS-target by mid september 2001, merger of the
2 data centers
choice of the target platform ex-A becomes
Fortis platform
choice of the target applications target is a
mix of ex-A and ex-G applications (target ex-G
applications to be moved to FORTIS-platform)
move split up into 10 phases, taking into account
functional dependencies
during the move technical changes (to adapt to
Fortis naming conventions, security, ), but NO
logical changes
target by begin 2001, all target applications
running on Fortis-platform, with the ex-A data
mid-september merge ex-A and ex-G data kill
ex-G platform

4
FORTIS migration plan

Ex-A Production Environment
SYSA 1 IMS/DB2/MQS
SYSC - infocenter 1 DB2 (pure non-ims)
Ex-G Production Environment
SI40 normal production 1 IMS/DB2/MQS
SI70 non-brick production (frontend) 1
IMS/DB2/MQS
SI90 infocenter production (ims non-ims) 1
IMS/DB2
Target move SIxx to SYSA
SI40 by mid-september 2001
SI70/SI90 by mid-2002

5
FORTIS migration plan

Limited sysplex experience in ex-G
Decision for 2-way //-sysplex for reasons of
CAPACITY
IMS WADS ? Cfr next foil
Logging ?
Limit on number of MIPS ?
Limit on memory ?
BIG MOVE is a one-shot, not a gradual process
the risk of a WAIT AND SEE approach is not
acceptable
after integrating SI40, still integration to do
of SI70 and SI90
trend of workload in general always UP

6
FORTIS migration plan
7
FORTIS migration plan

Decision for 2-way //-sysplex for reasons of
AVAILABILITY
in case of PROBLEMS/DISASTER on one system, other
system should continue to work presuming there
is a good session balancing, only half of the
connections impacted
rolling implementation of maintenance and IPL
still potential problem if outage takes too long
and all the workload should be taken over on one
system

8
FORTIS migration plan

Which kind of //-sysplex Partial or Full
Partial (Vertical splitting)
effort to isolate clusters of applications
and make each cluster run on one system in the
sharing-group
less access to the coupling facility less
overhead --gt attractive
BUT
is clustering stable in time ?
how to choose and document ?
what if machine power/cluster load changes
rebalance ?
users need access to multiple clusters (and
IMS-systems)
if one system down a whole application is down

9
FORTIS migration plan

Which kind of //-sysplex Partial or Full
Full (Horizontal splitting)
technically possible IMS V6 offers Vtam Generic
Resources
transparant application runs where the power is
(WLM)
simple approach
only ONE question is it feasible ? (not that
many references)
Decision FULL (although with a limited set of
affinities)
preferred half of users be out instead of half
the applications when system fails
links at inter-application level everywhere
extremely difficult to isolate clean and
well-defined clusters of applications

10
DB2 - sharing overview(cfr IBM)
11
DB2 Locking overview

Local Locks Business as usual
Global Locks data sharing lock, saved in CF
ALWAYS put in CF PARENT LOCKS (on
tablespace/partition level)
SOMETIMES put in CF CHILD LOCKS
table - for segmented tablespace
page
record

12
DB2 Locking overview

Logical Locks
control concurrency
can be local or global
PARENT locks always global
CHILD locks global or local
are associated to programs
Physical Locks
control inter-DB2 coherency and consistency of
pages
always global
are associated to DB2 subsystems

13
DB2 Locking logical/physical
14
DB2 L-Locking overview

If those Parent Locks are held in the CF, for a
particular tablespace or partition
DB2A/DB2B
S / S
S / X
X / X

Then those Child Locks will be propagated to the
CF
DB2A/DB2B
None / None
All / X
All / All

15
DB2 L-Locking overview (cfr IBM)
16
DB2 L-Locking overview

Detail Lock Table entry

Exclusive owner
Shared lock status
00
1 / 0 / 0 / 0 / 0 / 0 / 0 / 0
5
...
01
10
0 / 0 / 0 / 0 / 0 / 0 / 0 / 0
...
17
DB2 L-Locking overview

Types of L-Lock Contention
False Contention
2 DB2s request incompatible locks on two
different database objects, belonging to the same
hash class
XES Contention
2 DB2s request f.i. an IS- and IX-lock, but
changed to S- and X-lock respectively by XES
Real (IRLM) Contention
2 DB2s request incomatible locks on the same
database object

18
DB2 L-Locking overview

Resolution of L-Lock Contention
False Contention
solved by XES informing the IRLMs to grant the
lock
XES Contention
solved by XES, by giving control to the IRLM
contention exit of the Global Lock Manager
Real Contention
Wait . until the other process terminates

19
DB2 P-Locking overview

Depends on the OPEN/CLOSE status of the
tablespace/indexspace
At Open page set
from None ---gt Read only ( _ ---gt IS)
At First update
from Read Only ---gt Read/Write (IS ---gt IX)
At Pseudo Close (no update for PCLOSEN/PCLOSET)
from Read/Write ---gt Read Only (IX---gt IS)
At Physical Close (no activity for
PCLOSEN/CLOSET CLOSE YES pseudo closed)
from Any ---gt None (IS/IX ---gt _)

20
DB2 P-Locking overview

P-locks are kept by each individual IRLM
Are promoted to the CF
when promoted to the CF, conflicting P-locking
can be detected
solved by a process of negotiation
P-lock mode changes
Result of negotiation triggering/stopping the
Inter DB2 Read Write (IDRW)-status (page set is
becoming Group Buffer Pool Dependent)
GBP-behaviour FORCE AT COMMIT

21
DB2 P-Locking and GBPools
22
Example flow in GBP(cfr IBM)
23
DB2 Group Buffer Pools (cfr IBM)
24
DB2 Group Buffer Pools (cfr IBM)
25
DB2 Group Buffer Pools (cfr IBM)
26
DB2 Group Buffer Pools (cfr IBM)
27
DB2 Group Buffer Pools

Are divided in 2 parts directory data
Cross-invalidation reasons
the CF receives a updated version of a page by
DB2A that same page must be marked invalid in
the local bufferpool of DB2B (if DB2B wants a
fresh copy it must be read from the CF)
the number of directory entries is too small
when all directory entries are exhausted, and a
new page must be registered in the CF
one of the existing directories (with clean
pages) is choosen
the DB2s having the page referenced by that
directory receive a XI signal for that page
the new page is registered in the freed directory
entry

28
P-locks and Row Locking (cfr IBM)
29
DB2 sharing managing

Managing roles accomplished by each DB2
the Global Lock Manager is the DB2 subsystem
generating the first Update-lock on the resource
resolves conflicting lock situations
the GBP Structure owner is the DB2 subsystem
which first connects to the GBP monitors the
GBP level threshold, and constructs a list of
page names to be castout
triggers the GBP checkpoint
the pageset castout owner is the DB2 subsystem
which first updates the pageset ownership
reassigned to another updating DB2 at
pseudo/physical close
castout via private buffers of pageset castout
owner

30
IMS sharing overview(cfr IBM)
31
IMS/DB2 use of Cache in CF

STORE IN CACHE
DEDB/VSO
read of a VSO-CI ALWAYS brings the CI into local
storage from DASD and caches it in the CF
PRELOAD
DB2
pages read SOMETIMES promoted to CF update on
GBP-dep TS
GBPCACHE ALL
STORE THROUGH CACHE
OSAM - not activated
DIRECTORY ONLY CACHE
OSAM VSAM

32
IMS sharing behaviour

IMS
IMS V6 installed ---gt possibility of DEDB/VSO
impact on the cache structures
a couple of heavily used DBs pseudo-SPA /
branch-protocol
use of IRLM will trigger BLOCK LOCKs
used to serialize updates to a block (CI) by
different IMS systems, for Full Function (will
not replace existing database record locks)
---gt additional overhead impact on the lock
structures
IBM message do not panic, but watch out for BAD
applications - they can only get WORSE pay
especially attention to deadlocks
ONE advantage by putting SHRLVL 3 on the
datasets, IMS begins notifying ALL the locks to
the Lock Structure in the CF ---gt effect on the
Lock Structure can easily be measured in the
starting phase

33
IMS sharing behaviour

IMS - potential reasons for WORSE behaviour of
the Full Function applications
For Full Function DB Updates, a new type of lock
Block Locks (OSAM, VSAM ESDS VSAM KSDS)
also for pointer updates
Block Locks always kept until Sync point always
private attribute
Block Lock always prohibiting concurrent update
on 2 IMSs to the same CI (lt---gt DB record lock in
non-sharing environment)
Block Locks even in 1 IMS-system, concurrent
erase and insert/update on KSDS CI not possible
anymore
CI-split Block Lock on the CI
TWIN ROOT Pointers in HIDAM maintenance of the
bi-directional pointers implies extra locking on
neighbour
Dataset Busy Locks in combination with Block
Locks deadlocks !

34
Ex-A and Ex-G environments

A-side
(system testing)
Development
Test
Quality Assurance (test cases to be maintained by
Development Teams)
Production
G-side
(system testing)
Development
Acceptance (regular refreshes from Production)
Production

35
FORTIS environments

(system testing)
sharing OK at T-16 months - phase 1 gt PLXT
Development
Test
Acceptance (regular refreshes from Production)
sharing OK at T- 10 months - phase 2 gt PLXB
Production
sharing OK at T- 7 months - phase 3 gt PLXA
Quick Fix (for testing emergency fixes in Prod)

36
Starting situation

IMS V6
Block-level data sharing possible for DEDB-VSO
Block-level data sharing possible for DEDB using
SDEP segments
makes Vtam Generic Resources possible
NO use of shared message and fast path EMH
queues, in the first phase
DB2 V5
very stable
is V6 stable enough ?
V5 functionality is OK

37
Starting what to expect ?

DB2
use of the Group Buffer Pool is a general option
FULL sharing ---gt every table at every moment
potential GBPdependent
difficult to foresee what will happen in real
life wait and see ?
locking DB2 tries to be more intelligent than
IMS
only lock registration in the CF if really
necessary
is nice BUT what can we expect wait and see ?
IMS and DB2 figures as of March 2001
transaction load/day pure IMS/IMSDB2
3.750.000/1.250.000
space IMS DB2 gt 1 Tbyte
mid-september 2001 trx load expected to double
/ 70 space increase

38
Starting what to expect ?

Important mix of ex-A ex-G applications
Effort on the merger - not on the tuning of the
applictions
IMS gets higher priority than DB2 (75 in IMS)
Decision to take for IMS DEDB-VSO which DBs
and to which extent stored in the CF
private pool for each AREA in the CF better
follow up and tuning
4 DEDB-VSOs in CF
2 PRELOAD - small DBs (lt 200K)
1 LOOKASIDE - protocol DB for communication with
the branches
1 pseudo-SPA

39
Starting first experiences

IMS - March 2001
on PLXB set up the batch workload (Scheduling
Environments OK)
start with BMP balancing
scripts for TPNS stress testing for a selected
set of transactions, concurrent with selected
BMPs
pushing the system to see what happens
not realistic too high transaction load
simulated, with too few different transactions
Nevertheless IMS SYSTEMs BLOCKED
gt PTFs to be applied

40
Starting - first experiences

IMS - April 2001
putting SHRLVL 3 for IMS datasets, on PLXA CF
IOs peaking at 17K per sec during online
CF capacity 45K-50K IOs per sec ?
---gt action needed to limit the usage of the CF
structures
meanwhile SHRLVL 3 desactivated
action started to understand the reason of the
high lock numbers
IBM-contacts Pay Attention to the number of
daily deadlocks
FORTIS 200-300 deadlocks/day considered to be
(too) high
in data sharing environment a certain
multiplication factor may be expected
SHRLEVEL 3 reactivated for a limited set of
DBs

41
Starting - first experiences

IMS - May 2001
working in 2 directions in parallel
find out the highest lock-generators
understand the deadlock-reports
identify PSBs where procopt GOT would make sense
investigate the potential gains of removing TWIN
ROOT pointers on HIDAM root segments
started developing a tool to extract info from
IMS lock traces
some people are becoming specialists in locking
behaviour of IMS

42
Starting - actions follow up

IMS - MAY 2001 - actions results
lock trace showed 25 DBs responsible for 90 of
locks
2 DBs responsible for 40 of deadlocks (total of
300-400 deadlocks per day)
related to high overflow usage and contention on
IOVF
compression solved the problem for the 2 DBs
highly referenced DBs with low level update
lots of PCBs found with Procopt A
automated change by DBA to Procopt GO without
application changes
savings 600K locks per day
starting removing Twin Root pointers for a
selected list of HIDAMs

43
Starting - actions follow up

IMS - JUNE 2001
detailed analysis of lock traces
lock peaks for particular DBs identified
processes active at the peak-time identified and
investigated badly-coded calls adapted
Development teams contacted to change PCB Procopt
to GOT where mass change by DBA not feasible
(matching IMS LOGS lt---gt PCB definitions)
by the end of June online lock registration at
8K IOs per sec, with peaks of 10K-11K IOs per
sec (50 gain compared to start)
Overnight rate up to 27K IOs per sec (mid-june
BMPs in sysplex)
deadlock rate down to 100 deadlocks per day (from
400)
further automation of the deadlock detection
automated daily list of involved DBs /
transactions / jobs mailing to DBA

44
Starting - actions follow up

IMS - JULY 2001
detailed analysis of overnight lock traces
started
a number of BMPs identified as to examine
majority high GU/GN activity (gt 1.000.000)
Procopt changed to GOT where possible
PCB added ---gt 2 PCBs (1 for read, 1 for update)
Deadlocks continued to require a lot of
attention possible actions
compression of Fast Path data (less in IOVF)
higher checkpoint frequency (at most 1 second
during online)
use of GOT wherever possible
adjustment CI sizes/freespace on certain
highly-accessed indexes
(release lock by using extra call with high
key) - rarely applied
End of July 70-150 deadlocks/day across both
systems in the sysplex

45
Starting - actions follow up

IMS - August 2001
during the first half of August online network
gradually opened
---gt steep increase in deadlocks across a number
of related transactions
every case involved already known, but rate was
now sometimes 100-fold
mostly due to hotspot indexes
Fortis relatively high number of secondary
indexes (dixit IBM), thus greater potential for
hotspots
Development Teams responsive, but
unfortunately, for a couple of deadlocks the most
obvious (and only) solution was creating
affinities via MSC, for MPPs and BMPs in order to
force them to run on the same system

46
Starting - actions follow up

IMS - September 2001
beginning of September, before the BIG MOVE
between 3K (daytime) and 6K (overnight) IOs/sec
on lock structure, on average
after the BIG MOVE stabilisation at 250-300
deadlocks per day
higher than we liked to see, but less than feared

47
Starting - actions follow up

DB2 - JUNE 2001
Open question regarding locking do we need to
do the same effort as for IMS, by pushing
ISOLATION(UR) ?
Default all Plans/Packages bound with
ISOLATION(CS) - sometimes UR on individual SQL
statements
RELEASE(COMMIT)
CURRENTDATA(NO)
aware of potential less Lock Avoidance (GCLSN
instead of CLSN)
a limited number of heavily executed SQLs WITH
UR added
Decision not to change the BIND-parameters and
count on DB2 for optimization of lock
registration in the CF gt REBIND BMP with
RELEASE(DEALLOCATE) can be done very quickly in
case of too many Global Locks

48
Starting - actions follow up

DB2 - August 2001
Estimating size of Group Buffer Pools
based on the number of Buffers Written (versus
Buffers Updated) and the elapsed time we want
the page to be available in the Group Buffer Pool
for reuse by another DB2
f.e. in peak 200000 buffers written on 10 minutes
to be able to keep up for 5 minutes a Group
Buffer Pool of 1000004K needed
Avoid XI caused by too few directory entries
made the sum of all number of pages in local
virtual bufferpools hiperpools Group Buffer
Pool total pages
ratio directory entries/cache directory
entries gt total pages
gt never XI due to insufficient directory
entries

49
Starting - actions follow up

DB2 - August 2001
Avoid flood of committed updated pages to the
Group Buffer Pool when transaction/BMP commits
trying to have a smooth process of emptying the
Virtual Buffer Pool
VDWTH set to 64 pages - makes transfer of updated
pages to the Group Buffer Pool asynchronously
Avoid filling up of Group Buffer Pool avoidance
of insufficient number of CASTOUT engines
class castout 1 ( minimum)
GBP treshold 10
GBP checkpoint 30 minutes

50
Miscellaneous actions

Avoid U3307 abends (CF Lock table full)
IMS LOCKMAX parameter adapted downwards new
default is 15K
DB2 via combination of NUMLKUS
NUMLKTS/LOCKSIZE/LOCKMAX
Ultimate solution for bad application behaviour
Affinity on one system (via MSC), eventually
serialisation
Affinities to be avoided as much as possible is
not in line with the sysplex philosophy
Review randomizing module to avoid synonyms
pointing to the same RAP

51
Miscellaneous actions

Passing commands always scope GLOBAL
Checks the correct execution of the command on
both systems
if OK, continue with next step
if NOK on at least one system, rollback on all
systems
IMS AOI (local) APPC(remote) implementation
if one IMS/DB2 does not respond job in abend
possibility to bypass this abend (in case of
maintenance)
Resynchronization of IMS-DB-status and TRX
classes on IMS startup
based on extract of last system checkpoint of
last stopping IMS system
based on the active IMS if one is still active

52
Miscellaneous actions

Affinities defined for reasons of
external connections BCNL, MERVA, CSFI,
deadlock avoidance
serial definition of transaction
for protocol reasons (input device no human being
f.e.)
for performance problems (only 3 instances)
all trxs and BMPs processing (GET) MQSeries
messages queues defined on one system (PUT no
problem due to MQ-clustering-implementation)

53
(No Transcript)
54
Used Structures (1/2)

size max IO/sec
DB2 - Group Buffer Pools (duplex) 1.2 GB
4.250 - Lock structure 128 MB 1.200
- Shared Communication Area 49 MB 10
IMS - VSAM structure 20 MB 1.700 -
OSAM structure 6 MB 3.200 - Lock
structure 64 MB 25.000 - Fast Path VSO
DBs 144 MB 1.250 - Shared Queue
(test)
XCF signalling paths 28 MB 2.400
GRS star 8 MB 100
RACF 85 MB
DFSMS/HSM Record Level Sharing (test)

55
Used Structures (2/2)

size max. IO/sec.
JES2 primary checkpoint 20 MB 100
LOGREC (test)
OPERLOG 16 MB 1.750
MIM 22 MB 3.000
Shared Queue MQ (test)
Resource Recovery System (test)
Enhanced Catalog Sharing 1 MB
XBM 35 MB
VTAM Generic Resources 9 MB 300

56
Workload balancing

BATCH WLM Scheduling Environments WLM
Managed Initiators.
ONLINE VTAM Generic Resources.

57
WLM Scheduling Environments
//JOBNAME JOB ...,SCHENVIMP1 //STEP1 EXEC
IMSBATCH
58
WLM Scheduling Environments
//JOBNAME JOB ...,SCHENVIMP2 //STEP1 EXEC
IMSBATCH
59
WLM Scheduling Environments
//JOBNAME JOB ...,SCHENVIMP //STEP1 EXEC
IMSBATCH
60
IMSBATCH procedure
61
IMSBATCH procedure
IMASP1
... // SET IMULOAD'I10.IM.CA.IMSP1.USERLIB', //
IMSID'IMSP1',
...
IMASP
... // SET IMULOAD'I10.IM.CA.IMSP.USERLIB', //
IMSID'IMSP', ...
IMASP2
... // SET IMULOAD'I10.IM.CA.IMSP2.USERLIB', //
IMSID'IMSP2',
...
... // INCLUDE MEMBERIMASENV //G EXEC
PGMDFSRRC00,REGIONRGN, //
PARM(BMP,MBR,PSB,IN,OUT, //
OPTSPIETESTDIRCA,PRLD, ...
// DD DSNIMULOAD,DISPSH
R ...
62
(No Transcript)
63
BMP restart - JES2 exit4

PROBLEM !
A BMP that abends has to be restarted on the same
IMS subsystem where it was running.

64
BMP restart - JES2 exit4
//JOB1 JOB ...,SCHENVIMP //STEP1 EXEC
IMSBATCH
65
BMP restart - JES2 exit4
//JOB1 JOB ...,SCHENVIMP //STEP1 EXEC
IMSBATCH
66
BMP restart - JES2 exit4
When job starts and flag exists gt RESTART
//JOB1 JOB ...,SCHENVIMP //STEP1 EXEC
IMSBATCH
67
BMP restart - JES2 exit4
When job starts and flag exists gt RESTART
//JOB1 JOB ...,SCHENVIMP1 //STEP1 EXEC
IMSBATCH
68
(No Transcript)
69
Status today

Daily gt 8,000,000 transactions
6,000,000 IMS-only
2,000,000 IMS DB2 (/- 250,000,000 SQLs/day)
peak up to 400 trxs/sec
branches equally split amongst the 2 IMS-systems
BUT WLM still favors SYSA (especially for BMPs)
f.e. total SQL statements during online hours
85 on SYSA/ 15 on SYA2 (85,000,000 on
SYA1/15,000,000 on SYA2)
Locking
IMS 0.3 True versus 0.1 False Contention
DB2 2-3 True versus 0.5 False Contention (no
real effort done gt still optimization to do)

70
Second half of 2002 4-Way
IMP4
IMP2
IMP3
IMP1
DBP4
DBP3
DBP2
DBP1
DBPI
71
In the Pipeline

Installing MQSeries 5.2
non-persistent messages in Coupling
IMS Shared Queue
testing on PLXT
MQSeries 5.3
persistent messages in Coupling - nothing planned
yet

Write a Comment

User Comments (0)

About PowerShow.com

FORTIS DATA SHARING Implementation - PowerPoint PPT Presentation

FORTIS DATA SHARING Implementation

the CF receives a updated version of a page by DB2A; that same page must be ... For Full Function DB Updates, a new type of lock : Block Locks (OSAM, VSAM ESDS ... – PowerPoint PPT presentation