Title: Open VMS Performance Tips
1Open VMS Performance Tips Tricks
Guy Peleg President Maklee Engineering
guy.peleg_at_maklee.com
2Performance Why should you care?
Application Tuning
Oracle Tuning
System Tuning
Java Tuning
3The Golden Rules
Source OpenVMS Information Desk October 2004
- The best performing code isthe code not being
executed - The fastest I/Os are those avoided
- Idle CPUs are the fastest CPUs
- Look at your code.be ready to be surprised
4RMS
- RMS holds great potential for improving
performance - The C RTL uses RMS
- Most C applications would benefit from RMS tuning
5RMS
- RMS parameters related to performance
- FAB/RAB parameters (should you have access to the
code) - ASY, RAH, WBH, DFW, SQO
- ALQ DEQ
- MBC MBF
- NOSHR, NQL, NLK
- SET RMS
- /SYSTEM /PROCESS
- /BUFFER_COUNTn
- /BLOCK_COUNTn
- SYSGENgt SET RMS_SEQFILE_WBH 1
- Dont be afraid of Global Buffers
6FTP Performance Simple RMS Tuning
- FTP into IT13 and transfer the file
- Brutelgt ftp it13
- 220 IT13.bruclass.com FTP Server (Version 5.6)
Ready. - Connected to ALPH13.BRUCLASS.COM.
- Name (ALPH13.BRUCLASS.COMbru_guy) peleg
- 331 Username peleg requires a Password
- Password
- 230 User logged in.
- FTPgt cd 1dga703000000
- 250-CWD command successful.
- 250 New default directory is 1DGA703000000
- FTPgt put HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64E
XE - 200 TYPE set to IMAGE.
- 200 PORT command successful.
- 150 Opening data connection for
1DGA703000000HP-I64VMS-JAVA150-V0105-1-1.PC - SI_SFX_I64EXE (192.168.1.7,49428)
- 226 Transfer complete.
- local SYSSYSDEVICEBRU_GUYHP-I64VMS-JAVA150-V0
105-1-1.PCSI_SFX_I64EXE1 rem - ote HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE
7FTP Performance Simple RMS Tuning
- set rms/sys/exte60000/seq/block127/buf8
- mc sysgen
- SYSGENgt SET RMS_SEQ 1
- SYSGENgt W A
- SYSGENgt Exit
- Throughput increased by more than 50
- FTPgt put HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64E
XE - 200 TYPE set to IMAGE.
- 200 PORT command successful.
- 150 Opening data connection for
1DGA703000000HP-I64VMS-JAVA150-V0105-1-1.PC - SI_SFX_I64EXE (192.168.1.7,49432)
- 226 Transfer complete.
- local SYSSYSDEVICEBRU_GUYHP-I64VMS-JAVA150-V0
105-1-1.PCSI_SFX_I64EXE1 rem - ote HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE
- 286026004 bytes sent in 000031.83 seconds
(8773.78 Kbytes/s) - 200 TYPE set to ASCII.
8gZIP RMS
- gZIP is written in C I/Os eventually reach RMS
- 1.6 Ghz rx2600, MSA30, OpenVMS V8.3
- Test 1
- Compress 5.67 GB saveset
- Decompress 2.74 gZIP archive
- Default O/S RMS settings
- Test 2
- Compress 5.67 GB saveset
- Decompress 2.74 gZIP archive
- SET RMS/BLOCK127/EXTEN60000/BUFFER8,
RMS_SEQFILE_WBH1
9gZIP RMS
Elapsed Time in Minutes (less is better)
10Smaller MBC for Random Access
- Times to read 1,000,000 records randomly (same
sequence of records (where mbc passed as first
parameter
frand 32 Elapsed time 42823ms frand
64 Elapsed time 54761ms frand 96 Elapsed
time 66343ms frand 124 Elapsed time
80122ms frand 1 Elapsed time 31205ms
frand 1 Elapsed time 31233ms frand
2 Elapsed time 31680ms frand 4 Elapsed time
32607ms frand 8 Elapsed time 33698ms
frand 16 Elapsed time 36101ms
11RMS fsynch()
- Writing small amount of data?
- Using fsynch() ?
- Slow !
- Setting MBC MBF to 1 is (almost!) identical
- Still need to take care of EOF
12Sequential Writes
- Frequent file expansions are expensive
- Typically seen with
- BACKUP savesets
- Database Imports
- FTPing large files
- The significant amount spent expanding files
impacts performance - If possible pre allocate files (container
files) - Limit the number of expansions on a volume
- SET VOLUME/EXTEND65535
13Black Magic
- What would you say about improving system
performance by 5 - 20? - A typical response would be What does it
take? - Nothing ! Just a small change to one SYSGEN
parameter - .and some physical memory
- Sounds interesting?
14Introducing the VHPT
- Each CPU contains a translation buffer
- Special cache to hold recent translations of
virtual memory address to physical address - When a TB miss occurs the O/S has to resolve the
translation by walking the page tables - Itanium provides an extra layer for resolving
addresses Virtual Hash Page Table (VHPT) - VHPT linear array of 32 byte entries
- Created by OpenVMS at boot time but not accessed
by it
15VHPT
- Order of use
- CPU TB cache
- VHPT
- OpenVMS performs 3 level address translation
walks the page tables. - The VHPT is sized by a system parameter -
VHPT_SIZE - Default value of 1 means allocate 32KB per CPU
for the VHPT
16VHPT
- Default VHPT settings should be sufficient for
small applications (up to 8MB of virtual address
space). - Large applications with poor locality would
benefit from increasing the VHPT. - Generally speaking an application that benefits
from enabling HT would benefit from an increase
to the VHPT. - YMMV !!
17VHPT Benchmark
- The following charts illustrate the impact of
increasing the VHPT made on Oracle batch jobs - rx6600 8 cores
- OpenVMS V8.3-1H1
- EVA8000
- Oracle 10gR2
- HyperThreads Enabled
- 64 GB of physical memory
- With VHPT 10000, 2.5GB of physical memory is
allocated for the VHPT.
18Oracle Batch job A
23 performance increase
Elapsed Time in Minutes (less is better)
19Oracle Batch job B
22 performance increase
Elapsed Time in Minutes (less is better)
20CPU Power Management (IA64 only)
- CPUs may be placed in a lower power mode when
idle. - Reduces energy costs for the system.
- SYSGEN parameter CPU_POWER_MGMT turns this
feature on/off. - May impact performance.
- In a recent engagement we noted 30 performance
improvement on an rx6600 by turning power
management off (set CPU_POWER_MGMT0)
21Shadowed RAM disk
- Shadowed RAM disk for applications that
frequently read data from disk. - The Shadow server will read from memory and will
write to both devices. - Forces data to remain resident in memory
- Significantly boosts performance when files are
opened cluster wide by multiple users - XFC will not help
- Beneficial if file update rate is low compared to
the read rate - Included in the EOE MCOE packages
22Physical Disk Vs. RAM disk
- C application that processes records read from
sequential file - Each I/O 124 Blocks
- RX2600, OpenVMS V8.3, HSG80
Elapsed time to read 250MB file (less is better)
23V8.3-1H1
- When possible upgrade to V8.3-1H1
- Performance improvements
- Always inspire to stay current with O/S version
- Relink Applications using the V8.3-1H1 Linker
- The new linker produces smaller images
- Reduction between 2 - 18
- 0 is also possible
- Montvale based systems There is more than meets
the eye
24V8.3-1H1 Addendum kit
- EFICHK operation is performed during the patch
installation - Performance improvements The following product
will be installed to destination   HP I64VMS
VMS831H1I_ADDENDUM V1.0Â Â Â Â Â DISKSYS831H1VMSCO
MMON. - Â
- Portion done 0...10...20...30...40...50...7
0...80...90Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
                            MOUNT-I-FATCHECK,
volume created by EFICP version
V5.2-5Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â checking for errors,
repairing, and updating FAT information.EFICP-W-
BADCCNT, FS0\EFI\VMS\TOOLS\ACPIDUMP.EFI actual
cluster count of 126 does not match the file
allocation of 127.                        Â
Filesize of 258232 bytes, requires 508 blocks
(rounded to the cluster factor of
4)Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 508 blocks shown
allocated, but 126 actual clusters (504 blocks)
counted in file                         The
disk storage (258048 bytes) is smaller than the
file size (258232 bytes)Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
Truncating file! CHECK CONTENTS FOR
VALIDITYEFICP-I-FATCHECK, 1 errors found, 1
fixed. 18 files in 4 folders checked, 12095166
total bytes in 5913 clustersEFICP-I-FATCHECK,
Updating the FAT EFICP version information to
V6.0-1, FAT version 1EFI-I-COPIED, copied
FS0\EFI\VMS\IPB.EXE to PCSIDESTINATIONSYSEXEF
LAG_IPB.EXEEFI-I-COPIED, copied
PCSIDESTINATIONSYSEXEIPB.EXE to
FS0\EFI\VMS\EFI-I-COPIED, copied
FS0\EFI\VMS\IPB.EXE to PCSIDESTINATIONSYSEXEC
HECK_IPB.EXE...100COPIED, copied
FS0\EFI\VMS\VMS_LOADER.EFI to PCSIDESTINATIONS
YSEXECHECK_VMS_LOADER.EFI
25Resident Images a mystery
AlphaServer GS1280 7/1150
Elapsed time to execute a program (less is better)
26Resident Images
AlphaServer GS1280 7/1150
Elapsed time to execute a program (less is better)
27Resident Images
rx6600 4P/8C 1.6 Ghz
Elapsed time to execute a program (less is better)
28Resident Images
- Alpha
- the image activator has to apply the relocations
- pagefaults - Link using /sectioncode
- Avoid /sectiondata
- IA64
- relocations are mapped into memory (the dynamic
segment stays in paged pool)
29SORTing
- HYPERSORT
- Multi-threaded
- define sortshr syslibraryhypersort.exe
- Spread work files among disks/controllers/adaptors
- Apart from input/output disks
- No problem to have input and output on same disk
30Sort 100,000,000 Records
- 100 bytes each
- 19,531,250 blocks
- 3 work files
- 618,000 IO Sort32
- 922,000 IO HyperSort
- No XFC file caching of input, output or work
- HyperSort Elapsed lt CPU
31PEDRIVER Data Compression
- OpenVMS V8.3
- Reduces traffic between nodes
- May be beneficial for Shadow copy and MSCP
traffic - Can be enabled system wide or per VC
32Turn on compression for one VC
- SCACPgt set vc it14/comp
- SCACPgt sh vc
- IT13 PEA0 VC Summary 30-JAN-2007 074328.02
- Remote VC Total Channels ECS
MaxPkt ReXmt --XmtWindow-- Xmt Total
----------- Most Recent ----------- - -
- Node State Errors XmtTMO Open ECS Pri
Size TMO(uSec) Cur Max Mgt Options Pkts(SR)
VC Opened Time VC Closed Time - ------ ----- ------ --------- ---- --- ---
---- --------- ---- ---- ---- ------ ---------
------------------ --------------- - ---
- ALPH50 Open 4 115444 2 2 0
1426 672330.3 33 64 0 889107
21-JAN 133425.78 (No time) - ALPH40 Open 0 Infinite 2 2 0
1426 516452.3 16 32 0 803545
21-JAN 133425.72 (No time) - IT14 Open 1 790292 2 2 0
1426 223273.5 32 64 0 CMP 1242954
21-JAN 133425.93 (No time) - IT13 Open 0 Infinite 1 1 0
1426 3000000.0 1 8 0 5
21-JAN 133423.05 (No time)
33PEDRIVER Data Compression
- Copy 250MB file to MSCP served SCSI disk
- Both systems are rx2600, running OpenVMS V8.3
Elapsed time to copy 250MB file (less is better)
34Alignment Faults
- No performance talk is complete without
mentioning Alignment Faults - Alignment faults on Itanium will have serious
impact on performance - May be an (performance) issue on Alpha as well
35What is an Alignment Fault?
- When an attempted
- Longword memory access is not aligned on a memory
boundary that is divisible by 4 - Quadword memory access is not aligned on a memory
boundary that is divisible by 8 - Word memory access is not aligned on a boundary
that is divisible by 2 - An alignment fault is generated and control is
transferred to code that will complete the
load/store through shifting, masking and setting
bits.
36Why Worry?
OpenVMS Monitor
Utility ALIGNMENT
FAULT STATISTICS
on node DWARF
3-MAY-2007 142656.27
CUR AVE MIN
MAX Kernel Fault Rate 0.00
0.66 0.00 1.33 Exec Fault
Rate 0.00 0.00 0.00
0.00 Super Fault Rate
0.00 0.00 0.00 0.00 User
Fault Rate 640253.31 662505.00
640253.31 684756.68 Total Fault Rate
640253.31 662505.83 640253.31 684758.31
37Why Worry?
----- TIME IN PROCESSOR
MODES CUR on node
DWARF ----- 3-MAY-2007
142659.27 Combined for 2 CPUs
0 50 100 150 200
- - - - - - - -
- - - - - - - - Interrupt State
MP
Synchronization 9
Kernel Mode
172
Executive Mode
Supervisor
Mode
User Mode 19
Compatibility Mode
Idle Time
- - - - - - - - -
- - - - - - -
38Let the Compiler Warn You in Advance
- cc/nomember/warningenablealignment align_test
- int x
- ................
- CC-I-MISALGNDMEM, This member is at offset 1,
which is not a multiple of the member's alignment
of longword. Consider padding before this
member, rearranging the order of member
declarations, or using pragma member_alignment. - at line number 10 in file SYSSYSDEVICEtestALIG
N_TEST.C7 - int x
- ................
- CC-I-MISALGNDSTRCT, This member requires
longword alignment for efficient access, but is
contained in a struct containing byte alignment.
Consider using pragma nomember_alignment
longword. - at line number 10 in file SYSSYSDEVICEtestALIG
N_TEST.C7 - sub(zi.x,zi.a)
- ....................
- CC-W-ALIGNCONFLICT, In this statement, the
address "zi.x" has alignment of byte which is
less than the alignment requirements of - the destination pointer. Dereferencing the
destination pointer may cause an alignment fault. - at line number 22 in file SYSSYSDEVICEtestALIG
N_TEST.C7
39Reporting Alignment Faults
- Analyze alignment faults on Alpha prior to a port
- Only works on current process
- sysperm_report_align_fault
- sysperm_dis_align_fault_report
- r align_testAddress of x 10001SYSTEM-I-ALI
GN, data alignment trap, virtual
address0000000000010001, function00000000,
PC000000001DCF0202, PS0000001BSYSTEM-I-ALIGN,
data alignment trap, virtual address0000000000010
001, function00000001, PC000000001DCF0212,
PS0000001BSYSTEM-I-ALIGN, data alignment trap,
virtual address0000000000010006,
function00000000, PC000000001DCF0202,
PS0000001BSYSTEM-I-ALIGN, data alignment trap,
virtual address0000000000010006,
function00000001, PC000000001DCF0212,
PS0000001BSYSTEM-I-ALIGN, data alignment trap,
virtual address000000000001000B,
function00000000, PC000000001DCF0202,
PS0000001BSYSTEM-I-ALIGN, data alignment trap,
virtual address000000000001000B,
function00000001, PC000000001DCF0212,
PS0000001BSYSTEM-I-ALIGN, data alignment trap,
virtual address0000000000010015,
function00000000, PC000000001DCF0202,
PS0000001B
40(No Transcript)
41Process Affinity
- Running on a large system with a low load?
- Running on a large system with heavy load?
- Better utilize the CPU caches (data cache,
instruction cache TB) by affinitizing your
process to a set of CPUs - In HT environment affinitize to one core
- Up to 25 performance increase
42Generating Primes GS 1280 7/1150
EV7 has EV68 core
43Free Hot File Tracking Utility
- sh mem/cache(volume,topqio)
- System Memory Resources on
26-APR-2007 013915.03 - Extended File Cache Top QIO File Statistics
- _1DGA642 (DISKES40), Caching mode is VIOC
Compatible - _1DGA642VMSCOMMON.SYSEXERIGHTSLIST.DAT1
(open) - Caching is enabled, active caching mode is Write
Through - Allocated pages 9 Total QIOs
107 - Read hits 92 Virtual
reads 107 - Virtual writes 0 Hit rate
85 - Read aheads 0 Read
throughs 107 - Write throughs 0 Read
arounds 0 - Write
arounds 0 - _1DGA642VMSCOMMON.SYSEXEVMSOBJECTS.DAT2
(open) - Caching is enabled, active caching mode is Write
Through - Allocated pages 0 Total QIOs
9
44Free Hot File Tracking Utility
- _1DGA242 (DISKITANIUMVMS), Caching mode is
VIOC Compatible - _1DGA242VMSCOMMON.SYSLIBDECCSHR.EXE1
(open) - Caching is enabled, active caching mode is Write
Through - Allocated pages 303 Total QIOs
1646 - Read hits 1561 Virtual
reads 1646 - Virtual writes 0 Hit rate
94 - Read aheads 0 Read
throughs 1642 - Write throughs 0 Read
arounds 4 - Write
arounds 0 - _1DGA242VMSCOMMON.SYSLIBLIBRTL.EXE1 (open)
- Caching is enabled, active caching mode is Write
Through - Allocated pages 143 Total QIOs
1165 - Read hits 1123 Virtual
reads 1165 - Virtual writes 0 Hit rate
96 - Read aheads 0 Read
throughs 1164 - Write throughs 0 Read
arounds 1 - Write
arounds 0
Avoid caching files that pollute the cache
45Elapsed time for I/Os
- SDAgt xfc show volume/brief
- Â
- Summary of XFC Cached Volumes (CVBs)-------------
-----------------------Volume Name     Â
CVB               Open  Closed     Total     Â
Read      Read     Write     ... Response
(Milliseconds)...                               Â
   Files   Files      I/Os      Hits    Â
Count     Count         Hits      disk  Â
AverageDISKCARFAXÂ Â Â Â Â Â FFFFFFFEE01895E0Â Â Â Â Â
0Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â
0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â (N/A)Â Â Â Â Â (N/A)Â Â Â Â Â
(N/A)DISKUPÂ Â Â Â Â Â Â Â Â Â FFFFFFFEE0189380Â Â Â Â Â
0Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â
0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â (N/A)Â Â Â Â Â (N/A)Â Â Â Â Â
(N/A)DISKORADATÂ Â Â Â Â Â FFFFFFFEE0189120Â Â Â Â
26Â Â Â Â Â Â Â 3Â Â Â 1872255Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â
1872255Â Â Â Â Â Â (N/A)Â Â Â Â Â Â 0.0000Â Â Â Â
0.0000DISKORADSKÂ Â Â Â Â Â FFFFFFFEE0188EC0Â Â Â Â
73Â Â Â Â Â 177Â Â 22015701Â Â 14108183Â Â 21116834Â Â Â Â
898891Â Â Â Â Â Â Â 0.0232Â Â Â Â 0.5811Â Â Â Â
0.2236DISKIA64_V82Â Â Â Â FFFFFFFEE0188C60Â Â Â Â Â
0Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â
0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â (N/A)Â Â Â Â Â (N/A)Â Â Â Â Â
(N/A)DISK82SOURCEÂ Â Â Â FFFFFFFEE0188A00Â Â Â Â Â
0Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 1Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â
1Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â (N/A)Â Â Â Â Â (N/A)Â Â Â Â Â
(N/A)DISKIT14_10292Â Â FFFFFFFEE01887A0Â Â Â Â Â
2Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â
0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â (N/A)Â Â Â Â Â (N/A)Â Â Â Â Â
(N/A)DISKES40Â Â Â Â Â Â Â Â FFFFFFFEE0188540Â Â Â Â Â
4Â Â Â Â Â Â Â 3Â Â 27676052Â Â 27667501Â Â 27674665Â Â Â Â Â Â
1387Â Â Â Â Â Â Â 0.0118Â Â Â Â 0.4007Â Â Â Â
0.0120DISKIT14_DOSDÂ Â Â FFFFFFFEE01882E0Â Â Â Â Â
0Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â Â Â Â
0Â Â Â Â Â Â Â Â Â 0Â Â Â Â Â Â (N/A)Â Â Â Â Â (N/A)Â Â Â Â Â
(N/A)DISKSYS831H1Â Â Â Â FFFFFFFEE0188080Â Â Â
313Â Â Â Â Â 183Â Â Â 2736618Â Â Â 2668894Â Â Â
2713025Â Â Â Â Â 23594Â Â Â Â Â Â Â 0.0179Â Â Â Â 0.5425Â Â Â Â
0.0308
SDAgtXFC SHOW VOLUME/BRIEF
46The XFC overhead
RDB users consider disabling caching of .RDA
files
Elapsed time to copy 150MB file, rx2600, HSG80,
OpenVMS V8.3
47IBM MQ series
- MQ is a heavy user of pthreads
- Set MULTITHREAD to 1
- Thread manager upcalls are enabled the creation
of multiple kernel threads is disabled
48Sizing Working Sets
- Respect AUTOGEN but dont trust it blindly
- Alpha Server ES47, 16GB RAM
- maximum process count of 2500 processes
- AUTOGEN will set PQL_MWSDEFAULT to 17.38MB
- 17.38MB X 2500 43.45GB RAM
- Exceeds Physical memory by almost 3 times
49Sizing Working Sets
- Its not 1980 any more
- Determine the size of XFC cache MPW_HILIMIT
- Subtract the sum from the number of fluid pages
on the system (MMGGQ_FLUID_PGCNT) - Divide by the maximum number of processes that
have ever been running on the system
(PMSGL_PROCCNTMAX) - Multiply the result by 16 to translate from pages
to pagelets - If you are conservative, take 70 of the result
and set working set limit and quota to this value - Working set extent should be 3 times the result
- Make sure PGFLQUOTA is properly sized
50TCP/IP Gigabit Ethernet
- Using Gigabit Ethernet?
- Turn on Jumbo frames
- Frames larger than 1518 bytes, more data per
frame -gt less frames -gt less interrupts -gt better
performance - Must be supported by the switch
- Must be configured before TCP/IP is started
- mc lancp set dev ewa/jumbo
- Bit 6 in SYSGEN parameter LAN_FLAGS
51Toolbox Overview
- Collection of highly valuable, undocumented
unsupported tools, subject to change without a
notice - Implemented as SDA extensions
- Use hooks in the VMS executive
- May be loaded and unloaded on the fly
- No reboot required
- Trace data is stored in ring buffer in S2 space
- May be viewed from a crash dump
52Toolbox Overview
-
First shipped in - CNX connection manager tracing V7.2-2
- EXC exception tracing V8.2
- FC Fibrechannel debug and tracing V7.2-2
- FLT alignment fault tracing V8.1
- IO buffered and direct I/O tracing V7.3-2
- LCK lock manager tracing V7.2-2
- LNM logical name tracing V7.3-1
- MTX mutex tracing V7.3
- PCS PC sampling V7.3-2
- PRF performance utility V8.2
- PSH pshared debug utility V8.2-1
53Toolbox Overview
-
First shipped in - RDB Rdb lock decoding and tracing V7.3-2
- RMS indexed file tracing V8.2-1
- SPL spinlock tracing V7.2-1H1
- TQE timer entry tracing V7.3-1
- TR debug and trace prints V7.3
- XFC eXtended File Cache diagnostics V7.3
54Toolbox Overview
- Common commands
- SDAgt xxx ! Displays brief command help
- SDAgt xxx LOAD
- SDAgt xxx START TRACE /BUFFER3000
- SDAgt xxx SHOW TRACE
- SDAgt xxx STOP TRACE
- SDAgt xxx UNLOAD
- SDAgt READ /EXEC /NOLOG
55PRF
- PRF is highly powerful SDA extension for
monitoring various performance counters at the
processor level. - May be used for PC sampling.
- Highlights areas in the application that require
performance enhancements.
56PRF
- SDAgt prf load
- PRFDEBUG load status 00000001
- SDAgt prf start pc/ind21E004DA
- PC Sampling started...
- SDAgt prf start collect
- SDAgt
- Now run the application
- r prime
- ELAPSED 0 000024.16 CPU 00024.06
BUFIO 0 DIRIO 0 FAULTS 0 -
- To look at the collected data
- SDAgt prf show collect
57PRF SHOW COLLECT
- Start VA End VA Image
Count
Percent - ----------------- -----------------
----------------------------------------
----------- -------- - FFFFF802.11F00000 FFFFF802.11F01FFF PRIME
305113
99.85 - FFFFF802.A1000000 FFFFF802.A1015FFF Kernel
Promote VA 1
0.00 - FFFFFFFF.80000000 FFFFFFFF.800000FF
SYSPUBLIC_VECTORS
2 0.00 - FFFFFFFF.80000100 FFFFFFFF.800111FF
SYSBASE_IMAGE
2 0.00 - FFFFFFFF.80011200 FFFFFFFF.800651FF
SYSPLATFORM_SUPPORT
258 0.08 - FFFFFFFF.800A0000 FFFFFFFF.801DD6FF
SYSTEM_PRIMITIVES
88 0.03 - FFFFFFFF.801DD700 FFFFFFFF.80243BFF
SYSTEM_SYNCHRONIZATION_MIN
9 0.00 - FFFFFFFF.80254600 FFFFFFFF.8026EFFF
SYSEIDRIVER.EXE
5 0.00 - FFFFFFFF.8026F000 FFFFFFFF.802895FF SYSLAN.EXE
2
0.00 - FFFFFFFF.80289600 FFFFFFFF.802BA1FF
SYSLAN_CSMACD.EXE
2 0.00 - FFFFFFFF.80440E00 FFFFFFFF.8052B2FF IO_ROUTINES
1
0.00 - FFFFFFFF.8053A600 FFFFFFFF.80670DFF
PROCESS_MANAGEMENT
7 0.00 - FFFFFFFF.80670E00 FFFFFFFF.807759FF SYSVM
11
0.00 - FFFFFFFF.80779500 FFFFFFFF.807C76FF LOCKING
1
0.00 - FFFFFFFF.807C7700 FFFFFFFF.807F9CFF
MESSAGE_ROUTINES
1 0.00
58PRF SHOW COLLECT
- SDAgt prf show coll/threash2
- PC Count Rate
Symbolization Module
Offset - ----------------- ------- ---------
----------------------------------------
------------------------- -------- - FFFFF802.11F00170 63410 20.07
PRIME10170 PRIME
00010170 -
GENERATE_PRIME00000170 / GENERATE_PRIME00000170
- FFFFF802.11F00190 6138 2.01
PRIME10190 PRIME
00010190 -
GENERATE_PRIME00000190 / GENERATE_PRIME00000190
- FFFFF802.11F001A0 6761 2.21
PRIME101A0 PRIME
000101A0 -
GENERATE_PRIME000001A0 / GENERATE_PRIME000001A0
- FFFFF802.11F00200 6296 2.06
PRIME10200 PRIME
00010200 -
GENERATE_PRIME00000200 / GENERATE_PRIME00000200
- FFFFF802.11F00220 8102 2.65
PRIME10220 PRIME
00010220 -
GENERATE_PRIME00000220 / GENERATE_PRIME00000220
- FFFFF802.11F00290 6804 2.23
PRIME10290 PRIME
00010290
59Montecito
Source Wikipedia
60Hyperthreading with Stalls vs Hyperthreading with
No Stalls
61Two Cores vs Hyperthreading (NoStalls)
62HyperThreads Impact on Oracle Jobs
Elapsed time (minutes) to execute 7 jobs Less is
better
63HyperThreads
- HyperThreads have the potential of improving
performance - Application has to meet the following criteria
- COM Queue
- Poor locality (L2/L3 misses)
- No pagefulating
- PRF may be used to track L2 misses
- PRF START PROFILE/CPUn/CACHEL2/INDEXPID
- PRF START COLLECT
64L2 Cache Misses on TC_CF (13.2 improvement)
- I-Cache Misses D-Cache Misses Branch
Trace Buf - Start VA End VA Image
Latency Percent
Latency Percent Count Percent - ----------------- -----------------
----------------------------------- ----------
------- ---------- ------- ---------- ------- - 00000000.00000000 00000000.7ADCBFFF Process
Space 17062 1.73
6072893 96.52 244963 8.62 - 00000000.7ADCC000 00000000.7AEF7FFF DCL
101 0.01
0 0.00 242 0.01 - FFFFF802.0806C000 FFFFF802.0825DFFF LIBRTL
4104 0.42
1217 0.02 21753 0.77 - FFFFF802.0825E000 FFFFF802.08283FFF LIBOTS
2150 0.22
123 0.00 240662 8.47 - FFFFF802.082E8000 FFFFF802.0837FFFF SMGSHR
52 0.01
10 0.00 211 0.01 - FFFFF802.08404000 FFFFF802.0840DFFF CMATIS_SHR
281 0.03
0 0.00 1504 0.05 - FFFFF802.08444000 FFFFF802.084F7FFF DPMLSHR
5 0.00
0 0.00 1 0.00 - FFFFF802.084F8000 FFFFF802.085A9FFF PTHREADRTL
2657 0.27
294 0.00 6315 0.22 - FFFFF802.085AA000 FFFFF802.090B3FFF DECCSHR
24027 2.43
6258 0.10 369765 13.02 - FFFFF804.0E000000 FFFFF804.0E015FFF Kernel
Promote VA 2232 0.23
0 0.00 5191 0.18 - FFFFFFFF.80000000 FFFFFFFF.800000FF
SYSPUBLIC_VECTORS 403
0.04
65L2 Cache Misses on PRIMES_1 (Slight Degradation)
- Cache Misses Branch Trace Buf
- Start VA End VA Image
Latency Percent
Latency Percent Count Percent - ----------------- -----------------
----------------------------------- ----------
------- ---------- ------- ---------- ------- - 00000000.00000000 00000000.7ADCBFFF Process
Space 5077 2.77
29968 52.88 26607 5.27 - 00000000.7ADCC000 00000000.7AEF7FFF DCL
19 0.01
0 0.00 22 0.00 - FFFFF802.0806C000 FFFFF802.0825DFFF LIBRTL
949 0.52
570 1.01 3816 0.76 - FFFFF802.0825E000 FFFFF802.08283FFF LIBOTS
63 0.03
0 0.00 201 0.04 - FFFFF802.082E8000 FFFFF802.0837FFFF SMGSHR
20 0.01
0 0.00 46 0.01 - FFFFF802.08404000 FFFFF802.0840DFFF CMATIS_SHR
0 0.00
0 0.00 6 0.00
66LNM
- The LNM extension allows tracking logical name
translations. - Logical name translations are expensive from a
performance point of view and should be avoided
when possible. - MONITOR IO displays the total number of logical
name translations per second
67LNM Example
- SDAgt lnm show collect
- Logical Name Trace Information
- -------------------------------
- Count Logical Name
- ------------ -------------------------------
- 5000 SYSSCRATCH !SYSSCRATCH is
being translated 5000 times - 10 SYSSHARE
- 10 SYSSYSROOT
- 5 GBLINS8DDE9730
- 5 SYSCOMMON
- 4 GBLINS8DDAE310
- 4 SYSOUTPUT
- 3 GBLINS8DDC20D0
- 3 GBLINS8DDD1A60
- 3 IPCACP_NETMBX
- 2 CMATIS_SHR
- 2 DPMLSHR
- 2 LIBOTS
- 2 LIBRTL
68LNM Example
- SDAgt lnm show trace
- Logical Name Trace Information
- -------------------------------
- Timestamp CPU EPID Main Image
CallerPC
Logical Name - ---------------------- --- --------
---------------------- ---------------------------
------------- -------------------------------- - 25-JAN 062215.530026 01 21E0040E IPCACP
FFFFFFFF.80514560 IOCTRANDEVNAM_C007C0
IPCACP_NETMBX - 25-JAN 062205.530027 01 21E0040E IPCACP
FFFFFFFF.80514560 IOCTRANDEVNAM_C007C0
IPCACP_NETMBX - 25-JAN 062130.440094 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSOUTPUT - 25-JAN 062130.440010 00 21E004DA MANY_TRNLNMS
00000000.00000000
PASOUTPUT - 25-JAN 062130.439846 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439835 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439825 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439814 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439803 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439792 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439782 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439771 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439760 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH - 25-JAN 062130.439750 00 21E004DA MANY_TRNLNMS
00000000.00000000
SYSSCRATCH
69LNM Cobol
- Do you have an application written in Cobol?
- COB5644
70Decoding PCs
- New routine to decode PC into module and routine
names with offsets (IA64 only) - tfget_mod_rtn in module TRACE_ELF in
SYSSHAREVMSVOLATILE_PRIVATE_INTERFACES.OLB - tfget_mod_rtn ( entry-gtspltreq_pc, mod_name,
rtn_name, mod_rel_pc, rtn_rel_pc )
71Questions?
- See us at www.maklee.com for
- Performance improvements
- Oracle Tuning
- Platform Migration
- Custom Engineering solutions
- Custom Training