John C' Calvin

About This Presentation

Title:

John C' Calvin

Description:

... clearly and up-front... 64-bit JVM can serve 570,000 pages/hr. One 64-bit JVM can run ... demands memory from the host operating system, and manages that ... – PowerPoint PPT presentation

Number of Views:215

Avg rating:3.0/5.0

Slides: 64

Provided by: ErinB63

Category:

more less

Transcript and Presenter's Notes

Title: John C' Calvin

1
Tuning the Java Virtual Machine for Stability and
Speed

John C. Calvin
Senior Systems Architect
University of Toronto

2
Toronto, Ontario CANADA (YYZ)

population 5.6 million
about the population of Chicago
area 2,751 mi2 (7,125 km2)
4 universities 7 colleges

3
University of Toronto

3 Campuses in the Greater Toronto Area
Downtown 52,296 students 175 acres
Eastern 10,465 students 300 acres
Western 10,924 students 200 acres
1.4 Billion annual operating budget
Degree Programs
840 undergraduate
520 graduate
75 doctoral
14,500 annual undergraduate intake
10,000 students in residence
247 buildings
200 acres net assignable space
7,591 parking spaces

4
University of Toronto

Students
55,352(46,940 FTEs) undergraduate degree-seeking
students
13,702 (12,499 FTEs) graduate degree-seeking
students
2,038 (902 FTEs) certificate, diploma and special
students
2,593 residents and post-graduate medical
students
International students
5,182 undergraduate degree seeking students
1,579 graduate degree seeking students
326 certificate, diploma and special students
779 residents and post-graduate medical students
Faculty
2,260 (2,185 FTEs) Professorial
432 (378 FTEs) Teaching Stream
1,079 (216 FTEs) Term-limited Sessional and
Stipendiary
3,913 (2,351 FTEs) Clinical
2,707 (1,071 FTEs) Other

5
Agenda

Opening Remarks
System Operational Goals
The Java Runtime Environment
Generations and Memory Spaces
Garbage Collection Times and Algorithms
Fork() and exec() under Linux and Solaris

6
My bias stated clearly and up-front

Operating system virtualization multiplies
operating system overhead.
Tomcat clustering multiplies Java Virtual Machine
(JVM) overhead.
Both OS virtualization and Tomcat clustering
multiply the configuration complexity.

7
The Basis of the Work

A fast JVM is only good, if its stable.
A big JVM is only good, if its stable.
Big fast JVMs can be stable.
The Sun 64-bit JVM v1.6.x is very stable.
Blackboard is a large Java application.
A big, fast JVM must be good for Blackboard.
Our JVMs were not stable.

8
System Operational Goals

Survive the worst-case single-user demand, placed
on a single JVM by any administrative, faculty,
or student use-cases, without JVM failure.
Increase the stability of the JVM.
Increase the performance of the JVM.
Reduce overall configuration complexity.
Fully exploit application server resources.
Reduce the cost of Blackboard hardware.

9
The Results

One 64-bit JVM can serve 60,000 actives.
One 64-bit JVM can serve 2000 users.
One 64-bit JVM can serve 570,000 pages/hr.
One 64-bit JVM can run 900 threads.
One application server was enough.
Two servers offer hardware redundancy.

10
(No Transcript)
11
Java Runtime Environment

The JRE is an application running under a host
operating system Windows, Linux, Solaris, etc.
The JRE is written in C and has been compiled
specifically for each operating system.
The JRE must interpret or compile the Java code
in Blackboard in order to run it.
The JRE demands memory from the host operating
system, and manages that memory as its heap
memory spaces, where it stores Java objects
created as the running JVM threads instantiate
them.

12
JVM Heap Size

Upper Bound (eg. 16GB)
bbconfig.max.heapsize.tomcat16g
A.K.A. JVM Command-line option Xmx16g
The size at which the adaptive JVM memory sizing
algorithm will not increase the size of the heap.
Lower Bound (eg. 4GB)
bbconfig.min.heapsize.tomcat4g
A.K.A. JVM Command-line option Xms4g
The size below which the adaptive JVM memory
sizing algorithm will not reduce the size of the
heap.

13
Downward Pressures on JVM max.heapsize

Amount of installed physical memory
Platform choice
AMD, Intel, SPARC, speed, of cores, etc.
Operating System choice
Windows, Linux, Solaris
JVM choice
32-bit vs. 64-bit, 1.5.x or 1.6.x
Garbage Collection (GC) algorithm choice
Concurrent Mark Sweep (CMS) vs. ParallelOld
Max. tolerable Stop-the-World times
Headroom for other applications, OS, etc.
fork()/exec() swap-space/memory demands

14
Upward Pressures on JVM max.heapsize

Worst-case application use cases
Largest GradeCenter ( of columns enrollment)
Unqualified searches (course catalog, users)
Desired number of concurrent users
MaxThreads 10MB (approx. measured)
Idle JVM Heap low-water mark after GC

15
What Happens When

common use-cases exhaust the heap?
admin use-cases exhaust the heap?
a spike in demand exhausts max.threads?
the required number of concurrent users cannot
be supported by the max.heapsize?

16
Solving Memory Problems Solution 1

Reduce the applications demand
Reduce the size of organizations
Reduce course enrolment sizes
Reduce number of columns in GradeCentre
Reduce the size of course catalog
Reduce the number of users accounts

17
Solving Memory Problems Solution 2

Reduce the number of concurrent users
Load-balance across multiple servers
Add more physical application servers
More pizza-boxes, power, A/C
Virtualize the application servers
Solaris Zones, Logical Domains, VMWare, etc.
Network load-balance between them
Hardware, Round-robin DNS, etc.
Load-balance across multiple JVMs
Tomcat Clustering

18
Solving Memory Problems Solution 3

Increase the JVMs heap size.
Install more physical memory
Too much memory has rarely caused a big problem
Run a 64-bit JVM and configure a large heap
2GB or larger heap requires 64-bit JVM
Tune Garbage Collection algorithms
JVM 1.6.0_13 required
-XXAlwaysPreTouch
Eliminate the Java Runtime.exec method
Some Blackboard code changes required

19
Reasons to Avoid a Large Heap

Because its a bad idea.
Because nobody else is doing it.
Because Blackboard doesnt support it.
Because you cant run a 64-bit JVM.
Because you dont know how to do it.
Because you havent enough memory.
Because you arent using the CMS collector.
Because you just dont want to break it.

20
Reasons to Embrace a Large Heap

Because you have infinite GC loops.
Because you get out-of-memory errors.
Because JVMs are dying for no reason.
Because youre already running a 64-bit JVM.
Because it just feels like the right solution.
Because youve got to try something else.
Because youre lazy like me and you want a
simpler solution than Tomcat Clustering, Solaris
Zones, and Logical Domains.
Because RAM is the cheapest upgrade.

21
Non-Java Memory Demands

Java isnt the only thing that is going to demand
memory!
Apache will need 10MB http.max.clients.
ModPerl needs memory.
NFS needs memory.
The operating system needs memory.

22
Generations and Memory Spaces

New (Young) Generation (heap)
Eden Space
Survivor Space
Old Generation (heap)
Tenured Space
Permanent Generation (non-heap)
Code Space (non-heap)

23
Default JVM Generational Model
Young Generation
Old Generation
Eden Space
Survivor Space
Tenured Space
24
Garbage Collection Basics
Young Generation
Old Generation
25
Garbage Collection Algorithms

-XXUseParalledOldGC (the Throughput
collector)
-XXUseParallelGC
Invokes the multi-threaded New Generation
collector
-XX -UseParallelGC
Invokes the single-threaded New Generation
collector
-XXUseConcMarkSweepGC (the Low-pause
collector)
-XXUseParNewGC
Invokes the multi-threaded New Generation
collector
-XX-UseParNewGC
Invokes the single-threaded New Generation
collector
-XXUseG1 (the Garbage First collector)

26
GC Terminology

Minor collection (YGC)
A Garbage Collection event that clears the Eden
Space into the survivor space and promotes old
survivors into the Tenured Space it is a
stop-the-world collection.
Major collection (CMS)
A Garbage Collection event that clears dead
objects from the Tenured Space and coalesces the
JVMs free-lists into larger memory pages,
concurrently with the execution of the
application small portions are stop-the-world
events.
Full collection (Compacting Collector)
A Garbage Collection event that clears and
compacts the heap memory spaces it is a
stop-the-world collection.

27
Single Server Stop-the-World Times
28
GC Terminology

Memory Allocation Rate
The rate of consumption of the Eden Space in MB/s
(GB/Hr.)
Survival Rate
Percentage of Eden Space (or Survivor Space) that
must be copied with the next young generation
garbage collection
Infant Mortality Rate
Percentage of nascent object that die before
their first YGC
Promotion Rate
Percentage of the New Generation that must be
copy into the Old Generation with the next young
generation collection
Collection Rate (Young or Old)
Amount of memory reclaimed by minor and major
garbage collections measured in MB/s (GB/Hr.)

29
UofT Blackboard JVM Heap
30
Heap Memory Allocation Rate

YGC/Hour Eden Size
Example Memory Allocation Rate
4GB Eden Space with 15 second YGC interval
960GB per hour

31
/usr/java/bin/jmap -J-d64 -heap ltpidgt

/usr/java/bin/jmap -J-d64 -heap 29259
Attaching to process ID 29259, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 1.5.0_14-b03
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration
MinHeapFreeRatio 40
MaxHeapFreeRatio 70
MaxHeapSize 21474836480 (20480.0MB)
NewSize 4294967296 (4096.0MB)
MaxNewSize 4294967296 (4096.0MB)
OldSize 17179869184 (16384.0MB)
NewRatio 4
SurvivorRatio 2048

Heap Usage
New Generation (Eden 1 Survivor Space)
capacity 4292935680 (4094.0625MB)
used 3224075136 (3074.7176513671875MB)
free 1068860544 (1019.3448486328125MB)
75.10187378348981 used
Eden Space
capacity 4290904064 (4092.125MB)
used 3224075136 (3074.7176513671875MB)
free 1066828928 (1017.4073486328125MB)
75.13743229659865 used
From Space
capacity 2031616 (1.9375MB)
used 0 (0.0MB)
free 2031616 (1.9375MB)
0.0 used
To Space
capacity 2031616 (1.9375MB)
used 0 (0.0MB)

32
/usr/java/bin/jstat gccause ltpidgt

/usr/java/bin/jstat -gccause 2553 60s
S0 S1 E O P YGC YGCT
FGC FGCT GCT LGCC GCC
0.00 0.00 94.51 13.16 34.54 40
23.979 32 47.625 71.605 unknown GCCause
No GC
0.00 0.00 63.01 13.64 34.79 41
24.211 33 51.549 75.760 unknown GCCause
No GC
0.00 0.00 20.98 13.97 34.87 42
24.414 33 51.549 75.963 unknown GCCause
No GC
0.00 0.00 97.00 13.15 34.98 42
24.414 34 55.990 80.404 unknown GCCause
No GC
0.00 0.00 65.48 13.48 35.01 43
24.596 35 62.001 86.597 unknown GCCause
No GC
0.00 0.00 58.29 13.91 35.05 44
24.816 35 62.001 86.817 unknown GCCause
No GC
0.00 0.00 40.44 14.40 35.06 45
25.126 35 62.001 87.127 unknown GCCause
No GC
0.00 0.00 87.47 13.83 35.07 45
25.126 36 67.605 92.732 unknown GCCause
No GC
0.00 0.00 35.01 14.42 35.09 46
25.379 37 67.605 92.984 unknown GCCause
No GC
0.00 0.00 84.45 14.42 35.11 46
25.379 37 80.853 106.232 unknown GCCause
No GC
0.00 0.00 37.64 14.80 35.12 47
25.655 37 80.853 106.508 unknown GCCause
No GC
0.00 0.00 88.85 13.45 35.12 47
25.655 38 86.310 111.965 unknown GCCause
No GC
0.00 0.00 45.96 13.96 37.68 48
25.913 39 93.301 119.214 unknown GCCause
No GC
0.00 0.00 36.45 14.50 37.69 49
26.227 39 93.301 119.528 unknown GCCause
No GC

33
Some Logic About GC Times

More users on an application server
? More threads creating objects
? Increasing memory allocation rate
? Decreasing YGC Interval
? Less time for nascent objects to die
? Higher promotion/survival rate
? More physical memory to copy
? Longer Young Generation Collections
? Longer Stop-the-world events
? Longer server response times

34
Sun Default 64-bit JVM Heap
35
The New Ratio

Ratio of Young Generation to Old Generation
A New Ratio of 3 means that the Old Generation
is 3 times the size of the Young Generation.
Example with a 16GB heap
12GB Old Generation
4GB Young Generation
Example with a 20GB heap
15GB Old Generation
5GB Young Generation

36
JVM Memory Spaces
37
UofT Heap vs. Default Heap
38
Understanding the Young Generation
Survivor Space
Eden Space
Eden Space
From Space
To Space
39
Tenuring Threshold
Survivor Space
Tenured Space
From Space
To Space
40
GC Times Can Vary

Young GC Events
The number of objects
Size of the surviving objects
Old GC Events
Occupancy of Young Generation
Fragmentation of the Old Generation
Configuration
Size of the Eden Space
Size of the of the Survivor Spaces
Tenuring Threshold
Parallelism in the collectors

41
If You Cant Measure the Application,Measure the
Machine.

Tune the JVM for the largest heap that the
hardware, operating system, and GC algorithm will
support.
A larger heap means fewer stop-the-world events.
There is no harm having a larger heap, if the
upper-bound on the Stop-the-World times is known,
fixed, and acceptable for the application.
For a fixed number of threads, both CPU and
memory demands can be reduced by running a single
JVM, instead of clustering.
More users means more threads more threads means
more objects more objects means more heap.

42
Xms and Xmx Misconceptions

When min.heapmax.heap, all of the JVMs heap is
allocated at startup.
FALSE
Only the Eden Space is allocated at startup,
regardless of the min.heap setting.
If adaptive sizing is used, the JVM will not
reduce the total size of the JVM heap below
min.heap.
The heap size will not be reduced until it first
exceeds min.heap.

43
JVM Non-heap Size Options

-XXPermSize256m
-XXMaxPermSize256m
-XXInitialCodeCacheSize128m
-XXReservedCodeCacheSize128m

44
JVM Heap Size Options

-Xms20g
-Xmx20g
-XXOldSize16g
-XXNewSize4g
-XXMaxNewSize4g
-XXNewRatio4
-XXSurvivorRatio4096

45
Gross JVM Tuning Tuning

-Xss256k
-XXUseTLAB
-XXMaxFDLimit
-XXAlwaysPreTouch
-XXDisableExplicitGC
-XXMaxTenuringThreshold0

46
CMS Collector Tuning Options

-XXUseConcMarkSweepGC
-XXParallelGCThreads16
-XXParallelCMSThreads4
-XXParallelRefProcEnabled
-XXCMSMarkStackSize8M
-XXCMSMarkStackSizeMax8M
-XXCMSInitiatingOccupancyFraction60
-XXCMSScavengeBeforeRemark
-XXCMSParallelRemarkEnabled
-XXCMSPermGenSweepingEnabled
-XXCMSClassUnloadingEnabled

47
(No Transcript)
48
CMS Collection Close-up
49
Full CMS Cycle
50
An Evening of CMS Collections
51
The YGC Tipping Point

YGC Events occurring too frequently
? Too many objects surviving
? Too much memory being promoted
? Old Generation Collections too frequent
? Objects survive too many epochs
? Old Generation collection rate too low
? Old Generation runs out of memory
? Infinite GC loops are inevitable

52
Some Facts About Memory

A process address space is usually larger than
the total amount of physical memory (RAM)
installed.
Any program might demand more memory than is
currently uncommitted and available.
When the OS is starved for physical memory it
uses swap space (disk storage) to hold older
pages.
Physical memory is 500k times faster than disk.
Swapping to disk is memory starvation.
Swapping to disk is evil!

53
The Fork() and Exec() Problemin Solaris and Linux

When a process forks, it creates a copy of its
address space (not its memory footprint).
As the forked process runs, it modifies the
virtual memory in its own address space, and the
OS must allocate physical memory to hold the
modified pages (copy-on-write).
The exec() causes a lot of copy-on-write faults
to occur very rapidly, demanding large amounts of
memory from the OS.

54
Some Solaris Facts About Memory

If there is insufficient virtual memory to
back-stop a forking process, the fork() call will
fail.
This is a fail-safe fork. It prevents the OS
from over-committing virtual memory.
It can be annoying, because it means you might
need a huge swap-partition (2xRAM) that may get
committed but is never touched.

55
More Xms and Xmx Misconceptions

As the JVM forks, at least max.heapsize swap
space must be available or the fork() will fail.
FALSE
Virtual memory Physical Memory Swap Space
Virtual memory equal to the resident set size in
use by the parent process, at the time of the
fork(), will be allocated.
Linux will over-commit virtual memory (kernel 2.6
default).
A JVM call to malloc() succeeds and the JVM is
promised memory later the OS may not deliver on
that promise!
This can lead to out of memory errors.
Solaris will not over-commit virtual memory.
As fails the malloc() fails, so fails the fork().

56
More Xms and Xmx Misconceptions

When the JVM forks a new process, the forked
process will use max.heapsize.
FALSE
Fork1() uses a copy-on-write technique reduce the
time taken to fork. Physical memory pages from
the OS are mapped as virtual memory pages in the
forked process are modified.

57
Some Bad Examples

The machine has 8GB of RAM.
The machine has a 8GB swap partition
The application has a 6GB footprint.
There is 9GB of virtual memory available.
The application does a fork() and then exec().
Exec() zeros 6GB of memory in the forked process,
demanding 6GB of RAM from the OS.
The OS starts swapping memory pages to disk!!

58
Some Bad Examples

The machine has 16GB of RAM.
The machine has a 4GB swap partition
The application has a 8GB footprint.
There is 6GB of virtual memory available.
The application tries to fork() and exec().
Solaris The fork fails (out of memory)
Linux The fork succeeds, the exec fails.

59
A Good Example

The machine has 64GB of RAM.
The machine has a 64GB swap partition
The application has a 20GB footprint.
There is 86GB of virtual memory available.
The application does a fork() and exec().
Solaris and Linux both work fine.

60
Tomcat Connector Settings

maxThreads
Maximum number of threads that can be open in the
pool.
800 (200 default)
minSpareThreads
Minimum number of idle threads to leave in the
pool.
16 (4 default, possibly 10)
maxSpareThreads
Maximum number of idle threads to leave in the
pool.
100 (50 default)
acceptCount
Maximum number of requests to queue if no threads
are free.
50 (10 default)

61
Reasons Not to Use Tomcat Clustering

32-bit is JVM too small anyway lt 200 threads max
4 JVMs of 2GB each 8 GB
Thats 4 JVMs, each still has overhead of 60
threads and 256MB of executable and non-heap
memory!
? 180 fewer threads available and 768MB more
non-heap memory
? 1/3 less work for the same resource
commitment
? further constrains the worst-case application
use cases
(course enrollment Grade-book columns)
Each JVM has only ¼ the memory available to an
8GB JVM
? can support only ¼ of the peak threads of an
8GB JVM
? can support less than ¼ of the peak users of a
8GB JVM
? more likely to enter infinite GC loops.
CPU time lost to process-switching over
thread-switching
Blackboard configuration is more complicated than
with a single JVM.
SNMP monitoring and metering of clustered JVMs is
more complicated.

62
Computing is All About Memory

A memory upgrade is often the most cost-effective
hardware upgrade that one can perform.
Not CPU-bound? Not network-bound? Not database
bound? Then what is the problem?
4 machines with too little memory cost far more
than does one machine with too much.
A slow machine with lots of memory is faster than
a fast machine thats out of memory.
Increase the heap and use the CMS collector!

63
Thank you for your time.