Title: John C' Calvin
1Tuning the Java Virtual Machine for Stability and
Speed
- John C. Calvin
- Senior Systems Architect
- University of Toronto
2Toronto, Ontario CANADA (YYZ)
- population 5.6 million
- about the population of Chicago
- area 2,751 mi2 (7,125 km2)
- 4 universities 7 colleges
3University of Toronto
- 3 Campuses in the Greater Toronto Area
- Downtown 52,296 students 175 acres
- Eastern 10,465 students 300 acres
- Western 10,924 students 200 acres
- 1.4 Billion annual operating budget
- Degree Programs
- 840 undergraduate
- 520 graduate
- 75 doctoral
- 14,500 annual undergraduate intake
- 10,000 students in residence
- 247 buildings
- 200 acres net assignable space
- 7,591 parking spaces
4University of Toronto
- Students
- 55,352(46,940 FTEs) undergraduate degree-seeking
students - 13,702 (12,499 FTEs) graduate degree-seeking
students - 2,038 (902 FTEs) certificate, diploma and special
students - 2,593 residents and post-graduate medical
students - International students
- 5,182 undergraduate degree seeking students
- 1,579 graduate degree seeking students
- 326 certificate, diploma and special students
- 779 residents and post-graduate medical students
- Faculty
- 2,260 (2,185 FTEs) Professorial
- 432 (378 FTEs) Teaching Stream
- 1,079 (216 FTEs) Term-limited Sessional and
Stipendiary - 3,913 (2,351 FTEs) Clinical
- 2,707 (1,071 FTEs) Other
5Agenda
- Opening Remarks
- System Operational Goals
- The Java Runtime Environment
- Generations and Memory Spaces
- Garbage Collection Times and Algorithms
- Fork() and exec() under Linux and Solaris
6My bias stated clearly and up-front
- Operating system virtualization multiplies
operating system overhead. - Tomcat clustering multiplies Java Virtual Machine
(JVM) overhead. - Both OS virtualization and Tomcat clustering
multiply the configuration complexity.
7The Basis of the Work
- A fast JVM is only good, if its stable.
- A big JVM is only good, if its stable.
- Big fast JVMs can be stable.
- The Sun 64-bit JVM v1.6.x is very stable.
- Blackboard is a large Java application.
- A big, fast JVM must be good for Blackboard.
- Our JVMs were not stable.
8System Operational Goals
- Survive the worst-case single-user demand, placed
on a single JVM by any administrative, faculty,
or student use-cases, without JVM failure. - Increase the stability of the JVM.
- Increase the performance of the JVM.
- Reduce overall configuration complexity.
- Fully exploit application server resources.
- Reduce the cost of Blackboard hardware.
9The Results
- One 64-bit JVM can serve 60,000 actives.
- One 64-bit JVM can serve 2000 users.
- One 64-bit JVM can serve 570,000 pages/hr.
- One 64-bit JVM can run 900 threads.
- One application server was enough.
- Two servers offer hardware redundancy.
10(No Transcript)
11Java Runtime Environment
- The JRE is an application running under a host
operating system Windows, Linux, Solaris, etc. - The JRE is written in C and has been compiled
specifically for each operating system. - The JRE must interpret or compile the Java code
in Blackboard in order to run it. - The JRE demands memory from the host operating
system, and manages that memory as its heap
memory spaces, where it stores Java objects
created as the running JVM threads instantiate
them.
12JVM Heap Size
- Upper Bound (eg. 16GB)
- bbconfig.max.heapsize.tomcat16g
- A.K.A. JVM Command-line option Xmx16g
- The size at which the adaptive JVM memory sizing
algorithm will not increase the size of the heap. - Lower Bound (eg. 4GB)
- bbconfig.min.heapsize.tomcat4g
- A.K.A. JVM Command-line option Xms4g
- The size below which the adaptive JVM memory
sizing algorithm will not reduce the size of the
heap.
13Downward Pressures on JVM max.heapsize
- Amount of installed physical memory
- Platform choice
- AMD, Intel, SPARC, speed, of cores, etc.
- Operating System choice
- Windows, Linux, Solaris
- JVM choice
- 32-bit vs. 64-bit, 1.5.x or 1.6.x
- Garbage Collection (GC) algorithm choice
- Concurrent Mark Sweep (CMS) vs. ParallelOld
- Max. tolerable Stop-the-World times
- Headroom for other applications, OS, etc.
- fork()/exec() swap-space/memory demands
14Upward Pressures on JVM max.heapsize
- Worst-case application use cases
- Largest GradeCenter ( of columns enrollment)
- Unqualified searches (course catalog, users)
- Desired number of concurrent users
- MaxThreads 10MB (approx. measured)
- Idle JVM Heap low-water mark after GC
15What Happens When
- common use-cases exhaust the heap?
- admin use-cases exhaust the heap?
- a spike in demand exhausts max.threads?
- the required number of concurrent users cannot
be supported by the max.heapsize?
16Solving Memory Problems Solution 1
- Reduce the applications demand
- Reduce the size of organizations
- Reduce course enrolment sizes
- Reduce number of columns in GradeCentre
- Reduce the size of course catalog
- Reduce the number of users accounts
17Solving Memory Problems Solution 2
- Reduce the number of concurrent users
- Load-balance across multiple servers
- Add more physical application servers
- More pizza-boxes, power, A/C
- Virtualize the application servers
- Solaris Zones, Logical Domains, VMWare, etc.
- Network load-balance between them
- Hardware, Round-robin DNS, etc.
- Load-balance across multiple JVMs
- Tomcat Clustering
18Solving Memory Problems Solution 3
- Increase the JVMs heap size.
- Install more physical memory
- Too much memory has rarely caused a big problem
- Run a 64-bit JVM and configure a large heap
- 2GB or larger heap requires 64-bit JVM
- Tune Garbage Collection algorithms
- JVM 1.6.0_13 required
- -XXAlwaysPreTouch
- Eliminate the Java Runtime.exec method
- Some Blackboard code changes required
19Reasons to Avoid a Large Heap
- Because its a bad idea.
- Because nobody else is doing it.
- Because Blackboard doesnt support it.
- Because you cant run a 64-bit JVM.
- Because you dont know how to do it.
- Because you havent enough memory.
- Because you arent using the CMS collector.
- Because you just dont want to break it.
20Reasons to Embrace a Large Heap
- Because you have infinite GC loops.
- Because you get out-of-memory errors.
- Because JVMs are dying for no reason.
- Because youre already running a 64-bit JVM.
- Because it just feels like the right solution.
- Because youve got to try something else.
- Because youre lazy like me and you want a
simpler solution than Tomcat Clustering, Solaris
Zones, and Logical Domains. - Because RAM is the cheapest upgrade.
21Non-Java Memory Demands
- Java isnt the only thing that is going to demand
memory! - Apache will need 10MB http.max.clients.
- ModPerl needs memory.
- NFS needs memory.
- The operating system needs memory.
22Generations and Memory Spaces
- New (Young) Generation (heap)
- Eden Space
- Survivor Space
- Old Generation (heap)
- Tenured Space
- Permanent Generation (non-heap)
- Code Space (non-heap)
23Default JVM Generational Model
Young Generation
Old Generation
Eden Space
Survivor Space
Tenured Space
24Garbage Collection Basics
Young Generation
Old Generation
25Garbage Collection Algorithms
- -XXUseParalledOldGC (the Throughput
collector) - -XXUseParallelGC
- Invokes the multi-threaded New Generation
collector - -XX -UseParallelGC
- Invokes the single-threaded New Generation
collector - -XXUseConcMarkSweepGC (the Low-pause
collector) - -XXUseParNewGC
- Invokes the multi-threaded New Generation
collector - -XX-UseParNewGC
- Invokes the single-threaded New Generation
collector - -XXUseG1 (the Garbage First collector)
26GC Terminology
- Minor collection (YGC)
- A Garbage Collection event that clears the Eden
Space into the survivor space and promotes old
survivors into the Tenured Space it is a
stop-the-world collection. - Major collection (CMS)
- A Garbage Collection event that clears dead
objects from the Tenured Space and coalesces the
JVMs free-lists into larger memory pages,
concurrently with the execution of the
application small portions are stop-the-world
events. - Full collection (Compacting Collector)
- A Garbage Collection event that clears and
compacts the heap memory spaces it is a
stop-the-world collection.
27Single Server Stop-the-World Times
28GC Terminology
- Memory Allocation Rate
- The rate of consumption of the Eden Space in MB/s
(GB/Hr.) - Survival Rate
- Percentage of Eden Space (or Survivor Space) that
must be copied with the next young generation
garbage collection - Infant Mortality Rate
- Percentage of nascent object that die before
their first YGC - Promotion Rate
- Percentage of the New Generation that must be
copy into the Old Generation with the next young
generation collection - Collection Rate (Young or Old)
- Amount of memory reclaimed by minor and major
garbage collections measured in MB/s (GB/Hr.)
29UofT Blackboard JVM Heap
30Heap Memory Allocation Rate
- YGC/Hour Eden Size
- Example Memory Allocation Rate
- 4GB Eden Space with 15 second YGC interval
- 960GB per hour
31/usr/java/bin/jmap -J-d64 -heap ltpidgt
- /usr/java/bin/jmap -J-d64 -heap 29259
- Attaching to process ID 29259, please wait...
- Debugger attached successfully.
- Server compiler detected.
- JVM version is 1.5.0_14-b03
- using parallel threads in the new generation.
- using thread-local object allocation.
- Concurrent Mark-Sweep GC
- Heap Configuration
- MinHeapFreeRatio 40
- MaxHeapFreeRatio 70
- MaxHeapSize 21474836480 (20480.0MB)
- NewSize 4294967296 (4096.0MB)
- MaxNewSize 4294967296 (4096.0MB)
- OldSize 17179869184 (16384.0MB)
- NewRatio 4
- SurvivorRatio 2048
- Heap Usage
- New Generation (Eden 1 Survivor Space)
- capacity 4292935680 (4094.0625MB)
- used 3224075136 (3074.7176513671875MB)
- free 1068860544 (1019.3448486328125MB)
- 75.10187378348981 used
- Eden Space
- capacity 4290904064 (4092.125MB)
- used 3224075136 (3074.7176513671875MB)
- free 1066828928 (1017.4073486328125MB)
- 75.13743229659865 used
- From Space
- capacity 2031616 (1.9375MB)
- used 0 (0.0MB)
- free 2031616 (1.9375MB)
- 0.0 used
- To Space
- capacity 2031616 (1.9375MB)
- used 0 (0.0MB)
32/usr/java/bin/jstat gccause ltpidgt
- /usr/java/bin/jstat -gccause 2553 60s
- S0 S1 E O P YGC YGCT
FGC FGCT GCT LGCC GCC - 0.00 0.00 94.51 13.16 34.54 40
23.979 32 47.625 71.605 unknown GCCause
No GC - 0.00 0.00 63.01 13.64 34.79 41
24.211 33 51.549 75.760 unknown GCCause
No GC - 0.00 0.00 20.98 13.97 34.87 42
24.414 33 51.549 75.963 unknown GCCause
No GC - 0.00 0.00 97.00 13.15 34.98 42
24.414 34 55.990 80.404 unknown GCCause
No GC - 0.00 0.00 65.48 13.48 35.01 43
24.596 35 62.001 86.597 unknown GCCause
No GC - 0.00 0.00 58.29 13.91 35.05 44
24.816 35 62.001 86.817 unknown GCCause
No GC - 0.00 0.00 40.44 14.40 35.06 45
25.126 35 62.001 87.127 unknown GCCause
No GC - 0.00 0.00 87.47 13.83 35.07 45
25.126 36 67.605 92.732 unknown GCCause
No GC - 0.00 0.00 35.01 14.42 35.09 46
25.379 37 67.605 92.984 unknown GCCause
No GC - 0.00 0.00 84.45 14.42 35.11 46
25.379 37 80.853 106.232 unknown GCCause
No GC - 0.00 0.00 37.64 14.80 35.12 47
25.655 37 80.853 106.508 unknown GCCause
No GC - 0.00 0.00 88.85 13.45 35.12 47
25.655 38 86.310 111.965 unknown GCCause
No GC - 0.00 0.00 45.96 13.96 37.68 48
25.913 39 93.301 119.214 unknown GCCause
No GC - 0.00 0.00 36.45 14.50 37.69 49
26.227 39 93.301 119.528 unknown GCCause
No GC
33Some Logic About GC Times
- More users on an application server
- ? More threads creating objects
- ? Increasing memory allocation rate
- ? Decreasing YGC Interval
- ? Less time for nascent objects to die
- ? Higher promotion/survival rate
- ? More physical memory to copy
- ? Longer Young Generation Collections
- ? Longer Stop-the-world events
- ? Longer server response times
34Sun Default 64-bit JVM Heap
35The New Ratio
- Ratio of Young Generation to Old Generation
- A New Ratio of 3 means that the Old Generation
is 3 times the size of the Young Generation. - Example with a 16GB heap
- 12GB Old Generation
- 4GB Young Generation
- Example with a 20GB heap
- 15GB Old Generation
- 5GB Young Generation
36JVM Memory Spaces
37UofT Heap vs. Default Heap
38Understanding the Young Generation
Survivor Space
Eden Space
Eden Space
From Space
To Space
39Tenuring Threshold
Survivor Space
Tenured Space
From Space
To Space
40GC Times Can Vary
- Young GC Events
- The number of objects
- Size of the surviving objects
- Old GC Events
- Occupancy of Young Generation
- Fragmentation of the Old Generation
- Configuration
- Size of the Eden Space
- Size of the of the Survivor Spaces
- Tenuring Threshold
- Parallelism in the collectors
41If You Cant Measure the Application,Measure the
Machine.
- Tune the JVM for the largest heap that the
hardware, operating system, and GC algorithm will
support. - A larger heap means fewer stop-the-world events.
- There is no harm having a larger heap, if the
upper-bound on the Stop-the-World times is known,
fixed, and acceptable for the application. - For a fixed number of threads, both CPU and
memory demands can be reduced by running a single
JVM, instead of clustering. - More users means more threads more threads means
more objects more objects means more heap.
42Xms and Xmx Misconceptions
- When min.heapmax.heap, all of the JVMs heap is
allocated at startup. - FALSE
- Only the Eden Space is allocated at startup,
regardless of the min.heap setting. - If adaptive sizing is used, the JVM will not
reduce the total size of the JVM heap below
min.heap. - The heap size will not be reduced until it first
exceeds min.heap.
43JVM Non-heap Size Options
- -XXPermSize256m
- -XXMaxPermSize256m
- -XXInitialCodeCacheSize128m
- -XXReservedCodeCacheSize128m
44JVM Heap Size Options
- -Xms20g
- -Xmx20g
- -XXOldSize16g
- -XXNewSize4g
- -XXMaxNewSize4g
- -XXNewRatio4
- -XXSurvivorRatio4096
45 Gross JVM Tuning Tuning
- -Xss256k
- -XXUseTLAB
- -XXMaxFDLimit
- -XXAlwaysPreTouch
- -XXDisableExplicitGC
- -XXMaxTenuringThreshold0
46CMS Collector Tuning Options
- -XXUseConcMarkSweepGC
- -XXParallelGCThreads16
- -XXParallelCMSThreads4
- -XXParallelRefProcEnabled
- -XXCMSMarkStackSize8M
- -XXCMSMarkStackSizeMax8M
- -XXCMSInitiatingOccupancyFraction60
- -XXCMSScavengeBeforeRemark
- -XXCMSParallelRemarkEnabled
- -XXCMSPermGenSweepingEnabled
- -XXCMSClassUnloadingEnabled
47(No Transcript)
48CMS Collection Close-up
49Full CMS Cycle
50An Evening of CMS Collections
51The YGC Tipping Point
- YGC Events occurring too frequently
- ? Too many objects surviving
- ? Too much memory being promoted
- ? Old Generation Collections too frequent
- ? Objects survive too many epochs
- ? Old Generation collection rate too low
- ? Old Generation runs out of memory
- ? Infinite GC loops are inevitable
52Some Facts About Memory
- A process address space is usually larger than
the total amount of physical memory (RAM)
installed. - Any program might demand more memory than is
currently uncommitted and available. - When the OS is starved for physical memory it
uses swap space (disk storage) to hold older
pages. - Physical memory is 500k times faster than disk.
- Swapping to disk is memory starvation.
- Swapping to disk is evil!
53The Fork() and Exec() Problemin Solaris and Linux
- When a process forks, it creates a copy of its
address space (not its memory footprint). - As the forked process runs, it modifies the
virtual memory in its own address space, and the
OS must allocate physical memory to hold the
modified pages (copy-on-write). - The exec() causes a lot of copy-on-write faults
to occur very rapidly, demanding large amounts of
memory from the OS.
54Some Solaris Facts About Memory
- If there is insufficient virtual memory to
back-stop a forking process, the fork() call will
fail. - This is a fail-safe fork. It prevents the OS
from over-committing virtual memory. - It can be annoying, because it means you might
need a huge swap-partition (2xRAM) that may get
committed but is never touched.
55More Xms and Xmx Misconceptions
- As the JVM forks, at least max.heapsize swap
space must be available or the fork() will fail. - FALSE
- Virtual memory Physical Memory Swap Space
- Virtual memory equal to the resident set size in
use by the parent process, at the time of the
fork(), will be allocated. - Linux will over-commit virtual memory (kernel 2.6
default). - A JVM call to malloc() succeeds and the JVM is
promised memory later the OS may not deliver on
that promise! - This can lead to out of memory errors.
- Solaris will not over-commit virtual memory.
- As fails the malloc() fails, so fails the fork().
56More Xms and Xmx Misconceptions
- When the JVM forks a new process, the forked
process will use max.heapsize. - FALSE
- Fork1() uses a copy-on-write technique reduce the
time taken to fork. Physical memory pages from
the OS are mapped as virtual memory pages in the
forked process are modified.
57Some Bad Examples
- The machine has 8GB of RAM.
- The machine has a 8GB swap partition
- The application has a 6GB footprint.
- There is 9GB of virtual memory available.
- The application does a fork() and then exec().
- Exec() zeros 6GB of memory in the forked process,
demanding 6GB of RAM from the OS. - The OS starts swapping memory pages to disk!!
58Some Bad Examples
- The machine has 16GB of RAM.
- The machine has a 4GB swap partition
- The application has a 8GB footprint.
- There is 6GB of virtual memory available.
- The application tries to fork() and exec().
- Solaris The fork fails (out of memory)
- Linux The fork succeeds, the exec fails.
59A Good Example
- The machine has 64GB of RAM.
- The machine has a 64GB swap partition
- The application has a 20GB footprint.
- There is 86GB of virtual memory available.
- The application does a fork() and exec().
- Solaris and Linux both work fine.
60Tomcat Connector Settings
- maxThreads
- Maximum number of threads that can be open in the
pool. - 800 (200 default)
- minSpareThreads
- Minimum number of idle threads to leave in the
pool. - 16 (4 default, possibly 10)
- maxSpareThreads
- Maximum number of idle threads to leave in the
pool. - 100 (50 default)
- acceptCount
- Maximum number of requests to queue if no threads
are free. - 50 (10 default)
61Reasons Not to Use Tomcat Clustering
- 32-bit is JVM too small anyway lt 200 threads max
- 4 JVMs of 2GB each 8 GB
- Thats 4 JVMs, each still has overhead of 60
threads and 256MB of executable and non-heap
memory! - ? 180 fewer threads available and 768MB more
non-heap memory - ? 1/3 less work for the same resource
commitment - ? further constrains the worst-case application
use cases - (course enrollment Grade-book columns)
- Each JVM has only ¼ the memory available to an
8GB JVM - ? can support only ¼ of the peak threads of an
8GB JVM - ? can support less than ¼ of the peak users of a
8GB JVM - ? more likely to enter infinite GC loops.
- CPU time lost to process-switching over
thread-switching - Blackboard configuration is more complicated than
with a single JVM. - SNMP monitoring and metering of clustered JVMs is
more complicated.
62Computing is All About Memory
- A memory upgrade is often the most cost-effective
hardware upgrade that one can perform. - Not CPU-bound? Not network-bound? Not database
bound? Then what is the problem? - 4 machines with too little memory cost far more
than does one machine with too much. - A slow machine with lots of memory is faster than
a fast machine thats out of memory. - Increase the heap and use the CMS collector!
63Thank you for your time.
- John C. Calvin
- Senior Systems Architect
- Information Technology Services
- University of Toronto
- john.calvin_at_utoronto.ca