Title: VMware Virtualization of Oracle and Java
1VMware Virtualization of Oracle and Java
- Scott B. Drummonds
- Tim Harris
2Agenda
- VMware Virtualization Overview
- Architecture, performance, and overheads
- Best Practices
- Oracle, Java
- Performance Data
- Scaling, large memory pages, AMD RVI
- Conclusions
3VMware Virtualization of Oracle and Java
4VMware Infrastructure 3 Architecture
5VMware ESX Virtualization Architecture
Guest
Guest
File System
TCP/IP
CPU resource is controlled by The scheduler, and
virtualized by the monitor Memory is allocated
by the VMkernel, and virtualized by the
monitor Network and I/O devices are emulated
and proxies though native device drivers
Monitor
Monitor
VMkernel
Virtual NIC
Virtual SCSI
Memory Allocator
Scheduler
Virtual Switch
File System
NIC Drivers
I/O Drivers
Physical Hardware
6Speeding Up Virtualization
Where are the various virtualization performance
hits?
7Multi-Mode Monitors
Guest
Guest
Guest
There are different types of monitors for
different workloads and CPU types VMware ESX
provides a dynamic framework to allow the best
monitor for the workload
Binary Translation
Para- Virtualization
Hardware Assist
VMkernel
Virtual NIC
Virtual SCSI
Memory Allocator
Scheduler
Virtual Switch
File System
NIC Drivers
I/O Drivers
Physical Hardware
8MMU Virtualization without hardware support
- Guest maintains sets of page tables
- In native execution, they would have been used
for address translation - Virtual Machine Monitor (VMM) maintains a set of
shadow page tables - There is one shadow page table for each guest
page table - VMM sets CR3 to point to shadow page tables
- Translation happens using shadow page tables
- Guest page tables contain VPN-PPN translations,
while shadows contain VPN-MPN translations
CR3
Guest page tables
VPN-PPN translations
Shadow page tables
VPN-MPN translations
9Rapid Virtualization Indexing
VPN - PPN mapping
- RVI provides mechanism to avoid shadow page
tables - Provides a second layer of page tables
- Contain physical to machine address translations
- Hypervisor maintains them
- So, with RVI
- Guest page tables contain virtual to physical
translation - Second level tables contain physical to machine
translation - Using two tables, hardware converts virtual
address to machine address
guest CR3
guest page tables
nested CR3
hypervisor page tables
PPN - MPN mapping
10What is Page Sharing?
VM 1
VM 2
VM 3
- Content-based
- Hint (hash of page content) generated for 4K
pages - Hint is used for a match
- If matched, perform bit by bit comparison
- COW (Copy-on-Write)
- Shared pages are marked read-only
- Write to the page breaks sharing
Hypervisor
VM 1
VM 2
VM 3
Hypervisor
11ESX Server Memory Ballooning
Borrow Pages
VM with VMware Tools Installed
- Guest OS has better information than VMkernel
- Which pages are stale
- Which pages are unused
- Guest Driver installed with VMware Tools
- Artificially induces memory pressure
- VMkernel decides how much memory to reclaim, but
guest OS gets to choose particular pages
May page content out to virtual disk
Expand
VM with VMware Tools Installed
Lend Pages
May bring contentfrom virtual disk
Shrink
12Oracle Databases on VI3
13Oracle Database Characteristics
- What its not
- Its not a Huge I/O consumer
- Most common Oracle databases have modest I/O
profiles - It does not have a small memory footprint
- Large-ish memory footprint and modest I/O most
common - Tuning a DB for virtualization is not unique
rocket science - Many standard tuning activities benefit
virtualized DBs substantially
14Capacity Planner Data for Oracle Databases
- Out of 13K Physical Oracle DBs considered
- 65 of systems on 2 core systems, averaging 5
CPU utilization - Roughly 4 of systems fully consume more than 2
cores - Most consume between 2 and 4G of RAM
- Static RAM consumption points to fixed SGA with
fixed of PGAs
15Oracle DB Workload Characteristics
- Memory
- Large In-Memory Footprint (SGA)
- Ensure good cache hit ratio
- Target 98 or higher
- Small number of processes to protect against TLB
misses - I/O
- Lower than generally assume (50 IOP per second
average) - Depends on quality of SQL execution plans
- Overall Privileged Instructions
- I/O, Context Switch (TLB misses)
16Making your DB Ready to Virtualize
- Well Tuned DB for Physical is Good for Virtual
- Poor Execution Plans even worse in Virtual
- Cause poor cache re-use
- Cause additional I/O and hence CPU Overhead
- Cause additional impact on storage
- Minimize Full Table Scans
- Up to date statistics for CBO
- Small number of DB file scattered read events
- Tune SQL with high number of physical reads per
execute
17Virtualization Overhead for Oracle DBs
- Well Tuned DB
- Typically 10 to 20 additional CPU required over
physical - Poorly Tuned DB
- Maybe 20 to 30 or even more
- Depends on SQL execution plans
- User Impact of Additional CPU Requirements
- Allocate additional CPU per VM to cover overhead
- Results in minimal impact to user response time
- If VM CPU pegged then expect substantial impact
on user
18Oracle Physical to Virtual Conversion Process
- All but the Hungriest DBs will fit on a VM
- Use smallest VM that will suffice
- Ie. 1 vCPU VM more efficient than 2 vCPU if it
fits - Current limit of 4 vCPU per VM
- Will not exist for long
- Limits virtualization of DBs that consume more
than 3 physical cores - Appears to be a small number of all DBs
- Less than 5 in our surveys
19General Best Practices for Virtualizing DBs
- Characterize DBs into three rough groups
- Green DBs typically 70
- Ideal candidate for virtualization
- Well tuned and modest CPU consumption
- Yellow DBs typically 25
- Likely candidate for virtualization
- May need some SQL tuning and monitoring to
understand CPU and I/O requirements - Red DBs typically 5
- Unlikely candidates until larger VMs available
- Consumes 4 or more physical cores
- Not a lot of SQL tuning to be done
20OLTP vs DSS Oracle Workloads and Virtualization
- OLTP Workloads
- Assume frequent small queries
- Should hit efficient index almost all the time
- Basic Diagnostics with AWR report
- Need small physical reads per exec, no full table
scans - DSS Workloads
- Should hit summary tables vs base tables as much
as possible - Use materialized views to roll up as batch jobs
at night - Daytime load should be index look ups
- May summarize delta from summary in real time
when necessary
21Direct I/O
- Guest-OS Level Option for Bypassing the guest
cache - Uncached access avoids multiple copies of data in
memory - Avoid read/modify/write module file system block
size - Bypasses many file-system level locks
- Enabling Direct I/O on Linux
vi init.ora filesystemio_optionssetall Check
iostat 3 (Check for I/O size matching the DB
block size)
22Asynchronous I/O
- An API for single-threaded process to launch
multiple outstanding I/Os - Multi-threaded programs could just just multiple
threads - Oracle databases uses this extensively
- See aio_read(), aio_write() etc...
- Enabling AIO on Linux
rpm -Uvh aio.rpm vi init.ora filesystemio_opti
onssetall Check ps aef grep dbwr
strace p io_submit() for io_submit in syscall trace
23Use Large Pages
- Guest-OS Level Option to use Large MMU Pages
- Maps the large SGA region with fewer TLB entries
- Reduces MMU overheads
- Enabling Large Pages on Linux
vi /etc/sysctl.conf (add the following
lines) vm/nr_hugepages2048 vm/hugetlb_shm_group
55 cat /proc/vminfo grep Huge HugePages_Total
1024 HugePages_Free 940 Hugepagesize
2048 kB
24Linux Versions
- Some older Linux versions have a 1Khz timer to
optimize desktop-style applications - There is no reason to use such a high timer rate
on server-class applications - The timer rate on 4vcpu Linux guests is over
70,000 per second! - Use RHEL5.1
- Install 2.6.18-53.1.4 kernel or later
- Put divider10 on the end of the kernel line in
grub.conf and reboot. - All the RHEL clones (CentOS, Oracle EL, etc.)
work the same way.
25Page Sharing and Large Memory Pages
- Large Pages In Oracle
- Can increase efficiency of memory management
- Large Pages are Not Shared
- Expect less reduction in memory consumption with
large pages - Hardware Assisted Memory Management
- Benefits from use of Large Pages
- No other hypervisor uses large pages today
- Expect AMD RVI and Intel EPT to work well with
VMware Infrastructure - And likely not with hypervisors that dont
support large pages
26Page Sharing With Oracle DBs
- Page Sharing in Vmware Infrastructure
- Reduce memory consumption by sharing common pages
- Common pages include
- OS related pages
- Executable related pages
- Ie. Oracle executables for each VM running Oracle
- Serves to allow larger SGA with overall memory
consumption reduction
27Oracle Performance Study SwingBench
- TPC-like Transaction Processing Benchmark
- Order-entry benchmark order product processing
- Java client generator with Oracle back-end
28SwingBench Configuration
- Database
- of Users 5,011,872
- of Products 5,011,872
- Db size 5.11GB
- Test
- Test duration 10 mins
- of Users per run 30
- Think time 0
- New customer - 11
- Browse products - 28
- Order products - 28
- Process orders - 5
- Browse orders - 28
- Oracle 11g (11.1.0)
- RHEL4 x86_64
- SGA 3GB
- PGA 1GB
- RHEL5 U1
- 64 bit
- 2.6.18-53.1.13.el5
- Dell PowerEdge 2950
- Mem 8GB
- Two dual core) Intel(R) Xeon(R) CPU 5160 _at_
3.00GHz processor 4GB cache - Storage CX 3-40 (30 disks)
- ESX version 3.5 build 60217
- Number of VMs 1
- vCPU 4
- Mem 6GB
- vDisk 16GB
- vNIC 1
29Measuring the Performance of DB Virtualization
30SwingBench Oracle Single DB Scaling
Oracle 10GR2 Swingbench Throughput (txns/sec)
Number of virtual CPUs in Database/Guest
31Study The Oracle DVD-Store Benchmark
- Simulate a large multi-tier application with
Oracle as the back-end database - Simulates DVD store transactions
- Java client tier
- Oracle Database
Sun 16-core x4600 M2 VMware ESX 3.5 Oracle 10G
R2 RHEL4, Update 4, 64-bit
EMC CLARiiON CX-340 30 x 15k Spindles
32Many Large Databases Scaling Out
- What happens when we consolidate more than one
large database per host? - Increase number of large databases and measure
performance - Key criteria Throughput and Response Time
- Scale DVD-Store Benchmark
- From 1 to 7 Databases, each with their own VM
- From 2 to 16 Physical CPU cores
- From 32 to 256 GB of RAM
33Large Database Consolidation Study
34Oracle Performance (Response time)
35Java on VI3
36Java Workload Characteristics
- CPU
- Intensive threads not processes
- Memory
- Heavy
- Network
- Tends to be light
- Storage
- Tends to be light
37Page Sharing Java
- Common pages to OS
- Common pages to JVM
- Common application pagesonly where apps are
identical? - Garbage collection
- Fewer zero pages
- Tends to fill up assigned memory
- Configurable through JVM?
38Page Sharing and Large Memory Pages
- Beware of the combination of large pages and
memory over-commitment - large pages are not shared
- when sharing is needed, large pages are backed by
normal pages (4K) - This is a repeat of earlier slide in Oracle
sectionreconcile for final
39VM Memory Over-commitment with Java
- JVM is a VM within the OS
- If balloon driver takes memory from JVM, access
to JVM heap will force guest swapping - this is particularly bad with JVM heap access
which tends to be randomno locality
40Balloon and Swap Interaction
41Java Config
- Multiple JVMs known to outperform single large
JVM - Requires app with a scale-out model
- Scaling out VMs a better idea
- DRS
42Java Tuning
- Understand
- Objects created and put in Eden
- After certain life, pushed to long-lived area
- GC sweeps Eden aggressively and less so with
long-lived area - So
- Eden sizing impacts memory access
- GC thread count increases raise memory access
profile and virtual overhead - This is another reason for using our model of
multiple VMs each with their own JVM
43JRockit
- BEA
- OS-less
- Optimal out-of-box
44Common App Tuning
- Linux kernel 2.6.22.16 (check)
- RHEL
- SUSE 250 Hz
- Others
45Performance Data
46Java Scalability
- Will Java scale to 16 cores?
- If no, show graph.
47SwingBench Oracle Single DB Scaling
Oracle 10GR2 Swingbench Throughput (txns/sec)
Number of virtual CPUs in Database/Guest
48Large Database Consolidation Study
Scaling to 16 Cores, 256GB RAM!
49Oracle Performance (Response time)
50Storage Protocols Sequential Read Throughput
51Storage Protocols Sequential Write Throughput
52VMFS Performance VMFS versus RDM