Title: Virtual Hierarchies to Support Server Consolidation
1Virtual Hierarchies to Support Server
Consolidation
- Michael Marty and Mark Hill
- University of Wisconsin - Madison
2What is Server Consolidation?
- Multiple server applications are deployed onto
Virtual Machines (VMs), running on a single, more
powerful server. - Feasibility
- Virtualization Technology (VT) Hardware and
software - Many-core CMPs Suns Niagara (32 threads)
Intels Tera-scale project (100s tiles)
3CMP Running Consolidated Servers
4Characteristics
- Isolating the function of VMs
- Isolating the performance of consolidated servers
- Facilitating dynamic reassignment of VM resources
(processor, memory) - Supporting inter-VM memory sharing (content-based
page sharing)
5How Memory System Optimized?
- Minimize AMAT by servicing misses within a VM
- Minimize interference among separate VMs to
isolate performance - Facilitate dynamic reassignment of cores, caches,
and memory to VMs - Inter-VM page sharing
6Current CMP Memory Systems
- Global broadcast Not viable for such a large
number of tiles - Global directory Forcing memory accesses to
cross chip, failing to minimize AMAT and isolate
performance - Statically distributing dir among tiles Better,
complicating memory allocation, VM reassignment
scheduling, limiting sharing opportunity
7DRAM Dir with Dir Cache (DRAM-DIR)
- Main dir in DRAM Dir cache in Memory Controller
- Each tile is a sharer of the data
- Any miss issues a request to dir.
- 1. Failing to minimize AMAT
- Significant latency to reach dir, even data is
near - 2. Allows performance of one VM to affect others
- due to interconnect and directory contention.
8Duplicate Tag Directory (TAG-DIR)
- Centrally located
- Fails to minimize AMAT
- Dir contentions
- Challenging as the number of cores increases (64
cores, 16-way gt 1024-way)
9Static Cache Bank Dir(STATIC-BANK-DIR)
- Home tile (decided by block address or page frame
no.) - Home tile maintains sharer states
- A local miss asks for home tile
- A replacement from home tile invalidates all
copies - Fails to meet minimizing AMAT, VM isolation (Even
worse, due to invalidations.)
10Solution Two-level virtual hierarchy
- Level 1 directory for intra-VM coherence
- Minimizing memory access time
- Isolating performance
- Two alternative global level two protocols for
inter-VM coherence - Allowing for inter-VM sharing due to migration,
reconfiguration, page sharing - VHA and VHB
11Level 1 Intra-VM Dir Protocol
- Home tile within the VM
- Who is home?
- Not necessarily power of 2
- Dynamic reassignment
- Dynamic home tiles by VM config Table (64-entry)
- 64 bit vector for each dir entry
12Level 2 Option 1 VHA
- Dir in DRAM and Dir Cache in Memory Controller
- Each entry contains a full 64-bit vector
- Why not home tile ID?
13Brief Summary
- Level-one Intra-VM protocol handles most of the
coherence - Level-two protocol will only be used for inter-VM
sharing and dynamic reconfiguration of VMs - Can we reduce the complexity of Level-two
protocol?
14Level 2 Option 2 VHB
- A single bit tracks whether a block has any
cached copies. - Broadcast for misses for inter-VM sharing if bit
is set.
15Advantage of Level 2 Broadcast
- Reduce the complexity of protocol, get rid of
many transient states - Enables level 1 proto to be inexact
- Using limited or coarse-grain vector
- Even no state with broadcast within VM
- No home tag for private data
- Victimize a tag without invalidating sharers
- Accessing memory with prediction without checking
the home tile first
16Uncontended L1-to-L1 Sharing latency
17Normalized Runtime Homogenous
- STATIC-BANK-DIR VHA consumes tag space in
static or dynamic home tiles - VHB no home tiles for private data
18Memory System Stall Cycle
19Cycle per Transaction for Mixed
- VHB best overall performance, lowest cpt
- DRAM-DIR 45-55 hit rate in the 8MB Dir Cache
(no partition) - STATIC slightly better for oltp, worse for jbb
in mixed1, allow interference, allow oltp to use
other VMs resource
20Conclusion
- Future memory system should be optimized for
workload consolidation as well as
single-workload. - Maximize shared memory accesses serviced within a
VM - Minimize interference among separate VMs
- Facilitate dynamic reassignment of resource