Title: Improving the Cache Performance of DBMS
1Improving the Cache Performance of DBMS
- Instructor Prof. Walid Aref
- Presented by Reynold Cheng
- Date 24th March, 2003
2Talk Outline
- Study the efficiency of DBMS in utilizing a
modern processor - Identify processor/main memory bottlenecks in
database queries - Discuss a page layout scheme for reducing cache
misses
3DBMS on Main Memory
- As main memory becomes more abundant, a larger
portion of database is put to main memory - DBMS access less from disk
- DBMS operations are more main memory bound
- Main memory, instead of disk, can become a new
performance bottleneck
4Memory Hierarchies
5Processor/Main Memory Performance Gap
- The improvement in main memory speed is much less
than processor gap - Cache miss is very expensive
- 1 access to memory 1000 instruction
opportunities in Pentium II Xeon - Does a modern DBMS suffer from poor cache misses?
6A study on processing time of DBMS 1
- A modern platform has
- Sophisticated execution hardware, overlapping
techniques - Fast, non-blocking caches and memory
- However, DBMS does not take full advantage
compared with scientific workloads - A study on the performance of four commercial
database systems on one platform - Identify the bottlenecks in processing time in
executing simple queries
7Execution Time Breakdown
- The execution time of a query in a pipelined
model can be classified into 4 components - Useful computation time
- Memory stalls
- Branch mispredictions
- Hardware resource waiting time (e.g., instruction
dependency, functional unit unavailability)
8Execution Time Breakdown (2)
- Stalls at least 50 of the time
- Memory stalls are the major bottleneck
9Breakdown of Memory Delays
L2 Data accesses on caches 19 - 86 of memory
stalls
10Summary of Major Bottlenecks
- 90 of memory stall is spent on
- Level 1 instruction cache miss
- Level 2 data cache miss
- Branch misprediction penalty
- Resource-related stall times the most important
one being dependency stalls - Ongoing research in reducing the effect of these
bottlenecks
11Data Page Layout
- Traditional schemes
- Slotted Pages (NSM)
- Vertical Partitioning (DSM)
- Both suffer from data cache misses problems
- Partition Attribute Across (PAX) 2
- A cache-conscious scheme
- A nice mix of the advantages between NSM and DSM
12Scheme 1 Slotted Pages
- N-ary Storage Model (NSM)
- Store table records sequentially
- Used by all commercial DBMS
- High intra-record locality
- attributes of the same record are put together
- Poor cache performance
13NSM Example Page Layout
14NSM Evaluating a Query
15Problems of NSM
- Each cache miss brings into the cache the other
values next to age - Wastes useful cache space to store non-referenced
data - Incurs unnecessary accesses to main memory
- Each record result in a cache miss
- Key problem Inter-record locality is not
maintained
16Scheme 2 Vertical Partitioning
- Decomposition Storage Model (DSM)
- Partition n-attribute relation as n
single-attribute relations - Each sub-relation contains 2 attributes
- Logical record id
- Attribute value
- The whole relation can be reconstructed by
joining sub-relations on record id
17DSM Example Page Layout
1237
4322
1563
7658
R3
R2
R1
18Advantages of DSM
- During single-attribute scan, low cache miss
compared with NSM - Higher I/O performance
- Inter-record locality
- Same attributes are put together
- Solves the problem suffered by NSM
19Problems of DSM
- Performance worsens significantly for queries
that involve multiple attributes - The DBMS must join the sub-relations on the
record id to reconstruct a record - The time spent joining sub-relations increases
with the number of attributes in the result
relation - High record reconstruction cost
- Intra-record locality is destroyed
20Summary of NSM and DSM
- Any better scheme to balance these factors?
21Scheme 3 PAX (Partition Attributes Across) 2
mini-page
mini-page
mini-page
22PAX Evaluating a Query
23Advantages of PAX
- Maintain high inter-record locality
- Fewer cache misses per record
- Eliminates unnecessary memory accesses
- Low record reconstruction cost
- Keep records fields in one page
- Does not affect I/O performance
- Only changes the page layout
- Orthogonal to other design decisions
24Comparing NSM, DSM, PAX
25Detailed Design of NSM
- One offset per record
- One offset for each variable attribute in each
record
26Detailed Design of PAX
27Experimental Setup
- Relation R is main-memory resident with numeric
fields exactly 4 values fit into 32-byte cache
line - Simple query
- select avg (ai)
- from R
- where aj gt Lo and aj lt Hi
- PII Xeon with Windows NT
- 16KB L1-I, 16KB L1-D, 512KB L2, 512 MB RAM
- Xeons hardware counters are used
- Tested with the Shore Storage Manager (behavior
similar to commercial DBMS)
28Effect of I/O
- PAX optimizes data cache behavior once the data
page is available from disk - Does not affect I/O performance
- PAX is orthogonal to any additional optimizations
to enhance I/O performance - When I/O latency dominates execution time, the
performance of PAX and NSM converge
29Sensitivity to of Attributes
DSM is very sensitive to of attributes record
reconstruction cost increases with of attributes
30Effect on Cache Data Access
- PAX saves 70 of NSMs data cache penalty
- PAX reduces misses at both L1 and L2
- PAX is less sensitive to selectivity
31Time and Sensitivity Analysis
- PAX incurs 75 less memory penalty than NSM (10
of time) - Execution times converge as of attributes
increase (because - width of minipages decrease)
32DSS Workload Evaluation
- Compare PAX and NSM running TPC-H decision
support workload - DSS applications are memory and computation
intensive, and not generally memory-intensive - Queries execute projections, selections,
aggregates and joins - 100M, 200M and 500M TPC-H DBs
33Speedup of TPC-H Queries
- PAX improves performance of NSM with I/O
considered - Speedup is less with larger DB sizes
34Data Update Issues
- Policy In-place update
- Variable-length shift if the new data item is
larger than the replaced one - PAX shifts half the data of the V-minipage
- NSM shifts half of the data of the page
- Update statement
- update R
- Set apap b
- Where aq gt Lo and aq lt Hi
35Effect of Query Update
- PAX always speeds queries up (7-17)
- Performance converges as projectivity increases
- Higher selectivity write-backs dominates speedup
36Conclusions
- We discuss bottlenecks in query execution time
- We study how data layout can be redesigned to
improve L2 data cache misses - NSM suffers from poor cache performance, while
DSM has large record reconstruction cost - PAX outperforms both schemes on simple queries on
memory-resident relations, and TPC-H workload - PAX is easy to be adopted in commercial DBMSs
37Other Issues of Main Memory Databases 3
- Cache-conscious index redesign database indexes
to have cache-friendliness properties - Commit May need to write to disk more often
since main memory loses data more easily in
system crashes serious bottleneck - Concurrency control since locks may not be held
long, lock contention may not be as important
increase granularity - Recovery need more time to recover
38References
- A. Ailamaki, D. J. DeWitt, M. D. Hill and D.
Wood. DBMSs On a Modern Processor Where Does
Time Go? Proceedings of the 25th VLDB Conference,
Scotland, 1999. - A. Ailamaki, D. J. DeWitt, M. D. Hill and M.
Skounakis. Weaving Relations for Cache
Performance. Proceedings of the 27th VLDB
Conference, Italy, 2001. - H. Garcia-Molina and K. Salem. Main Memory
Database Systems An Overview, TKDE, 4(6), 1992.