Title: Efficient query execution on modern hardware with MonetDBX100
1Efficient query execution on modern hardware
with MonetDB/X100
2Thesis outline
- Introduction
- Modern hardware evolution
- Databases on modern hardware
- MonetDB/X100
- Vectorized in-cache processing
- Lightweight RAM-cache compression
- Cooperative scans
3Introduction
- In the introductory chapters we provide the
motivation for this research, as well as the
background, where we concentrate on the ways
existing database systems use modern hardware
4Introduction
- Hardware evolves at a rapid pace
- Database architectures evolve slowly
- Dont exploit the hardware well
- Performance comparing to specialized solutions
gets worse - Goal a new database architecture
- Works well on modern hardware
- Competitive to specialized solutions
5Modern hardware
- In this section we show how hardware evolution
results in a significant change of the computer
parameters, and calls for new design and
programming techniques
6Modern hardware evolution
- Three focus areas in this thesis
- Super-scalar CPUs
- Hierarchical memory systems
- Hard-disk evolution
- Areas left for future research
- Multi-cores (SMT, CMP)
- Hybrid CPUs (Cell)
- Alternative storage (NAND, MEMS)
7Super-scalar CPUs
- CPU pipeline evolution
- Pipelined CPUs
- Deeper pipelines
- More computing units
- SIMD
- Main pipeline hazards
- Data-dependent computations
- Branches
- Function calls (esp. dynamic dispatch)
8Exploiting super-scalar CPUs
- Targeted at computation-intensive tasks
- Main compiler techniques
- Loop-unrolling
- Software-pipelining
- SIMDization
- Function inlining
- Branch hints
- Most need multiple tuples to work on
9Hierarchical memory systems
- Increasing relative memory latency
- Cache memories
- Multiple levels with different parameters
- Organization (cache lines, associativity)
- Cache control prefetching, special ops
- Virtual memory
- TLB often forgotten
10Hard-disk evolution
- Latency vs bandwidth gap
- Bandwidth improves
- Random-access roughly fixed
- Disk vs Processor gap
- CPUs improve faster than both random and
sequential disk bandwidth
11Databases on modern hardware
- This section presents how databases are
traditionally organized, and what problems do
they have with efficiently exploiting the modern
hardware. We also discuss existing
architecture-conscious database techniques.
12Databases on modern hardware
- Database architecture overview
- Execution layer
- Volcano model
- MonetDB model
- Architecture-conscious improvements
- Storage layer
13Volcano model
- Tuple-at-a-time pipeline
- Large interpretation overhead
- Context switches I-cache misses
- Poor CPU utilization
- Originally, not cache-conscious
14MonetDB model
- Column-at-a-time processing
- Efficient on modern CPUs
- Intermediate result materialization
- Reduced performance
- Main-memory limitation
- Complicated for multi-attribute queries
15Architecture-conscious DB
- Reducing interpretation overheads
- Improving D-cache behavior
- Efficient computation
- Novel architectures (not in scope of this thesis)
- SMT, CMP
- GPUs, NPUs, Cell
16Reducing interpretation overheads
- Buffer operator Zhou
- Staged databases Harizopoulos
- STEPS
- Q-Pipe
- Block-oriented processing Padmanabhan
- Implicitly required by many other techniques
17Improving D-cache behavior
- Cache-conscious data structures
- B-Trees Rao
- PAX Ailamaki
- Cache-conscious processing
- Prefetching Chen
- Partitioning Boncz/Manegold
18Efficient computation
- SIMD in databases Zhou
- Selection evaluation Ross
19Arch-conscious DB - summary
- Lots of techniques
- Typically, analyzed in isolation
- Often, do not fit the tuple-at-a-time model
- Common requirement block processing
- Motivates a block-processing-oriented architecture
20Storage layer
- NSM vs DSM vs PAX
- Heavy use of random-access indices
- Little focus on scan-based queries
21MonetDB/X100
- The analysis in the previous chapter motivates
research in a new database architecture that
would efficiently work on modern hardware. The
main part of the thesis proposes a new approach
implemented in MonetDB/X100, first giving a
system overview and then describing some proposed
techniques in detail.
22MonetDB/X100
- Introduction
- MonetDB/X100 architecture
- Execution layer
- Storage layer
- Detailed description
- Vectorized in-cache execution
- Lightweight compression
- Cooperative scans
23MonetDB/X100 history
- MSc on parallel execution in MonetDB
- Suggested pipelined execution to reduce memory
traffic on SMPs - Outcome vectorized in-cache execution
- High performance in memory
- Problem too fast for disk
- Shifted research focus storage layer
- Bandwidth improvements necessary
24MonetDB/X100 focus
- We focus on (in this thesis)
- Efficient single-threaded execution
- Read-only storage
- We do not do
- Query parallelization
- Query optimization
- Updates (under development)
25MonetDB/X100 rationale
- Take the best of two worlds
- Scalability from Volcano
- Raw performance from MonetDB
- Optimize for modern hardware
- Cache-aware design
26Architecture overview
Memory-to-cache decompression
27Execution layer
28Vectorized execution basics
- Decomposition of processing
- Generic operators
- Type-specific primitives (90 CPU time).
- Data in vectors
- Allows efficient array-like processing
- Impact of vector size
- Demonstrates the problems of Volcano and MonetDB
approaches - Motivates staying in the CPU cache
29Execution layer details
- Query language physical algebra
- Type system
- Property system
- Simple operator examples
- Project
- Select and selection vectors
30Execution layer performance
- 100GB TPC-H in-memory
- Details for TPC-H Q1
- Primitive profiling
- In-memory processing bandwidth
- Hard to scale to disk
31Storage layer - ColumnBM
- Focus on scan-intensive tasks
- Simple scan-targeted index structures
- Focus on high bandwidth
- Column storage
- Large I/O units
- Proposed improvements
- Lightweight compression
- Cooperative scans
32- In the following three chapters we discuss in
details three major technical contributions of
this thesis - vectorized in-cache execution
- lightweight RAM-cache compression
- Cooperative scans
33Vectorized in-cache execution
- In this chapter we in detail discuss various
aspects of the proposed execution model, and show
how a vectorized execution layer can be
implemented.This part uses CIDR05 and
DAMON06, but adds a lot of new material.
Hence, its the focus of this presentation.
34Vectorized in-cache execution
- Benefits
- Challenges and solutions
- Vectorized operators - examples
35ViCE Benefits
- Reduced interpretation overhead
- Good I-cache behavior
- Intermediate processing unit
- Allows many arch-conscious optimizations
- High performance primitives
- Performance easy to profile and improve
- Exploit modern CPU potential
- Potential for future platforms
36ViCE Challenges and solutions
- Vectorizing relational operators
- Efficient primitive implementation
- Resource management
37Vectorizing relational operators
- Handling multiple attributes
- Pivot vectors
- Decomposition into independent primitives
- Phase separation (like loop fission)
- Branch separation
38Efficient primitive implementation
- High primitive specialization
- Heavy macro expansions
- Primitive generation
- Multiple implementations (optional)
- CPU-efficient programming
- Reducing data/control dependencies
- SIMDization
39Resource management
- Choosing a vector size
- Fitting into L1 or L2
- Discussion of memory usage
- Vectors memory negligible
40Vectorized relational operators
- Hash-based operators
- Order-aware operators
- Other operators
- NULL, overflow and variable-width-type handling
41Hash-based operators
- CPU efficient hashing Damon06
- Cache-efficient partitioning Damon06
- Example implementation hash-join
42Order-aware operators
- Simple order-aware aggregation
- Merge operators
- Handling multi-attribute keys
- Example implementation merge-join
43Other operators
- Other relational operators
- TopN example of do frequent task fast, accept
slow rare cases - RadixSort
- Internal operators
- Chunk adapt vector size
- Materialize allow e.g. CartesianProduct
44- The next two chapters discuss two techniques,
that improve data delivery rates for scan-based
applications.These chapters represent relatively
standalone techniques (though fitting nicely into
MonetDB/X100) and are very close to the papers
they are based on, hence their discussion here is
minimal.
45Lightweight compression
- ICDE06, with minimal changes
46Lightweight compression
- Standard compression too slow
- Proposed approach
- Data-specific algorithms
- CPU-tuned decompression
- Into-cache on-demand decompression
- Faster, allows more data in memory
- Fits MonetDB/X100 perfectly
- Improves TPC-H and IR tasks
- Partially applicable to any database
47Cooperative scans
- VLDB07, with minimal changes
48Cooperative scans
- Multi-query scans importance
- Existing optimization techniques
- Static behavior
- Limited gains
- Proposed relevance technique
- Dynamic behavior
- Better in both NSM and DSM
- Applicable to any database
49Outro
- Optional an evaluation chapter, when we discuss
in detail the performance of TPC-H and Terabyte
TREC benchmarks, but probably it wont fit - Conclusions By redesigning database architecture
in execution and storage layers we allowed
databases to make better use of modern hardware
and improved their performance.
50(No Transcript)