Efficient query execution on modern hardware with MonetDBX100 - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Efficient query execution on modern hardware with MonetDBX100

Description:

In the introductory chapters we provide the motivation for this ... NSM vs DSM vs PAX. Heavy use of random-access indices. Little focus on scan-based queries ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 51
Provided by: Mar1020
Category:

less

Transcript and Presenter's Notes

Title: Efficient query execution on modern hardware with MonetDBX100


1
Efficient query execution on modern hardware
with MonetDB/X100
  • Marcin Zukowski

2
Thesis outline
  • Introduction
  • Modern hardware evolution
  • Databases on modern hardware
  • MonetDB/X100
  • Vectorized in-cache processing
  • Lightweight RAM-cache compression
  • Cooperative scans

3
Introduction
  • In the introductory chapters we provide the
    motivation for this research, as well as the
    background, where we concentrate on the ways
    existing database systems use modern hardware

4
Introduction
  • Hardware evolves at a rapid pace
  • Database architectures evolve slowly
  • Dont exploit the hardware well
  • Performance comparing to specialized solutions
    gets worse
  • Goal a new database architecture
  • Works well on modern hardware
  • Competitive to specialized solutions

5
Modern hardware
  • In this section we show how hardware evolution
    results in a significant change of the computer
    parameters, and calls for new design and
    programming techniques

6
Modern hardware evolution
  • Three focus areas in this thesis
  • Super-scalar CPUs
  • Hierarchical memory systems
  • Hard-disk evolution
  • Areas left for future research
  • Multi-cores (SMT, CMP)
  • Hybrid CPUs (Cell)
  • Alternative storage (NAND, MEMS)

7
Super-scalar CPUs
  • CPU pipeline evolution
  • Pipelined CPUs
  • Deeper pipelines
  • More computing units
  • SIMD
  • Main pipeline hazards
  • Data-dependent computations
  • Branches
  • Function calls (esp. dynamic dispatch)

8
Exploiting super-scalar CPUs
  • Targeted at computation-intensive tasks
  • Main compiler techniques
  • Loop-unrolling
  • Software-pipelining
  • SIMDization
  • Function inlining
  • Branch hints
  • Most need multiple tuples to work on

9
Hierarchical memory systems
  • Increasing relative memory latency
  • Cache memories
  • Multiple levels with different parameters
  • Organization (cache lines, associativity)
  • Cache control prefetching, special ops
  • Virtual memory
  • TLB often forgotten

10
Hard-disk evolution
  • Latency vs bandwidth gap
  • Bandwidth improves
  • Random-access roughly fixed
  • Disk vs Processor gap
  • CPUs improve faster than both random and
    sequential disk bandwidth

11
Databases on modern hardware
  • This section presents how databases are
    traditionally organized, and what problems do
    they have with efficiently exploiting the modern
    hardware. We also discuss existing
    architecture-conscious database techniques.

12
Databases on modern hardware
  • Database architecture overview
  • Execution layer
  • Volcano model
  • MonetDB model
  • Architecture-conscious improvements
  • Storage layer

13
Volcano model
  • Tuple-at-a-time pipeline
  • Large interpretation overhead
  • Context switches I-cache misses
  • Poor CPU utilization
  • Originally, not cache-conscious

14
MonetDB model
  • Column-at-a-time processing
  • Efficient on modern CPUs
  • Intermediate result materialization
  • Reduced performance
  • Main-memory limitation
  • Complicated for multi-attribute queries

15
Architecture-conscious DB
  • Reducing interpretation overheads
  • Improving D-cache behavior
  • Efficient computation
  • Novel architectures (not in scope of this thesis)
  • SMT, CMP
  • GPUs, NPUs, Cell

16
Reducing interpretation overheads
  • Buffer operator Zhou
  • Staged databases Harizopoulos
  • STEPS
  • Q-Pipe
  • Block-oriented processing Padmanabhan
  • Implicitly required by many other techniques

17
Improving D-cache behavior
  • Cache-conscious data structures
  • B-Trees Rao
  • PAX Ailamaki
  • Cache-conscious processing
  • Prefetching Chen
  • Partitioning Boncz/Manegold

18
Efficient computation
  • SIMD in databases Zhou
  • Selection evaluation Ross

19
Arch-conscious DB - summary
  • Lots of techniques
  • Typically, analyzed in isolation
  • Often, do not fit the tuple-at-a-time model
  • Common requirement block processing
  • Motivates a block-processing-oriented architecture

20
Storage layer
  • NSM vs DSM vs PAX
  • Heavy use of random-access indices
  • Little focus on scan-based queries

21
MonetDB/X100
  • The analysis in the previous chapter motivates
    research in a new database architecture that
    would efficiently work on modern hardware. The
    main part of the thesis proposes a new approach
    implemented in MonetDB/X100, first giving a
    system overview and then describing some proposed
    techniques in detail.

22
MonetDB/X100
  • Introduction
  • MonetDB/X100 architecture
  • Execution layer
  • Storage layer
  • Detailed description
  • Vectorized in-cache execution
  • Lightweight compression
  • Cooperative scans

23
MonetDB/X100 history
  • MSc on parallel execution in MonetDB
  • Suggested pipelined execution to reduce memory
    traffic on SMPs
  • Outcome vectorized in-cache execution
  • High performance in memory
  • Problem too fast for disk
  • Shifted research focus storage layer
  • Bandwidth improvements necessary

24
MonetDB/X100 focus
  • We focus on (in this thesis)
  • Efficient single-threaded execution
  • Read-only storage
  • We do not do
  • Query parallelization
  • Query optimization
  • Updates (under development)

25
MonetDB/X100 rationale
  • Take the best of two worlds
  • Scalability from Volcano
  • Raw performance from MonetDB
  • Optimize for modern hardware
  • Cache-aware design

26
Architecture overview
Memory-to-cache decompression
27
Execution layer
28
Vectorized execution basics
  • Decomposition of processing
  • Generic operators
  • Type-specific primitives (90 CPU time).
  • Data in vectors
  • Allows efficient array-like processing
  • Impact of vector size
  • Demonstrates the problems of Volcano and MonetDB
    approaches
  • Motivates staying in the CPU cache

29
Execution layer details
  • Query language physical algebra
  • Type system
  • Property system
  • Simple operator examples
  • Project
  • Select and selection vectors

30
Execution layer performance
  • 100GB TPC-H in-memory
  • Details for TPC-H Q1
  • Primitive profiling
  • In-memory processing bandwidth
  • Hard to scale to disk

31
Storage layer - ColumnBM
  • Focus on scan-intensive tasks
  • Simple scan-targeted index structures
  • Focus on high bandwidth
  • Column storage
  • Large I/O units
  • Proposed improvements
  • Lightweight compression
  • Cooperative scans

32
  • In the following three chapters we discuss in
    details three major technical contributions of
    this thesis
  • vectorized in-cache execution
  • lightweight RAM-cache compression
  • Cooperative scans

33
Vectorized in-cache execution
  • In this chapter we in detail discuss various
    aspects of the proposed execution model, and show
    how a vectorized execution layer can be
    implemented.This part uses CIDR05 and
    DAMON06, but adds a lot of new material.
    Hence, its the focus of this presentation.

34
Vectorized in-cache execution
  • Benefits
  • Challenges and solutions
  • Vectorized operators - examples

35
ViCE Benefits
  • Reduced interpretation overhead
  • Good I-cache behavior
  • Intermediate processing unit
  • Allows many arch-conscious optimizations
  • High performance primitives
  • Performance easy to profile and improve
  • Exploit modern CPU potential
  • Potential for future platforms

36
ViCE Challenges and solutions
  • Vectorizing relational operators
  • Efficient primitive implementation
  • Resource management

37
Vectorizing relational operators
  • Handling multiple attributes
  • Pivot vectors
  • Decomposition into independent primitives
  • Phase separation (like loop fission)
  • Branch separation

38
Efficient primitive implementation
  • High primitive specialization
  • Heavy macro expansions
  • Primitive generation
  • Multiple implementations (optional)
  • CPU-efficient programming
  • Reducing data/control dependencies
  • SIMDization

39
Resource management
  • Choosing a vector size
  • Fitting into L1 or L2
  • Discussion of memory usage
  • Vectors memory negligible

40
Vectorized relational operators
  • Hash-based operators
  • Order-aware operators
  • Other operators
  • NULL, overflow and variable-width-type handling

41
Hash-based operators
  • CPU efficient hashing Damon06
  • Cache-efficient partitioning Damon06
  • Example implementation hash-join

42
Order-aware operators
  • Simple order-aware aggregation
  • Merge operators
  • Handling multi-attribute keys
  • Example implementation merge-join

43
Other operators
  • Other relational operators
  • TopN example of do frequent task fast, accept
    slow rare cases
  • RadixSort
  • Internal operators
  • Chunk adapt vector size
  • Materialize allow e.g. CartesianProduct

44
  • The next two chapters discuss two techniques,
    that improve data delivery rates for scan-based
    applications.These chapters represent relatively
    standalone techniques (though fitting nicely into
    MonetDB/X100) and are very close to the papers
    they are based on, hence their discussion here is
    minimal.

45
Lightweight compression
  • ICDE06, with minimal changes

46
Lightweight compression
  • Standard compression too slow
  • Proposed approach
  • Data-specific algorithms
  • CPU-tuned decompression
  • Into-cache on-demand decompression
  • Faster, allows more data in memory
  • Fits MonetDB/X100 perfectly
  • Improves TPC-H and IR tasks
  • Partially applicable to any database

47
Cooperative scans
  • VLDB07, with minimal changes

48
Cooperative scans
  • Multi-query scans importance
  • Existing optimization techniques
  • Static behavior
  • Limited gains
  • Proposed relevance technique
  • Dynamic behavior
  • Better in both NSM and DSM
  • Applicable to any database

49
Outro
  • Optional an evaluation chapter, when we discuss
    in detail the performance of TPC-H and Terabyte
    TREC benchmarks, but probably it wont fit
  • Conclusions By redesigning database architecture
    in execution and storage layers we allowed
    databases to make better use of modern hardware
    and improved their performance.

50
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com