Efficient query execution on modern hardware with MonetDBX100 - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Efficient query execution on modern hardware with MonetDBX100

Description:

In the introductory chapters we provide the motivation for this ... NSM vs DSM vs PAX. Heavy use of random-access indices. Little focus on scan-based queries ... – PowerPoint PPT presentation

Number of Views:137

Avg rating:3.0/5.0

Slides: 51

Provided by: Mar1020

Category:

more less

Transcript and Presenter's Notes

Title: Efficient query execution on modern hardware with MonetDBX100

1
Efficient query execution on modern hardware
with MonetDB/X100

Marcin Zukowski

2
Thesis outline

Introduction
Modern hardware evolution
Databases on modern hardware
MonetDB/X100
Vectorized in-cache processing
Lightweight RAM-cache compression
Cooperative scans

3
Introduction

In the introductory chapters we provide the
motivation for this research, as well as the
background, where we concentrate on the ways
existing database systems use modern hardware

4
Introduction

Hardware evolves at a rapid pace
Database architectures evolve slowly
Dont exploit the hardware well
Performance comparing to specialized solutions
gets worse
Goal a new database architecture
Works well on modern hardware
Competitive to specialized solutions

5
Modern hardware

In this section we show how hardware evolution
results in a significant change of the computer
parameters, and calls for new design and
programming techniques

6
Modern hardware evolution

Three focus areas in this thesis
Super-scalar CPUs
Hierarchical memory systems
Hard-disk evolution
Areas left for future research
Multi-cores (SMT, CMP)
Hybrid CPUs (Cell)
Alternative storage (NAND, MEMS)

7
Super-scalar CPUs

CPU pipeline evolution
Pipelined CPUs
Deeper pipelines
More computing units
SIMD
Main pipeline hazards
Data-dependent computations
Branches
Function calls (esp. dynamic dispatch)

8
Exploiting super-scalar CPUs

Targeted at computation-intensive tasks
Main compiler techniques
Loop-unrolling
Software-pipelining
SIMDization
Function inlining
Branch hints
Most need multiple tuples to work on

9
Hierarchical memory systems

Increasing relative memory latency
Cache memories
Multiple levels with different parameters
Organization (cache lines, associativity)
Cache control prefetching, special ops
Virtual memory
TLB often forgotten

10
Hard-disk evolution

Latency vs bandwidth gap
Bandwidth improves
Random-access roughly fixed
Disk vs Processor gap
CPUs improve faster than both random and
sequential disk bandwidth

11
Databases on modern hardware

This section presents how databases are
traditionally organized, and what problems do
they have with efficiently exploiting the modern
hardware. We also discuss existing
architecture-conscious database techniques.

12
Databases on modern hardware

Database architecture overview
Execution layer
Volcano model
MonetDB model
Architecture-conscious improvements
Storage layer

13
Volcano model

Tuple-at-a-time pipeline
Large interpretation overhead
Context switches I-cache misses
Poor CPU utilization
Originally, not cache-conscious

14
MonetDB model

Column-at-a-time processing
Efficient on modern CPUs
Intermediate result materialization
Reduced performance
Main-memory limitation
Complicated for multi-attribute queries

15
Architecture-conscious DB

Reducing interpretation overheads
Improving D-cache behavior
Efficient computation
Novel architectures (not in scope of this thesis)
SMT, CMP
GPUs, NPUs, Cell

16
Reducing interpretation overheads

Buffer operator Zhou
Staged databases Harizopoulos
STEPS
Q-Pipe
Block-oriented processing Padmanabhan
Implicitly required by many other techniques

17
Improving D-cache behavior

Cache-conscious data structures
B-Trees Rao
PAX Ailamaki
Cache-conscious processing
Prefetching Chen
Partitioning Boncz/Manegold

18
Efficient computation

SIMD in databases Zhou
Selection evaluation Ross

19
Arch-conscious DB - summary

Lots of techniques
Typically, analyzed in isolation
Often, do not fit the tuple-at-a-time model
Common requirement block processing
Motivates a block-processing-oriented architecture

20
Storage layer

NSM vs DSM vs PAX
Heavy use of random-access indices
Little focus on scan-based queries

21
MonetDB/X100

The analysis in the previous chapter motivates
research in a new database architecture that
would efficiently work on modern hardware. The
main part of the thesis proposes a new approach
implemented in MonetDB/X100, first giving a
system overview and then describing some proposed
techniques in detail.

22
MonetDB/X100

Introduction
MonetDB/X100 architecture
Execution layer
Storage layer
Detailed description
Vectorized in-cache execution
Lightweight compression
Cooperative scans

23
MonetDB/X100 history

MSc on parallel execution in MonetDB
Suggested pipelined execution to reduce memory
traffic on SMPs
Outcome vectorized in-cache execution
High performance in memory
Problem too fast for disk
Shifted research focus storage layer
Bandwidth improvements necessary

24
MonetDB/X100 focus

We focus on (in this thesis)
Efficient single-threaded execution
Read-only storage
We do not do
Query parallelization
Query optimization
Updates (under development)

25
MonetDB/X100 rationale

Take the best of two worlds
Scalability from Volcano
Raw performance from MonetDB
Optimize for modern hardware
Cache-aware design

26
Architecture overview
Memory-to-cache decompression
27
Execution layer
28
Vectorized execution basics

Decomposition of processing
Generic operators
Type-specific primitives (90 CPU time).
Data in vectors
Allows efficient array-like processing
Impact of vector size
Demonstrates the problems of Volcano and MonetDB
approaches
Motivates staying in the CPU cache

29
Execution layer details

Query language physical algebra
Type system
Property system
Simple operator examples
Project
Select and selection vectors

30
Execution layer performance

100GB TPC-H in-memory
Details for TPC-H Q1
Primitive profiling
In-memory processing bandwidth
Hard to scale to disk

31
Storage layer - ColumnBM

Focus on scan-intensive tasks
Simple scan-targeted index structures
Focus on high bandwidth
Column storage
Large I/O units
Proposed improvements
Lightweight compression
Cooperative scans

In the following three chapters we discuss in
details three major technical contributions of
this thesis
vectorized in-cache execution
lightweight RAM-cache compression
Cooperative scans

33
Vectorized in-cache execution

In this chapter we in detail discuss various
aspects of the proposed execution model, and show
how a vectorized execution layer can be
implemented.This part uses CIDR05 and
DAMON06, but adds a lot of new material.
Hence, its the focus of this presentation.

34
Vectorized in-cache execution

Benefits
Challenges and solutions
Vectorized operators - examples

35
ViCE Benefits

Reduced interpretation overhead
Good I-cache behavior
Intermediate processing unit
Allows many arch-conscious optimizations
High performance primitives
Performance easy to profile and improve
Exploit modern CPU potential
Potential for future platforms

36
ViCE Challenges and solutions

Vectorizing relational operators
Efficient primitive implementation
Resource management

37
Vectorizing relational operators

Handling multiple attributes
Pivot vectors
Decomposition into independent primitives
Phase separation (like loop fission)
Branch separation

38
Efficient primitive implementation

High primitive specialization
Heavy macro expansions
Primitive generation
Multiple implementations (optional)
CPU-efficient programming
Reducing data/control dependencies
SIMDization

39
Resource management

Choosing a vector size
Fitting into L1 or L2
Discussion of memory usage
Vectors memory negligible

40
Vectorized relational operators

Hash-based operators
Order-aware operators
Other operators
NULL, overflow and variable-width-type handling

41
Hash-based operators

CPU efficient hashing Damon06
Cache-efficient partitioning Damon06
Example implementation hash-join

42
Order-aware operators

Simple order-aware aggregation
Merge operators
Handling multi-attribute keys
Example implementation merge-join

43
Other operators

Other relational operators
TopN example of do frequent task fast, accept
slow rare cases
RadixSort
Internal operators
Chunk adapt vector size
Materialize allow e.g. CartesianProduct

The next two chapters discuss two techniques,
that improve data delivery rates for scan-based
applications.These chapters represent relatively
standalone techniques (though fitting nicely into
MonetDB/X100) and are very close to the papers
they are based on, hence their discussion here is
minimal.

45
Lightweight compression

ICDE06, with minimal changes

46
Lightweight compression

Standard compression too slow
Proposed approach
Data-specific algorithms
CPU-tuned decompression
Into-cache on-demand decompression
Faster, allows more data in memory
Fits MonetDB/X100 perfectly
Improves TPC-H and IR tasks
Partially applicable to any database

47
Cooperative scans

VLDB07, with minimal changes

48
Cooperative scans

Multi-query scans importance
Existing optimization techniques
Static behavior
Limited gains
Proposed relevance technique
Dynamic behavior
Better in both NSM and DSM
Applicable to any database

49
Outro

Optional an evaluation chapter, when we discuss
in detail the performance of TPC-H and Terabyte
TREC benchmarks, but probably it wont fit
Conclusions By redesigning database architecture
in execution and storage layers we allowed
databases to make better use of modern hardware
and improved their performance.

50
(No Transcript)

Write a Comment

User Comments (0)