The Analytic DBMS Market(s) - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

The Analytic DBMS Market(s)

Description:

Blogs, including DBMS2 (www.DBMS2.com -- the source for most of this talk) ... Sub-DBMS file managers (e.g. SimpleDB, some MySQL uses) Science DBMS ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 27
Provided by: CurtAlfr7
Category:
Tags: dbms | analytic | market | mysql

less

Transcript and Presenter's Notes

Title: The Analytic DBMS Market(s)


1
The Analytic DBMS Market(s) New opportunities
with new technology by Curt A. Monash,
Ph.D. President, Monash Research Editor,
DBMS2 contact _at_monash.com http//www.monash.com h
ttp//www.DBMS2.com
2
Curt Monash
  • Analyst since 1981
  • Covered DBMS since the pre-relational days
  • Also analytics, search, etc.
  • Own firm since 1987
  • Publicly available research
  • Blogs, including DBMS2 (www.DBMS2.com -- the
    source for most of this talk)
  • Feed at www.monash.com/blogs.html
  • White papers and more at www.monash.com
  • User and vendor consulting

3
Our agenda
  • Why there are specialty analytic DBMS
  • Its not just the analytic area
  • Hardware issues
  • Tips for choosing among them
  • Segments and priorities
  • The selection process

4
Database diversity
  • High-end e-commerce
  • 100-terabyte analytics
  • High-volume call center
  • Media-heavy web startup
  • Simple departmental application
  • (and many more)

5
11 kinds of data management software
  1. High-end OLTP/general-purpose DBMS
  2. Mid-range OLTP/general-purpose DBMS
  3. Row-based analytic RDBMS
  4. Column- or array-based analytic RDBMS
  5. Text search engines
  6. XML and OO DBMS (but these may merge with search)
  7. RDF and other graphical DBMS (but these may merge
    with relational)
  8. Event/stream processing engines (aka CEP)
  9. Embedded DBMS for devices
  10. Sub-DBMS file managers (e.g. SimpleDB, some MySQL
    uses)
  11. Science DBMS

6
Why are there specialized analytic DBMS?
  • General-purpose database managers are optimized
    for updating short rows
  • not for analytic query performance
  • 10-100X price/performance differences are not
    uncommon
  • At issue is the interplay between storage,
    processors, and RAM

7
Moores Law, Kryders Law, and a huge exception
  • Growth factors
  • Transistors/chip
  • gt100,000 since 1971
  • Disk density gt100,000,000 since 1956
  • Disk speed
  • 12.5 since 1956
  • The disk speed barrier dominates everything!

8
The 1,000,0001 disk-speed barrier
  • RAM access times 5-7.5 nanoseconds
  • CPU clock speed lt1 nanosecond
  • Interprocessor communication can be 1,000X
    slower than on-chip
  • Disk seek times 2.5-3 milliseconds
  • Limit ½ rotation
  • i.e., 1/30,000 minutes
  • i.e., 1/500 seconds 2 ms
  • Tiering brings it closer to 1,0001 in practice,
    but even so the difference is VERY BIG

9
Hardware strategies to optimize analytic I/O
  • Lots of RAM
  • Parallel disk access!!!
  • Lots of networking
  • Tuned MPP (Massively Parallel Processing) is the
    key

10
Software strategies to optimize analytic I/O
  • Minimize data returned
  • Classic query optimization
  • Minimize index accesses
  • Page size
  • Precalculate results
  • Materialized views
  • OLAP cubes
  • Return data sequentially
  • Store data in columns
  • Stash data in RAM

11
16 contenders
  • Aster Data
  • Dataupia
  • Exasol
  • Greenplum
  • HP Neoview
  • IBM DB2 BCUs
  • Infobright
  • Kickfire
  • Kognitio
  • Microsoft Madison
  • Netezza
  • Oracle Exadata
  • ParAccel
  • Sybase IQ
  • Teradata
  • Vertica

12
Varied approaches
  • 3 are trying to meld OLTP and analytic processing
  • 2 have very specialized hardware
  • 1 is purely RAM-centric
  • Several use Infiniband several stress gigE
    switches
  • 6 are columnar
  • 2 stress cloud/DaaS

13
Segmentation made simple
  • One database to rule them all
  • One analytic database to rule them all
  • Frontline analytic database
  • Very, very big analytic database
  • Big analytic database handled very
    cost-effectively

14
7 more precise segmentation issues
  • What is your tolerance for specialized hardware?
  • What is your tolerance for set-up effort?
  • What is your tolerance for ongoing administrative
    burden?
  • What are your insert and update requirements?
  • At what volumes will you run fairly simple
    queries?
  • What are your complex queries like?
  • and, most important,
  • Are you madly in love with your current DBMS?

15
Specialized hardware
  • Custom or unusual chips (rare)
  • Custom or unusual interconnects
  • Fixed configurations of common parts

16
Set-up effort
  • Hardware acquisition and installation
  • Database and index design
  • Data cleaning and integration
  • Porting of existing applications

17
Ongoing administration
  • Part of the set-up effort also translates to an
    ongoing administrative burden
  • Indexes, materialized views, cubes, etc.
  • unless the DBMS architecture minimizes their
    use

18
Inserts and updates
  • Finally we get to the performance criteria
  • Batch load
  • ELT (or ETLT) vs. pure ETL
  • Mini-batches or trickle feeds
  • True transactional updates

19
Concurrent queries
  • Major use cases
  • Traditional BI
  • Customer-facing apps
  • Product maturity is often key

20
Complex queries
  • This is where the glamour is
  • MPP to speed up I/O
  • Clever answers to the data redistribution problem
  • Table scans vs. random access
  • Columns vs. rows
  • Aggressive use of RAM
  • Compression (saving on disk cost isnt the point)
  • and fast analytics even beyond the queries

21
The analytic DBMS selection process
  • Figure out what youre trying to buy
  • Make a short list
  • Do free POCs
  • Evaluate and decide

22
Figure out what youre trying to buy
  • Inventory your use cases
  • Current
  • Known future
  • Wish-list/dream-list future
  • Set constraints
  • People and platforms
  • Money
  • Establish target SLAs
  • Must-haves
  • Nice-to-haves

23
Short list basics
  • You might as well consider the incumbent(s)
  • Cash cost is an easy filter to apply
  • What is the crux of the deployment effort?
  • References can be scarce

24
Free POCs are a great invention
  • Most of the effort is in the set-up
  • The better you match your use cases, the more
    reliable the POC is
  • You might as well do POCs for several vendors
    at (almost) the same time!
  • Where is the POC being held?
  • Can you plan this yourself, or do you need
    outside help?

25
Evaluate and decide
  • It all comes down to
  • Cost
  • Speed
  • Risk
  • and in some cases
  • Time to value
  • Upside

26
Further information Curt A. Monash,
Ph.D. President, Monash Research Editor,
DBMS2 contact _at_monash.com http//www.monash.com h
ttp//www.DBMS2.com
Write a Comment
User Comments (0)
About PowerShow.com