Title: The Analytic DBMS Market(s)
1 The Analytic DBMS Market(s) New opportunities
with new technology by Curt A. Monash,
Ph.D. President, Monash Research Editor,
DBMS2 contact _at_monash.com http//www.monash.com h
ttp//www.DBMS2.com
2Curt Monash
- Analyst since 1981
- Covered DBMS since the pre-relational days
- Also analytics, search, etc.
- Own firm since 1987
- Publicly available research
- Blogs, including DBMS2 (www.DBMS2.com -- the
source for most of this talk) - Feed at www.monash.com/blogs.html
- White papers and more at www.monash.com
- User and vendor consulting
3Our agenda
- Why there are specialty analytic DBMS
- Its not just the analytic area
- Hardware issues
- Tips for choosing among them
- Segments and priorities
- The selection process
4Database diversity
- High-end e-commerce
- 100-terabyte analytics
- High-volume call center
- Media-heavy web startup
- Simple departmental application
- (and many more)
511 kinds of data management software
- High-end OLTP/general-purpose DBMS
- Mid-range OLTP/general-purpose DBMS
- Row-based analytic RDBMS
- Column- or array-based analytic RDBMS
- Text search engines
- XML and OO DBMS (but these may merge with search)
- RDF and other graphical DBMS (but these may merge
with relational) - Event/stream processing engines (aka CEP)
- Embedded DBMS for devices
- Sub-DBMS file managers (e.g. SimpleDB, some MySQL
uses) - Science DBMS
6Why are there specialized analytic DBMS?
- General-purpose database managers are optimized
for updating short rows - not for analytic query performance
- 10-100X price/performance differences are not
uncommon - At issue is the interplay between storage,
processors, and RAM
7 Moores Law, Kryders Law, and a huge exception
- Growth factors
- Transistors/chip
- gt100,000 since 1971
- Disk density gt100,000,000 since 1956
- Disk speed
- 12.5 since 1956
- The disk speed barrier dominates everything!
8 The 1,000,0001 disk-speed barrier
- RAM access times 5-7.5 nanoseconds
- CPU clock speed lt1 nanosecond
- Interprocessor communication can be 1,000X
slower than on-chip - Disk seek times 2.5-3 milliseconds
- Limit ½ rotation
- i.e., 1/30,000 minutes
- i.e., 1/500 seconds 2 ms
- Tiering brings it closer to 1,0001 in practice,
but even so the difference is VERY BIG
9Hardware strategies to optimize analytic I/O
- Lots of RAM
- Parallel disk access!!!
- Lots of networking
- Tuned MPP (Massively Parallel Processing) is the
key
10 Software strategies to optimize analytic I/O
- Minimize data returned
- Classic query optimization
- Minimize index accesses
- Page size
- Precalculate results
- Materialized views
- OLAP cubes
- Return data sequentially
- Store data in columns
- Stash data in RAM
1116 contenders
- Aster Data
- Dataupia
- Exasol
- Greenplum
- HP Neoview
- IBM DB2 BCUs
- Infobright
- Kickfire
- Kognitio
- Microsoft Madison
- Netezza
- Oracle Exadata
- ParAccel
- Sybase IQ
- Teradata
- Vertica
12Varied approaches
- 3 are trying to meld OLTP and analytic processing
- 2 have very specialized hardware
- 1 is purely RAM-centric
- Several use Infiniband several stress gigE
switches - 6 are columnar
- 2 stress cloud/DaaS
13Segmentation made simple
- One database to rule them all
- One analytic database to rule them all
- Frontline analytic database
- Very, very big analytic database
- Big analytic database handled very
cost-effectively
147 more precise segmentation issues
- What is your tolerance for specialized hardware?
- What is your tolerance for set-up effort?
- What is your tolerance for ongoing administrative
burden? - What are your insert and update requirements?
- At what volumes will you run fairly simple
queries? - What are your complex queries like?
- and, most important,
- Are you madly in love with your current DBMS?
15Specialized hardware
- Custom or unusual chips (rare)
- Custom or unusual interconnects
- Fixed configurations of common parts
16Set-up effort
- Hardware acquisition and installation
- Database and index design
- Data cleaning and integration
- Porting of existing applications
17Ongoing administration
- Part of the set-up effort also translates to an
ongoing administrative burden - Indexes, materialized views, cubes, etc.
- unless the DBMS architecture minimizes their
use
18Inserts and updates
- Finally we get to the performance criteria
- Batch load
- ELT (or ETLT) vs. pure ETL
- Mini-batches or trickle feeds
- True transactional updates
19Concurrent queries
- Major use cases
- Traditional BI
- Customer-facing apps
- Product maturity is often key
20Complex queries
- This is where the glamour is
- MPP to speed up I/O
- Clever answers to the data redistribution problem
- Table scans vs. random access
- Columns vs. rows
- Aggressive use of RAM
- Compression (saving on disk cost isnt the point)
- and fast analytics even beyond the queries
21The analytic DBMS selection process
- Figure out what youre trying to buy
- Make a short list
- Do free POCs
- Evaluate and decide
22Figure out what youre trying to buy
- Inventory your use cases
- Current
- Known future
- Wish-list/dream-list future
- Set constraints
- People and platforms
- Money
- Establish target SLAs
- Must-haves
- Nice-to-haves
23Short list basics
- You might as well consider the incumbent(s)
- Cash cost is an easy filter to apply
- What is the crux of the deployment effort?
- References can be scarce
24Free POCs are a great invention
- Most of the effort is in the set-up
- The better you match your use cases, the more
reliable the POC is - You might as well do POCs for several vendors
at (almost) the same time! - Where is the POC being held?
- Can you plan this yourself, or do you need
outside help?
25Evaluate and decide
- It all comes down to
- Cost
- Speed
- Risk
- and in some cases
- Time to value
- Upside
26 Further information Curt A. Monash,
Ph.D. President, Monash Research Editor,
DBMS2 contact _at_monash.com http//www.monash.com h
ttp//www.DBMS2.com