How to Select an Analytic DBMS - PowerPoint PPT Presentation

About This Presentation
Title:

How to Select an Analytic DBMS

Description:

Very, very big analytic database. Big analytic database handled very cost-effectively ... Big Data. Operational analytics. Metrics a first cut. Total user data ... – PowerPoint PPT presentation

Number of Views:1021
Avg rating:3.0/5.0
Slides: 41
Provided by: CurtAlfr7
Category:
Tags: dbms | analytic | big | select

less

Transcript and Presenter's Notes

Title: How to Select an Analytic DBMS


1
How to Select an Analytic DBMS DRAFT!! by Curt
A. Monash, Ph.D. President, Monash
Research Editor, DBMS2 contact
_at_monash.com http//www.monash.com http//www.DBMS2
.com
2
Curt Monash
  • Analyst since 1981, own firm since 1987
  • Covered DBMS since the pre-relational days
  • Also analytics, search, etc.
  • Publicly available research
  • Blogs, including DBMS2 (www.DBMS2.com -- the
    source for most of this talk)
  • Feed at www.monash.com/blogs.html
  • White papers and more at www.monash.com
  • User and vendor consulting

3
Our agenda
  • Why are there such things as specialized analytic
    DBMS?
  • What are the major analytic DBMS product
    alternatives?
  • What are the most relevant differentiations among
    analytic DBMS users?
  • Whats the best process for selecting an analytic
    DBMS?

4
Why are there specialized analytic DBMS?
  • General-purpose database managers are optimized
    for updating short rows
  • not for analytic query performance
  • 10-100X price/performance differences are not
    uncommon
  • At issue is the interplay between storage,
    processors, and RAM

5
Moores Law, Kryders Law, and a huge exception
  • Growth factors
  • Transistors/chip
  • gt100,000 since 1971
  • Disk density gt100,000,000 since 1956
  • Disk speed
  • 12.5 since 1956
  • The disk speed barrier dominates everything!

6
The 1,000,0001 disk-speed barrier
  • RAM access times 5-7.5 nanoseconds
  • CPU clock speed lt1 nanosecond
  • Interprocessor communication can be 1,000X
    slower than on-chip
  • Disk seek times 2.5-3 milliseconds
  • Limit ½ rotation
  • i.e., 1/30,000 minutes
  • i.e., 1/500 seconds 2 ms
  • Tiering brings it closer to 1,0001 in practice,
    but even so the difference is VERY BIG

7
Software strategies to optimize analytic I/O
  • Minimize data returned
  • Classic query optimization
  • Minimize index accesses
  • Page size
  • Precalculate results
  • Materialized views
  • OLAP cubes
  • Return data sequentially
  • Store data in columns
  • Stash data in RAM

8
Hardware strategies to optimize analytic I/O
  • Lots of RAM
  • Parallel disk access!!!
  • Lots of networking
  • Tuned MPP (Massively Parallel Processing) is the
    key

9
Specialty hardware strategies
  • Custom or unusual chips (rare)
  • Custom or unusual interconnects
  • Fixed configurations of common parts
  • Appliances or recommended configurations
  • And theres also SaaS

10
18 contenders (and there are more)
  • Aster Data
  • Dataupia
  • Exasol
  • Greenplum
  • HP Neoview
  • IBM DB2 BCUs
  • Infobright/MySQL
  • Kickfire/MySQL
  • Kognitio
  • Microsoft Madison
  • Netezza
  • Oracle Exadata
  • Oracle w/o Exadata
  • ParAccel
  • SQL Server w/o Madison
  • Sybase IQ
  • Teradata
  • Vertica

11
General areas of feature differentiation
  • Query performance
  • Update/load performance
  • Compatibilities
  • Advanced analytics
  • Alternate datatypes
  • Manageability and availability
  • Encryption and security

12
Major analytic DBMS product groupings
  • Architecture is a hot subject
  • Traditional OLTP
  • Row-based MPP
  • Columnar
  • (Not covered tonight) MOLAP/array-based

13
Traditional OLTP examples
  • Oracle (especially pre-Exadata)
  • IBM DB2 (especially mainframe)
  • Microsoft SQL Server (pre-Madison)

14
Analytic optimizations for OLTP DBMS
  • Two major kinds of precalculation
  • Star indexes
  • Materialized views
  • Other specialized indexes
  • Query optimization tools
  • OLAP extensions
  • SQL 2003
  • Other embedded analytics

15
Drawbacks
  • Complexity and people cost
  • Hardware cost
  • Software cost
  • Absolute performance

16
Legitimate use scenarios
  • When TCO isnt an issue
  • Undemanding performance (and therefore
    administration too)
  • When specialized features matter
  • OLTP-like
  • Integrated MOLAP
  • Edge-case analytics
  • Rigid enterprise standards
  • Small enterprise/true single-instance

17
Row-based MPP examples
  • Teradata
  • DB2 (open systems version)
  • Netezza
  • Oracle Exadata (sort of)
  • DATAllegro/Microsoft Madison
  • Greenplum
  • Aster Data
  • Kognitio
  • HP Neoview

18
Typical design choices in row-based MPP
  • Random (hashed or round-robin) data
    distribution among nodes
  • Large block sizes
  • Suitable for scans rather than random accesses
  • Limited indexing alternatives
  • Or little optimization for using the full boat
  • Carefully balanced hardware
  • High-end networking

19
Tradeoffs among row MPP alternatives
  • Enterprise standards
  • Vendor size
  • Hardware lock-in
  • Total system price
  • Features

20
Columnar DBMS examples
  • Sybase IQ
  • SAND
  • Vertica
  • ParAccel
  • InfoBright
  • Kickfire
  • Exasol
  • MonetDB
  • SAP BI Accelerator (sort of)

21
Columnar pros and cons
  • Bulk retrieval is faster
  • Pinpoint I/O is slower
  • Compression is easier
  • Memory-centric processing is easier

22
Segmentation a first cut
  • One database to rule them all
  • One analytic database to rule them all
  • Frontline analytic database
  • Very, very big analytic database
  • Big analytic database handled very
    cost-effectively

23
Basics of systematic segmentation
  • Use cases
  • Metrics
  • Platform preferences

24
Use cases a first cut
  • Light reporting
  • Diverse EDW
  • Big Data
  • Operational analytics

25
Metrics a first cut
  • Total user data
  • Below 1-2 TB, references abound
  • 10 TB is another major breakpoint
  • Total concurrent users
  • 5, 15, 50, or 500?
  • Data freshness
  • Hours
  • Minutes
  • Seconds

26
Basic platform issues
  • Enterprise standards
  • Appliance-friendliness
  • Need for MPP?
  • (SaaS)

27
The selection process in a nutshell
  • Figure out what youre trying to buy
  • Make a shortlist
  • Do free POCs
  • Evaluate and decide
  • The only part thats even slightly specific to
    the analytic DBMS category

28
Figure out what youre trying to buy
  • Inventory your use cases
  • Current
  • Known future
  • Wish-list/dream-list future
  • Set constraints
  • People and platforms
  • Money
  • Establish target SLAs
  • Must-haves
  • Nice-to-haves

29
Use-case checklist -- generalities
  • Database growth
  • As time goes by
  • More detail
  • New data sources
  • Users (human)
  • Users/usage (automated)
  • Freshness (data and query results)

30
Use-case checklist traditional BI
  • Reports
  • Today
  • Future
  • Dashboards and alerts
  • Today
  • Future
  • Latency
  • Ad-hoc
  • Users
  • Now that we have great response time

31
Use-case checklist data mining
  • How much do you think it would improve results to
  • Run more models?
  • Model on more data?
  • Add more variables?
  • Increase model complexity?
  • Which of those can the DBMS help with anyway?
  • What about scoring?
  • Real-time
  • Other latency issues

32
SLA realism
  • What kind of turnaround truly matters?
  • Customer or customer-facing users
  • Executive users
  • Analyst users
  • How bad is downtime?
  • Customer or customer-facing users
  • Executive users
  • Analyst users

33
Short list constraints
  • Cash cost
  • But purchases are heavily negotiated
  • Deployment effort
  • Appliances can be good
  • Platform politics
  • Appliances can be bad
  • You might as well consider incumbent(s)

34
Filling out the shortlist
  • Who matches your requirements in theory?
  • What kinds of evidence do you require?
  • References?
  • How many?
  • How relevant?
  • A careful POC?
  • Analyst recommendations?
  • General buzz?

35
A checklist for shortlists
  • What is your tolerance for specialized hardware?
  • What is your tolerance for set-up effort?
  • What is your tolerance for ongoing administrative
    burden?
  • What are your insert and update requirements?
  • At what volumes will you run fairly simple
    queries?
  • What are your complex queries like?
  • and, most important,
  • Are you madly in love with your current DBMS?

36
Proof-of-Concept basics
  • The better you match your use cases, the more
    reliable the POC is
  • Most of the effort is in the set-up
  • You might as well do POCs for several vendors
    at (almost) the same time!
  • Where is the POC being held?

37
The three big POC challenges
  • Getting data
  • Real?
  • Politics
  • Privacy
  • Synthetic?
  • Hybrid?
  • Picking queries
  • And more?
  • Realistic simulation(s)
  • Workload
  • Platform
  • Talent

38
POC tips
  • Dont underestimate requirements
  • Dont overestimate requirements
  • Get SOME data ASAP
  • Dont leave the vendor in control
  • Test what youll be buying
  • Use the baseball bat

39
Evaluate and decide
  • It all comes down to
  • Cost
  • Speed
  • Risk
  • and in some cases
  • Time to value
  • Upside

40
Further information Curt A. Monash,
Ph.D. President, Monash Research Editor,
DBMS2 contact _at_monash.com http//www.monash.com h
ttp//www.DBMS2.com
Write a Comment
User Comments (0)
About PowerShow.com