Title: How to Select an Analytic DBMS
1 How to Select an Analytic DBMS DRAFT!! by Curt
A. Monash, Ph.D. President, Monash
Research Editor, DBMS2 contact
_at_monash.com http//www.monash.com http//www.DBMS2
.com
2Curt Monash
- Analyst since 1981, own firm since 1987
- Covered DBMS since the pre-relational days
- Also analytics, search, etc.
- Publicly available research
- Blogs, including DBMS2 (www.DBMS2.com -- the
source for most of this talk) - Feed at www.monash.com/blogs.html
- White papers and more at www.monash.com
- User and vendor consulting
3Our agenda
- Why are there such things as specialized analytic
DBMS? - What are the major analytic DBMS product
alternatives? - What are the most relevant differentiations among
analytic DBMS users? - Whats the best process for selecting an analytic
DBMS?
4Why are there specialized analytic DBMS?
- General-purpose database managers are optimized
for updating short rows - not for analytic query performance
- 10-100X price/performance differences are not
uncommon - At issue is the interplay between storage,
processors, and RAM
5 Moores Law, Kryders Law, and a huge exception
- Growth factors
- Transistors/chip
- gt100,000 since 1971
- Disk density gt100,000,000 since 1956
- Disk speed
- 12.5 since 1956
- The disk speed barrier dominates everything!
6 The 1,000,0001 disk-speed barrier
- RAM access times 5-7.5 nanoseconds
- CPU clock speed lt1 nanosecond
- Interprocessor communication can be 1,000X
slower than on-chip - Disk seek times 2.5-3 milliseconds
- Limit ½ rotation
- i.e., 1/30,000 minutes
- i.e., 1/500 seconds 2 ms
- Tiering brings it closer to 1,0001 in practice,
but even so the difference is VERY BIG
7 Software strategies to optimize analytic I/O
- Minimize data returned
- Classic query optimization
- Minimize index accesses
- Page size
- Precalculate results
- Materialized views
- OLAP cubes
- Return data sequentially
- Store data in columns
- Stash data in RAM
8Hardware strategies to optimize analytic I/O
- Lots of RAM
- Parallel disk access!!!
- Lots of networking
- Tuned MPP (Massively Parallel Processing) is the
key
9Specialty hardware strategies
- Custom or unusual chips (rare)
- Custom or unusual interconnects
- Fixed configurations of common parts
- Appliances or recommended configurations
- And theres also SaaS
1018 contenders (and there are more)
- Aster Data
- Dataupia
- Exasol
- Greenplum
- HP Neoview
- IBM DB2 BCUs
- Infobright/MySQL
- Kickfire/MySQL
- Kognitio
- Microsoft Madison
- Netezza
- Oracle Exadata
- Oracle w/o Exadata
- ParAccel
- SQL Server w/o Madison
- Sybase IQ
- Teradata
- Vertica
11General areas of feature differentiation
- Query performance
- Update/load performance
- Compatibilities
- Advanced analytics
- Alternate datatypes
- Manageability and availability
- Encryption and security
12Major analytic DBMS product groupings
- Architecture is a hot subject
- Traditional OLTP
- Row-based MPP
- Columnar
- (Not covered tonight) MOLAP/array-based
13Traditional OLTP examples
- Oracle (especially pre-Exadata)
- IBM DB2 (especially mainframe)
- Microsoft SQL Server (pre-Madison)
14Analytic optimizations for OLTP DBMS
- Two major kinds of precalculation
- Star indexes
- Materialized views
- Other specialized indexes
- Query optimization tools
- OLAP extensions
- SQL 2003
- Other embedded analytics
15Drawbacks
- Complexity and people cost
- Hardware cost
- Software cost
- Absolute performance
16Legitimate use scenarios
- When TCO isnt an issue
- Undemanding performance (and therefore
administration too) - When specialized features matter
- OLTP-like
- Integrated MOLAP
- Edge-case analytics
- Rigid enterprise standards
- Small enterprise/true single-instance
17Row-based MPP examples
- Teradata
- DB2 (open systems version)
- Netezza
- Oracle Exadata (sort of)
- DATAllegro/Microsoft Madison
- Greenplum
- Aster Data
- Kognitio
- HP Neoview
18Typical design choices in row-based MPP
- Random (hashed or round-robin) data
distribution among nodes - Large block sizes
- Suitable for scans rather than random accesses
- Limited indexing alternatives
- Or little optimization for using the full boat
- Carefully balanced hardware
- High-end networking
19Tradeoffs among row MPP alternatives
- Enterprise standards
- Vendor size
- Hardware lock-in
- Total system price
- Features
20Columnar DBMS examples
- Sybase IQ
- SAND
- Vertica
- ParAccel
- InfoBright
- Kickfire
- Exasol
- MonetDB
- SAP BI Accelerator (sort of)
21Columnar pros and cons
- Bulk retrieval is faster
- Pinpoint I/O is slower
- Compression is easier
- Memory-centric processing is easier
22Segmentation a first cut
- One database to rule them all
- One analytic database to rule them all
- Frontline analytic database
- Very, very big analytic database
- Big analytic database handled very
cost-effectively
23Basics of systematic segmentation
- Use cases
- Metrics
- Platform preferences
24Use cases a first cut
- Light reporting
- Diverse EDW
- Big Data
- Operational analytics
25Metrics a first cut
- Total user data
- Below 1-2 TB, references abound
- 10 TB is another major breakpoint
- Total concurrent users
- 5, 15, 50, or 500?
- Data freshness
- Hours
- Minutes
- Seconds
26Basic platform issues
- Enterprise standards
- Appliance-friendliness
- Need for MPP?
- (SaaS)
27The selection process in a nutshell
- Figure out what youre trying to buy
- Make a shortlist
- Do free POCs
- Evaluate and decide
- The only part thats even slightly specific to
the analytic DBMS category
28Figure out what youre trying to buy
- Inventory your use cases
- Current
- Known future
- Wish-list/dream-list future
- Set constraints
- People and platforms
- Money
- Establish target SLAs
- Must-haves
- Nice-to-haves
29Use-case checklist -- generalities
- Database growth
- As time goes by
- More detail
- New data sources
- Users (human)
- Users/usage (automated)
- Freshness (data and query results)
30Use-case checklist traditional BI
- Reports
- Today
- Future
- Dashboards and alerts
- Today
- Future
- Latency
- Ad-hoc
- Users
- Now that we have great response time
31Use-case checklist data mining
- How much do you think it would improve results to
- Run more models?
- Model on more data?
- Add more variables?
- Increase model complexity?
- Which of those can the DBMS help with anyway?
- What about scoring?
- Real-time
- Other latency issues
32SLA realism
- What kind of turnaround truly matters?
- Customer or customer-facing users
- Executive users
- Analyst users
- How bad is downtime?
- Customer or customer-facing users
- Executive users
- Analyst users
33Short list constraints
- Cash cost
- But purchases are heavily negotiated
- Deployment effort
- Appliances can be good
- Platform politics
- Appliances can be bad
- You might as well consider incumbent(s)
34Filling out the shortlist
- Who matches your requirements in theory?
- What kinds of evidence do you require?
- References?
- How many?
- How relevant?
- A careful POC?
- Analyst recommendations?
- General buzz?
35A checklist for shortlists
- What is your tolerance for specialized hardware?
- What is your tolerance for set-up effort?
- What is your tolerance for ongoing administrative
burden? - What are your insert and update requirements?
- At what volumes will you run fairly simple
queries? - What are your complex queries like?
- and, most important,
- Are you madly in love with your current DBMS?
36Proof-of-Concept basics
- The better you match your use cases, the more
reliable the POC is - Most of the effort is in the set-up
- You might as well do POCs for several vendors
at (almost) the same time! - Where is the POC being held?
37The three big POC challenges
- Getting data
- Real?
- Politics
- Privacy
- Synthetic?
- Hybrid?
- Picking queries
- And more?
- Realistic simulation(s)
- Workload
- Platform
- Talent
38POC tips
- Dont underestimate requirements
- Dont overestimate requirements
- Get SOME data ASAP
- Dont leave the vendor in control
- Test what youll be buying
- Use the baseball bat
39Evaluate and decide
- It all comes down to
- Cost
- Speed
- Risk
- and in some cases
- Time to value
- Upside
40 Further information Curt A. Monash,
Ph.D. President, Monash Research Editor,
DBMS2 contact _at_monash.com http//www.monash.com h
ttp//www.DBMS2.com