Title: Innovations in Database Technology
1Innovations in Database Technology
IRMAC BI/DW SIG May 28, 2009
2Agenda
- About Infobright
- Data Warehousing Challenge
- Use Cases
- Infobright Approach
- Infobright Architecture
- Infobright Versions System Requirements
3 4About Infobright
5- Data Warehousing Challenge
6Data Warehousing Challenges
.
More Kinds of Output Needed by More Users, More
Quickly
More Data, More Data Sources
Limited Resources and Budget
0101010101010101010101010101
Real time data
10
0101010101010101010101010101
10
101
Multiple databases
0101010101010101010101010
0101010101010101010101010
10
10
01
10
External Sources
101
0101010101010101010101
1
0101010101010101010101
0
1
10
10
10
10
0
101
01
01
10
10
1
0
01
1
1
01
0
0
1
1
10
10
0
0
101
10
01
10
1
01
1
01
10
10
1
0
0
01
10
0
10
101
1
101
101
10
101
1
1010
10
1
010
01
0
1
0
1
010
1010
1
0
1
10
0
01
0
1
0
01
0
0101
10
101
1
01
0
101
0
0
101
0
- Labor intensive, heavy indexing and partitioning
- Hardware intensive massive storage big servers
- Expensive and complex
Traditional Data Warehousing
7Data Warehousing Raising The Bar
New Demands Larger transaction volumes driven
by the internet Impact of Cloud Computing More -gt
Faster -gt Cheaper
Data Warehousing Matures Near real time
updates Integration with master data
management Data mining using discrete business
transactions Provision of data for business
critical applications
Early Data Warehouse Characteristics Integration
of internal systems Monthly and weekly
loads Heavy use of aggregates
8 9Use Cases
Infobright is a good fit for
- Loading millions of transactions with a limited
batch window - Summarizing transactional data for trend analysis
- Extracting transactional detail based on specific
constraints - Ad hoc query support across many dimensional
attributes
Avoid using Infobright for
- Real-time transactional updates (operational data
entry) - Full data extracts (select from )
- Row based operations that need to access all
columns of a table are typically better suited to
row based databases
10Customer Experience Load Speed
Business Requirement
- Mavenir - OEM customer deploying a world wide
telco application - Application provides operators with access to
detailed SMS traffic - Needed a low cost solution with the ability to
load 20K records per second - Peak of 70M messages per hour during Chinese New
year
Solution
- Custom front end developed using MySQL JDBC
driver - Completed design, test, deployment in lt 3 months
with no assistance from Infobright - Allowed for expansion from 7 to 90 days of online
SMS history - Supports plan for 70 annual growth
- Rollout to allow for 120 concurrent users
11Customer Experience Query Performance
Business Requirement
- Sulake - Online Social Networking service with
126M users across 31 countries - 990M page impressions per month
- Need to quickly analyze online spend on a daily
basis to enhance online experience and drive
additional revenue - Existing InnoDB solution was able to process
business queries in a reasonable time frame
(queries taking hours to complete) - Business opportunities were being lost due to
inability to analyze subscriber behavior using
transactions
Solution
- Customer used existing data model and deployed
the application using Business Objects Data
Integrator for ETL, Web-Intelligence for BI - Existing ETL workflows were converted to
Infobright in less than 4 weeks without
assistance - Historically long running queries (hours) now
running in minutes and seconds - Additional benefits due to compression were a
reduced need for disk storage and an overall
reduction in I/O and network traffic
12Customer Experience - TCO
Business Requirement
- A global provider of electronic trading solutions
across 22 time zones and 700 financial exchanges - Wanted to expand analytical access to financial
transactions to include both current (30 days)
and archived transactions (4 years) - Expansion of existing Sybase solution was too
costly
Solution
- Infobright was able to achieve performance
benchmarks within the first 3 days of a proof of
concept using production data - 28,000 records per second load speed
- Join 100M row with a 30Mrow table -gt 400k rows,
returned in 185 seconds - Additional queries that did not complete using
Sybase, finished in minutes using Infobright - Final solution deployed using Pentaho Kettle for
ETL and Crystal Reports for BI - Success with modest data size (150GB) has opened
opportunities for additional more detailed
transactional analysis
13Customer Experience Query Performance and TCO
Business Requirement
- TradeDoubler Based in Sweden, a global digital
marketing company, serving 1600 online
advertisers across Europe and Asia. - TradeDoubler optimizes Web marketing campaigns by
analyzing Web clicks, impressions and purchases. - Analyzing terabytes of data about the results of
its programs is central to the companys success.
- Selected Infobright to produce analytical results
rapidly, seamless interoperability with their
MySQL database and low TCO
Solution
- Deployed solution using a single, 12,500 Dell
server with 8 CPU cores and 16 GB RAM - Used Pentaho Kettle for ETL and Jaspersoft Server
Pro Reports for BI - Needed to process and analyze data 20 billion
online transactions/month - In POC, loaded gt 3.2 billion rows at gt 300,000
rows / second - In production, achieved 30x data compression
- Extremely fast query speed. 3 queries that
previously did not return, now returned within a
minute
14 15Introducing Infobright
Column Orientation
- Smarter architecture
- Load data and go
- No indices or partitions to build and maintain
- Knowledge Grid automatically updated as data
packs are created or updated - Super-compact data foot- print can leverage
off-the-shelf hardware
Knowledge Grid statistics and metadata
describing the super-compressed data
Data Packs data stored in manageably sized,
highly compressed data packs Data compressed
using algorithms tailored to data type
16Column vs. Row-Oriented
17Data Packs and Compression
- Data Packs
- Each data pack contains 65, 536 data values
- Compression is applied to each individual data
pack - The compression algorithm varies depending on
data type and data distribution
64K
64K
64K
64K
18Knowledge Grid
This metadata layer 1 of the compressed volume
19A Simple Query using the Knowledge Grid
SELECT count() FROM employees WHERE salary gt
50000 AND age lt 65 AND job Shipping AND
city TORONTO
20A Join Query using the Knowledge Grid
Car Sales
SELECT MIN(sale), MAX(discount), name FROM
carsales, salesperson WHERE carsales.id
salesperson.id AND carsales.prov ON AND
carsales.date 2008-02-29 GROUP BY name
- Any subsequent queries will be able to use the
PPN to resolve joins between Car Sales and Sales
Person
21 22Infobright Embedded With MySQL
MySQL/Infobright Architecture
CONNECTORS Native C API, JDBC, ODBC, .NET, PHP,
Python, Perl, Ruby, VB
Infobright Loader / Unloader
CONNECTION POOL Authentication, Thread Reuse,
Connection Limits, Check Memory, Caches
Management Services Utilities
Infobright ships with the full MySQL binaries.
The MySQL architecture is used to support
database components such as connectors, security
and memory management.
Infobright Optimizer and Executor
SQL Interface
MySQL Loader
Parser
Caches Buffers
My SQL Optimizer
- MyISAM
- Views
- Users
- Permissions
- Tables Defs
23Optimized SQL for Infobright
MySQL
The Infobright Optimizer supports a large amount
of MySQL syntax and functions. When the optimizer
encounters SQL syntax that is not supported, then
the query is executed using the MySQL optimizer.
- Infobright Optimized SQL
- Select Statements
- Comparison Operators
- Logical Operators
- String Comparison Functions (LIKE, ..)
- Aggregate Functions
- Arithmetic Operators
- Data Manipulation Language (I/U/D)
- Data Definition Language (CREATE DROP)
- String Functions
- Date/Time Functions
- Numeric Functions
- Trigonometric Functions
- Case Statements
24Infobright Data Types
Date
Most of the data types expected for a MySQL
database engine are fully supported. The data
types that are currently not implemented within
Infobright include BLOB, ENUM, SET and Auto
Increment.
Numeric
String
25ETL Integration
Leverage existing IT tools and resources for
fast, simple deployments and low TCO
- Increased efficiency with popular platforms
- Deeper ETL Integration
- Jaspersoft, Talend, Pentaho
- Leverages end-to-end data management provided by
ETL tools - Improved support for Data Manipulation Language
(DML)
26Data Loading with without custom ETL connectors
-
- Loading Infobright tables with custom connectors
- Kettle from Pentaho
- Talend ETL from Talend
- Jaspersoft ETL (Talend) from Jaspersoft
- Two ways to invoke Infobright loader without
connectors - Generate a CSV or binary file and invoke the
Infobright loader to load the file - Named pipe technique
- Create a named pipe (i.e. mkfifo
/home/mysql/s_mysession1.pipe) - Launch the Infobright loader in the background to
read from the pipe - Launch the ETL process that writes data to the
named pipe - When the ETL process runs, as records are written
to the named pipe, the loader reads them and
writes them to an Infobright database table -
27- Infobright Versions System Requirements
28Comparison of ICE and IEE
29System Requirements
30For More Information
Data Warehouse Evangelist Bob Newell Bob.Newell_at_in
fobright.com
Or join our open source community
at www.infobright.org
31Query performance
Time in minutes, seconds, milliseconds