Exploiting MySQL 5.1

About This Presentation

Title:

Exploiting MySQL 5.1

Description:

Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder – PowerPoint PPT presentation

Number of Views:7

Avg rating:3.0/5.0

Slides: 39

Provided by: Wes108

more less

Transcript and Presenter's Notes

Title: Exploiting MySQL 5.1

1

Exploiting MySQL 5.1
For Advanced Business Intelligence Applications

Matt Casters Chief Architect, Data Integration
and Kettle Project Founder MySQL User
Conference, Tuesday April 24, 2007
2
Agenda

Great news
Pentaho Data Integration introduction
Pentaho Data Integration MySQL support
Table partitioning
Database partitioning
Clustering
Metadata to the rescue
Pentaho company overview platform demo
Questions closing

3
Great news

The San Francisco Giants beat the Arizona
Diamondbacks last Saturday
Barry Zito pitched a great game
The hot-dogs where great

4
Pentaho news

Mondrian 2.3. was released a few weeks ago
Detailed cache control (clearing)
Sun Microsystems Pentaho announced that Pentaho
Reporting will be used to drive Business
Intelligence requirements in Open Office 2.3 to
be released in the second half of this year.
Tomorrow well do another one.

5
Pentaho Data Integration introduction

From source systems
to the data warehouse
to reports
to analyses
to dashboard reports
to better information

6
Data Transformation and Integration Examples

Data filtering
Is not null, greater than, less than, includes
Field manipulation
Trimming, padding, upper and lowercase conversion
Data calculations
- X / , average, absolute value, arctangent,
natural logarithm
Date manipulation
First day of month, Last day of month, add
months, week of year, day of year
Data type conversion
String to number, number to string, date to
number
Merging fields splitting fields
Looking up date
Look up in a database, in a text file, an excel
sheet,

7
Pentaho Data Integration (Kettle) Components

Spoon
Connect to data sources
Define transformation rules and design target
schema(s)
Graphical job execution workflow engine for
defining multi-stage and conditional
transformation jobs
Pan
Command-line execution of single, pre-defined
transformation jobs
Kitchen
Scheduler for multi-stage jobs
Carte
Remote execution slave server with embedded web
server
Pentaho BI Platform
Integrated scheduling of transformations or jobs
Ability to call real-time transformations and use
output in reports and dashboards

8
Data Transformation and Integration Examples

Demo time load 1M rows into a database table

9
Case Study Pentaho Data Integration

Organization Flemish Government Traffic Centre
Use case Monitoring the state of the road
network
Application requirement Integrate
minute-by-minute data from 570 highway locations
for analysis (1550 lanes)
Technical challenges Large volume of data, more
than 2.5 billion rows
Business Usage Users can now compare traffic
speeds based on weather conditions, time of day,
date, season
Best practices
Clearly understand business user requirements
first
There are often multiple ways to solve data
integration problems, so consider the long-term
need when choosing the right way

10
Summary and Resources

Pentaho and MySQL can address help you manage
your data infrastructure
Extraction, Transformation and Loading for Data
Warehousing and Data Migration
kettle.pentaho.org
Kettle project homepage
kettle.javaforge.com
Kettle community website forum, source,
documentation, tech tips, samples,
www.pentaho.org/download/
All Pentaho modules, pre-configured with sample
data
Developer forums, documentation
Ventana Research Open Source BI Survey
www.mysql.com
White paper - http//dev.mysql.com/tech-resources
/articles/mysql_5.0_pentaho.html
Kettle Webinar - http//www.mysql.com/news-and-eve
nts/on-demand-webinars/pentaho-2006-09-19.php
Roland Bouman blog on Pentaho Data Integration
and MySQL
http//rpbouman.blogspot.com/2006/06/pentaho-data-
integration-kettle-turns.html

11
Pentaho Data Integration MySQL support

Reading MySQL Result Streaming (cursor
emulation) support
Writing MySQL dialects for data types
Job entry Bulk Loader of text files for MySQL
Job entry Bulk writer to a text file for MySQL

12
Table partitioning

Challenges
More data is being gathered all the time
Data is coming from more sources than ever
Faster access to stored information is becoming
more important
More people require concurrent access to the data
The old solution
Split up the data into different tables
SALES_2003
SALES_2004
SALES_2006
Hardcode reports to use one or the other table
Lots of database management issues
High report maintenance costs
Nothing is automatic, everything is manual

13
Table partitioning

The new solution
MySQL 5.1. partitioned tables
Automatic split up of data into smaller chunks
SQL engine that can automatically pull results
back together

14
Table partitioning
Sales table
Sales table
15
Table partitioning

Advantages
Reduces query time by reducing the amount of data
to plough through.
Increases performance by
Pruning the list of partitions to search for
automatically. This is done using the MySQL
query optimizer based on the query thats being
issued.
Massive reduction in I/O
Smaller partitioned indexes leading to faster
index tree traversal
Allowing parallel access to the different
partitions

16
Table partitioning

How do I create a partitioned table?
CREATE TABLE sales
(
sales_date TIMESTAMP,
ordernr INT,
customernr INT,
productnr INT,
discount_pct TINYINT
)
PARTITION BY RANGE( YEAR(sales_date) )
(
PARTITION P2000 VALUES LESS THAN (2001),
PARTITION P2001 VALUES LESS THAN (2002)
)

17
Table partitioning

How do I add another partition to the table?
ALTER TABLE sales
ADD PARTITION
(
PARTITION P2002 VALUES LESS THAN (2003)
)

18
Table partitioning

How do I drop a partition from the table?
ALTER TABLE sales
DROP PARTITION P2002

19
Table partitioning

Retrieving partition information
mysqlgt desc information_schema.partitions
-----------------------------------------------
-------------------------
Field Type
Null Key Default Extra
-----------------------------------------------
-------------------------
TABLE_CATALOG varchar(512)
YES NULL
TABLE_SCHEMA varchar(64)
NO
TABLE_NAME varchar(64)
NO
PARTITION_NAME varchar(64)
YES NULL
SUBPARTITION_NAME varchar(64)
YES NULL
PARTITION_ORDINAL_POSITION bigint(21)
YES NULL
SUBPARTITION_ORDINAL_POSITION bigint(21)
YES NULL
PARTITION_METHOD varchar(12)
YES NULL
SUBPARTITION_METHOD varchar(12)
YES NULL
PARTITION_EXPRESSION longtext
YES NULL
SUBPARTITION_EXPRESSION longtext
YES NULL
PARTITION_DESCRIPTION longtext
YES NULL
TABLE_ROWS bigint(21)
NO 0

20
Table partitioning

Retrieving partition information
SELECT partition_name
FROM information_schema.partitions
WHERE table_name sales
----------------
partition_name
----------------
P2000
P2001
P2002
P2003
P2004
P2005
P2006
P2007
P2008
P2009
P2010

21
Table partitioning

Use-case Flemish Traffic Center
Medium sized data set 514M rows
Partitioned by week
PARTITION BY RANGE ( YEAR(mtime)100
WEEK(mtime) )
80GB of data in hundreds of partitions
1 row of data speed, number of vehicles (5
classes), etc.
Per minute
Per lane of traffic
For 1550 detectors
? 815M rows of data per year
A test query aggregating speed counts per road
position per minute
Gave back 642.319 rows
9 seconds for MySQL to get the result
29 seconds to pass the data to the client over
JDBC

22
Table partitioning

Should we do table partitioning? Yes!
However
Even though you can do sub-partitioning, you need
to partition the data in one certain way, by
range, by list,
The entry point for optimization is always going
to be that partitioning method
Pruning is always going to be based on that
partitioning method
Queries across the whole data set dont always
get benefit (neither a penalty)
More detailed information
Partitioning in MySQL 5.1 and Onwards
Mikael Ronström, Senior Software Architect,
MySQL AB
?Tomorrow 1150am 1235pm, Ballroom A
How can we improve upon this?

23
Database partitioning

Challenge
Dealing with even greater data sets
The need to present results even quicker
The old solution
Large boxes
Huge amounts of RAM
Very fast disk systems (Fibre attached, SAN, )
Multi-CPU monsters
? Very expensive and complicated systems
The new solution
2 or more MySQL servers
Basic, off the shelf, hardware
Standard software
Simple configurations
Pentaho Data Integration MySQL 5.1

24
Database partitioning
DB1
DB2
DB3
DB4
25
Database partitioning

Split the data over several databases
Pick a splitting mechanism in advance
Divide the load over the databases
Make the databases smaller
Increase the query performance
Increase the load performance

26
Database partitioning

How do you set it up?
First create a partitioned / clustered database
connection
Then create a Partitioning Schema

27
Database partitioning

Demo time
Creating partitions partitioned
Loading data partitioned
Reading back data partitioned
Reading back data partitioned and ordered

28
Table partitioning

Use-case Flemish Traffic Center
Medium sized data set 514M rows
Same partitioning as before
Same table layout, engine and configuration
Partitioned over 3 databases on detector (per
lane)
A test query aggregating speed counts per road
position per minute
Gave back 642.319 rows
3 seconds for MySQL to get the result
10 seconds to pass the data to the client over
JDBC
? Demonstrating almost linear scalability!!

29
Table partitioning

Should we do database partitioning? Yes!
However
The scalability is limited to the raw CPU power
that is available on a single system.
This poses a scalability problem
when dealing with larger numbers of partitions
Reading back large volumes of data
? Enters Pentaho Data Integration Clustering

30
Pentaho Data Integration Clustering
Server X
31
Pentaho Data Integration Clustering

The challenge is to eliminate CPU bottlenecks
Pentaho Data Integration Carte comes to the
rescue
Can be started up on any system turning it into a
Slave Server
The combination of database partitioning and PDI
Clustering
Will split up the number of database partitions
among the slave servers
Will split the total CPU load over the slave
servers
Will split the total I/O load over the database
partitions (MySQL instances)

32
Pentaho Data Integration Clustering

Demo-time
Start up 2 slave servers
Run a step across the 2 servers
Monitor

33
Pentaho Metadata to the rescue

The problem
Reporting becomes harder on a database
partitioned system
In stead of 1 database you read from a bunch of
them
This is where our new project Pentaho Metadata
Comes in
Due in a few months
Auto-generates SQL now
Will auto-generate PDI transformations in the
near future
Will auto-deliver the data to your report in the
platform
Will make it very easy for your users to create
drag and drop reports

34
Pentaho Introduction

Worlds most popular enterprise open source BI
Suite
2 million lifetime downloads, averaging 100K /
month
Founded in 2004 Pioneer in professional open
source BI
Key Projects
JFreeReport Reporting
Kettle Data Integration
Mondrian OLAP
Pentaho BI Platform
Weka Data Mining
Management and Board
Proven BI veterans from Business Objects, Cognos,
Hyperion, SAS, Oracle
Open source leaders - Larry Augustin, New
Enterprise Associates, Index Ventures
MySQL Gold Partner

35
Pentaho Introduction
Strategic
Scorecards
Analysis
Aggregates
Reports
Operational
Sales
Marketing
Inventory
Financial
Production
36
Pentaho Introduction