Exploiting MySQL 5.1

About This Presentation
Title:

Exploiting MySQL 5.1

Description:

Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder – PowerPoint PPT presentation

Number of Views:7
Avg rating:3.0/5.0
Slides: 39
Provided by: Wes108

less

Transcript and Presenter's Notes

Title: Exploiting MySQL 5.1


1
  • Exploiting MySQL 5.1
  • For Advanced Business Intelligence Applications

Matt Casters Chief Architect, Data Integration
and Kettle Project Founder MySQL User
Conference, Tuesday April 24, 2007
2
Agenda
  • Great news
  • Pentaho Data Integration introduction
  • Pentaho Data Integration MySQL support
  • Table partitioning
  • Database partitioning
  • Clustering
  • Metadata to the rescue
  • Pentaho company overview platform demo
  • Questions closing

3
Great news
  • The San Francisco Giants beat the Arizona
    Diamondbacks last Saturday
  • Barry Zito pitched a great game
  • The hot-dogs where great

4
Pentaho news
  • Mondrian 2.3. was released a few weeks ago
  • Detailed cache control (clearing)
  • Sun Microsystems Pentaho announced that Pentaho
    Reporting will be used to drive Business
    Intelligence requirements in Open Office 2.3 to
    be released in the second half of this year.
  • Tomorrow well do another one.

5
Pentaho Data Integration introduction
  • From source systems
  • to the data warehouse
  • to reports
  • to analyses
  • to dashboard reports
  • to better information

6
Data Transformation and Integration Examples
  • Data filtering
  • Is not null, greater than, less than, includes
  • Field manipulation
  • Trimming, padding, upper and lowercase conversion
  • Data calculations
  • - X / , average, absolute value, arctangent,
    natural logarithm
  • Date manipulation
  • First day of month, Last day of month, add
    months, week of year, day of year
  • Data type conversion
  • String to number, number to string, date to
    number
  • Merging fields splitting fields
  • Looking up date
  • Look up in a database, in a text file, an excel
    sheet,

7
Pentaho Data Integration (Kettle) Components
  • Spoon
  • Connect to data sources
  • Define transformation rules and design target
    schema(s)
  • Graphical job execution workflow engine for
    defining multi-stage and conditional
    transformation jobs
  • Pan
  • Command-line execution of single, pre-defined
    transformation jobs
  • Kitchen
  • Scheduler for multi-stage jobs
  • Carte
  • Remote execution slave server with embedded web
    server
  • Pentaho BI Platform
  • Integrated scheduling of transformations or jobs
  • Ability to call real-time transformations and use
    output in reports and dashboards

8
Data Transformation and Integration Examples
  • Demo time load 1M rows into a database table

9
Case Study Pentaho Data Integration
  • Organization Flemish Government Traffic Centre
  • Use case Monitoring the state of the road
    network
  • Application requirement Integrate
    minute-by-minute data from 570 highway locations
    for analysis (1550 lanes)
  • Technical challenges Large volume of data, more
    than 2.5 billion rows
  • Business Usage Users can now compare traffic
    speeds based on weather conditions, time of day,
    date, season
  • Best practices
  • Clearly understand business user requirements
    first
  • There are often multiple ways to solve data
    integration problems, so consider the long-term
    need when choosing the right way

10
Summary and Resources
  • Pentaho and MySQL can address help you manage
    your data infrastructure
  • Extraction, Transformation and Loading for Data
    Warehousing and Data Migration
  • kettle.pentaho.org
  • Kettle project homepage
  • kettle.javaforge.com
  • Kettle community website forum, source,
    documentation, tech tips, samples,
  • www.pentaho.org/download/
  • All Pentaho modules, pre-configured with sample
    data
  • Developer forums, documentation
  • Ventana Research Open Source BI Survey
  • www.mysql.com
  • White paper - http//dev.mysql.com/tech-resources
    /articles/mysql_5.0_pentaho.html
  • Kettle Webinar - http//www.mysql.com/news-and-eve
    nts/on-demand-webinars/pentaho-2006-09-19.php
  • Roland Bouman blog on Pentaho Data Integration
    and MySQL
  • http//rpbouman.blogspot.com/2006/06/pentaho-data-
    integration-kettle-turns.html

11
Pentaho Data Integration MySQL support
  • Reading MySQL Result Streaming (cursor
    emulation) support
  • Writing MySQL dialects for data types
  • Job entry Bulk Loader of text files for MySQL
  • Job entry Bulk writer to a text file for MySQL

12
Table partitioning
  • Challenges
  • More data is being gathered all the time
  • Data is coming from more sources than ever
  • Faster access to stored information is becoming
    more important
  • More people require concurrent access to the data
  • The old solution
  • Split up the data into different tables
  • SALES_2003
  • SALES_2004
  • SALES_2006
  • Hardcode reports to use one or the other table
  • Lots of database management issues
  • High report maintenance costs
  • Nothing is automatic, everything is manual

13
Table partitioning
  • The new solution
  • MySQL 5.1. partitioned tables
  • Automatic split up of data into smaller chunks
  • SQL engine that can automatically pull results
    back together

14
Table partitioning
Sales table
Sales table
15
Table partitioning
  • Advantages
  • Reduces query time by reducing the amount of data
    to plough through.
  • Increases performance by
  • Pruning the list of partitions to search for
    automatically. This is done using the MySQL
    query optimizer based on the query thats being
    issued.
  • Massive reduction in I/O
  • Smaller partitioned indexes leading to faster
    index tree traversal
  • Allowing parallel access to the different
    partitions

16
Table partitioning
  • How do I create a partitioned table?
  • CREATE TABLE sales
  • (
  • sales_date TIMESTAMP,
  • ordernr INT,
  • customernr INT,
  • productnr INT,
  • discount_pct TINYINT
  • )
  • PARTITION BY RANGE( YEAR(sales_date) )
  • (
  • PARTITION P2000 VALUES LESS THAN (2001),
  • PARTITION P2001 VALUES LESS THAN (2002)
  • )

17
Table partitioning
  • How do I add another partition to the table?
  • ALTER TABLE sales
  • ADD PARTITION
  • (
  • PARTITION P2002 VALUES LESS THAN (2003)
  • )

18
Table partitioning
  • How do I drop a partition from the table?
  • ALTER TABLE sales
  • DROP PARTITION P2002

19
Table partitioning
  • Retrieving partition information
  • mysqlgt desc information_schema.partitions
  • -----------------------------------------------
    -------------------------
  • Field Type
    Null Key Default Extra
  • -----------------------------------------------
    -------------------------
  • TABLE_CATALOG varchar(512)
    YES NULL
  • TABLE_SCHEMA varchar(64)
    NO
  • TABLE_NAME varchar(64)
    NO
  • PARTITION_NAME varchar(64)
    YES NULL
  • SUBPARTITION_NAME varchar(64)
    YES NULL
  • PARTITION_ORDINAL_POSITION bigint(21)
    YES NULL
  • SUBPARTITION_ORDINAL_POSITION bigint(21)
    YES NULL
  • PARTITION_METHOD varchar(12)
    YES NULL
  • SUBPARTITION_METHOD varchar(12)
    YES NULL
  • PARTITION_EXPRESSION longtext
    YES NULL
  • SUBPARTITION_EXPRESSION longtext
    YES NULL
  • PARTITION_DESCRIPTION longtext
    YES NULL
  • TABLE_ROWS bigint(21)
    NO 0

20
Table partitioning
  • Retrieving partition information
  • SELECT partition_name
  • FROM information_schema.partitions
  • WHERE table_name sales
  • ----------------
  • partition_name
  • ----------------
  • P2000
  • P2001
  • P2002
  • P2003
  • P2004
  • P2005
  • P2006
  • P2007
  • P2008
  • P2009
  • P2010

21
Table partitioning
  • Use-case Flemish Traffic Center
  • Medium sized data set 514M rows
  • Partitioned by week
  • PARTITION BY RANGE ( YEAR(mtime)100
    WEEK(mtime) )
  • 80GB of data in hundreds of partitions
  • 1 row of data speed, number of vehicles (5
    classes), etc.
  • Per minute
  • Per lane of traffic
  • For 1550 detectors
  • ? 815M rows of data per year
  • A test query aggregating speed counts per road
    position per minute
  • Gave back 642.319 rows
  • 9 seconds for MySQL to get the result
  • 29 seconds to pass the data to the client over
    JDBC

22
Table partitioning
  • Should we do table partitioning? Yes!
  • However
  • Even though you can do sub-partitioning, you need
    to partition the data in one certain way, by
    range, by list,
  • The entry point for optimization is always going
    to be that partitioning method
  • Pruning is always going to be based on that
    partitioning method
  • Queries across the whole data set dont always
    get benefit (neither a penalty)
  • More detailed information
  • Partitioning in MySQL 5.1 and Onwards
  • Mikael Ronström, Senior Software Architect,
    MySQL AB
  • ?Tomorrow 1150am 1235pm, Ballroom A
  • How can we improve upon this?

23
Database partitioning
  • Challenge
  • Dealing with even greater data sets
  • The need to present results even quicker
  • The old solution
  • Large boxes
  • Huge amounts of RAM
  • Very fast disk systems (Fibre attached, SAN, )
  • Multi-CPU monsters
  • ? Very expensive and complicated systems
  • The new solution
  • 2 or more MySQL servers
  • Basic, off the shelf, hardware
  • Standard software
  • Simple configurations
  • Pentaho Data Integration MySQL 5.1

24
Database partitioning
DB1
DB2
DB3
DB4
25
Database partitioning
  • Split the data over several databases
  • Pick a splitting mechanism in advance
  • Divide the load over the databases
  • Make the databases smaller
  • Increase the query performance
  • Increase the load performance

26
Database partitioning
  • How do you set it up?
  • First create a partitioned / clustered database
    connection
  • Then create a Partitioning Schema

27
Database partitioning
  • Demo time
  • Creating partitions partitioned
  • Loading data partitioned
  • Reading back data partitioned
  • Reading back data partitioned and ordered

28
Table partitioning
  • Use-case Flemish Traffic Center
  • Medium sized data set 514M rows
  • Same partitioning as before
  • Same table layout, engine and configuration
  • Partitioned over 3 databases on detector (per
    lane)
  • A test query aggregating speed counts per road
    position per minute
  • Gave back 642.319 rows
  • 3 seconds for MySQL to get the result
  • 10 seconds to pass the data to the client over
    JDBC
  • ? Demonstrating almost linear scalability!!

29
Table partitioning
  • Should we do database partitioning? Yes!
  • However
  • The scalability is limited to the raw CPU power
    that is available on a single system.
  • This poses a scalability problem
  • when dealing with larger numbers of partitions
  • Reading back large volumes of data
  • ? Enters Pentaho Data Integration Clustering

30
Pentaho Data Integration Clustering
Server X
31
Pentaho Data Integration Clustering
  • The challenge is to eliminate CPU bottlenecks
  • Pentaho Data Integration Carte comes to the
    rescue
  • Can be started up on any system turning it into a
    Slave Server
  • The combination of database partitioning and PDI
    Clustering
  • Will split up the number of database partitions
    among the slave servers
  • Will split the total CPU load over the slave
    servers
  • Will split the total I/O load over the database
    partitions (MySQL instances)

32
Pentaho Data Integration Clustering
  • Demo-time
  • Start up 2 slave servers
  • Run a step across the 2 servers
  • Monitor

33
Pentaho Metadata to the rescue
  • The problem
  • Reporting becomes harder on a database
    partitioned system
  • In stead of 1 database you read from a bunch of
    them
  • This is where our new project Pentaho Metadata
    Comes in
  • Due in a few months
  • Auto-generates SQL now
  • Will auto-generate PDI transformations in the
    near future
  • Will auto-deliver the data to your report in the
    platform
  • Will make it very easy for your users to create
    drag and drop reports

34
Pentaho Introduction
  • Worlds most popular enterprise open source BI
    Suite
  • 2 million lifetime downloads, averaging 100K /
    month
  • Founded in 2004 Pioneer in professional open
    source BI
  • Key Projects
  • JFreeReport Reporting
  • Kettle Data Integration
  • Mondrian OLAP
  • Pentaho BI Platform
  • Weka Data Mining
  • Management and Board
  • Proven BI veterans from Business Objects, Cognos,
    Hyperion, SAS, Oracle
  • Open source leaders - Larry Augustin, New
    Enterprise Associates, Index Ventures
  • MySQL Gold Partner

35
Pentaho Introduction
Strategic
Scorecards
Analysis
Aggregates
Reports
Operational
Sales
Marketing
Inventory
Financial
Production
36
Pentaho Introduction
  • Demo and overview

37
Questions and Closing
  • ?

38
Birds of a Feather
  • MySQL Data Warehousing and BI
  • Tonight at 730
  • Camino Real
  • With
  • Chief Geek James Dixon Senior architect CTO,
    Pentaho
  • Julian Hyde, OLAP Architect and Mondrian Project
    Founder, Pentaho
  • Brian Miezejewski, Principal Consultant, MySQL
  • Roland Bouman, Certification Developer, MySQL
  • MySELF
  • A few bottles of beer
Write a Comment
User Comments (0)