Migrating from PostgreSQL to MySQL at Cocolog - PowerPoint PPT Presentation

About This Presentation
Title:

Migrating from PostgreSQL to MySQL at Cocolog

Description:

Migrating from PostgreSQL to MySQL at Cocolog Naoto Yokoyama, NIFTY Corporation Garth Webb, Six Apart Lisa Phillips, Six Apart Credits: Kenji Hirohama, Sumisho ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 48
Provided by: 209857
Category:

less

Transcript and Presenter's Notes

Title: Migrating from PostgreSQL to MySQL at Cocolog


1
Migrating from PostgreSQL to MySQL at Cocolog
Naoto Yokoyama, NIFTY Corporation Garth Webb, Six
Apart Lisa Phillips, Six Apart Credits Kenji
Hirohama, Sumisho Computer Systems Corp.
2
Agenda
  • 1. What is Cocolog
  • 2. History of Cocolog
  • 3. DBP Database Partitioning
  • 4. Migration From PostgreSQL to MySQL

3
1. What is Cocolog
4
What is Cocolog
  • NIFTY Corporation
  • Established in 1986
  • A Fujitsu Group Company
  • NIFTY-Serve (licensed and interconnected with
    CompuServe)
  • One of the largest ISPs in Japan
  • Cocolog
  • First blog community at a Japanese ISP
  • Based on TypePad technology by SixApart
  • Several hundred million PV/month
  • History
  • Dec/02/2003 Cocolog for ISP users launch
  • Nov/24/2005 Cocolog Free for free launch
  • April/05/2007 Cocolog for Mobile Phone launch

5
Cocolog (Screenshot of home page)
2008/04 700 Thousand Users
6
Cocolog (Screenshot of home page)
TypePad
Cocolog
7
Cocolog template sets
8
Cocolog Growth (User) Cocolog Cocolog Free
phase1
phase2
phase3
phase4
9
Cocolog Growth (Entry) Cocolog Cocolog Free
phase1
phase2
phase3
phase4
10
Technology at Cocolog
  • Core System
  • Linux 2.4/2.6
  • Apache 1.3/2.0/2.2 mod_perl
  • Perl 5.8CPAN
  • PostgreSQL 8.1
  • MySQL 5.0
  • memcached/TheSchwartz/cfengine
  • Eco System
  • LAMP,LAPP,RubyActiveRecord, Capistrano
  • Etc...

11
Monitoring
  • Management Tool
  • Proprietary in-house development with PostgreSQL,
    PHP, and Perl
  • Monitoring points (order of priority)?
  • response time of each post
  • number of spam comments/trackbacks
  • number of comments/trackbacks
  • source IP address of spam
  • number of entries
  • number of comments via mobile devices
  • page views via mobile devices
  • time of batch completion
  • amount of API usage
  • bandwidth usage
  • DB
  • Disk I/O
  • Memory and CPU usage
  • time of VACUUM analyze
  • APP
  • number of active processes

Service
APL
DB
Hard
12
Tips for migration
  • Troubles with PostreSQL 7.4-8.1Linux 2.4/2.6
  • VACUUM
  • Data size
  • Character set
  • Cleaning data
  • Troubles with MySQL
  • convert_tz function
  • sort order

13
2. History of Cocolog
14
Phase1 2003/12(Entry 0.04Million)
Before DBP10servers
Postgre SQL
Register
Static contents Published
NAS
WEB
15
Phase2 2004/12 (Entry 7Million)
Before DBP50servers
Rich template
Publish Book
Tel Operator Support
Postgre SQL
Register
Podcast Portal Profile Etc..
TypePad
Static contents Published
NAS
WEB
16
Phase2 - Problems
  • The system is tightly coupled.
  • Database server is receiving from multiple
    points.
  • It is difficult to change the system design and
    database schema.

17
Phase3 2006/3 (Entry 12Million)
Before DBP200servers
Rich template
Publish Book
Web-API
memcached
Tel Operator Support
Postgre SQL
Register
TypePad
Podcast Portal Profile Etc..
Static contents Published
NAS
WEB
18
Phase4 2007/4 (Entry 16Million)
Before DBP300servers
Rich template
Publish Book
Web-API
Typepad
memcached
Tel Operator Support
Postgre SQL
Register
Atom
Static contents Published
Mobile WEB
NAS
WEB
19
Now 2008/4
After DBP150servers
Rich template
Publish Book
Web-API
memcached
Tel Operator Support
Multi MySQL
Register
Typepad
Atom
Static contents Published
Mobile WEB
NAS
WEB
20
3. TypePad Database Partitioning
21
Steps for Transitioning
  • Server Preparation Hardware and software
    setup
  • Global Write Write user information to the
    global DB
  • Global Read Read/write user information on
    the global DB
  • Move Sequence Table sequences served by
    global DB
  • User Data Move Move user data to user
    partitions
  • New User Partition All new users saved
    directly to user partition 1
  • New User Strategy Decide on a strategy for
    the new user partition
  • Non User Data Move Move all non-user owned
    data

22
TypePad Overview (PreDBP)?
Internet
Mobile Blog Readers
Blog Owners
https(443)?
smtp(25) / pop(110)?
Blog Readers
http(80)?
ApplicationServer
WebServer
TypeCastServer
MailServer
smtp(25) / pop(110)?
memcached(11211)?
http(80) atom api
postgres(5432)?
nfs(2049)?
Storage
Database(Postgres)?
ATOMServer
MEMCACHED
ADMIN(CRON)Server
Dedicated Server for TypeCast (via ATOM)?
Cron Server for periodic asynchronous tasks
Data Caching servers to reduce DB load
Static Content (HTML, Images, etc)?
23
Why Partition?
TypePad
TypePad
TypePad
TypePad
TypePad
TypePad
TypePad
TypePad
GlobalRole
Non-UserRole
User Role (User1)?
User Role (User2)?
User Role (User0)?
Non-User Role
User Role (User3)?
Current setup
After DBP
All inquires (access) go to one DB(Postgres)
Inquiries (access) are divided among several
DB(MySQL)
24
Server Preparation
Information that does not need to be partitioned
(such as session information)?
TypePad
GlobalRole
Non-UserRole
SchwartzDB
Maintains user mapping and primary key generation
User Role (User1)?
Stores job details
User Role (User2)?
User information is partitioned
DB(PostgreSQL)?
Server for executing Jobs
User Role (User0)?
User Role (User3)?
Job Server TypePad Schwartz
Non-User Role
DB(MySQL)? for partitioned data
Asynchronous Job Server
Current Setup
New expanded setup
?Grey areas are not used in current steps
25
Global WriteCreating the user map
TypePad
?
GlobalRole
Non-UserRole
SchwartzDB
?
Maintains user mapping and primary key generation
User Role (User1)?
User Role (User2)?
DB(PostgreSQL)?
User Role (User0)?
User Role (User3)?
Job Server TypePad Schwartz
Non-User Role
DB(MySQL)? for partitioned data
Asynchronous Job Server
Explanation ?For new registrations only,
uniquely identifying user data is written to the
global DB ?This same data continues to be
written to the existing DB
?Grey areas are not used in current steps
26
Global ReadUse the user map to find the user
partition
TypePad
?
GlobalRole
Non-UserRole
SchwartzDB
?
?
Maintains user mapping and primary key generation
User Role (User1)?
Migrate existing user data
User Role (User2)?
DB(PostgreSQL)?
User Role (User0)?
User Role (User3)?
Job Server TypePad Schwartz
Non-User Role
DB(MySQL)? for partitioned data
Asynchronous Job Server
Explanation ?Migrate existing user data to the
global DB ?At start of the request, the
application queries global DB for the location of
user data ?The application then talks to this
DB for all queries about this user. At this
stage the global DB points to the user0 partition
in all cases.
?Grey areas are not used in current steps
27
Move SequenceMigrating primary key generation
TypePad
?
GlobalRole
Non-UserRole
SchwartzDB
?
Maintains user mapping and primary key generation
User Role (User1)?
Migrate sequence management
User Role (User2)?
DB(PostgreSQL)?
User Role (User0)?
User Role (User3)?
Job Server TypePad Schwartz
Non-User Role
DB(MySQL)? for partitioned data
Asynchronous Job Server
Explanation ?Postgres sequences (for generating
unique primary keys) are migrated to tables on
the global DB that act as pseudo-sequences. ?
Application requests new primary keys from global
DB rather than the user partition.
?Grey areas are not used in current steps
28
User Data MoveMoving user data to the new
user-role partitions
TypePad
?
GlobalRole
Non-UserRole
SchwartzDB
?
Maintains user mapping and primary key generation
User Role (User1)?
Stores job details
?
User Role (User2)?
User information is partitioned
DB(PostgreSQL)?
Server for executing Jobs
User Role (User0)?
User Role (User3)?
Job Server TypePad Schwartz
Migrating each user data
Non-User Role
?
DB(MySQL)? for partitioned data
Explanation ?Existing users that should be
migrated by Job Server are submitted as new
Schwartz jobs. User data is then migrated
asynchronously ?If a comment arrives while the
user is being migrated, it is saved in the
Schwartz DB to be published later. ?After being
migrated all user data will exist on the
user-role DB partitions ?Once all user data is
migrated, only non-user data is on Postgres
DB(MySQL)? for partitioned data
?Grey areas are not used in current steps
29
New User PartitionNew registrations are created
on one user role partition
TypePad
GlobalRole
Non-UserRole
SchwartzDB
?
Maintains user mapping and primary key generation
User Role (User1)?
?
User Role (User2)?
User information is partitioned
DB(PostgreSQL)?
User Role (User0)?
User Role (User3)?
Job Server TypePad Schwartz
Non-User Role
DB(MySQL)? for partitioned data
Asynchronous Job Server
Explanation ?When new users register, user data
is written to a user role partition. ?Non-user
data continues to be served off Postgres
?Grey areas are not used in current steps
30
New User StrategyPick a scheme for distributing
new users
TypePad
GlobalRole
Non-UserRole
SchwartzDB
?
Maintains user mapping and primary key generation
User Role (User1)?
?
User Role (User2)?
User information is partitioned
DB(PostgreSQL)?
User Role (User0)?
User Role (User3)?
Job Server TypePad Schwartz
Non-User Role
DB(MySQL)? for partitioned data
Asynchronous Job Server
Explanation ?When new users register, user data
is written to one of the user role partitions,
depending on a set distribution method (round
robin, random, etc) ?Non-user data continues to
be served off Postgres
?Grey areas are not used in current steps
31
Non User Data MoveMigrate data that cannot be
partitioned by user
Information that does not need to be partitioned
(such as session information)?
TypePad
GlobalRole
Non-UserRole
SchwartzDB
Maintains user mapping and primary key generation
User Role (User1)?
User Role (User2)?
User information is partitioned
DB(PostgreSQL)?
User Role (User0)?
User Role (User3)?
Migrate non-User data
Job Server TypePad Schwartz
Non-User Role
?
DB(MySQL)? for partitioned data
Asynchronous Job Server
Explanation ?Migrate non-user role data left on
PostgreSQL to the MySQL side.
?Grey areas are not used in current steps
32
Data migration done
Information that does not need to be partitioned
(such as session information)?
TypePad
?
GlobalRole
Non-UserRole
SchwartzDB
?
Maintains user mapping and primary key generation
User Role (User1)?
Stores job details
User Role (User2)?
User information is partitioned
DB(Postgres)?
Server for executing Jobs
User Role (User0)?
User Role (User3)?
Job Server TypePad Schwartz
Non-User Role
DB(MySQL)? for partitioned data
?
Asynchronous Job Server
Explanation ?All data access is now done
through MySQL ?Continue to use The Schwartz for
asynchronous jobs
?Grey areas are not used in current steps
33
The New TypePad configuration
Internet
Blog Readers
Mobile Blog Readers
Blog Owners (management interface)?
https(443)?
smtp(25) / pop(110)?
http(80)?
ApplicationServer
WebServer
TypeCastServer
MailServer
http(80) atom api
smtp(25) / pop(110)?
memcached(11211)?
MySQL(3306)?
nfs(2049)?
Storage
Database(MySQL)?
ATOMServer
MEMCACHED
ADMIN(CRON)Server
JobServer
Dedicated Server for TypeCast (via ATOM)?
Cron Server for periodic asynchronous tasks
Data Caching servers to reduce DB load
TheSchwartz server for running ad-hoc jobs
asynchronously
Static Content (HTML, Images, etc)?
34
4. Migration from PostgreSQL to MySQL
35
History of scale up PostgreSQL server, Before DBP
  • DB Node Spec History

Time OS(RedHat) CPU Xeon MEM DiskArray
2003/12 2007/11 7.4(2.4.9) 1.8GHz/512k1 1GB No
2003/12 2007/11 ES2.1(2.4.9) 3.2GHz/1M2 4GB No
2003/12 2007/11 ES2.1(2.4.9) 3.2GHz/1M2 4GB Yes
2003/12 2007/11 AS2.1(2.4.9) 3.2GHz/1M4 12GB Yes
2003/12 2007/11 AS4 (2.6.9) 3.2GHz/1M4 12GB Yes
2003/12 2007/11 AS4 (2.6.9) MP3.3GHz/1M4 ?2Core4? 16GB Yes
36
History of scale up PostgreSQL server, Before DBP
  • DB DiskArray Spec
  • FUJITSU ETERNUS8000
  • Best I/O transaction performance in the world
  • 146GB (15 krpm) 32disk with RAID - 10
  • MultiPath FibreChannel 4Gbps
  • QuickOPC (One Point Copy)
  • OPC copy functions let you create a duplicate
    copy of any data from the original at any chosen
    time.

http//www.computers.us.fujitsu.com/www/products_s
torage.shtml?products/storage/fujitsu/e8000/e8000
37
Scale out MySQL servers, After DBP
  • A role configuration
  • Each role is configured as HA cluster
  • HA Software NEC ClusterPro
  • Shared Storage

38
Scale out MySQL servers, After DBP
heart beat
FibreChannel SAN
DiskArray
PostgreSQL

TypePad Application
39
Scale out MySQL servers, After DBP
  • Backup
  • Replication w/ Hot backup

40
Scale out MySQL servers, After DBP
heart beat
FibreChannel SAN
DiskArray
PostgreSQL
mysqld
mysqld
mysqld

rep
rep
rep
mysqld
mysqld
mysqld
TypePad Application
opc
MySQL BackupRole
41
Troubles with PostreSQL 7.4 8.1
  • Data size
  • over 100 GB
  • 40 is index
  • Severe Data Fragmentation
  • VACUUM
  • VACUUM analyze cause the performance problem
  • Takes too long to VACUUM large amounts of data
  • dump/restore is the only solution for
    de-fragmentation
  • Auto VACUUM
  • We dont use Auto VACUUM since we are worried
    about latent response time

42
Troubles with PostgreSQL 7.4 8.1
  • Character set
  • PostgreSQL allow the out of boundary UTF-8
    Japanese extended character sets and multi bytes
    character sets which normally should come back
    with an error - instead of accepting them.

43
Cleaning data
  • Removing characters set that are out of the
    boundries UTF-8 character sets.
  • Steps
  • PostgreSQL.dumpALL
  • Split for Piconv
  • UTF8 -gt UCS2 -gt UTF8 Merge
  • PostgreSQL.restore

dump
44
Migration from PostgreSQL to MySQL using TypePad
script
  • Steps
  • PostgreSQL -gt PerlObject tmp publish
  • -gt MySQL -gt PerlObject last publish
  • diff tmp last Object (data check)
  • diff tmp last publish (file check)

data check
Object
Object
TypePad
TypePad
PostgreSQL
Document
Document
File check
tmp
last
45
Troubles with MySQL
  • convert_tz function
  • doesn't support the input value outside the scope
    of Unix Time
  • sort order
  • different sort order without order by clause

46
Cocolog Future Plans
  • Dynamic
  • Job queue

47
Consulting by
  • Sumisho Computer Systems Corp.
  • System Integrator
  • first and best partner of MySQL in Japan since
    2003
  • provide MySQL consulting, support, training
    service
  • HA
  • Maintenance
  • online backup
  • Japanese character support

48
Questions
Write a Comment
User Comments (0)
About PowerShow.com