Title: Global Peer-to-Peer Replication Utilizing Sybase Replication Server
1DM208 Global Peer-to-Peer Data Replication
Utilizing Sybase Replication Server
Mich Talebzadeh Consultant Peridale
Systems, mitch.talebzadeh_at_db.com
2Aimed At Audience with
- Full Familiarity with Sybase ASE
- Working knowledge of Sybase's Replication Server
Products - Knowledge of UNIX and shell scripting useful.
3Topics Covered
- Reasons for Peer-to-Peer Replication
- Project Issues to be Addressed
- Planning for a Replication System
- Issues to be Addressed
- Peer-to-Peer Replication Implementation
4Topics Covered
- Tuning Replication Server for Better Performance
- Monitoring the Latency and Delivery of Data
- Use Replication Server to Measure the Volume of
Replication Traffic on a Daily Basis
5Reasons for Peer-to-Peer Replication
- Need to Provide Business Solutions to a Community
Scattered Around the Globe - Limitations of Traditional Techniques Using a
Central Database Repository in a Major Location
with Applications Connected Remotely are - Limited Application Performance Because of the
Geographical Distance - Degradation in Network Response When the Traffic
over the WAN Gets Heavy. For Example, Remote DSS
Versus Local Transaction Inputs
6Reasons for Peer-to-Peer Replication
- The Data Server Becomes a Bottleneck As a Larger
Business Community Contends for Access - Data Becomes Unavailable When There is a Network
Failure. - Impact on Maintenance Tasks When Remote Users
Logged In.
7Reasons for Peer-to-Peer Replication
- A typical third-party application may not access
data in the most efficient way. For example, if
the Application Makes a Large Number of Discreet
Queries to the Database, Connection Latency
between the Application and Data Server could
Cause Start-Up Delays - If the Application were to Crash, the System
Becomes Unusable to Traders and Has an
Unacceptable Business Impact.
8Project Issues to Be addressed
- The Package Eligibility for Replication
- The Availability of Required Band-width
- The Technical Knowledge of the Package or
Application to Be Replicated - Local DBA Support for Sybase Replication
9Planning For A Replication System
- Start A Pilot Project with One Way Replication
Between Two Sites (Not to be confused with Sybase
Warm Standby) - Establish the volume of data growth and Deltas on
a Daily Basis on the Replicated Database - Establish the Bandwidth between the Two Replicate
Sites. May have to Improve the Bandwidth - Identifies External Feeders to the Database
10Planning For A Replication System
- Best Option to set up the Pilot on the Local and
the Remote Sites, otherwise use two different
Servers Locally linked via a WAN Simulator - Decide on What to Replicate
- Replicate Tables and Turn off Triggers for
Replicated Transactions - Not Replicate Tables and Let Triggers Insert the
Records - Replicate Stored Procedures
11Issues to Be Addressed
- If Table Replication
- Which Tables you are Replicating
- Does the Table Being Replicated Meets Criteria
for Replication - Does the Table Have Primary Key
- How is the Unique ID on the Table Generated
- Determine Rows and Columns to Be Replicated
12Issues To Be Addressed
- Time Zones Dependency and Handling Time Stamps.
Discuss and Agree with Business on How to Handle
These. Remember Remote Locations Mean Different
Time Zones. Serious Business Implication - Managing Conflicts in Peer-to-Peer Replication
- Managing Inserts
- Conflicting Updates
- Let Us Take a Look at These Site Specific
Solutions.
13Issues To Be Addressed
- Managing Inserts
- A typical local table will include inserts from
local application in addition to inserts
delivered via replicated transactions. In a
peer-to-peer set up both tables are
bi-directional. - Designers tend to use unique IDs to uniquely
identify records in a table. Primary keys or
unique clustered indexes are usually built on the
unique ID.
14Issues To Be Addressed
- The unique ID for a given table tends to be a
monolithically incrementing number, a 32-bit
integer, stored in a table, the so-called
table_next_row_id and retrieved from this table. - Inter site conflicts occur when rows are inserted
in a local table and distributed to the remote
table. If the remote table has already a record
with the same unique ID, the replicated insert
will be rejected, and the data at the remote
table will be inconsistent with the local table. - Possible solutions are
15Issues To Be Addressed
- Add a location key to the tables if not already
there. Include the location key in the primary
key for replicated tables. This is useful if the
application is at the design stage. As
replication implementation is normally an after
thought, this approach may not be possible
without a substantial change to database schema
and code. Beware of Maintenance and Retrofitting. - Localise the table_next_row_id and DO NOT
replicate it. Allocate ranges for next_row_id
column for each location. A 32-bit integer
provides ability to store up to 2 billion unique
values. For a lifetime of a typical application
this range is far more than enough to cater for
all sites.
16Issues To Be Addressed
- For example, you can allocate the following
ranges - Location Reserved range for next_row_id
Column - London 1-100,000,000
- Hong Kong gt100,000,000 and lt 150,000,000
- Tokyo gt 150,000,000 and lt 200,000,000
17Issues To Be Addressed
- Conflicting Updates
- The best way to handle conflicting updates from
different sites is to construct the application
and the environment so that conflicts cannot
happen. However, in majority cases one needs to
rely on business rules to reduce/eliminate the
conflicts. These rules are application specific.
For example for a trading system these could be
18Issues To Be Addressed
- On performing simultaneous new trade on the same
holding on different site, problem will occur on
calculated field, such as quantity, PL,
Furthermore, there are no signals, which warn the
users, when the problem occurs. The adopted
solution is to recalculate these fields (nightly)
so that the following day when the portfolio is
loaded, they will have the correct figure. - However, we have not yet encountered such problem.
19Issues To Be Addressed
- On simultaneous update on the same order. This
could happen due to a mistake. The business rule
is whoever created the order should be the one
who updates the order. If this happens the
quantity (PL) data will be out of synch. Again
there will be no warning message to indicate this
and it will be very difficult for IT support to
detect it. The traders/ users will inform
Application support that PL or quantity is
wrong. The Application support group needs to
check the historical order and amend it
appropriately. Once this is carried out, the
correct details will be replicated to other sites
and the databases will be in sync again.
20Issues To Be Addressed
- Make Sure that the IT and Business Managers plus
Application Support are Aware of the Fact that
Technology is Not a Substitute for Business Rules
and Ultimately they are Responsible For
Consistency of the Data.
21Peer-to-Peer Replication Implementation
- For Simplicity We Illustrate the Peer-to-Peer
Implementation Among Three Sites. Although This
Model Can be Expanded Further to Include
Additional Sites - We Have the Following
22Peer-to-Peer Replication Implementation
23Peer-to-Peer Replication Set UP
24Peer-to-Peer Replication Implementation
- Basics
- Create Replication Servers on all Three sites.
Each Replication Server Will Handle the Local
RSSD and the Local Database to Be Replicated - Choose the ID Server for Replication Where the
Replication and Support Knowledge is Highest - Ideally All Sites Should Have Sybase sa and
root Access to all Servers. As a Minimum the ID
Server Site Should Have Access to All These
25Peer-to-Peer Replication Implementation
- Create the Diagnostics Run Files for Replication
Servers so the DBA May Observe Each Replicated
Transaction Performed by the Server (Invaluable
in Identifying Problems). - This is Achieved by Replacing the Repserver
Binary with the repserver.diag Binary in the Run
File and Adding the Following Entry to the
Replication Server .cfg File - traceDSI,DSI_CMD_DUMP
26Peer-to-Peer Replication Implementation
- Create error class rs_sqlserver_error_class
(default sql server error class) in ID
replication server only. - This will Handle Sql Server Errors in
Replication Server. The default error action for
all errors returned by Sql Server is to Stop
Replication! You Can Assign Action to the Created
Error Class etc. Very Important! - Error Actions are Stored in Table
rs_erroractions. Use rs_helperror error_no, v to
Get Information About the Errors.
27Peer-to-Peer Replication Implementation
- Turn off trigger settings For Each Replicated
Database in the Repserver Controlling that
Database. Use configure connection command with
dsi keep triggers option set to off. For
example in lon_rep_server run the following
command - configure connection to london_sql_server.db set
dsi_keep_triggers to off - Create Interfaces Files with All Sql Server and
Repserver Entries for All Sites. Ensure that the
Application Servers also have All the Relevant
Information for All Sql Servers
28Peer-to-Peer Replication Implementation
- Creating Direct Routes
- In our triangle diagram, we need to create routes
in order for our three replication servers to
send messages to destination replication servers. - A route is a one-way message stream that sends
requests from one replication server to another,
carrying data modification commands, replicated
functions and stored procedures. In this design
the routes are created as follows
29Peer-to-Peer Replication Implementation
- Route type Source destination
- Direct lon_rep_server hk_rep_server
- Direct lon_rep_server tyo_rep_server
- Direct hk_rep_server lon_rep_server
- Direct hk_rep_server tyo_rep_server
- Direct tyo_rep_server lon_rep_server
- Direct tyo_rep_server hk_rep_server
30Peer-to-Peer Replication Implementation
- For example in lon_rep_server run the following
command to create route to hk_rep_server and
tyo_rep_server respectively - create route to hk_rep_server set username
hk_rep_server_rsi set password hk_rep_server_rsi_p
s - create route to tyo_rep_server set username
tyo_rep_server_rsi set password
tyo_rep_server_rsi_ps
31Peer-to-Peer Replication Implementation
- Where hk_rep_server_rsi is the RSI username
already created by rs_init when you created the
hk_rep_server. hk_rep_server_rsi_ps is the
default password for such user etc. Use
rs_helproute in any RSSD to check the status of
the routes created. - Keep all passwords below 30 characters. If you
want to change password for RSI user etc, do so
when creating the repserver. Makes life easier
later.
32Peer-to-Peer Replication Implementation
- Loading the Database to Be Replicated
- In order to perform the initial load of the
database to be replicated, you may consider the
following steps - Decide where you are going to load your initial
database. In our case we chose London as the
starting point. Otherwise you may have to merge
databases. Requires a good migration plan - dbcc the database and perform update statistics
in London. - Review all the primary keys for tables to be
replicated.
33Peer-to-Peer Replication Implementation
- Turn off all replication flags in the user
tables. Use sp_setreptable table_name, false - Do dbcc settrunc(ltm,ignore) on the database
- Dump transaction with truncate_only
- Dump database to the dump directory.
- Ftp the dump file to the remote servers. May
consider zipping the file etc. - On the BCP site load the database
34Peer-to-Peer Replication Implementation
- Load the database from the dump file in remote
locations - Localise the so-called local tables. For example
if you have table_next_row_id, set next_row_id
column to the appropriate starting values for
location etc - Adding databases is quite straightforward. For
example - Add the London Production database to the
replication system using lon_rep_server - Add the Hong Kong database to the replication
system using hk_rep_server - Add the Tokyo database to the replication system
using tyo_rep_server
35Peer-to-Peer Replication Implementation
- Creating Replication and Subscription Definitions
- All the Information you need is in the database
to be replicated - Use the Local Repserver and SQL Server to Create
Replication Definitions for the Database to be
Replicated - You will need to extract information from system
tables sysobjects, syscolumns systypes and
syskeys, provided that you have defined primary
keys for your tables!
36Peer-to-Peer Replication Implementation
- I have prepared shell scripts which will
automatically generate replication definitions
for all tables. This script takes the format - genrepdef.ksh -R ltREP_SERVER_NAMEgt -S
ltSQL_SERVER_NAMEgt -D ltDATABASE_NAMEgt - where the parameters refer to the repserver
controlling the database, the name of the SQL
server and the database name respectively. - In the same way, you can create subscription
definitions using script - gensubdef.ksh -R ltREP_SERVER_NAMEgt -S
ltSUBSCRIBER_SERVERgt -D ltDATABASE_NAMEgt
37Peer-to-Peer Replication Implementation
- Where the parameters refer to the repserver in
which subscriptions are defined, the name of the
SQL server which subscribes to the replication
definitions and the database name respectively. - Use activate.ksh, validate.ksh and checksub.ksh
to complete the subscription process.
38Peer-to-Peer Replication Implementation
- In summary for replications we have
39Peer-to-Peer Replication Implementation
- Likewise for subscriptions we have
40Peer-to-Peer Replication Implementation
- At the end of subscription definitions you should
have two subscription definitions for each
replication definitions. In other words, doing
sp_helpsub for each table should give you 3x2
subscriptions 6 lines. This should be shown in
any RSSD database. - Use of Function Strings to Apply Local Timestamps
- Replication Server converts functions to commands
for destination data servers, and submits them to
these data servers.
41Peer-to-Peer Replication Implementation
- For example, a new row inserted in the source
table causes Replication Server to distribute an
rs_insert function specific to that table to the
subscriber databases. - A possible solution for applying local timestamps
at replicate database would be to modify
rs_insert for a given source table to invoke an
RPC at the destination database. The RPC in turn
inserts local timestamp to the subscribed table
as shown below
42Peer-to-Peer Replication Implementation
- alter function string tran_history_db_rd.rs_insert
for rs_sqlserver_function_class output rpc - 'execute ins_tran_history_sp
- _at_time_stamp ?time_stamp!new?,
- _at_action ?action!new?,
- _at_table_name ?table_name!new?,
- _at_row_id ?row_id!new?,
- _at_host_name ?host_name!new?,
- _at_user_name ?user_name!new?,
- _at_pid ?pid!new?
- go
43Peer-to-Peer Replication Implementation
- create procedure ins_tran_history_sp
- ( _at_time_stamp datetime null, _at_action int
null, - _at_table_name varchar(24) null, _at_row_id int
null, - _at_hos_tname varchar(24 ) null, _at_user_name
varchar(24) null, - _at_pid numeric(35) null)
- as
- begin transaction
- insert into tran_history
- ( timestamp, action, table_name, row_id,
host_name,user_name, - pid)
44Peer-to-Peer Replication Implementation
- values
- ( getdate(), _at_action, _at_table_name, _at_row_id,
- _at_hostname, _at_user_name, _at_pid)
- commit transaction
- go
45Tuning Replication Server for Better Performance
- There are some configuration parameters that can
be altered in order to get better performance
from the Replication Servers - init_sqm_write_delay - stable queue manager waits
for at least init_sqm_write_delay milliseconds
for a block to fill before it writes the block to
the correct queue on the stable device (default
is 1000). Try decreasing this parameter. - init_sqm_max_write_delay - a flush to the queue
is guaranteed to happen after waiting for
init_sqm_max_write_delay, if the DSI or RSI
thread reading the queue is unable to connect to
the target or has been suspended (default 10000).
Decrease this parameter if required.
46Tuning Replication Server for Better Performance
- sqt_max_cache_size - need to increase this value
if there are a lot of open transactions and or
large transactions. Memory for sqt_max_cache_size
is taken from the global memory pool (default is
131072 bytes) - batch_sz - the larger the batch_sz the less often
the truncation point is updated (default 1000
commands). Try increasing this value
47Monitoring the Latency and Delivery of Data
- It is a common practice for DBAs to set up a DBA
specific table in the replicated database and try
to check the latency and health of the
replication system by updating data in this
table. In its simplest form one can insert or
update records in this table and see if the data
is being replicated to the other sites. The time
taken for data to get to the remote site will
give an indication of latency.
48Monitoring the Latency and Delivery of Data
- Although the above method can be used as a simple
monitoring tool it suffers from the following
drawbacks - It is not an indicator of whether the business
transactions arrive in remote sites in a timely
manner - DBA table is normally used by one application,
for example a cron job via a simple transaction.
User applications create a larger number of
transactions and the data delivery is impacted by
the following
49Monitoring the Latency and Delivery of Data
- Concurrency
- Size of the user tables
- Locking mechanism employed by the application
- It should be noted here that the maintenance user
trying to deliver the replicated data could be
blocked by local users. If the statistics on the
user tables are not current then the replicated
data may take a longer time to be delivered
resulting in remote users being blocked waiting
for locks to be released.
50Monitoring the Latency and Delivery of Data
- It is important to get a more realistic estimate
of replication delivery. A possible solution
would be to look at the entries in an audit or
transact history table in the replicate database
and check the delivery timestamp. By comparing
the timestamps for records delivered and
adjusting for servers clock difference, the
latency can be estimated.
51Use Replication Server to Measure the Volume of
Replication Traffic
- Use Replication Command admin who, sqm
- Take the Column First Seg.Block. This is the
first undeleted segment and block number in the
queue. As we go through the day the Seg. Block
value increases. When the Replication is shutdown
these values are reset. Take readings at the
different time intervals, the difference in the
values read gives an indication of data passed to
a replication queue over the time. - Need to convert blocks read to MB
52 Received0 Server Current
Previous Start of
Received/Sent Sent1
Read(S.Blk) Read(S.Blk) Day
Read(S.Blk) today/MB 1050
hk_server.db 40196.52 40196.35
40171.60 24.87 1051
hk_server.db 8487.27
8487.24 8472.60 14.48 1160
bcp_server.db 34872.64
34872.47 34844.20 28.69 1161
bcp_server.db 5245.16
5245.13 5236.18
8.97 1170 lon_server.db
3742.25 3742.25 3733.60
8.45 1171 lon_server.db
45706.58 45705.62 45660.20
46.59 1180
ny_server.db 21153.49 21153.32
21124.23 29.41 1181
ny_server.db 2709.48 2709.45
2702.48 7.00 1190
tyo_server.db 14081.53 14081.36
14053.30 28.36 1191
tyo_server.db 2664.40 2664.37
2656.28 8.19
53Use Replication Server to Measure the Volume of
Replication Traffic
- Stable queues are composed of segments stored on
disk partitions. - Each segment has a megabyte of message space,
divided into 64 blocks of 16K - Take the reading at Start of day (Say 11PM)
- No_of_16K_Blocks_cur (Seg.Block_cur -1) 64
( Seg.Block_cur - int(Seg.Block_cur) ) 100 - No_of_16K_Blocks_start (Seg.Block_start -1)
64 ( Seg.Block_start - int(Seg.Block_start) )
100 - Traffic today (MB) ( No_of_16K_Blocks_cur -
No_of_16K_Blocks_start ) / 64
54Total Messages attributed to different servers up
to now are London Production
55.05MB Hong Kong
39.36MB Tokyo
36.55MB New York
36.41MB London bcp
37.66MB DBA specific traffic
(estimate) 44.84MB Application traffic
160.17MB Sum total all sites
205.02MB