Title: E121 Physically Partitioning Very Large Tables
1E121 Physically Partitioning Very Large Tables
- Russ Wheaton
- DBA Advisor - FedEx Services
- Revenue Systems Engineering
- russman_at_wheatonworld.com
2What is Physical Partitioning?
- Break a table into multiple smaller tables.
- Each table has the exact same schema.
- Data is routed to a specific table based on a
partition key or range. - This is NOT Sybase Partitioning which creates
additional page chains for a single table.
3What is Physical Partitioning (cont)?
4Why Partition?
- Data Volume
- Purge Considerations
- Distribution Considerations
5When to Partition
- No single answer or magic numbers.
- Indicators that you should consider a
partitioning scheme are - Data is transient in nature.
- Transactional purge becomes impractical.
- Table size becomes unwieldy.
- As an alternative approach to distribution.
- Production support is getting too boring and its
time to shake things up a little.
6When Not to Partition
- Use common sense.
- Dont partition for the sake of partitioning.
- Avoid too many partition sets in any one
database. - Consider partitioning at the database level
instead. - The sheer number of objects will become a
management nightmare. - Dont partition to manage a poor database design.
Redesign it. - When DML will regularly affect multiple
partitions. - When the partitioning key itself is subject to
regular updates.
7Partitioning Concepts
- Partition Key
- Control Structures
- Over flow Table
- Partitioning Abstraction
- To the developer
- To the user
- Nomenclature
- Table Rotation / Purge / Archive
8Sybase Features Employed
- UNION in VIEWS
- Execute Immediate
9Partition Key
- Must uniquely identify a row across all tables.
- Must be easily range-able.
- Good candidates for a purge-able partition
include - Date ranges
- By day, month, year, etc.
- Other candidates are
- Account number ranges
- Geographical ranges
- Product code ranges
- You get the idea
10Control Structures - Control Table
- Coordinates the placement of the data.
- Stores the range values for the partition key.
- Identifies if a partitioned table is purge-able.
11Control Structures - Control Procedure
- Gate keeper for a partitioned table set.
- Given the partition key, check the control table
for the correct target table. - If no table is found, or the table is marked for
purge, the overflow table is returned.
CREATE PROCEDURE ap_authors_control_sel
_at_au_id char(10), _at_table_name varchar(30)
output AS DECLARE _at_purge_flag CHAR(1) SELECT
_at_table_name table_name , _at_purge_flag
purge_flag FROM authors_control WHERE _at_au_id
BETWEEN partition_key_start AND
partition_key_end IF _at__at_rowcount 0 OR
_at_purge_flag ltgt 'N' BEGIN SELECT _at_table_name
table_name FROM authors_control
WHERE purge_flag 'O' END
12Overflow Table
- Exact same schema as the other partitioned
tables. - It is the dumping ground for data falling
outside the established partition key range. - This type of thing can occur primarily due to
- Misadministration of the partitioning control
data. - Failed purge that has left a table marked as
inactive. - Corrupted or unexpected data coming into the
system. - One goal of a partitioning scheme is to be able
to accommodate any data in the same way a
non-partitioned table would.
13Abstraction
- A goal of a partitioning scheme is toabstract
the underlying complexity from the application
developers and the users. - Allow application developers to code against the
partitioned scheme as if it were one table. - Allow users to query the partitioned tables as if
they were one logical table. - To accomplish this, there are rules that
developers must adhere to.
14Abstraction Application Development
- Developers can be shielded from the partitioning
scheme during most of their development. - Application stored procedures need to integrate
with the DBA provided partitioning abstraction
stored procedures. - The partitioning scheme should abstract low level
data access stored procedures for - Insert
- Update
- Delete
- These stored procedures must affect one and one
partition.
15Abstraction Application Development
CREATE PROCEDURE ap_authors_01_i _at_au_id
CHAR(10) ,_at_au_lname VARCHAR(40)
,_at_au_fname VARCHAR(20) ,_at_phone
CHAR(12) ,_at_address VARCHAR(40) ,_at_city
VARCHAR(20) ,_at_state CHAR(2)
,_at_country VARCHAR(12) ,_at_postalcode
CHAR(10) AS INSERT INTO authors_01 (au_id,
au_lname, au_fname, phone, address, city,
state, country, postalcode,
last_update_date, last_update_user) VALUES
(_at_au_id, _at_au_lname, _at_au_fname, _at_phone,
_at_address, _at_city, _at_state, _at_country,
_at_postalcode, getdate(), suser_name())
- Each table in the partition set will have its own
update, delete and insert stored procedure.
16Abstraction Application Development
- The following objects will be created in our
example.
17Abstraction Application Development
- This is where we hide the complexity, and
everything starts to look simple. - The single access into the partitioned table set
is through the top level stored procedures - ap_authors_i_ptn
- ap_authors_u_ptn
- ap_authors_d_ptn
CREATE PROCEDURE ap_authors_i_ptn _at_au_id
CHAR(10) ,_at_au_lname VARCHAR(40)
,_at_au_fname VARCHAR(20) ,_at_phone
CHAR(12) ,_at_address VARCHAR(40) ,_at_city
VARCHAR(20) ,_at_state CHAR(2)
,_at_country VARCHAR(12) ,_at_postalcode
CHAR(10) AS DECLARE _at_table_name VARCHAR(30)
, _at_proc_name VARCHAR(30) , _at_rtncode
INT EXEC ap_authors_control_sel _at_au_id,
_at_table_name OUT SELECT _at_proc_name ap_
_at_table_name _i EXEC
_at_rtncode _at_proc_name _at_au_id, _at_au_lname,
_at_au_fname, _at_phone, _at_address, _at_city,
_at_state, _at_country, _at_postalcode RETURN
_at_rtncode
18Abstraction Application Development
- Case statement method.
- Less flexible.
- Requires more maintenance.
- This is the 11.9 approach.
CREATE PROCEDURE ap_authors_i_ptn . . . same as
previous slide AS . . .Get _at_table_name same
as previous slide IF _at_table_name authors_01
EXEC _at_rtncode ap_authors_01_i _at_au_id,
_at_au_lname, _at_au_fname, _at_phone, _at_address,
_at_city, _at_state, _at_country, _at_postalcode ELSE
IF _at_table_name authors_02 EXEC _at_rtncode
ap_authors_02_i _at_au_id, _at_au_lname,
_at_au_fname, _at_phone, _at_address, _at_city,
_at_state, _at_country, _at_postalcode ELSE IF
_at_table_name authors_03 EXEC _at_rtncode
ap_authors_03_i _at_au_id, _at_au_lname,
_at_au_fname, _at_phone, _at_address, _at_city,
_at_state, _at_country, _at_postalcode ELSE IF
_at_table_name authors_ov EXEC _at_rtncode
ap_authors_ov_i _at_au_id, _at_au_lname,
_at_au_fname, _at_phone, _at_address, _at_city,
_at_state, _at_country, _at_postalcode RETURN
_at_rtncode
19Abstraction Benchmark
- Benchmark of both approachs
- Conducted using
- RedHat LINUX 7.2
- Sybase 12.5
- Pentium II 300 Mhz
- 512 Meg RAM
- All tables start off empty.
- No indexes, triggers, or RI.
- 1000 inserts, commit frequency of 1.
20Abstraction Benchmark (cont)
21Abstraction Users
- Present the user with a single logical view of
the partitioned table set. - Use Sybase UNION in VIEW feature.
- The user simply executes the query against
v_authors_ptn !!
CREATE VIEW v_authors_ptn AS SELECT FROM
authors_01 UNION ALL SELECT FROM
authors_02 UNION ALL SELECT FROM
authors_03 UNION ALL SELECT FROM authors_ov
22Nomenclature
- A properly named partitioning scheme can make or
break the flexibility and the maintainability. - Choose an adequate suffix size (two digits allow
99 tables in the partition set). - Suffix the overflow table with something that
does not collide with other tables. For example,
suffixing the overflow table _99 or _00 could
cause problems.
23Nomenclature (cont)
- Try to name control tables so that they include
the name of the table they are designed to
control. - Include a prefix or suffix in stored procedures
and views to indicate that the object is acting
on a partitioned table set.
24Pros / Cons
- PROs
- Ease of maintenance. Stuff runs faster on smaller
tables. - Backup and Recovery.
- Consistency checking.
- Index maintenance.
- DOL table maintenance.
- Truncate purge instead of transactional delete.
- Active vs. inactive data separation.
25Pros / Cons
- CONs
- Many more objects to manage.
- Complexity not hidden by database engine.
- UNION in VIEWs are not parallel.
- UNION in VIEWs are read only.
- May become native to ASE in future releases.
26Questions?