Title: WOUG PRESENTATION MATERIAL
1Implementing Oracle Table Data Compression
Case Study James Chance Answerport,
Inc. jchance_at_answerport.com www.answerport.com
2- Carl Dudley, University of Wolverhampton, UK,
- Oracle Open World Session 2004
- Table Compression in Oracle9i Release2
- An Oracle White Paper May 2002
- Secrets of Oracle9i Database Table Compression.
- Oracle Case Study
- Compressing Data for Space and Speed.
- Technology Data Warehousing
- Oracle Magazine
- My own findings while implementing on
environments - Oracle 9.2.0.5, Block Sizes 8 and 16 on HPUX
64 Bit RISC - Baan IVC4 SVP 15
- Oracle 9.2.0.5, Block Size 8 TRU64 UNIX
- Baan IVC4 SVP 13
3How Data Compression Works
- Eliminates duplicate values in a database block
- Uses a lossless dictionary-based compression
algorithm - Compressed data, stored in a database block, is
self-contained - The algorithm decides whether to compress or not
too compress - The duplicate values are for a particular column
based on column length
- and number of occurrences
- All subsequent occurrences of the duplicate
value are replaced with a - reference stored in a symbol table
- Only entire columns or sequences of columns are
compressed
4How Data Compression Works
Source Secrets of Oracle9i Database Table
Compression
5Potential Benefits of Data Compression
- Reduce Physical Space
- Reduce disk cache
- Reduce buffer cache
- Increase / Improve query performance
- Data is read and cached compressed
- Data is decompressed at data access time
- More compressed data can fit into the same
amount of - buffer cache
- Reduce I/O reads from disk
6Potential Benefits of Data Compression
Source Secrets of Oracle9i Database Table
Compression
7Potential Issues with Data Compression
- Increase in CPU Usage
- Larger Block Size increases benefit may need to
- migrate / convert database
- DML Updates will uncompress and allocate new
blocks - PCTFREE should be 0
- Standard DMLs do not compress data
- Some tables may not compress
- Fragmentation may increase between compression
cycles - Bulk Loads overhead will increase
8Oracles Documented Findings
- Bulk Loads may be impacted as much as 50
- INSERT operations, such as parallel INSERT or
CREATE TABLE - AS SELECT operations, and INSERT with an APPEND
hint (a.k.a. - direct path INSERT) go through compression, and
are subject to the - same bulk load performance
- DELETE operations are 10 faster
- UPDATE operations are 10-20 slower for
compressed tables on - average
- Querying compressed data is virtually as fast as
querying uncompressed - data in most cases
- Simple table scans that are not in the buffer
cache may be 3-4 times - faster if the compression ratio is 41
9Data Compression Terms
- Compression Factor (CF) the ratio between the
number of blocks required to store the
noncompressed table compared to the number of
blocks needed for the compressed version - Space Savings (SS) the ratio between the number
of noncompressed blocks (of the noncompressed
table) minus the number of compressed blocks (of
the compressed table) to the number of
noncompressed blocks (of the noncompressed table)
10When to use Data Compression
- Read Only or Mostly Read Tables
- Data Warehouse
- OLAP
- Very Large Databases.
- Larger Block Size 16 and higher (minimum 8)
- Large Materialized Views
- I/O Bottlenecks
- CF is at least 21 or SS is a significant value.
11How to Apply Data Compression
- Turning Compression on for a Table
- Create Table lttable_namegt Compress
- Alter Table lttable_namegt Compress
- Future bulk loads may be compressed existing
data is not - Compressing Data via bulk loads
- Create Table lttable_namegt Compress AS Select ...
- Insert into lttable_namegt select ... parallel
- Insert / Append / into lttable_namegt select ...
- SQLLoad Direct
- OCI API Load Direct
- Using the Move option
- Alter Table lttable_namegt Move Compress
- No additional copy created but temp space and
exclusive table level lock required. Indexes will
become unuseable - Alter Table lttable_namegt
- Move Partition ltpartition_namegt Compress
- Existing data and bulk loads are compressed in a
specific partition - Tablespace Default
- CreateAlter Tablespace lttablespace_namegt
Default - CompressNocompress ...
12How to Apply Data Compression
- Online with DBMS_REDEFINITION Package
- Sort data
- May cause slower queries
- Space gains may not be significant
- May cause lower space savings
- Sequencing Columns placing duplicate value
columns side by side - May cause slower queries
- Space gains may not be significant
- May cause lower space savings
- User_Tables / All_tables view identifies
compression in 9.2.0.5 and 10g.
13Finding the Best Tables to Compress
SQL To Identify Large Tables
- SELECT owner
- ,name
- ,SUM(gb)
- ,SUM(pct)
- FROM (SELECT owner
- ,name
- ,TO_CHAR(gb,'999.99') gb
- ,TO_CHAR((RATIO_TO_REPORT(gb)
OVER())100,'999,999,999.99') pct - FROM (SELECT owner
- ,SUBSTR(segment_name,1,30) name
- ,SUM(bytes/(102410241024)) gb
- FROM dba_segments
- WHERE segment_type IN ('TABLE','TABLE
PARTITION') - GROUP BY owner
- ,segment_name
- )
- )
- WHERE pct gt 3
Source Oracle Open World Session 2004
14 Finding the Best Tables to Compress
Script to Predict the Compression Factor
- CREATE OR REPLACE FUNCTION compression_ratio
(tabname VARCHAR2) - RETURN NUMBER IS
- pct NUMBER 0.000099 -- sample
percentage - blkcnt NUMBER 0 -- original
block count (should be lt 10K) - blkcntc NUMBER -- compressed block count
- BEGIN
- EXECUTE IMMEDIATE ' CREATE TABLE
temp_uncompressed PCTFREE 0 AS SELECT
FROM ' tabname '
WHERE ROWNUM lt 1' - WHILE ((pct lt 100) AND (blkcnt lt 1000)) LOOP --
until gt 1000 blocks in sample - EXECUTE IMMEDIATE 'TRUNCATE TABLE
temp_uncompressed' - EXECUTE IMMEDIATE 'INSERT INTO
temp_uncompressed SELECT FROM '
tabname ' SAMPLE BLOCK ('
pct ',10)' - EXECUTE IMMEDIATE 'SELECT COUNT(DISTINCT(dbms_r
owid.rowid_block_number(rowid))) - FROM temp_uncompressed'
INTO blkcnt - pct pct 10
- END LOOP
- EXECUTE IMMEDIATE 'CREATE TABLE temp_compressed
COMPRESS AS SELECT FROM
temp_uncompressed' - EXECUTE IMMEDIATE 'SELECT COUNT(DISTINCT(dbms_ro
wid.rowid_block_number(rowid))) - FROM temp_compressed' INTO
blkcntc - EXECUTE IMMEDIATE 'DROP TABLE temp_compressed'
- EXECUTE IMMEDIATE 'DROP TABLE
temp_uncompressed'
Source Table Compression in Oracle9i Release2
15 Finding Duplicate Columns Values
Script to determine unique value count
- Drop Table Compress_Unique
-
- Create table Compress_Unique as select
table_name,column_name,0 Unique_value ,
Data_length, Column_ID - from user_tab_columns where rownum lt 0
- create or replace procedure Pre_Compression(itname
varchar2,ircnt number) is - TRUE integer 1
- FALSE integer 0
- Record_Found integer 1
- wsql varchar2(32767)
- ucnt number
- rcnt number
-
- cursor C1 is
- Select table_name from user_tables
- Where table_name like upper(itname)
-
- cursor C2 (itable user_tab_columns.table_namety
pe) is
16 Finding Duplicate Columns Values
Script to determine unique value count (cont)
- begin
- for c1r in c1
- Loop
- wsql 'select count() from '
c1r.table_name ' where rownum lt '
to_char(ircnt1) - execute immediate wsql into rcnt
- if (rcnt ircnt) then
- for c2r in c2 (c1r.table_name)
- Loop
- wsql 'select count(distinct '
c2r.column_name ') from ' c2r.table_name
' where rownum lt ' - to_char(ircnt1)
- execute immediate wsql into ucnt
- Delete Compress_Unique
- Where table_name c2r.table_name
- and column_name
c2r.column_name - 34 insert into Compress_Unique
values(c2r.table_name,c2r.column_name,ucnt,c2r.dat
a_length,c2r.column_id) - end loop
- end if
- end loop
- end
17 Finding Duplicate Columns Values
Script to determine unique value count (Cont)
- TABLE_NAME COLUMN UNIQUE_DATA_LENGTH
COLUMN_ID - ------------ ------- -----------------------------
- TTFACP200666 TRAGR 1 16
58 ltlt Best One 1 Column Value - TTFACP200666 TDISP 1 10
62 - TTFACP200666 TPDOC 1 8
80 - TTFACP200666 TPROJ 1 6
60 - TTFACP200666 TTTYP 1 3
1 - TTFACP200666 TCCRS 1 3
52 - TTFACP200666 TOTYP 1 3
54 - TTFACP200666 TBLOC 1 3
64 - TTFACP200666 TBREF 1 3
66 - TTFACP200666 TBANK 1 3
67 - TTFACP200666 TREAS 1 3
68 - TTFACP200666 TCVAT 2 9
19 - TTFACP200666 TCCUR 2 3
11 - TTFACP200666 TPAYM 2 3
59 - TTFACP200666 TTAPR 2 3
87 - TTFACP200666 TTYPA 2 3
93 - TTFACP200666 TVATC 3 3
18
18 Finding Duplicate Columns Values
Script to determine unique value count (Cont)
- TABLE_NAME COLUMN UNIQUE DATA_LENGTH
COLUMN_ID - ------------ ------- ------ -----------
---------- - TTFGLD106666 TDIM1 1 6
14 ltlt Best One 1 Column Value - TTFGLD106666 TDIM2 1 6
16 - TTFGLD106666 TDIM3 1 6
18 - TTFGLD106666 TDIM4 1 6
20 - TTFGLD106666 TDIM5 1 6
22 - TTFGLD106666 TCUNO 1 6
44 - TTFGLD106666 TOTYP 1 3
1 - TTFGLD106666 TFTYP 1 3
7 - TTFGLD106666 TCCUR 2 3
25 - TTFGLD106666 TCCTY 3 3
30 - TTFGLD106666 TCVAT 4 9
31 - TTFGLD106666 TCTYP 4 3
52 - TTFGLD106666 TUSER 6 16
47 - TTFGLD106666 TVLAC 43 12
33 - TTFGLD106666 TLEAC 44 12
11 - TTFGLD106666 TSUNO 584 6
45 - TTFGLD106666 TREFR 4310 30
23 ltlt Worst Many Column Values
19 Observations by Oracle
Source Secrets of Oracle9i Database Table
Compression
20 Observations by Oracle
Star Query Performance
Source Secrets of Oracle9i Database Table
Compression
21 Observations by Oracle
TPC-H Queries The TPC-H schema comprises eight
tables, two of which are compressed in the test,
The CF was 12.
Source Secrets of Oracle9i Database Table
Compression
22Observations by Carl Dudley
SELECT table_name,compressed,num_rows FROM
my_user_tables TABLE_NAME COMPRESSED
NUM_ROWS ---------- ---------- -------- EC
DISABLED 229376 EC1 ENABLED
229376
SELECT COUNT(ename) uncompressed FROM
ec UNCOMPRESSED ------------ 229376
Elapsed 000004.00
SELECT COUNT(ename) compressed FROM
ec1 COMPRESSED ----------
229376 Elapsed 000002.07
Source Oracle Open World Session 2004
23Observations by Carl Dudley
- Updates, and conventional single and multi-row
inserts are NOT compressed - UPDATE
- Wholesale updates lead to large increases in
storage (gt250) - Performance impact on UPDATEs can be around 400
- Rows are migrated to new blocks (default value of
PCTFREE is 0) - DELETE
- Performance impact of around 15 for compressed
rows - Creating a compressed table can take 50 longer
- Compressed tables cannot be modified in 9.2.0.1
- ORA-22856 cannot add column to object tables
Source Oracle Open World Session 2004
24 My Observations HPUX 64 BIT RISC
Standard Bulk Load Compression
TABLE TTFACP200666 Create Table
ttfacp200666_comp Tablespace BAANBTP_40K_666_D
Compress as select from TTFACP200666 select
segment_name,sum(bytes) from user_segments where
segment_name like 'TTFACP200666' and
segment_type 'TABLE' group by
segment_name TTFACP200666
43130880 TTFACP200666_COMP
6635520
25 My ObservationsHPUX 64 BIT RISC
Bulk Load Compression Reposition Columns
TABLE TTFACP200666 Create Table
TTFACP200666_COMP_COLPOS tablespace
BAANBTP_40K_666_D compress as Select TTTYP,
TLINE, TRATF, TSPOT, TSVAM, TSVAH, TLPDT,
TLAPA, TLAPI, TLAPH, TDID2, TDC2A, TDC2I,
TDC2H, TDID3, TDC3A, TDC3I, TDC3H,
TPADA, TPADI, TPADH, TBASI, TCDAM, TTORE,
TBACO, TBACA, TAPPR, TCCRS, TOTYP, TOINV,
TSUBC, TBLAC, TRAGR, TPROJ, TDISP, TBLOC,
TBDAT, TBREF, TBANK, TREAS, TPOST, TPDAT,
TBALC, TBALH, TLAMT, TBALA, TBAHC, TYEAR,
TPDOC, TVATY, TPAPR, TLVAT, TTEXT,
TREFCNTD, TREFCNTU, TCCUR, TRATE, TCVAT,
TPAYM, TRECD, TTAPR, TRCPT, TTYPA, TLOCO,
TVATC, TSTAP, TTDOC, TTPAY, TSTEP, TPDIF,
TUSER, TPROD, TVATP, TCPAY, TDID1, TDC1A,
TDC1I, TDC1H, TVATA, TVATI, TVATH, TBTNO,
TSUNO, TDUED, TLIQD, TORNO, TLINK,
TDAPR, TDOCD, TLINO, TDOCA, TDOCN, TREFR,
TNINV, TISUP, TAMNT, TAMTI, TAMTH from
TTFACP200666 select segment_name,sum(bytes)
from user_segments where segment_name
like 'TTFACP200666' and segment_type
'TABLE' group by segment_name TTFACP20066
6
43130880 TTFACP200666_COMP
6635520 TTFACP200666_COMP_COLPOS
6594560
26 My ObservationsHPUX 64 BIT RISC
Standard Bulk Load Compression Sorting Data
TABLE TTFACP200666 Create Table
ttfacp200666_comp Tablespace BAANBTP_40K_666_D Com
press as select from TTFACP200666 order by
TCPAY, TDID1, TDC1A, TDC1I, TDC1H, TVATA,
TVATI, TVATH, TBTNO,TSUNO, TDUED, TLIQD,
TORNO, TLINK, TDAPR, TDOCD, TLINO, TDOCA,
TDOCN, TREFR, TNINV, TISUP, TAMNT, TAMTI,
TAMTH select segment_name,sum(bytes) from
user_segments where segment_name like
'TTFACP200666' and segment_type
'TABLE' group by segment_name TTFACP200666
43130880 TTFACP200666_COMP
6635520 TTFACP200666_COMP_COLPOS
6594560 TTFACP20066
6_COMP_SORT
6225920
27 My Observations HPUX 64 BIT RISC
Bulk Load Compression Reposition Columns and
Sorting Data
TABLE TTFACP200666 Create Table
TTFACP200666_COMP_COLPOS tablespace
BAANBTP_40K_666_D compress as Select TTTYP,
TLINE, TRATF, TSPOT, TSVAM, TSVAH, TLPDT,
TLAPA, TLAPI, TLAPH, TDID2, TDC2A, TDC2I,
TDC2H, TDID3, TDC3A, TDC3I, TDC3H,
TPADA, TPADI, TPADH, TBASI, TCDAM, TTORE,
TBACO, TBACA, TAPPR, TCCRS, TOTYP, TOINV,
TSUBC, TBLAC, TRAGR, TPROJ, TDISP, TBLOC,
TBDAT, TBREF, TBANK, TREAS, TPOST, TPDAT,
TBALC, TBALH, TLAMT, TBALA, TBAHC, TYEAR,
TPDOC, TVATY, TPAPR, TLVAT, TTEXT,
TREFCNTD, TREFCNTU, TCCUR, TRATE, TCVAT,
TPAYM, TRECD, TTAPR, TRCPT, TTYPA, TLOCO,
TVATC, TSTAP, TTDOC, TTPAY, TSTEP, TPDIF,
TUSER, TPROD, TVATP, TCPAY, TDID1, TDC1A,
TDC1I, TDC1H, TVATA, TVATI, TVATH, TBTNO,
TSUNO, TDUED, TLIQD, TORNO, TLINK,
TDAPR, TDOCD, TLINO, TDOCA, TDOCN, TREFR,
TNINV, TISUP, TAMNT, TAMTI, TAMTH from
TTFACP200666 order by TCPAY, TDID1, TDC1A,
TDC1I, TDC1H, TVATA, TVATI, TVATH,
TBTNO,TSUNO, TDUED, TLIQD, TORNO, TLINK,
TDAPR, TDOCD, TLINO, TDOCA, TDOCN, TREFR,
TNINV, TISUP, TAMNT, TAMTI, TAMTH
select segment_name,sum(bytes) from user_segments
where segment_name like 'TTFACP200666'
and segment_type 'TABLE' group
by segment_name TTFACP200666
43130880 TTFACP200666_COMP
6635520 TTFACP200666_C
OMP_COLPOS
6594560 TTFACP200666_COMP_SORT
6225920 TTFACP200666_COMP_CPSRT
6103040
28 My Observations HPUX 64 BIT RISC
Bulk Load Test
TABLE TTFACP200666 Created a non-compressed
table w/o indexes Create table TTFACP200666_test
tablespace BAANBTP_40K_666_D nocompress as
select from TTFACP200666 Table created.
Elapsed 000003.56 TTFACP200666
43130880 TTFACP200666_TEST
47964160 Previous TTFACP200666_COMP Bulk
Load Time was Elapsed 000005.64 40 Increase
in Bulk Loads
29 My Observations HPUX 64 BIT RISC
First Query test with best compression
TABLE TTFACP200666 select count() from
TTFACP200666_COMP_CPSRT where trefr like
'9903' 967 Elapsed 000000.32 select
count() from TTFACP200666_test where trefr like
'9903 967 Elapsed 000000.18 43.75
Increase In Query Time
30 My Observations HPUX 64 BIT RISC
TABLE TTFACP200666 select TVATP,count() from
TTFACP200666_test group by tvatp 1
4569 2 7511 3
7444 4 5994 5 8689
6 7637 7 9357
8 8967 9 8088 10
6373 11 10496 12
7444 12 rows selected. Elapsed 000000.26
Second Query test with best compression
select TVATP,count() from TTFACP200666_COMP_CPSR
T group by tvatp 1 4569
2 7511 3 7444 4
5994 5 8689 6
7637 7 9357 8 8967
9 8088 10 6373
11 10496 12 7444 12 rows
selected. Elapsed 000000.48 45.83
Increase In Query Time
31 My Observations HPUX 64 BIT RISC
Third Query test with standard compression. No
Repositioning Columns and No Sorting Data
TABLE TTFACP200666 select count() from
TTFACP200666_COMP where trefr like '9903' 967
Times were Elapsed 000000.15 Elapsed
000000.12 Elapsed 000000.10 select count()
from TTFACP200666_test where trefr like '9903
967 Elapsed 000000.19 Elapsed
000000.18 Elapsed 000000.17 28 Decrease In
Query Scan Time
32 My Observations HPUX 64 BIT RISC
Random Baan IVC4 Sessions Tested with standard
compression
- Select for update no change or slightly faster
- Single update need more testing
- Single insert no change
- Session queries no change or slightly faster
- Batch jobs no change need more testing
- No Change in Performance
- More Testing is needed
33 My Observations HPUX 64 BIT RISC
Single Inserts Test
TABLE TTFACP200666 Insert into
TTFACP200666_COMP select .. From where rownum lt
2 1 row inserted. Times were Elapsed
000000.08 Elapsed 000000.06 Elapsed
000000.05 Insert into TTFACP200666_test
select .. From where rownum lt 2 Elapsed
000000.06 Elapsed 000000.05 Elapsed
000000.05 No Change In Single Insert
Performance
34 My Observations HPUX 64 BIT RISC
Delete Test
TABLE TTFACP200666 Delete TTFACP200666_COMP
where trefr like '9903' 967 Rows Delete
Times were Elapsed 000001.88 Elapsed
000001.96 Delete TTFACP200666_test where
trefr like '9903 967 Rows Delete Times were
Elapsed 000002.01 Elapsed 000001.87 No
Change In Delete Performance
35 My Observations HPUX 64 BIT RISC
Update Test
Updates scheduled for testing 1/24/05 1/28/05
36 My Observations TRU64 UNIX
Standard Bulk Load Compression
TABLE TTFGLD410100 create table
ttfgld410100_comp compress tablespace
bp_t100_data nologging as select from
ttfgld410100 Table created. Elapsed
002833.66 select segment_name,to_char(sum(bytes
)/(10241024),'9999.99') MB from user_segments
where segment_name like TTFGLD410100' group by
segment_name TTFGLD410100
6197.50 MB TTFGLD410100_COMP
1201.44 MB
CF Ratio 61
37 My Observations TRU64 UNIX
TABLE TTFGLD410100 Select
Performance
select count() from ttfgld410100_comp 16589287
Elapsed 000038.70 select tocom,count()
from ttfgld410100_comp group by tocom 100
6919020 101 3494419 102 4537215 103
1638633 Elapsed 000142.83 select count()
from ttfgld410100_comp where tocom 101 and
ttror 9 and tfitr 5 and ttrdt
'01-NOV-02' and ttrtm 76097 and tsern
1 and tline 0 1 Elapsed 000134.41
select count() from ttfgld410100 16589287
Elapsed 000102.26 select tocom,count()
from ttfgld410100 group by tocom 100
6919020 101 3494419 102 4537215 103
1638633 Elapsed 000223.56 select count()
from ttfgld410100 where tocom 101 and ttror
9 and tfitr 5 and ttrdt
'01-NOV-02' and ttrtm 76097 and tsern 1
and tline 0 1 Elapsed 000938.41
38 My Observations TRU64 UNIX
TABLE TTFGLD410100 Update
Performance
update ttfgld410100_comp set tamnt tamnt
.001 where tocom 102 and ttror 23
and tfitr 2 and ttrdt '03-FEB-03' and
ttrtm 6624 and tsern 4 and tline
0 1 row updated. Elapsed 000000.20 update
ttfgld410100_comp set tamnt tamnt .001
where tocom 102 and ttror 23 and
tfitr 2 and ttrdt '03-FEB-03 1912
rows updated. Elapsed 000001.01
update ttfgld410100 set tamnt tamnt
.001 where tocom 102 and ttror 23
and tfitr 2 and ttrdt '03-FEB-03' and
ttrtm 6624 and tsern 4 and tline
0 1 row updated. Elapsed 000000.21 update
ttfgld410100 set tamnt tamnt .001 where
tocom 102 and ttror 23 and tfitr
2 and ttrdt '03-FEB-03 1912 rows
updated. Elapsed 000000.56
39 My Observations TRU64 UNIX
TABLE TTFGLD410100 Bulk Update
Performance
update ttfgld410100 set tamnt tamnt
.001 where tocom 102 and ttror 23
and tfitr 2 and ttrdt between '01-FEB-03'
and '15-FEB-03 23146 rows
updated. Elapsed 000009.46
update ttfgld410100_comp set tamnt tamnt
.001 where tocom 102 and ttror 23
and tfitr 2 and ttrdt between '01-FEB-03'
and '15-FEB-03 23146 rows
updated. Elapsed 000009.33
40 My Observations HPUX 64 BIT RISC
Best Compression Strangeness
TABLE TTFGLD106666 TTFGLD106666
187801600 TTFGLD106666_COMP
29655040 TTFGLD106666_COMP
_COLPOS
29655040 TTFGLD106666_COMP_SORT
39567360 TTFGLD106666_COMP_
CPSRT
33832960 Standard Compressing with Alter Table
Move yielded the best Compression
41 Summary
- Table compression may reduce physical space gt50
- Bulk loads times may increase by 40
- Sorted data results in better compression
- Reposition columns results in better compression
- Reposition columns and sorting data results in
best - compression
- Single select or select for update no change or
faster - Table scan may increase from 10 - 25
- Single insert no change in performance
- Delete no change in performance
- Update No Change in performance need more testing
- Bulk Update No Change in performance need more
testing - Performance degradation with too much
compression - Only with standard compression not with best
compression
42Thank You James Chance Answerport,
Inc. jchance_at_answerport.com www.answerport.co
m