Title: Oracle Data Archiving Taming the Beast
1Oracle Data ArchivingTaming the Beast
Dave Moore Neon Enterprise Software
2Agenda
Archiving Defined Requirements and Solutions
Oracle Archiving Strategies Oracle Row Removal
Options Oracle Post Archive Operations
3Dave
- Oracle ACE
- Using Oracle since 1991
- Product Author at Neon Enterprise Software
- Creator of OracleUtilities.com
- Author of Oracle Utilities from Rampant Tech
Press - Core competencies include performance, utilities
and data management
4Database Archiving
Database Archiving The process of removing
selected data records from operational databases
that are not expected to be referenced again and
storing them in an archive data store where they
can be retrieved if needed.
Purge
5Trends Impacting Archive Needs
Data Retention Issues Volume of data Length of
retention requirement Varied types of
data Security issues
6 Archiving All Types of Data
Paper Blueprints Forms Claims
Word Excel PDF XML
IMS DB2 ORACLE SYBASE SQL Server IDMS
VSAM Programs UNIX Files
Outlook Lotus Notes Attachments
Sound Pictures Video
7Data Archiving and ILM
Create
Discard
Operational
Reference
Archive
Needed for completing business transactions
Needed for reporting or expected queries
Needed for compliance and business protection
Mandatory Retention Period
8Some Sample Regulations Impacting Data Retention
9What Does It All Mean?
- Enterprises must recognize that there is a
business value in organizing their information
and data. - Organizations that fail to respond run the risk
of seeing more of their cases decided on
questions of process rather than merit. - (Gartner,
20-April-2007, Research Note G00148170
Cost of E-Discovery
Threatens to Skew Justice System)
10Operational Efficiency
- Database Archiving can be undertaken to improve
operational efficiency - Large volumes of data can interfere with
production operations - efficiency of transactions
- efficiency of utilities BACKUP/RESTORE, REORG,
etc. - Storage
- Gartner databases copied an average of 6 times!
11What Solutions Are Out There?
- Keep Data in Operational Database
- Problems with authenticity of large amounts of
data over long retention times - Store Data in UNLOAD files (or backups)
- Problems with schema change and reading archived
data using backups poses even more serious
problems - Move Data to a Parallel Reference Database
- Combines problems of the previous two
- Move Data to a Database Archive
12Components of aDatabase Archiving Solution
Data Recall
13Archiving Requirements
- Policy based archiving logical selection
- Keep data for very long periods of time
- Store very large amounts of data in archive
- Maintain Archives for ever changing operational
systems - Become independent from Applications/DBMS/Systems
- Protect authenticity of data
- Access data when needed as needed
- Discard data after retention period automatically
14Policy based archiving
- Why
- Business objects are archived, not files
- Rules for when something is ready can be complex
- Data ready to be archived is distributed over
database - Implications
- User must provide policies for when something is
to be archived - How
- Full metadata description of data
- Flexible specification of policy WHERE clause
15 For Example
Parts Master is the parent table to all other
tables
PARTS MASTER
Part Number Type Description
Unit Type Cost Price
Substitute Parts
16Keep Data for a Long Time
- Why retention requirements in decades
- Implications
- Archive will outlive applications/DBMS/systems
that generated them - Archive will outlive people who designed and
managed operational systems - Archive will outlive media we store it on
- How
- Unique data store
- Application/DBMS/system independence
- Metadata independence
- Continuous management of storage
- Continuous management of archive content
17Maintain Archive for Changing Operational Systems
- Why
- Metadata changes frequently
- Applications are re-engineered periodically
- Change DBMS platform
- Change System platform
- Replace with new application
- Consolidate after merger or acquisition
- Implications
- Archive must support multiple variations of an
application - Archive must deal with metadata changes
- How
- Manage applications as major archive streams
having multiple minor streams with metadata
differences - Achieve independence from operating environment
18Achieve Metadata Independence
- Why
- Operational metadata is inadequate
- Operational metadata changes
- Operational systems keep only the current
metadata - Data in archive often does not mirror data in
operational structures - Implications
- Archive must encapsulate metadata
- Metadata must be improved
- How
- Metadata Capture, Validate, Enhance capabilities
- Store structure that encapsulates with data
- Keeps multiple versions of metadata
19Protect Authenticity of Data
- Why
- Potential use in lawsuits/ investigations
- Potential use in business analysis
- Implications
- Protect from unwanted changes
- Show original input
- Cannot be managed in operational environment
- How
- SQL Access that does not support I/U/D
- Do not modify archive data on metadata changes
- Encryption as stored
- Checksum for detection of sabotage
- Limit access to functions
- Audit use of functions
- Maintain offsite backup copies for restore if
sabotaged
20Access Data Directly From Archive
- Why
- Cannot depend on application environment
- Implications
- Full access capability within archive system
- How
- Industry standard interface (e.g. JDBC)
- LOAD format output for
- For load into a database
- May be different from source database
- Requires full and accurate metadata
- Ability to review metadata
- Ability to function across metadata changes
21Discard Function
- Why
- Legal exposure for data kept too long
- Implications
- Data cannot be kept in archive beyond retention
period - Must be removed with no exposure to forensic
software - How
- Policy based discard
- System level function
- Tightly controlled and audited
- True zero out capability
- Discard from backups as well
22Database or Archive?
Keep in DB
Keep in Archive
Performance
Space
Compliance
23Based on Data Availability
Purge
Keep in DB
Keep in Archive
Must be Available to App
Must be Available
Must Be Secure
Not Needed
24Oracle Archiving Strategies
- Designed Up Front (Yeah, right)
- Determined by Application Owner
- Implemented by ____________
- Utilize Oracle Features
25Finding Large Tables
- DBA_SEGMENTS (bytes)
- DBA_TABLES (num_rows)
- or based on I/O
26Rolling Windows
- Self Managing
- Mostly based on DATE
- Utilize DBMS Features
- Partitioning
- Transportable Tablespaces
- Exchange Partition
- Set tablespace read only
- Expdp
- Copy export file and data file
27Rolling Windows via Partitioning
P1
P47
Probably Never Accessed
Rarely Accessed
Heavily Accessed
Data Profile
Cheap as you can get
Not so fast or expensive
Fast, expensive
Storage Profile
Read / Write
Read Only / Compressed
Read Only
28Why not use transportable tablespaces or Oracle
exports for data retention?
29The Problem with Oracle Files
- Transportable Tablespaces
- Exports
- Backups
Oracle
Export Files Datafiles
Import Trans Tsp
Version 16Z
Year 2007
Year 2030
Not a good method for LT Data Retention
30Partitioning (Old ways)
- Range Partitioning
- Data is distributed based on partition key range
of values usually a date. - Good When Data is date-based.
31Partitioning (Old Ways)
- Hash Partitioning
- Uses hash algorithm to create equally sized
buckets of data. - Good When No natural partition key and desire
I/O balancing (hot spots).
32Partitioning (Old Ways)
- List Partitioning
- Data is distributed based on LIST of values in
partition key. - Good When Have short list of values (States,
Regions, Account Types)
33Partitioning (New Ways 11G)
- Interval Partitioning
- Initial Partition is created manually, the rest
are automatically created as new data arrives. - Good When Need a rolling window!
34Partitioning (New Ways 11G)
- REF partitioning
- Related Tables benefit from same partitioning
strategy, whether column exists in children or
not! - Good When Desire related data to be partitioned
in the same manner. -
35Partitioning (New Ways 11G)
- Virtual Column Partitioning
- Partition key may be based on virtual column
- Good When Virtual column is required for
partition key. -
36Rows Gotta Go
37Row Removal Options
- SQL DELETE
- CTAS / DROP / RENAME
- TRUNCATE
- Row Marking
38SQL DELETE
- Good for small number of rows
- RI handled automatically
- Oracle was born to DELETE, better than any
PL/SQL that you write. - Issue with Un-indexed Foreign Keys ?
39DELETE Optimization
- Work in batches, committing (only when
programmatically DELETING) - Use parallel DML (Partitioned tables only)
- Drop Indexes before (if possible)
- Index FK columns
40CTAS
- Works well for PURGE, not archive
- Perfect when you want to keep low percentage of
rows in the table - Doesnt handle RI no DELETE was issued.
- Process
- Create table with rows you want to keep
- Drop old table
- Rename table
- Recreate indexes
- create table new_table unrecoverable as select
from old_table where ...
41TRUNCATE
- Congratulations if your application lends itself
to TRUNCATE without losing new data - What about RI?
- May truncate or drop individual partitions
42DROP
- DROP PARTITION
- What would you do before you drop it?
- Exchange partition with table
- Transportable tablespace.
43Things to Remember
- Benchmark the best way for you
- Benchmark against real data if possible
- Use parallel DML
-
44Design Summary
- Create an architecture that lends itself to
aging, archiving, deleting - This architecture should compensate for business
requirements - For instance, customer orders not accessible
after 6 months or - top query performance needed for all ACTIVE
accounts etc - Implement it THE EASY PART
45Post Archive Challenges
46Post Archive Challenges
I have successfully deleted 10 billion rows from
the table. HoooAhhhh! Performance will be
great, space will be available, and I will get
credit for optimizing our data warehouse
application, saving the company billions of
dollars
47 2 Days Later
48Post Archive Challenges
Hmmmmm. It looks like - Queries are not any
faster . . . - The Select count() took the same
amount of time . . . - Space was not freed in
Oracle (DBA_FREE_SPACE) . . . - Space was not
freed in the operating system . . . WHY NOT
????? Where are the benefits ???
49From Swiss to Provolone
After Maintenance
After DELETE
50Post Archive Challenges
- Statistics are not fresh
- High Water Marks are very high
- Space has not been freed within Oracle (if
thats what you want) - Space has not been freed to the OS
51Refresh Statistics
- Help the optimizer, easy enough
- dbms_stats provides many options
52Automatic Stats
- Recommended by Oracle
- Calls DBMS_STATS_JOB_PROC
- Enabled via
-
- Begin
- dbms_auto_task_admin.enable(
- client_name gt auto optimizer stats
collection, - operation gt NULL,
- window_name gt NULL)
- END
- /
-
53When do you go manual ?
- High transaction DELETEs or TRUNCATEs
- Bulk loads which add more than 10 of table size
- So theres our answer go manual.
54How do we Gather Them?
- NOT the Analyze Command
- Instead DBMS_STATS package
- exec dbms_stats.gather_table_stats(ownname gt
'BDB', tabname gt 'MASTER', estimate_percent gt
dbms_stats.auto_sample_size) -
55High Water Mark
56High Water Mark
57Reset High Water Mark (HWM)
- DROP or TRUNCATE
- Multiple OTHER ways to do this depending on
version - In v9 alter table move tablespace tsp name
- Row movement must be enabled
- Tablespace must be a LMT
- Can move into same tablespace
- Will occupy 2X space temporarily
- Must then rebuild indexes
- In v10 alter table lttable_namegt shrink space
58 59Create table, check space
SQLgt create table space_example as select from
dba_source Table created. SQLgt select
count() from space_example
COUNT() ---------- 296463 SQLgt exec
dbms_space.unused_space(DAVE',
'SPACE_EXAMPLE') Total blocks 6328 Unused
blocks 1 Unused bytes 8192 Last Used Block
55 Last Used Block ID 10377 Last Used Ext File
ID 4
60Check datafile space
Size Current Poss. FILE_NAME
Poss. Size
Savings ------------------------------------------
-------- -------- -------- -------- /export/home/o
ra102/oradata/ora102/qasb001.dbf 29
46 17 /export/home/ora102/oradata/ora102/e
xample01.dbf 69 100
31 /export/home/ora102/oradata/ora102/qasb002.dbf
41 41 0 /export/home/ora102
/oradata/ora102/system01.dbf 493 500
7 /export/home/ora102/oradata/ora102/sysaux
01.dbf 430 430
0 /export/home/ora102/oradata/ora102/undotbs01.dbf
91 175 84 /export/home/ora102/
oradata/ora102/users01.dbf 44 83
39 /export/home/ora102/oradata/ora102/test.db
f 51 70 19
61Delete rows, check space
SQLgt delete from space_example 296463 rows
deleted. SQLgt commit SQLgt exec
dbms_space.unused_space(DAVE', 'SPACE_EXAMPLE')
Total blocks 6328 Unused blocks 1 Unused
bytes 8192 Last Used Block 55 Last Used Block
ID 10377 Last Used Ext File ID 4
Nothing Changed !
62Shrink it, check space
SQLgt alter table space_example enable row
movement SQLgt alter table space_example shrink
space SQLgt exec dbms_space.unused_space('BDB',
'SPACE_EXAMPLE') Total blocks 8 Unused
blocks 4 Unused bytes 32768 Last Used Block
4 Last Used Block ID 5129 Last Used Ext File ID
4
Space Freed From Table, but still in Oracle
63Check space again
Size Current Poss. FILE_NAME
Poss. Size
Savings ------------------------------------------
-------- -------- -------- -------- /export/home/o
ra102/oradata/ora102/qasb001.dbf 29
46 17 /export/home/ora102/oradata/ora102/e
xample01.dbf 69 100
31 /export/home/ora102/oradata/ora102/qasb002.dbf
41 41 0 /export/home/ora102
/oradata/ora102/system01.dbf 493 500
7 /export/home/ora102/oradata/ora102/sysaux
01.dbf 430 430
0 /export/home/ora102/oradata/ora102/undotbs01.dbf
171 175 4 /export/home/ora102/
oradata/ora102/users01.dbf 44 83
39 /export/home/ora102/oradata/ora102/test.db
f 1 70 69
This datafile should be resized to save 69 MB
SQLgt alter database datafile '/export/home/ora102/
oradata/ora102/test.dbf' resize 1m
64Free the Space
- Space is still reserved for future inserts and
updates, just not freed back to the OS - Space will not be automatically freed confirm
by checking DBA_FREE_SPACE - Ways to set it free
- drop
- truncate
- alter table move
- alter table shrink space
65Unindexed Foreign Keys Example
1 Million Rows
PARENT
COL1
ON DELETE CASCADE
COL1 COL1_PARENT
1 Million Rows
SQLgt DELETE FROM PARENT WHERE COL1 lt 1000
Fky.sql
66Before Index
delete from parent where col1 lt 1000 call
count cpu elapsed disk query
current rows ------- ------ --------
---------- ---------- ---------- ----------
---------- Parse 1 0.01 0.08
2 27 0 0 Execute
1 0.90 0.80 4 2208799
6062 999 Fetch 0 0.00
0.00 0 0 0
0 ------- ------ -------- ----------
---------- ---------- ----------
---------- total 2 0.91 0.88
6 2208826 6062
999 delete from "DAVE"."CHILD" where
"COL1_PARENT" 1 call count cpu
elapsed disk query current
rows ------- ------ -------- ----------
---------- ---------- ----------
---------- Parse 1 0.00 0.00
0 0 0 0 Execute
999 285.94 293.11 1543900 2208789
1029 999 Fetch 0 0.00
0.00 0 0 0
0 ------- ------ -------- ----------
---------- ---------- ----------
---------- total 1000 285.94 293.11
1543900 2208789 1029 999
67SQLgt create index prnt_ndx on child(col1_parent)
delete from parent where col1 lt 1000 call
count cpu elapsed disk query
current rows ------- ------ --------
---------- ---------- ---------- ----------
---------- Parse 1 0.00 0.00
0 0 0 0 Execute
1 0.53 0.47 7 13
7053 999 Fetch 0 0.00
0.00 0 0 0
0 ------- ------ -------- ----------
---------- ---------- ----------
---------- total 2 0.53 0.47
7 13 7053
999 delete from "DAVE"."CHILD" where
"COL1_PARENT" 1 call count cpu
elapsed disk query current
rows ------- ------ -------- ----------
---------- ---------- ----------
---------- Parse 1 0.00 0.00
0 0 0 0 Execute
999 0.42 0.46 2 3002
4058 999 Fetch 0 0.00
0.00 0 0 0
0 ------- ------ -------- ----------
---------- ---------- ----------
---------- total 1000 0.42 0.46
2 3002 4058 999
68Unindexed Foreign Keys
- Problem is not limited to DELETE statements
- Search database for unindexed FK columns
- Script is on asktom
- Search for unindex.sql
69Summary Points
- Create sound Archiving strategy based on Oracle
technical features as well as business and/or
legal requirements - Leverage partitioning
- Move partitions to cheap disk when appropriate
- Make partitions read only and compressed
- Remove data via DROP or TRUNCATE if possible
- If SQL DELETE, make sure to perform maintenance
operations - Consider 3rd party solutions
70Questions?
Well done is better than well said
Ben Franklin