Title: Archiving
1Archiving Restoring
2TOC
- Term History
- Disaster Recovery Planning
- Backup Restore Procedures
- Architecture (XPS differences)
- The grab bag
3Terminology
- Serial Backup
- Archives the entire system at a single point in
time using only one data stream - Parallel Backup
- Archives the requested dbspace one at a time to N
data streams - External Backup
- Allows a third party application to backup the
database server while maintain logical consistency
4Terminology
- Cold Restore
- Restoring the server when the database engine is
offline - Warm Restore
- Restores of dbspaces which occur while the
database engine is online - Mixed Restore
- A cold restore of set dbspaces followed by a warm
restore of other dbspaces
5Terminology
- Imported Restore
- Transferring an archive taken on one computer and
restoring it on a second computer - Point-in-Time Restore
- Restoring the entire system to a single point it
time - Restartable Restore
- Allows the DBA to pickup the restore from the
failure point
6Early Backup and Restore History
- 1.X Turbo
- Only Quiescent mode archives
- 4.X named OnLine for advanced archiving
technology - 5.X same core technology
- limitation revealed (scalability extensibility)
7DSA Backup and Restore History
- 6.0 new client/server model developed
- 7.1 7.20 same core technology
- 7.21 new client (onbar)
- 7.3 server API re-write
- 9.2 onbar usability features added
8Pre-DSA Archive Bad Grammar Archive
- Archive Checkpoint (get timestamp)
- Free extents recorded
- Reserve pages saved
- Chunks backed-up by ascending chunk number
- Pages modified during archive are placed in
physical log - tbtape routinely scans physical log for
unarchived before-images - Pages placed directly to tape
9Pre-DSA Restore
- Begins with OnLine off-line
- Reads configuration file, matches params to
config params of archive tape - Zero out logs (physical logical)
- Validate size of all chunks
- Read tape, copying pages based on their address
directly to disk
10DSA Archive ArchitectureMajor Differences
- True client-server architecture
- Archived pages logically grouped by dbspaces
- Granularity of creations
- Granularity of restores
- Warm restores
- Physical log pages kept in temp tables
11Server Algorithm ChangesGood Grammar Archive
- List is made of all pages that should be archived
- Cost vs Benefit
- Before images are queued by the modifier
- A new thread is responsible for the before image
handling
12Disaster Recovery
13What is a Successful Recovery?
- Successful recovery is defined by your business
needs
14Goals For Recovery
- Determine acceptable recovery time
- How long can your business function without the
data? - How long can your production system be down
during a restore?
15Determine Acceptable Data Loss
Type
Time
Quantity
Distribution
16Recovery Strategy
Plan Recovery Goals
Tune the Strategy
Select Tools
Analyze/Test the Strategy
Implement The Strategy
17Data Layout
- Poor data layout can hurt BAR performance
- Isolating the different types of data can
facility restore priority - Example
- 8 dbspaces each with 2 chunk, but one dbspace has
68 chunk
18Data Layout Examples
- Important frequently modified in its own dbspaces
- important data such as orders should
dbspace_orders - dbspace containing zipcodes and other lightly
modified data can be backed up with less
frequency
19Right, Fast or Cheap?
20Select Tools
Load/Unload High Performance Loader
(HPL) dbexport/dbimport dbschema SQL
load/unload onload/onunload dbload Customer ESQL
programs
- Backup Utilities
- ontape
- ON-Bar
- External Backup/Restore
- Fault Tolerance Mechanisms
- Mirroring
- High Availability Data Replication (HDR)
- Enterprise Data Replication (DR)
21Ontape Backup Features
- Backup at the Server level
- Support for incremental backups
- Manual or continuous logical log backup
- Restore entire system or single dbspace
- Backup is self describing
22On-Bar Backup Features
- Parallel backup and restore
- System and dbspace level backup and restore
- Support for incremental backups
- Manual or automatic backup of logical logs
- Instance point-in time recovery
- Open interface for communication with storage
managers (XBSA)
23External Backup Features
- EBR allows administrators to make a consistent
copy of their dbspaces using external tools - Used with many 3rd party backup products
- Allows for both cold and warm restores
24EBR - Examples
- Planned uses
- File system snapshots
- Breaking of mirrors
- Third party raw backup
- Basic Steps
- Block coserver(s) at checkpoint
- Backup dbspaces using third party tools
- Unblock coserver(s)
25Restoring
- Logical Logs required
- Restore looks hung, nothings happening
- Handling unanticipated problems
26Logical Logs Required for a Restore
- Cold Parallel Restore
- Starting log is the log that contains the begin
of the oldest active transaction when the first
archive checkpoint occurred - At least the logical log that contains the last
archive checkpoint - Cold Whole System (Non-Parallel)
- No logical logs required
- Logs included with archive
27Logical Logs Required for a Restore
- Warm Restore
- Starting log is the log that contains the begin
of the oldest active transaction when the first
archive checkpoint - All logs to the current point in time
- If you are using DR then you must include the
replay point
28Example of Logical Logs Required for a Restore
Log 10
Log 11
Log 12
Log 13
B
B
B
Archive Checkpoint
B
Oldest Begin Work
Logs Required
Cold restore all Logs 10-12 Optional 13
Warm restore Logs 11-gt No Optional Logs
29Restartable vs. Suspended Restored
- Restartable Restore
- When the database engine prematurely shuts down
the engine may be restarted in recovery mode - Suspended Restore
- When the archive client receives an error which
is restartable and the database engine does not
shutdown
30Restartable Restore
- Turned OFF by default
- What can restart when?
- Whole system
- Partial Restore
- Logical Recovery from a cold restore
- Only available with On-BAR
- onbar -RESTART
31Architecture
- Overview
- Archive Clients
- Moving Data
- IDS
- XPS
- Server Threads
- XPS Architecture
32What Pages are Sent to the Archive
- If pages timestamp is older than maxstamp and
newer than minstamp, it is put to tape - If a page is greater than current stamp, but
older than minstamp, it is put to tape, and its
timestamp is updated to maxstamp-1 - Pages newer than max, but older than current are
considered to be modified after the archive
started, and are ignored.
33Understanding Timestamps
0
Max-Stamp
Not Archived
Current Stamp
34OnLine Wheel-O-Death
0
Min-Stamp
Max-Stamp
The timestamp 50 away from Max-Stamp ie
Max-Stamp - 2GB
The timestamp at the start of the archive
Not Archived
All Pages in the red region have their timestamp
updated along with being archived
Current Stamp
The timestamp at the current point in time
35Archive Clients
EBR
Onbar
SMV
Common Archive Code
XBSA
XBSA
Ontape
36DSA Client Server Model
SQLI/ASF Network Connection
Archive Client
Archive BE
Streams Local Connection
37Moving Data between Client and Server
ONINIT
SQLI Requests Archive Data Buffer
SQLI Returns Shared Memory Address
Archive Client
Shared Memory
38Moving Data between Client/Server
- The size of the buffers used to transmit data
- ontape - control by onconfigs TAPEBLOCK
- onBar - BAR_XFER_BUFSIZE - maximum size is one
online page smaller than 64kb - The number of buffers
- ontape
- onbar - BAR_XPORT_COUNT min 3 max 99
- Monitoring the data transfer
- onstat -g stq
39What Data is Shipped to the Archive Client
- Server sends raw online pages just like they
exist on disk
40Example of onstat -g stq
- Stream Queue (session 11 cnt 10) 0ad91400
1ada1400 2adb1400 3adc1400 4add1400
5ade1400 6adf1400 7ae01400 8ae11400
9ae21400 - Full Queue (cnt 0 waiters 0) 00 1ada1400
2adb1400 3adc1400 4add1400 5ade1400
6adf1400 7ae01400 8ae11400 - Empty Queue (cnt 0 waiters 1)
- Stream Queue (session 10 cnt 10) 0ac8d400
1ac9d400 2acad400 3acbd400 4accd400
5acdd400 6aced400 7acfd400 8ad0d400
9ad1d400 - Full Queue (cnt 9 waiters 0) 0ac9d400
1acad400 20 3accd400 4acdd400 5aced400
6acfd400 7ad0d400 8ad1d400 - Empty Queue (cnt 0 waiters 1)
41Server Threads
- ontape
- Scanner
- Before Image Processor
42Ontape Thread
- Always called ontape regardless of archive client
- Responsible for all communication to archive
client
43Scanner Thread (arc_backup1)
- The dummy thread, geared for I/O performance
and not thinking - Handed a list of pages to backup
- Scans data from disk into shared memory buffers
- Makes NO decisions about the data
- Ensures the page address is correct
44Before Image Processor Thread (arc_backup2)
- Monitors the before image queues
- Determines if the before image needs to be saved
or discarded - Drains the before image memory queue, by storing
the page images into temp tables - Creates multiple temp tables if required
45XPS Difference Architecture Overview
- Basic XPS Architecture
- Client Sub-Systems
- Server Sub-Systems
- Differences
- sysutils
- configuration
46Basic XPS Architecture
Storage Manager 1
Coserver 4
Storage Manager 2
Coserver 3
OnLine XPS
Coserver 2
Coserver 1
onbar
47Client Sub-Systems
48Client Sub-Systems
Storage Manager 1
onbar_w
Coserver 4
Coserver 3
OnLine XPS
Coserver 2
Coserver 1
onbar
onbar_d
49Server Sub-Systems
- ASF/local streams
- Send/Receive commands and data buffers
- Backup Scheduler (BUS)
- distributes tasks to workers
- XBAR
- communicates between coservers
- RSAM
- only sees a single coserver
- manages all I/O to disk (dbspaces/chunks)
New
New
50XBAR
- Interfaces with both BUS and RSAM
- Manages distributed execution of backup and
restores - transfers data from the objects coserver
(coserver where the dbspace/chunk exists) to
onbar_ws coserver (output coserver) - Uses XMF between coservers
- Uses local stream between onbar_w and output
coserver
51Backup Scheduler (BUS)
- Manages user requests, workers, storage managers
and coservers - Farms out work to onbar_w
- Reports success or failure to onbar_d after each
work item has been attempted - onbar_w create a new worker queue in the bus when
it is started
52XBAR/BUS support in SMI
- New tables for BUS data structures
- sysbusession list of sessions
- sysbuobject whats in the queue
- sysbuobjses for which session
- sysbusm BAR_SM paragraphs
- sysbusmdbspace space to BAR_SM map
- sysbusmlog logstream to BAR_SM map
- sysbusmworker worker to BAR_SM map
- sysbuworker info about each onbar_w
53Moving Data between Client/Server Version 8
SQLI
Storage Manager 1
onbar_w
Coserver 4
Coserver 3
OnLine XPS
Coserver 2
Coserver 1
Shared Memory
SQLI
onbar
onbar_d
54Difference Between8 and 7
- Multiple Nodes
- Non-locality of devices and data
- Backup data may be shipped between nodes
- Multiple Storage Managers
- One Storage manager can server the entire system
- Multiple storage managers can eliminate
performance bottlenecks for large systems
55Difference immediately seen by DBAs
- Command line is slightly different
- Configuration parameters are very different
- Version 7 has 6 configuration parameters, none
needs to be set - Version 8 has 15 configuration parameters, most
must be configured
56Difference immediately seen by DBAs
- sysutils has more columns
- Emergency bootfiles
- more columns
- 1 boot file per coserver
- Merge boot files
- Additional onstat options
57arc_very_old_pages()Why do it??
58arc_very_old_pages()
- Permanent solution 1
- No longer use timestamps for recovery
- Disk timestamps do not need to be refreshed
- Memory and disk timestamp are different
- Bitmaps used to keep track of foreground writes
- Permanent solution 2
- Multiple instances of the same page in the
physical log - Only the oldest instance of a page is restored
during physical recovery
59 7.31 Solution 1
Physical Recovery Started at Page(11065). Physica
l Recovery Complete 0 Pages Examined 0 Pages
Restored.
60 9.21 Solution 2
Physical Recovery Started at Page(11065). Physica
l Recovery Complete 0 Pages Examined 0 Pages
Restored.
61Override Internal Error Checks
- The -O option is much like -f for UNIX rm
- Does many different things
- Allow restore of a space that is still on-line
- Creates a filesystem entry for each chunk if
there isnt one - Allows expiration of objects from sysutils and
the storage manager that may be needed in a
restore
62Archive Utilities
- Explaining onstat oncheck options
- onstat -d
- onstat -g arc
- onstat -g stq
- Validating Archive
- Managing the archive catalogs
63onstat -g arc
num DBSpace Q Size Q Len Buffer partnum
size scanner 2 dbspace1 92 0 4
0x100085 240 0x2033ee 3 dbspace2 69
0 1 0x100084 150 0x302f1a Dbspaces
- Archive Status name number level date
log log-position rootdbs 1 0
10/04/2001.1017 5 0x10b608 dbspace1 2
0 10/04/2001.1017 5 0x10b608 dbspace2
3 0 10/04/2001.1017 5
0x10b608 sbspace1 4 0 10/04/2001.1017
5 0x10b608 sbspace2 5 0
10/04/2001.1017 5 0x10b608
64onstat -d information
- D Chunk is down
- L Storage space is being logically restored
- O Chunk is online
- P Storage Space is physically restored
- R Storage space is being restored
65oncheck -pr
- Validating PAGE_1DBSP PAGE_2DBSP...
- DBspace number 2
- DBspace name dbspace1
- . . . . .
- DBspace archive status
- Archive Level 0
- Real Time Archive Began 10/04/2001
103309 - Time Stamp Archive Began 306128
- Logical Log Unique Id 6
- Logical Log Position 0x3d2018
- Archive Level 1
- Real Time Archive Began 10/04/2001
103528 - Time Stamp Archive Began 323695
- Logical Log Unique Id 8
- Logical Log Position 0x208018
66Validating Archives
- Utilizes a executable called archecker
67Validating Archives
- What is actually validated
- What other information is there for me
- What else can go wrong with my validated restore
- How do I validated my archives
68What is actually validated
- Format of each page on the archive is check
(similar to oncheck -cd) - Tape control pages are sanity check
- Each table is checked ensuring all pages of the
table exist on the archive tape - Reserve page format is validated
- Each chunk free list is verified
- Table extents are checked for overlap (oncheck
-pe)
69Other Information for the DBA
- AC_MSGPATH - Message log for archecker
- AC_STORAGE/INFO
- extent list for each dbspace, oncheck -pe
DBS.dbspace_ - time to process each tape/object
- Information about the number and type of pages
processed profile.pid - AC_STORAGE/SAVE
- contains a binary image of control information
70Profile Information
- Profile Information
-
- Total pages processed 51227
- Total Data pages 49327
- Total index pages 828
- Total smart blob pages 6
- Total blob space pages 0
- Total partition pages 328
- Total chunk free list pages 5
- Total Reserve pages 12
- Total bit map pages 335
- MORE . . .
71Extent Information
- db1sysprocedures 0x00200235 8
- db1sysprocbody 0x0020023D 32
- db1sysprocauth 0x0020025D 8
- db1sysprocedures 0x00200265 8
- db1sysprocbody 0x0020026D 32
- db1t1 0x0020028D 24344
- FREE 0x002061A5 3
72Validating Archives
- ontape
- archecker -tdvs
- AC_TAPEBLK, AC_TAPEDEV
- onbar
- onbar -r -v (version 7.3X)
- onbar -v (9.20 8.30)
- onbar -b -v (8.30)
73onsmsync
- Adds from ixbar files to sysutils
- Removes objects from sysutils
- Three expiration policies
- -g remove older than the Nth generation
- -t remove from before a datetime
- -i remove older than an interval
74Understand ixBar Files
- Server name
- object name
- object type
- is_serial
- action id
- archive level
- SMV copy id high
- SMV copy id low
- Backup start date
- Backup start time
- Backup end data
- Backup end time
75Storage Manager Snafus
- Timeout of onbar
- Error 131 Object not found
- Salvaging logs and getting wrong object
76Recovery Snafus
- Check the devices are linked proper
- KAIO only uses raw I/O
- overlapping data
- While restoring database appears hung
77Preparing to Call Support
78Restore seems Hung
- The tape is done
- onstat -D shows no I/O
- Very little CPU activy
- While the system clears the physical and logical
logs there is very little activity and the system
appears to be hung.
79Improvements
- A message into the online log indicating this
phase of the restore started and completed. - The use of intelligent parallelism to clear all
the logs in a single chunks with one thread. One
disk clear thread per chunk.
Clearing the physical and logical logs has
started Cleared 2100 MB of the physical and
logical logs in 612 seconds
80Parallel Archive Procedures
- The archive is broken down into archive jobs with
each dbspace being its own backup - An onbar_d is started to backup a single dbspace
- Connects to database server and Storage manager
requesting the backup session - Updates sysutils and ixbar file
81Parallel Restore Procedures