Title: Oracle DataGuard Support Issues
1Oracle DataGuardSupport Issues
- Brian HitchcockOCP 10g DBA
- Sun Microsystems
- brian.hitchcock_at_sun.com
- brhora_at_aol.com
www.brianhitchcock.net
Brian Hitchcock October 23, 2007
Page 1
2Oracle DataGuard
- Focus on Logical Standby Support Issues
- Maintains a standby database
- Archived redo logs on primary
- Sent to standby and applied
- Simple idea
- Many configuration options
- Can become very complicated
3DataGuard
- Must be SYS to make changes
- Sqlplus / as sysdba
- Changes to DataGuard standby database
- Some cant be made while apply process running
- Change Guard status
- Support Issues
- Create physical standby
- Convert to logical standby
- After logical standby is running
- Refresh process
4DataGuard Errors
- DataGuard reports lot of errors
- Standby database alert log
- Many are for normal operation
- Why reported as errors?
- Monitoring of db alert log
- Will report these errors
- Hard to filter out normal errors
5Create Physical Standby
- On Primary database
- Enable Forced Logging
- Create password file
- Setup init.ora/spfile parameters
- Cant connect to standby
- SYS password
- Verify archiving enabled
- Backup db (hot or cold)
- Create standby control file
6Create Physical Standby
- On Standby database
- Copy db backup files from primary
- Copy standby control file from primary
- Setup init.ora/spfile parameters
- Start physical standby db
- Trace file
- Verify physical standby working
- May not see redo logs, register them
- Redo logs not deleted, use RMAN
7Convert to Logical Standby
- On Primary database
- Build LogMiner dictionary
- On Standby database
- Stop redo apply
- Errors, no impact
- Convert database to logical standby
- Two trace files
- Restart db
- Open resetlogs
- Verify logical standby working
8Logical Standby is Running
- Business requirements
- Standby frozen most of the day
- Standby catches up once per day
- Alert log messages while catching up
- Disk space for archived redo logs
- Other issues
- Apply process is slow
- How to detect, resolve
- Primary versus Standby backups
- Impact, resolution
9Logical Standby is Running
- Other Issues
- Constraint violations
- Errors, resolution
- No data found
- Errors, resolution
- ORA-16211
- Errors, Oracle Support
- Primary db XDB schema issues
- Fixed on primary, errors on standby
10Logical Standby is Running
- Other issues
- ORA-07445
- Refresh cures all
- Refresh process
- After refresh
- ORA-16211 unsupported record found in the
archived redo log - Compile invalid objects
- Import into standby database
11Primary Cant Connect
- Standby not available
- Reported on primary production database
- ORACLE not available
- Looks like production primary is down
- Your monitoring may need to be adjusted
Thu Oct 18 165920 2007 Error 1034 received
logging on to the standby Thu Oct 18 165920
2007 Errors in file /shared/orahome01/admin/BRHPR
OD/bdump/brhprod_arc1_2635.trc ORA-01034
ORACLE not available PINGARC1 Heartbeat
failed to connect to standby BRHPRSB'. Error is
1034.
12SYS Password Issue
- Primary tries to connect to standby
Mon Oct 8 153136 2007 Error 1017 received
logging on to the standby -----------------------
------------------------------------- Check that
the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED
or EXCLUSIVE, and that the SYS password is same
in the password files. returning error ORA-16191
-------------------------------------------------
----------- Mon Oct 8 153136 2007 Errors in
file /orahome01/admin/BRHBETA/bdump/brhbeta_arc0_2
309.trc ORA-16191 Primary log shipping client
not logged on standby PINGARC0 Heartbeat
failed to connect to standby BRHBRSB'. Error is
16191.
13SYS Password Issue
- Verify SYS password is the same
- On primary and standby
- Sqlplus sys/ltpasswordgt
- Verify password file has same password
- On primary and standby
- Cat ORACLE_HOME/dbs/orapwltSIDgt
- Refresh password file
- Alter user SYS identified by ltpasswordgt
- Update password file
14DataGuard Trace File
- Physical Standby
- Start log apply process
- Trace file created
- Stops when log apply process stops
- See file contents later
15Cant See Redo Logs
- Physical Standby
- Creating or Refreshing standby
- Primary configured, sending redo logs
- Standby not yet created/running
- Standby may not register redo logs
- Our scripts maintain primary archived redo logs
- Compress to save disk space, delete after 2 days
- Manually register
- Alter database register logfile ltlogfilegt
- DataGuard applies redo log
16Cant See Redo Logs
BRHBETAgt select from varchive_gap
THREAD LOW_SEQUENCE HIGH_SEQUENCE ----------
------------- -------------- 1 1959 1976
BRHBETAgt select sequence, applied from
varchived_log order by sequence SEQUENCE
APP ---------- --- 1956 YES 1957 YES
1958 YES 1977 NO 1978 NO 1979 NO 1980
NO 1981 NO 1982 NO 1983 NO 1984 NO
11 rows selected.
1959 thru 1976 on standby
17Redo Logs Not Deleted
- Physical Standby
- After applied to standby
- Unlike logical standby
- SQL apply process does delete them
- Use RMAN
- Possible disk space issues on standby
- How long will you need to store redo logs?
- Not an issue if converting to logical soon
18Stop Physical Standby
- Log Apply Process start
- Starts trace file
- When physical standby first created
- Ends when log apply stops
- Normal processing
- Trace file looks like a problem
19Turn Off Apply Process
- Physical standby
- Turn off apply process
- Regular maintenance
- Converting to Logical Standby
- Generates error
- Why is this an error?
- Typical of DataGuard
- Everything seems to be an error
- Even when it is perfectly routine
- Makes support more difficult
- When is an error something to worry about?
20Standby Alert Log
Tue Oct 9 163434 2007 Physical Standby
Database mounted. Completed ALTER DATABASE
MOUNT Tue Oct 9 163436 2007 alter database
recover managed standby database disconnect from
session Tue Oct 9 163436 2007 Attempt to
start background Managed Standby Recovery process
(BRHBETA) MRP0 started with pid11, OS id13474
Tue Oct 9 163436 2007 MRP0 Background
Managed Standby Recovery process started
(BRHBETA) Managed Standby Recovery not using
Real Time Apply parallel recovery started with 7
processes
Log apply process started when physical standby
created
21Standby Alert Log
Wed Oct 10 101515 2007 alter database recover
managed standby database cancel Wed Oct 10
101519 2007 MRP0 Background Media Recovery
cancelled with status 16037 Wed Oct 10 101519
2007 Errors in file /orahome01/admin/BRHBETA/bdum
p/brhbeta_mrp0_13474.trc ORA-16037 user
requested cancel of managed recovery operation
Recovery interrupted! Wed Oct 10 101520 2007
Errors in file /orahome01/admin/BRHBETA/bdump/brh
beta_mrp0_13474.trc ORA-16037 user requested
cancel of managed recovery operation Wed Oct 10
101520 2007 MRP0 Background Media Recovery
process shutdown (BRHBETA) Wed Oct 10 101521
2007 Managed Standby Recovery Canceled (BRHBETA)
Wed Oct 10 101521 2007 Completed alter
database recover managed standby database cancel
Log apply process stopped preparing to convert
to logical standby
22Trace File
more /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0
_13474.trc /orahome01/admin/BRHBETA/bdump/brhbeta
_mrp0_13474.trc Oracle Database 10g Enterprise
Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, OLAP and Data Mining
options ORACLE_HOME /orahome01/product/10.2.0
System name SunOS Node name brh-beta1-zone04
Release 5.10 Version Generic_118833-36
Machine sun4u Instance name BRHBETA Redo
thread mounted by this instance 1 Oracle
process number 11 Unix process pid 13474,
image oracle_at_beta1-zone04 (MRP0) SERVICE
NAME() 2007-10-09 163436.298 SESSION
ID(394.1) 2007-10-09 163436.298 ARCH
Connecting to console port... 2007-10-09
163436.299 60639 kcrr.c MRP0 Background
Managed Standby Recovery process started
Start applying redo logs to physical standby
23Trace File
2007-10-09 163441.302 1018 krsm.c Managed
Recovery Initialization posted. 2007-10-09
163441.303 60639 kcrr.c Managed Standby
Recovery not using Real Time Apply Recovery
target incarnation 2, activation ID 0 Influx
buffer limit 27762 (50 x 55524) Successfully
allocated 7 recovery slaves Using 158 overflow
buffers per recovery slave Start recovery at
thread 1 ckpt scn 8257757517457 logseq 1956 block
5 2007-10-09 163442.124 Media Recovery
add redo thread 1 2007-10-09 163442.124
1018 krsm.c Managed Recovery Active posted.
ORA-00367 checksum error in log file header
ORA-00305 log 1 of thread 1 inconsistent
belongs to another database ORA-00312 online
log 1 thread 1 '/shared/oralogs01/BRHBETA/redo01a
.log' 2007-10-09 163442.147 60639 kcrr.c
Clearing online redo logfile 1
/shared/oralogs01/BRHBETA/redo01a.log
2007-10-09 163615.066 2007-10-09
163615.066 60639 kcrr.c Clearing online redo
logfile 1 complete ORA-00367 checksum error in
log file header ORA-00305 log 2 of thread 1
inconsistent belongs to another database
ORA-00312 online log 2 thread 1
'/shared/oralogs01/BRHBETA/redo02a.log'
Recreating redo logs
24Trace File
2007-10-09 163615.100 60639 kcrr.c
Clearing online redo logfile 2
/shared/oralogs01/BRHBETA/redo02a.log
2007-10-09 163751.473 2007-10-09
163751.473 60639 kcrr.c Clearing online redo
logfile 2 complete ORA-00367 checksum error in
log file header ORA-00305 log 3 of thread 1
inconsistent belongs to another database
ORA-00312 online log 3 thread 1
'/shared/oradata02/BRHBETA/redo03b.log'
2007-10-09 163751.479 60639 kcrr.c Clearing
online redo logfile 3 /shared/oradata02/BRHBETA/re
do03b.log 2007-10-09 163926.048
2007-10-09 163926.048 60639 kcrr.c Clearing
online redo logfile 3 complete ORA-00367
checksum error in log file header ORA-00305 log
4 of thread 1 inconsistent belongs to another
database ORA-00312 online log 4 thread 1
'/shared/oradata02/BRHBETA/redo04b.log'
2007-10-09 163926.488 60639 kcrr.c Clearing
online redo logfile 4 /shared/oradata02/BRHBETA/re
do04b.log 2007-10-09 164100.447
2007-10-09 164100.447 60639 kcrr.c Clearing
online redo logfile 4 complete 2007-10-09
164100.469 60639 kcrr.c Media Recovery Waiting
for thread 1 sequence 1956
25Trace File
Applying redo logs to physical standby
2007-10-09 164100.469 60639 kcrr.c
Fetching gap sequence in thread 1, gap sequence
1956-1976 2007-10-09 164130.782
-------------------------------------------------
---------- Check that the CONTROL_FILE_RECORD_KEE
P_TIME initialization parameter is defined to a
value that is sufficiently large enough to
maintain adequate log switch information to
resolve archivelog gaps. -----------------------
------------------------------------
2007-10-09 165431.045 2007-10-09
165431.045 60639 kcrr.c Fetching gap sequence
in thread 1, gap sequence 1956-1956
2007-10-09 165501.154 -------------------------
---------------------------------- Check that
the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is
sufficiently large enough to maintain adequate
log switch information to resolve archivelog
gaps. -------------------------------------------
---------------- 2007-10-09 165631.179
Media Recovery Log /oraarch01/BRHBETA/LOG_1956_1_
629245032.arc 2007-10-09 165633.431 Media
Recovery Log /oraarch01/BRHBETA/LOG_1957_1_6292450
32.arc 2007-10-09 165644.495
26Trace File
2007-10-09 165644.495 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 1958
2007-10-09 165644.495 60639 kcrr.c Fetching
gap sequence in thread 1, gap sequence 1958-1976
2007-10-09 165714.647 --------------------
--------------------------------------- Check
that the CONTROL_FILE_RECORD_KEEP_TIME
initialization parameter is defined to a value
that is sufficiently large enough to maintain
adequate log switch information to resolve
archivelog gaps. -------------------------------
---------------------------- 2007-10-09
170514.785 Media Recovery Log
/oraarch01/BRHBETA/LOG_1958_1_629245032.arc
2007-10-09 170518.043 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 1959
2007-10-09 170518.043 60639 kcrr.c Fetching
gap sequence in thread 1, gap sequence 1959-1976
2007-10-09 170548.284 --------------------
--------------------------------------- Check
that the CONTROL_FILE_RECORD_KEEP_TIME
initialization parameter is defined to a value
that is sufficiently large enough to maintain
adequate log switch information to resolve
archivelog gaps. -------------------------------
----------------------------
27Trace File
2007-10-09 170718.309 Media Recovery Log
/oraarch01/BRHBETA/LOG_1959_1_629245032.arc
2007-10-09 170721.114 Media Recovery Log
/oraarch01/BRHBETA/LOG_1960_1_629245032.arc
2007-10-09 170722.945 Media Recovery Log
/oraarch01/BRHBETA/LOG_1961_1_629245032.arc
2007-10-09 170727.300 Media Recovery Log
/oraarch01/BRHBETA/LOG_1962_1_629245032.arc
2007-10-09 170729.637 Media Recovery Log
/oraarch01/BRHBETA/LOG_1963_1_629245032.arc
2007-10-09 170729.709 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 1964
2007-10-09 170729.709 60639 kcrr.c Fetching
gap sequence in thread 1, gap sequence 1964-1976
2007-10-09 170759.858 --------------------
--------------------------------------- Check
that the CONTROL_FILE_RECORD_KEEP_TIME
initialization parameter is defined to a value
that is sufficiently large enough to maintain
adequate log switch information to resolve
archivelog gaps. -------------------------------
---------------------------- 2007-10-09
170829.866 Media Recovery Log
/oraarch01/BRHBETA/LOG_1964_1_629245032.arc
28Trace File
2007-10-09 170831.924 Media Recovery Log
/oraarch01/BRHBETA/LOG_1965_1_629245032.arc
2007-10-09 170912.510 Media Recovery Log
/oraarch01/BRHBETA/LOG_1966_1_629245032.arc
2007-10-09 170921.050 Media Recovery Log
/oraarch01/BRHBETA/LOG_1967_1_629245032.arc
2007-10-09 170940.234 Media Recovery Log
/oraarch01/BRHBETA/LOG_1968_1_629245032.arc
2007-10-09 170945.055 Media Recovery Log
/oraarch01/BRHBETA/LOG_1969_1_629245032.arc
2007-10-09 170950.572 Media Recovery Log
/oraarch01/BRHBETA/LOG_1970_1_629245032.arc
2007-10-09 170958.968 Media Recovery Log
/oraarch01/BRHBETA/LOG_1971_1_629245032.arc
2007-10-09 171003.922 Media Recovery Log
/oraarch01/BRHBETA/LOG_1972_1_629245032.arc
2007-10-09 171013.196 Media Recovery Log
/oraarch01/BRHBETA/LOG_1973_1_629245032.arc
2007-10-09 171021.927 Media Recovery Log
/oraarch01/BRHBETA/LOG_1974_1_629245032.arc
2007-10-09 171034.064 Media Recovery Log
/oraarch01/BRHBETA/LOG_1975_1_629245032.arc
2007-10-09 171042.420 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 1976
29Trace File
2007-10-09 171042.421 60639 kcrr.c
Fetching gap sequence in thread 1, gap sequence
1976-1976 2007-10-09 171112.538
-------------------------------------------------
---------- Check that the CONTROL_FILE_RECORD_KEE
P_TIME initialization parameter is defined to a
value that is sufficiently large enough to
maintain adequate log switch information to
resolve archivelog gaps. -----------------------
------------------------------------
2007-10-09 171242.563 Media Recovery Log
/oraarch01/BRHBETA/LOG_1976_1_629245032.arc
2007-10-09 171245.563 Media Recovery Log
/oraarch01/BRHBETA/LOG_1977_1_629245032.arc
2007-10-09 171248.534 Media Recovery Log
/oraarch01/BRHBETA/LOG_1978_1_629245032.arc
2007-10-09 171300.505 Media Recovery Log
/oraarch01/BRHBETA/LOG_1979_1_629245032.arc
2007-10-09 171302.054 Media Recovery Log
/oraarch01/BRHBETA/LOG_1980_1_629245032.arc
2007-10-09 171303.231 Media Recovery Log
/oraarch01/BRHBETA/LOG_1981_1_629245032.arc
2007-10-09 171303.902 Media Recovery Log
/oraarch01/BRHBETA/LOG_1982_1_629245032.arc
30Trace File
2007-10-09 171304.492 Media Recovery Log
/oraarch01/BRHBETA/LOG_1983_1_629245032.arc
2007-10-09 171308.171 Media Recovery Log
/oraarch01/BRHBETA/LOG_1984_1_629245032.arc
2007-10-09 171326.860 2007-10-09
171326.860 60639 kcrr.c Media Recovery Waiting
for thread 1 sequence 1985 2007-10-09
171607.172 Media Recovery Log
/oraarch01/BRHBETA/LOG_1985_1_629245032.arc
2007-10-09 171608.067 Media Recovery Log
/oraarch01/BRHBETA/LOG_1986_1_629245032.arc
2007-10-09 171608.131 Media Recovery Log
/oraarch01/BRHBETA/LOG_1987_1_629245032.arc
2007-10-09 171608.195 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 1988
2007-10-09 171613.202 Media Recovery Log
/oraarch01/BRHBETA/LOG_1988_1_629245032.arc
2007-10-09 171613.268 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 1989
2007-10-09 211401.119 Media Recovery Log
/oraarch01/BRHBETA/LOG_1989_1_629245032.arc
2007-10-09 211416.922 2007-10-09
211416.922 60639 kcrr.c Media Recovery Waiting
for thread 1 sequence 1990
31Trace File
2007-10-10 093233.399 2007-10-10
093233.399 60639 kcrr.c Fetching gap sequence
in thread 1, gap sequence 1990-1990
2007-10-10 093305.187 Media Recovery Log
/oraarch01/BRHBETA/LOG_1990_1_629245032.arc
2007-10-10 093322.505 Media Recovery Log
/oraarch01/BRHBETA/LOG_1991_1_629245032.arc
2007-10-10 093322.570 Media Recovery Log
/oraarch01/BRHBETA/LOG_1992_1_629245032.arc
2007-10-10 093322.631 Media Recovery Log
/oraarch01/BRHBETA/LOG_1993_1_629245032.arc
2007-10-10 093322.693 Media Recovery Log
/oraarch01/BRHBETA/LOG_1994_1_629245032.arc
2007-10-10 093322.761 Media Recovery Log
/oraarch01/BRHBETA/LOG_1995_1_629245032.arc
2007-10-10 093322.807 Media Recovery Log
/oraarch01/BRHBETA/LOG_1996_1_629245032.arc
2007-10-10 093322.864 Media Recovery Log
/oraarch01/BRHBETA/LOG_1997_1_629245032.arc
2007-10-10 093322.918 Media Recovery Log
/oraarch01/BRHBETA/LOG_1998_1_629245032.arc
2007-10-10 093323.199 Media Recovery Log
/oraarch01/BRHBETA/LOG_1999_1_629245032.arc
32Trace File
Stop Log Apply Process Ready to convert to
Logical Standby
2007-10-10 093323.255 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 2000
2007-10-10 101107.685 Media Recovery Log
/oraarch01/BRHBETA/LOG_2000_1_629245032.arc
2007-10-10 101108.422 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 2001
2007-10-10 101448.843 Media Recovery Log
/oraarch01/BRHBETA/LOG_2001_1_629245032.arc
2007-10-10 101449.013 60639 kcrr.c Media
Recovery Waiting for thread 1 sequence 2002
2007-10-10 101519.072 2007-10-10
101519.072 60639 kcrr.c MRP0 Background Media
Recovery cancelled with status 16037 ORA-16037
user requested cancel of managed recovery
operation ----- Redo read statistics for thread
1 ----- Read rate (ASYNC) 619732Kb in 63640.12s
gt 0.01 Mb/sec Total physical reads 619732Kb
Longest record 28Kb, moves 0/2001133 (0)
Change moves 779641/4101685 (19), moved 141Mb
Longest LWN 1023Kb, moves 117/175493 (0),
moved 23Mb Last redo scn 0x0782.a8f27f37
(8257761607479) ---------------------------------
------------- 2007-10-10 101519.088 Media
Recovery drop redo thread 1
33Trace File
2007-10-10 101520.864 1018 krsm.c Managed
Recovery Not Active posted. ORA-16037 user
requested cancel of managed recovery operation
ARCH Connecting to console port...
2007-10-10 101520.871 60639 kcrr.c MRP0
Background Media Recovery process shutdown
2007-10-10 101520.871 1018 krsm.c
oraarch01/BRHBETA
34Convert to Logical Standby
- SQL Apply Process
- When applying redo logs
- Generates 2 trace files
- What are they?
- Trace files
- One shows start of kcrrwkx
- Second shows end of kcrrwkx
- What are these for?
- Neither show up in alert log
- Both continue as long as SQL apply process runs
35First Trace File
/orahome01/admin/BRHBETA/bdump more
brhbeta_arc0_13168.trc /orahome01/admin/BRHBETA/b
dump/brhbeta_arc0_13168.trc Oracle Database 10g
Enterprise Edition Release 10.2.0.2.0 - 64bit
Production With the Partitioning, OLAP and Data
Mining options ORACLE_HOME /orahome01/product/1
0.2.0 System name SunOS Node name
brh-beta1-zone04 Release 5.10 Version
Generic_118833-36 Machine sun4u Instance name
BRHBETA Redo thread mounted by this instance 1
Oracle process number 24 Unix process pid
13168, image oracle_at_beta1-zone04 (ARC0)
SERVICE NAME() 2007-10-10 104026.358
SESSION ID(188.2) 2007-10-10 104026.358
kcrrwkx nothing to do (start) 2007-10-10
104526.240 kcrrwkx nothing to do (start)
2007-10-10 104635.388 kcrrwkx nothing to do
(end) ... ...
36Second Trace File
/orahome01/admin/BRHBETA/bdump more
brhbeta_arc1_13170.trc /orahome01/admin/BRHBETA/b
dump/brhbeta_arc1_13170.trc Oracle Database 10g
Enterprise Edition Release 10.2.0.2.0 - 64bit
Production With the Partitioning, OLAP and Data
Mining options ORACLE_HOME /orahome01/product/1
0.2.0 System name SunOS Node name
brh-beta1-zone04 Release 5.10 Version
Generic_118833-36 Machine sun4u Instance name
BRHBETA Redo thread mounted by this instance 1
Oracle process number 9 Unix process pid
13170, image oracle_at_beta1-zone04 (ARC1)
SERVICE NAME() 2007-10-10 104026.358
SESSION ID(396.1) 2007-10-10 104026.358
kcrrwkx nothing to do (start) 2007-10-10
104126.315 kcrrwkx nothing to do (end)
2007-10-10 104226.322 kcrrwkx nothing to do
(end)
37DataGuard Likes to Chat
- Physical Standby
- While applying archived redo logs
- Trace file documents everything standby does
- Logical Standby
- Once converted to logical standby
- Two trace files generated
- Contain messages for start/stop of each log apply
- Why are these generated?
- Why not have DataGuard alert logs?
- Trace files tell me that something is wrong
38Normal Operation
- Logical Standby catching up to Primary
- Apply process turned off during the day
- Catches up at night
- Apply process failed
- Catch up after fix (skip table in the example)
- Typical alert log messages
- Redo log from primary registered with DG
- Redo logs applied to standby
- Redo logs deleted from standby
39Standby Catching Up
Tue Oct 16 151322 2007 Completed ALTER
DATABASE STOP LOGICAL STANDBY APPLY Tue Oct 16
151416 2007 Incremental checkpoint up to RBA
0x7.a0aa2.0, current log tail at RBA
0x7.b8e2c.0 Tue Oct 16 151445 2007 ALTER
DATABASE START LOGICAL STANDBY APPLY Tue Oct 16
151445 2007 ALTER DATABASE START LOGICAL
STANDBY APPLY (BRHBETA) Tue Oct 16 151445 2007
No optional part Attempt to start background
Logical Standby process LSP0 started with
pid21, OS id5041 LOGSTDBY status ORA-16111
log mining and apply setting up Tue Oct 16
151446 2007 LOGMINER Parameters summary for
session 1 LOGMINER Number of processes 3,
Transaction Chunk Size 201 LOGMINER Memory
Size 30M, Checkpoint interval 150M Tue Oct
16 151446 2007 Completed ALTER DATABASE START
LOGICAL STANDBY APPLY LOGMINER session 1,
builder process P001 started with pid7 OS
id10018 LOGMINER session 1, reader process
P000 started with pid34 OS id10014 LOGMINER
session 1, preparer process P002 started with
pid36 OS id10020 LSP2 started with pid23, OS
id5043
Stop SQL Apply process
Start SQL Apply process after skipping table
40Standby Catching Up
Tue Oct 16 151448 2007 LOGMINER Begin mining
logfile /oraarch01/BRHBETA/LOG_2048_1_629245032.a
rc LOGSTDBY Analyzer process P003 started with
pid13 OS id10051 Tue Oct 16 151448 2007
LOGMINER Turning ON Log Auto Delete LOGSTDBY
Apply process P004 started with pid40 OS
id10054 LOGSTDBY Apply process P006 started
with pid42 OS id10062 LOGSTDBY Apply process
P007 started with pid17 OS id10064 LOGSTDBY
Apply process P005 started with pid15 OS
id10060 Tue Oct 16 152202 2007 Beginning log
switch checkpoint up to RBA 0x8.2.10, SCN
8295181217591 Thread 1 advanced to log sequence
8 Current log 4 seq 8 mem 0
/shared/oradata02/BRHBETA/redo04b.log Current
log 4 seq 8 mem 1 /shared/oralogs01/BRHBETA/re
do04a.log Tue Oct 16 152528 2007 Completed
checkpoint up to RBA 0x8.2.10, SCN
8295181217591 Tue Oct 16 153432 2007
Incremental checkpoint up to RBA 0x8.4cbae.0,
current log tail at RBA 0x8.65553.0 Tue Oct 16
154240 2007 LOGMINER End mining logfile
/oraarch01/BRHBETA/LOG_2048_1_629245032.arc Tue
Oct 16 154240 2007 LOGMINER Begin mining
logfile /oraarch01/BRHBETA/LOG_2049_1_629245032.a
rc ... ... ...
Processing redo logs
41Standby Catching Up
Tue Oct 16 172048 2007 LOGMINER End mining
logfile /oraarch01/BRHBETA/LOG_2049_1_629245032.a
rc Tue Oct 16 172048 2007 LOGMINER Begin
mining logfile /oraarch01/BRHBETA/LOG_2050_1_6292
45032.arc Tue Oct 16 172054 2007 LOGMINER
End mining logfile /oraarch01/BRHBETA/LOG_2050_1_
629245032.arc ... ... ... Tue Oct 16 183913
2007 LOGMINER Log Auto Delete - deleting
/oraarch01/BRHBETA/LOG_2048_1_629245032.arc
Deleted file /oraarch01/BRHBETA/LOG_2048_1_629245
032.arc ... ... ... Tue Oct 16 184340 2007
LOGMINER Begin mining logfile
/oraarch01/BRHBETA/LOG_2082_1_629245032.arc Tue
Oct 16 184359 2007 LOGMINER End mining
logfile /oraarch01/BRHBETA/LOG_2082_1_629245032.a
rc Tue Oct 16 184359 2007 LOGMINER Begin
mining logfile /oraarch01/BRHBETA/LOG_2083_1_6292
45032.arc
Processing redo logs
Deleting redo logs
Processing redo logs
42Standby Catching Up
Tue Oct 16 184401 2007 LOGMINER Log Auto
Delete - deleting /oraarch01/BRHBETA/LOG_2056_1_6
29245032.arc Deleted file /oraarch01/BRHBETA/LOG_
2056_1_629245032.arc Tue Oct 16 184401 2007
LOGMINER Log Auto Delete - deleting
/oraarch01/BRHBETA/LOG_2057_1_629245032.arc
Deleted file /oraarch01/BRHBETA/LOG_2057_1_629245
032.arc Tue Oct 16 184401 2007 LOGMINER Log
Auto Delete - deleting /oraarch01/BRHBETA/LOG_205
8_1_629245032.arc Deleted file
/oraarch01/BRHBETA/LOG_2058_1_629245032.arc ...
Deleting redo logs
43Standby Catching Up
Tue Oct 16 184415 2007 LOGMINER Begin mining
logfile /oraarch01/BRHBETA/LOG_2087_1_629245032.a
rc Tue Oct 16 184837 2007 Completed
checkpoint up to RBA 0xa.2.10, SCN
8295181577382 Tue Oct 16 185518 2007
Incremental checkpoint up to RBA 0xa.12dad.0,
current log tail at RBA 0xa.1314b.0 Tue Oct 16
190131 2007 RFS1 No standby redo logfiles
created RFS1 Archived Log '/oraarch01/BRHBETA
/LOG_2153_1_629245032.arc' Tue Oct 16 190132
2007 RFS LogMiner Registered logfile
/oraarch01/BRHBETA/LOG_2153_1_629245032.arc to
LogMiner session id 1 Tue Oct 16 191522 2007
Incremental checkpoint up to RBA 0xa.142b2.0,
current log tail at RBA 0xa.143fe.0 Tue Oct 16
192901 2007 LSP0 warning -- apply server 2,
sid 384 waiting on user sid 196 for event (since
0 seconds) Tue Oct 16 192901 2007 LOGMINER
End mining logfile /oraarch01/BRHBETA/LOG_2087_1_
629245032.arc ...
Standby is at 2087
Primary is at 2153
44Standby Catching Up
Unsupported DDL Standby doesnt execute
Tue Oct 16 193058 2007 LOGSTDBY stmt CREATE
PFILE '/tmp/datatools/BRHBETA.PFILE.19144.119241
3665' FROM SPFILE LOGSTDBY status ORA-16226
DDL skipped due to lack of support LOGSTDBY id
XID 0x0003.02d.00013e70, hSCN 0x0782.a9c2fdb8,
lSCN 0x0782.a9c2fdb8, Thread 1, RBA LOGSTDBY
stmt create pfile'/orahome01/oradba/tmp/ora_adm_
sqlbt_bkp.tmp1.17449.BRHBETA' from spfile
LOGSTDBY status ORA-16226 DDL skipped due to
lack of support LOGSTDBY id XID
0x000b.001.000126cf, hSCN 0x0782.a9c2fe15, lSCN
0x0782.a9c2fe15, Thread 1, RBA LOGSTDBY stmt
CREATE PFILE '/tmp/datatools/BRHBETA.PFILE.19695
.1192413687' FROM SPFILE LOGSTDBY status
ORA-16226 DDL skipped due to lack of support
LOGSTDBY id XID 0x0003.00c.00013e62, hSCN
0x0782.a9c2fe4a, lSCN 0x0782.a9c2fe4a, Thread 1,
RBA LOGSTDBY stmt ALTER DATABASE BACKUP
CONTROLFILE TO '/tmp/datatools/dtodump_LOGSTDBY
status ORA-16226 DDL skipped due to lack of
support LOGSTDBY id XID 0x0009.007.00011453,
hSCN 0x0782.a9c2feb4, lSCN 0x0782.a9c2feb4,
Thread 1, RBA Tue Oct 16 193058 2007 ALTER
TABLESPACE "SYSTEM" BEGIN BACKUP Completed
ALTER TABLESPACE "SYSTEM" BEGIN BACKUP Tue Oct
16 193058 2007 ALTER TABLESPACE "SYSTEM" END
BACKUP Completed ALTER TABLESPACE "SYSTEM" END
BACKUP ...
45Standby Catching Up
Standby catches up at 2158
Tue Oct 16 212919 2007 LOGMINER Begin mining
logfile /oraarch01/BRHBETA/LOG_2157_1_629245032.a
rc Tue Oct 16 213003 2007 LOGMINER End
mining logfile /oraarch01/BRHBETA/LOG_2157_1_6292
45032.arc Tue Oct 16 213552 2007 Incremental
checkpoint up to RBA 0xa.f41b7.0, current log
tail at RBA 0xa.f41cc.0 Tue Oct 16 215556
2007 Incremental checkpoint up to RBA
0xa.f43b5.0, current log tail at RBA
0xa.f43b5.0 Tue Oct 16 221116 2007 RFS1
No standby redo logfiles created RFS1
Archived Log '/oraarch01/BRHBETA/LOG_2158_1_62924
5032.arc' Tue Oct 16 221116 2007 RFS
LogMiner Registered logfile /oraarch01/BRHBETA/L
OG_2158_1_629245032.arc to LogMiner session id
1 Tue Oct 16 221116 2007 LOGMINER Begin
mining logfile /oraarch01/BRHBETA/LOG_2158_1_6292
45032.arc Tue Oct 16 221120 2007 LOGMINER
End mining logfile /oraarch01/BRHBETA/LOG_2158_1_
629245032.arc
46Archived Redo Logs
- Logical Standby
- After applied to standby
- SQL apply process does delete them
- Unlike physical standby
- Possible disk space issues on standby
- How long will you need to store redo logs?
- If standby frozen all day
- Weekends? Holidays?
- If standby fails
- How many days to fix failures?
47Archived Redo Logs
- How long are redo logs available on primary?
- If not on disk when needed for standby
- Recover from backup
- Dataguard may not see these redo logs
- Register redo logs
- Logical standby
- Also generates its own archived redo logs
- Needed to recover standby db
- Unique standby db objects?
48SQL Apply Process Slow
- Detect long-running transaction
- Compute estimate of time to complete
- Identify and skip problem table
49Long Running Transaction
- Standby Alert Log
- SQL apply process applying redo log 2049
- Doesnt move on within a few minutes
- Current time is Tue Oct 16 080955 2007
- Shows start time for this redo log
- Has been processing for over 24 hours
Mon Oct 15 055229 2007 LOGMINER Begin mining
logfile /oraarch01/BRHBETA/LOG_2049_1_629245032.a
rc
50Long Running Transaction
- What is apply process doing?
- Check redo logs waiting to be applied
- Where is processing in current redo log?
- How long to complete current redo log?
51Long Running Transaction
alter session set nls_date_format 'DD-Mon-YYYY
hh24miss' column first_change format
99999999999999999999 column next_change format
99999999999999999999 column resetlogs_change
format 99999999999999999999 select from
dba_logstdby_log BRHBETAgt SELECT TYPE, STATUS,
HIGH_SCN FROM VLOGSTDBY TYPE
------------------------------ STATUS
-------------------------------------------------
--------------------------------------------------
----------- HIGH_SCN ---------------------
COORDINATOR ORA-16116 no work available
8257767540953 READER ORA-16127 stalled
waiting for additional transactions to be applied
8257767541085 BUILDER ORA-16127 stalled
waiting for additional transactions to be applied
8257767540965
52Long Running Transaction
PREPARER ORA-16127 stalled waiting for
additional transactions to be applied
8257767540965 ANALYZER ORA-16117 processing
8257767540953 APPLIER ORA-16116 no work
available 8257767539467 APPLIER ORA-16116 no
work available 8257767512259 APPLIER
ORA-16113 applying change to table or sequence
"BRH"."XXSUN_BRH_COMPS_INT" 8257767539247
APPLIER ORA-16116 no work available
8257767512262 9 rows selected. BRHBETAgt
53Long Running Transaction
BRHBETAgt select from dba_logstdby_log THREAD
RESETLOGS_CHANGE RESETLOGS_ID SEQUENCE
FIRST_CHANGE NEXT_CHANGE ----------
--------------------- ------------ ----------
--------------------- ---------------------
-------------------------- ---------- FILE_NAME
TIMESTAMP DIC DIC APPLIED 1 8257200902826
629245032 2048 8257767447753 8257767534297
12-Oct-2007 224105 12-Oct-2007 231823
/oraarch01/BRHBETA/LOG_2048_1_629245032.arc
12-Oct-2007 221908 NO NO CURRENT 1
8257200902826 629245032 2049 8257767534297
8257767754044 12-Oct-2007 231823 13-Oct-2007
001105 /oraarch01/BRHBETA/LOG_2049_1_629245032.
arc 12-Oct-2007 231218 NO NO CURRENT 1
8257200902826 629245032 2050 8257767754044
8257767922751 13-Oct-2007 001105 13-Oct-2007
011105 /oraarch01/BRHBETA/LOG_2050_1_629245032.
arc 13-Oct-2007 001106 NO NO NO ... ... ...
1 8257200902826 629245032 2140 8257781397314
8257781562968 16-Oct-2007 074115 16-Oct-2007
084115 /oraarch01/BRHBETA/LOG_2140_1_629245032.
arc 16-Oct-2007 074116 NO NO NO 93 rows
selected.
54Long Running Transaction
- redo log 2049 goes from
- SCN 8257767534297 to SCN 8257767754044
- Check again -- Tue Oct 16 150429 MST 2007
- Compute Estimate
- Tue Oct 16 080955 -- Tue Oct 16 111739
- APPLIER has moved from 39247 to 39991
- 3 hours --gt roughly 750 SCNs, 250 per hour
- it still needs to go from 539991 to 754044
- over 200,000 SCNs -- at 250 per hour,
- this would take 800 hours --gt 33 days
APPLIER ORA-16113 applying change to table or
sequence "BRH"."XXSUN_BRH_COMPS_INT"
8257767540857
55Long Running Transaction
- Check again -- Tue Oct 16 150429 MST 2007
- APPLIER has moved
- 39991 to 40857
- in the last 4 hours, 866 SCNs,
- roughly in line with 250/hr we computed earlier
APPLIER ORA-16113 applying change to table or
sequence "BRH"."XXSUN_BRH_COMPS_INT"
8257767540857
56Long Running Transaction
- This is truly awful science!
- Assumes all SCNs take same amount of time
- If processing takes more than a few minutes
- Compute estimate
- Confirm that it will take a long time
- Compare with business requirements for standby
- Must be in synch once per day
- Decide to skip table
- If table required, must wait or full refresh
57Long Running Transaction
- Skip table
- SQL Apply Process restarts with redo 2048
- Standby catches up quickly
ALTER DATABASE STOP LOGICAL STANDBY APPLY
EXECUTE DBMS_LOGSTDBY.SKIP - (stmt gt 'DML' ,
- schema_name gt 'BRH' , - object_name gt
'XXSUN_BRH_COMPS_INT', - proc_name gt null)
ALTER DATABASE START LOGICAL STANDBY APPLY
58Primary/Standby Interactions
- Logical standby backup starts
- Tablespaces put into backup mode
- Apply process applies redo logs from primary
- Contain transactions for primary backup
- Tries to put tablespaces into backup mode
- Apply process fails
- Wait for standby backup to finish
- Restart apply process
- Disable standby backups when catching up
- Apply process runs longer than normal
59Unique Constraint Violation
- Oracle calls this
- Oscillating updates
- Oracle docs explain this (I cant)
- Or primary update really did fail
- And was rolled back on primary db
- Fails and rolls back in standby db
- SQL apply process restarts
- Automatically
- No need to do anything
60Unique Constraint Violation
Tue Oct 16 212342 2007 LOGMINER Begin mining
logfile /oraarch01/BRHBETA/LOG_2147_1_629245032.a
rc ... ... ... Tue Oct 16 212431 2007
LOGSTDBY stmt insert into "APPLSYS"."WF_LOCAL_RO
LES" values COL1" 'Value1', COL2"
'Value2', COL3" IS NULL, LOGSTDBY status
ORA-00001 unique constraint (APPLSYS.WF_LOCAL_ROL
ES_U1) violated LOGSTDBY id XID
0x0009.016.00011548, hSCN 0x0782.aa32b533, lSCN
0x0782.aa32b533, Thread 1, RBA Tue Oct 16
212520 2007 LOGMINER End mining logfile
/oraarch01/BRHBETA/LOG_2147_1_629245032.arc
SQL Apply Process continues processing
61No Data Found
- What does it mean?
- When DataGuard updates standby
- Brings update from primary
- Brings pre-update data from primary
- On standby, DataGuard compares
- Pre-update data from primary
- Current data on standby
- If they dont agree
- DataGuard wont apply the update on standby
- SQL apply process fails
62No Data Found
Wed Sep 19 120923 2007 LOGSTDBY stmt update
"PO"."PO_LINE_LOCATIONS_ALL" ... ...SQL, values
... LOGSTDBY status ORA-01403 no data found
LOGSTDBY id XID 0x0008.01e.0000c437, hSCN
0x0789.eacde6c1, lSCN 0x0789.eacde6c1 LOGSTDBY
Apply process P007 pid29 OS id3447 stopped Wed
Sep 19 120923 2007 Errors in file
/shared/orahome01/admin/BRHPRSB/bdump/brhprsb_lsp0
_12386.trc ORA-12801 error signaled in
parallel query server P004 ORA-01403 no data
found LOGSTDBY Analyzer process P003 pid24 OS
id3439 stopped LOGSTDBY Apply process P006
pid27 OS id3445 stopped LOGSTDBY Apply process
P005 pid26 OS id3443 stopped
63No Data Found
- What happened?
- For some reason
- Table data not the same primary vs standby
- How could this happen?
- Logical standby is read-write
- SYS can change anything at any time
- SYS left guard status at NONE
- Other db users can make changes in standby
- How to fix?
- Skip table
- Refresh logical standby
64No Data Found
- Logical Standby
- No way to find out what happened
- No utility to verify primary, standby in synch
- Differences can exist for a long time
- Wont cause error until table updated on primary
- Logical Standby for reporting?
- Can you depend on this for your reports?
- How do you know what is in the standby?
- What has been skipped?
65Primary Schema Issues
- Primary db
- XDB schema reinstalled
- Create java class (loads java class from
filesystem) - Standby db
- Transactions came through to standby
- Standby doesnt have java class files
- Apply process fails
- Identify and skip transaction(s)
66ORA-07445 Errors
- SR opened
- Results
- Known bug fixed in 11g
- Apply patch on standby
- Impact
- None, no affect on standby
- Apply patch?
- No refresh would wipe out patch
- Dont want to patch primary db
- Primary doesnt have this error
67ORA-07445 Errors
SQL Apply Process stops
Tue Oct 16 212750 2007 Errors in file
/orahome01/admin/BRHBETA/bdump/brhbeta_p004_6577.t
rc ORA-07445 exception encountered core dump
krvsmso()1212 SIGSEGV Address not mapped to
object Tue Oct 16 212906 2007 Errors in
file /orahome01/admin/BRHBETA/bdump/brhbeta_lsp0_5
041.trc ORA-12805 parallel query server died
unexpectedly Tue Oct 16 212906 2007 TLCR
process death detected. Shutting down TLCR
logminer process death detected, exiting logical
standby LOGSTDBY Analyzer process P003 pid13 OS
id10051 stopped LOGSTDBY Apply process P005
pid15 OS id10060 stopped LOGSTDBY Apply
process P006 pid42 OS id10062 stopped LOGSTDBY
Apply process P007 pid17 OS id10064 stopped
Tue Oct 16 212906 2007 LOGSTDBY status
ORA-16222 automatic Logical Standby retry of
last action LOGSTDBY status ORA-16111 log
mining and apply setting up
Logical Standby is not for the faint of heart!
SQL Apply Process automatically restarts
68ORA-07445 Errors
Tue Oct 16 212907 2007 LOGMINER Parameters
summary for session 1 LOGMINER Number of
processes 3, Transaction Chunk Size 201
LOGMINER Memory Size 30M, Checkpoint interval
150M LOGMINER session 1, builder process
P001 started with pid7 OS id10018 LOGMINER
session 1, reader process P000 started with
pid34 OS id10014 LOGMINER session 1,
preparer process P002 started with pid36 OS
id10020 Tue Oct 16 212910 2007 LOGMINER
Begin mining logfile /oraarch01/BRHBETA/LOG_2147_
1_629245032.arc Tue Oct 16 212910 2007
LOGMINER Turning ON Log Auto Delete LOGSTDBY
Analyzer process P003 started with pid13 OS
id10051 LOGSTDBY Apply process P006 started
with pid42 OS id10062 LOGSTDBY Apply process
P004 started with pid30 OS id10219 LOGSTDBY
Apply process P005 started with pid15 OS
id10060 LOGSTDBY Apply process P007 started
with pid17 OS id10064
SQL Apply Process continues processing
69Refresh Process
- Export unique standby db objects
- Scripts to recreate
- Backup primary db
- Create standby control file
- Recover primary db backup on standby
- Use standby control file
- Create physical standby
- Convert to logical standby
- Import unique standby db objects
- Recreate with scripts
70Unsupported Record
- ORA-16211
- SQL apply process fails
- Must skip table or refresh standby
- Oracle SR tells me to
- Add all column supplemental log group to table
- Rebuild standby
- Or reinstantiate the table
- Needed for each table
- Not an easy process
71Unsupported Record
Thu Oct 11 101158 2007 LOGMINER Log Auto
Delete - deleting /oraarch01/BRHBETA/LOG_2005_1_6
29245032.arc Deleted file /oraarch01/BRHBETA/LOG_
2005_1_629245032.arc Thu Oct 11 101555 2007
LOGMINER WARNING - Invalidated 4 LCRs Thu
Oct 11 102029 2007 LOGSTDBY stmt
"BRH"."XXSUN_INV_ITEMS_INT" unsupported
LOGSTDBY status ORA-16211 unsupported record
found in the archived redo log ORA-06512 at
"SYS.DBMS_INTERNAL_LOGSTDBY", line 4717
ORA-06512 at line 1 LOGSTDBY id XID
0x0009.02e.0001127d, hSCN 0x0782.a9016545, lSCN
0x0782.a9016545, Thread 1 LOGSTDBY Apply
process P007 pid23 OS id16578 stopped Thu Oct
11 102029 2007 Errors in file
/orahome01/admin/BRHBETA/bdump/brhbeta_lsp0_13625.
trc ORA-12801 error signaled in parallel query
server P007 ORA-16211 unsupported record found
in the archived redo log LOGSTDBY Analyzer
process P003 pid19 OS id16570 stopped LOGSTDBY
Apply process P005 pid21 OS id16574 stopped
LOGSTDBY Apply process P006 pid36 OS id16576
stopped LOGSTDBY Apply process P004 pid34 OS
id16572 stopped
72Unsupported Record
- What causes this?
- Metalink 304061.1
- Possible causes
- Direct path insert on partitioned table
- Table has 500 columns
- Is this a standby?
- At any time this error may happen
- How to predict/prevent?
73Compile Invalid Objects
- In Logical Standby
- Execute utlrp.sql
- 2 hours go by
- Not much changed
- Disable Guard for session
- Alter session disable guard
- alter database guard standby
- Recompile runs in 2 minutes
- Alter session enable guard
- When normal things dont work
- Perhaps guard enabled is the problem
- Guard level is the problem (all vs standby)
74Import Into Logical Standby
- For recompile we used
- Alter session disable guard
- Refresh Logical Standby
- Unique db objects exported before refresh
- Must be imported after refresh
- Import doesnt use SQLPlus session
- Alter database guard standby
75Conclusion
- Logical standby
- Lots of errors
- Many require refreshing standby
- Lots of DBA support needed
- For all of this support
- What do you have?
- Do you know what is in the standby
- Reporting?
76Conclusion
- Physical standby
- Is solid, dependable
- No issues
- Logical standby
- Is it really a standby?
- Is it ready for failover?
- Is it providing complete data for reports?
- Lots of issues
- Is it worth the effort/risk?