Title: Recovery
1Recovery
2Implementing atomicity
- Note, when a transaction commits, the portion of
the system implementing durability ensures the
transactions effects are recorded in persistent
storage. - However, while a transaction is active (not yet
committed), failure of the transaction is a real
problem for atomicity -- the DB is left in an
inconsistent state. --gt NOT GOOD!
3Reasons for Rollback
- Any sort of System SW or HW crash.
- Transaction abort
- User initiated
- Transaction, T, itself - e.g., error handling.
- System - e.g.,
- T is involved in deadlock
- Letting T complete may lead to inconsistency
(I.e., violate consistency property).
4How to rollback in immediate update systems
- Immediate update system
- If Ts request to write x is granted, x is
immediately updated in the DB. - If Ts request to read x is granted, the value of
x is returned. - Note, concurrency control is relied upon to
prevent reads of data that were written by
uncommitted transactions. - Immediate update systems maintain a log of
records.
5Log in immediate update system
- Only append log records -- never change or
delete. - System uses log to maintain atomicity and
durability. - For durability
- Log used to restore effects of committed
transactions. - Log is a sequential file on disk
- Often, multiple copies kept on separate
non-volatile storage.
6Log - first assume all log records only on disk.
- Update record (for writes)
- Before image (aka undo record)
- Transaction id - the transaction executing the
write. - To rollback T, scan log backwards starting from
last record. Write the before image of each of
Ts log records to DB. - To improve performance (avoid a scan of complete
log), have each T record a begin record when T
starts. - Also to improve performance, have each log record
of a particular T be linked together
(stack-wise). - If transaction commits, write a commit record.
- If transaction aborts, do rollback, then write an
abort record.
7Savepoint record
- To increase flexibility in doing rollbacks, a
transaction can specify a savepoint during its
execution. -- Then one can do a partial rollback
to a specified savepoint (especially useful for
transaction error handling). - Savepoint record contains transaction id,
savepoint id (and any other useful information). - To rollback to a specified savepoint, scan log
backward to the specified savepoint record,
applying the before-image to the DB.
8Example of use of savepoint
- begin_transaction()
- stmt1
- sp1 create_savepoint()
- stmt2
- sp2 create_savepoint()
- if (cond1) rollback(sp1)
- else if (cond2) rollback(sp2)
- (sp2)
- else
- commit()
-
9Concurrent transactions
- In order to use the log, the system must
determine which transactions have completed
(commited or aborted), and which are active. - All active transactions need to be aborted.
10What does commit mean here?
- If commit record has not been written to log and
database fails, then the transaction will be
rolled back. - SO! Commit means the commit record has been
written to the log.
11Checkpoints
- A checkpoint record gives all currently active
transactions (e.g., written by the transaction
manager to the log). - To use checkpoint record, scan backward to most
recent checkpoint record. If T is listed there
and there has been no completion for T (abort or
commit) seen so far, then backward scan continues.
12Log example
- B1 // T1 begin
- B2
- U1 // T1 update
- C1 // T1 commit
- CK T2 // checkpoint
- U2 // (a)
- U2 // (b)
- ltltfail!gtgt --gt scan back, undo (b), undo (a),
discover only T2 is active, ignore C1, ignore U1,
stop at B2.
13Another log example
- . . .
- B2
- B3 /\ ok, T3 scan complete
- B1 /\ ok, T1 scan complete
- C2 // T2 commit /\ ignore
- B5 /\ ignore
- U3 /\ undo
- U5 /\ ignore
- A5 // T5 abort /\ ignore
- CK T4, T1, T3 /\ only T1, T3 matter
- U1 /\ undo
- U4 /\ can ignore
- B6 /\ done with T6
- C4 // T4 commit /\ T4 completed!
- U6 /\ T6 active -undo
- U1 /\ T1 active -undo
- ltlt fail gtgt
14Yet another log example
- . . . /\ continue for T1
- B6 /\ T6 scan done
- U5 /\ ignore
- U4 /\ ignore
- CK 1, 4, 5, 6 /\ only T1, T6 matter
- A5 /\ T5 done
- U4 /\ ignore
- C4 /\ T4 done
- U6 / T6 active-undo
- ltlt failgtgt
15Write-ahead log
- MUST always write log before DB is updated.
- Suppose dont do write-ahead, T executes update
--gt first change DB then write log. - If crash between change DB and write log, there
is no way to recover DB to a consistent state. - Suppose do do write-ahead, T executes update --gt
first write log, then change DB. - If crash between write log and change DB, the
recovery will write the before image (which is
the same as is currently stored in DB).
16Performance stinks because each DB write requires
two I/O writes!
- Use volatile storage for the last part of the log
-- log buffer. - Log buffer periodically flushed to log.
- When system crashes, the log buffer is not
available. - Note, using cache is analogous
- Want cache to improve performance, but
- Cache data (DB and maybe log buffer) are lost.
17Modify previous scheme for log buffer and cache
- Recall, must write record to log before writing
to DB. - So, A dirty page in cache is not written to DB
until after the log buffer containing
corresponding data item is appended to log.
Either - Append record to log buffer. Eventually the
buffer is flushed and can write dirty cache page. - Append record to log buffer, then immediately
write log buffer. AKA forced. - For a normal (unforced) write, DMA can proceed
concurrently with transaction execution. - BUT! for a forced write, cannot return from disk
write system call until the write is complete.
18Alternative implementation for lug buffer and
cache
- Add (overhead) data
- Add a log sequence number (LSN) to each log
record. - For each DB page, the LSN of the log record for
the most recent change to the DB page.
19Continuing alternative implementation
- When space needed in cache, choose a dirty page,
P, to write out - Determine if log buffer contains the update
record whose LSN is the LSN stored in P. - If so, must force write log buffer before P is
written to DB. - If not, the log on mass storage is already up to
date wrt P.
20Example
- DISK DB Log
- Page LSN 1
- O 3
- P 95 99 U5(m)
- Q 3
- Volatile Cache Log Buffer
- Page LSN LSN record
- P (x, y, z) 95, 101, 102, 103 100 . . .
- Q (a, b, c) 99 101 U1(x)
- O (l, m, n) 3 102 U2(y)
- 103 U2(x)
- To remove clean O, no change to DB
- To remove dirty Q, cache-gtQ-gt LSN lt logs
maximum LSN. Therefore can just write out Q. - To remove dirty P, cache-gtP-gtLSN gt logs
maximimum LSN. Therefore, must force Log Buffer
(from beginning to P-gtLSN), then can write out
P.
21Force policy (on commits)
- Force policy
- T wants to commit, but first!
- If Ts last update is still in log buffer, force
log buffer. (before image is durable) - Pages (dirty) in cache updated by T are forced
- (new values durable)
- Then, log Ts commit into log buffer.
- When that part of the log buffer is written, then
T is durable.
22 23For no-force commit policy New log record type
after-image
- After-image (aka redo record) is a copy of the
new value of the item. - The motivation for having after-image in log is
to improve disk access performance. I.e., new
data is durable if the log buffer has been
written out (even though the page in cache has
not). So, there is no required order between
writing commit record (to disk) and writing dirty
page).
24 25Three pass recovery Do, Undo, Redo
- Pass I Scan log backward to the most recent
checkpoint (determining which transactions to
rollback, I.e., are active at crash) - Pass 2 Replay log from checkpoint. For update
records (commited, aborted and active) update
corresponding items in DB (use after-image). Now
DB is up-to-date wrt all changes prior to crash. - Pass 3 Scan backward to roll back all
transactions active at the time of crash. Se
before-image to reverse DB value. This pass ends
when begin of all roll back transactions have
been reached.
26Caveat to Do, Undo, Redo
- Checkpoint followed by T update, then T abort
- Update was rolled back to data before abort was
logged. So updates are restored, but not
rolledback. - To fix, an abort that had updated x need TWO
records in the log - update (xold, xnew), followed by
- compensation (xnew, xold).
27 28- Class discussion of ARIES.