Title: Ch. 10. Transaction Manager Concepts
1Ch. 10. Transaction Manager Concepts
2Transaction Manager Concepts
- The transaction manager (TM) furnishes the A, C,
and D of ACID. - It provides the all-or-nothing property
(atomicity) by undoing aborted transactions,
redoing committed ones, and coordinating
commitment with other TMs if a transaction
happens to be distributed. - It provides consistency by aborting any
transactions that fail to pass the RM consistency
tests at commit. - It provides durability by forcing all log records
of committed transactions to durable memory as
part of commit processing, redoing any recently
committed work at restart. - The TM together with the log manager and the lock
manager supplies the mechanism to build RMs and
computations with the ACID properties.
3Normal Execution
new transaction
Begin_Work ( )
TRID
Lock Manager
Work Requests
Normal Functions
4. Write Commit log record and ?
Log Manager
Callback Functions UNDO, REDO, COMMIT
1. Want to Commit
Commit_Work ( )
2. Commit Phase 1?
3. YES to Phase 1
5. Commit Phase 2
6. Acknowledge
RMs
TM
4Transaction Abort
Application
Rollback_Work ( )
1. rollback transaction
2. Read transactions log records
Log Manager
5. write abort records
Note Rollback to a savepoint has similar logic.
RMs
TM
5DO-UNDO-REDO Protocol
- The DO-UNDO-REDO protocol is a programming style
for RMs implementing transactional objects - DO program
- UNDO program
- REDO program
- RM have following structure
New State
Old State
DO
Log Record
New State
Old State
DO
UNDO
Log Record
Old State
New State
REDO
Log Record
Normal Function DO program Callback Functions
UNDO REDO program
RM
6Restart
- The TM regularly invokes checkpoints during
normal processing ? it informs each RM to
checkpoint its state to persistent memory. - At restart, the transaction mgr. scans the log
table forward from the most recent checkpoint to
the end. - For each transaction that has not committed
(e.g., T ? ) the TM calls the UNDO( ) callback of
the RMs to undo it to the most recent persistent
savepoint.
Checkpoint
Crash
T1
T2
T3
7Value Logging
- Each log record contains the old and the new
states of the object. - UNDO Program set the object to the old state.
- REDO Program set the object to the new state.
- Example
- struct value_log_record_for_page_update
-
- int opcode / opcode will say page update /
- filename fname / name of file that was updated
/ - long pageno / page that was updated /
- char old_valuePAGESIZE / old value of page
/ - char new_valuePAGESIZE / new value of page /
8Logical Logging
- Value logging is often called physical logging
because it records the physical addresses and
values of objects - Logical (or operation) logging records the name
of an UNDO-REDO function and its parameter - It assume that each action is atomic and that in
each failure situation the system state will be
action consistent each logical action will have
been completely done or completely undone.
9Logical Logging (Contd)
- Problem Partially complete actions can fail, and
the UNDO of these partial actions will not be
presented with an action-consistent state.
step 1
step 2
step 3
step 1
step 2
step 3
logical action 1
logical action 2
a transaction
10Physiological LoggingMotivation
- Physiological logging is a compromise between
logical and physical logging. It uses logical
logging where possible. - There are the ideas that motivate physiological
logging - Page actions Complex actions can be structured
as a sequence of page actions. - Mini-transaction Page actions can be structured
as mini-transactions that use logical logging. - When the action completes, the object is updated.
- An UNDO-REDO log record is created to cover that
action. - These actions are atomic, consistent, and
isolated.
11Physiological LoggingMotivation (Contd)
- Log-object consistency It is possible to
structure the system so that at restart, the
persistent state is page-action consistent. - The log can then be used to transform this
action-consistent state into a transaction-consist
ent state at restart. - Note Physiological log records are physical to a
page, and logical within a page.
12Physiological LoggingAn Example
- Consider the insert that has the following
logical log record - ltinsert op, tablename A, record value rgt
index 1
index 2
File C
File B
Table T (File A)
Key B
Key C
13Physiological LoggingAn Example (Contd)
- This insert operation involves three page actions
(we assume that B-tree splits do not happen).
The corresponding physiological record bodies
are - ltinsert op, base filename A, page number 508,
record value rgt - ltinsert op, base filename B, page number 72,
index record value sgt - ltinsert op, index filename C, page number 94,
index record value tgt - Fundamental idea Log records are generated on a
per-page basis. Log records are designed to make
logical transformation of pages.
14Physiological LoggingDuring Online Operation
- We call normal operations without failures online
operations. - To allow updates, all page changes must be
structured as mini-transactions of this form - Mini_trans()
- lock the object in exclusive mode
- transform the object
- generate an UNDO-REDO log record
- unlock the object.
15Physiological LoggingDuring Online Operation
(Contd)
- The mini-transaction approach ensures online
consistency - Page-action consistency volatile and persistent
memory are in a page-consistent state, and each
page reflects the most recent updates to it. - Log consistency The log contains a history of
all updates to pages.
16One-Bit Resource MgrRequirements
- This RM manages an array of bits stored in a
single page. Each bit is either free (TRUE) or
busy (FALSE).
One-Bit RM
Client 1
1 F
2 F
3 T ? F
4 F
5 F ? T
. . .
get_bit ( )
one page
3
locked
get_bit ( 5)
Client 2
unlocked
lsn
17One-Bit Resource MgrRequirements (Contd)
- Requirements
- Page Consistency
- No clean free bit has been given to any
transaction. - Every clean busy bit has been given to exactly
one transaction. - Dirty bits are locked in exclusive mode by the
transaction that modified them. - The log sequence number (page lsn) reflects the
most recent log record for this page. - Log Consistency
- The log contains a log record for every completed
mini-transaction update to the page.
18One-Bit Resource Mgrgive_bit( ) 1
- give_bit (int i) / force a bit /
- get XLOCK on the bit
- if the XLOCK is granted
- then
- get the page semaphore
- free the bit
- generate log record saying bit is free
- write log record and update lsn / page is now
consistent / - free page semaphore
-
- else
- abort callers transaction / caller does not
own the bit /
19One-Bit Resource Mgrgive_bit( ) 1 (Contd)
- Note This code has all the elements of a
mini-transaction. - It is well formed and two-phased with respect to
the page semaphore. - It provides a page action-consistent
transformation of the page.
20One-Bit Resource Mgrgive_bit( ) 2
- get_bit (void) / allocate a free bit to and
returns bit index / - get the page semaphore
- repeat_until end of bit array
- find the next free bit and XLOCK it
- if lock is granted / the bit is free /
- then mark the bit busy
- generate log record describing update
- write log record and update lsn
- / page is now consistent /
- give up semaphore
- return the bit index to caller
- if no free bits were found during the repeat loop
- then
- abort transaction
- return -1 to caller
21The FIX Rule
- While the semaphore is set, the page is said to
be fixed, and releasing the page is called
unfixing it. - Fixed Rule
- Get the page semaphore in exclusive mode prior to
altering the page. - Get the semaphore in shared or exclusive mode
prior to reading the page. - Hold the semaphores until the page and log are
again consistent, and read or update is complete.
22The FIX Rule (Contd)
- Note This is just two-phase locking at the
page-semaphore level. - Isolation Theorem tells us that all read and
write actions on page will be isolated. - Page updates are actually min-transactions.
- When the page is unfixed, the page should be
consistent and the log record should allow UNDO
or REDO of the page transformation.
23Multi-Page Actions
- Some actions modify several pages at once.
- Examples
- Inserting a multi-page record.
- Splitting a B-tree node.
- These actions are structured as follows
- Fix all the relevant pages
- Do all the modifications and generate many log
records. - Unfix the page.
24Dealing with Failures
- Page actions provide page consistency even if
they fault. - Copy the page at the beginning of the page
action then - if anything goes wrong with the page action prior
to writing the log record, the page action just
returns the page to its original values by
copying it back. - Complex operations depend on transaction UNDO to
roll back. - Each complex action should start by declaring a
savepoint. - If anything goes wrong during a page action, the
operation first makes that page consistent. - The action can then call Roll_work () to return
to the savepoint - Note The save point wraps the complex action
within a subtransaction so that the complex
action can be undone if it fails.
25Online Consistency Restart Consistency
lsn time stamp
- Online log consistency requires that volatile log
contain all log records up to and including
vvlsn - VVlsn ? VLlsn
Volatile Page Versions
. . .
VVlsn
Volatile Log Records
. . .
VLlsn
Durable Log Records
. . .
DLlsn
Persistent Page Versions
. . .
PPlsn
26Online Consistency Restart Consistency (Contd)
- Restart consistency ensures that if a transaction
has committed with commit_lsn, then that commit
record is in the durable log - commit_lsn ? DLlsn
- In addition, restart consistency guarantees that
if version X of the volatile copy overwrites the
durable copy, then the log records for version X
are already present in the durable log - VVlsn ? DLlsn
- Note At restart, all volatile memory is reset
and must be reconstructed from persistent memory.
We must have - PVlsn ? DLlsn
- commit_lsn ? DLlsn
27Write Ahead Log (WAL) Protocol
- Protocol
- Each volatile page has a LSN field naming the log
record of the most recent update to the page. - Each update must maintain the page LSN field.
- When a page is about to be copied to persistent
memory, the copier must first use the log manager
to copy all log records up to and including the
pages LSN to durable memory (force them). - Once the force completes, the volatile version of
the page can overwrite the persistent version of
the page. - The page must be fixed during the writes and
during the copies, to guarantee page action
consistency. - Effect The log record of a page must be moved
to durable memory prior to overwriting the page
in persistent memory.
28Force-Log-at-Commit
- Question What if no pages were copied to
persistent memory, and the transaction committed? - If the system were to restart immediately, there
would be no record of the transactions updates,
and the transaction could not be undone. - Solution Force-Log-at-Commit rule.
- Rule The transactions log records must be moved
to durable memory as part of commit - Implementation When a transaction commits, the
TM writes a commit log record and requests the
log manager to flush the log. - As a consequence, all the log records prior to
the commit record are flushed as well.
29Physiological Logging Summary
- The RM must observe the following three rules
- Fix rule Cover all page reads and page writes
with the page semaphore. - Write-ahead log (WAL) Force the pages log
records prior to overwriting its persistent copy. - Force-log-at-commit Force the transactions log
records as part of commit. - Note many systems use the physiological design.
30UNDO Compensation Log Records
- Question what should the page LSN become when an
action is undone? - If subsequent updates to the page by other
transactions have advanced the log sequence
number, the LSN should not be set back to its
original value. - Strategy the UNDO looks just like a new action
that generates a new log record ? called a
compensation log record. - This approach makes page LSNs monotonic, an
essential property for write-ahead log. - A transaction that produced n new log records
during forward processing will produce n new log
records when the transaction is aborted.
31Idempotence and Testable
- Idempotent operation If the UNDO or REDO
operation can be repeated an arbitrary number of
times and still result in the correct state, the
separation is idempotent. - Example The operation move the reactor rods to
position 35 is idempotent. - The operation move the reactor rods down 2 cm
is not idempotent. - Note Repeated REDOs can arise from repeated
failures.
32Idempotence and Testable (Contd)
- Testable state If the old and new states can be
discriminated by the system, the state is
testable.
Old State
Test
Unknown State
New State
If an operation is not idempotent and the state
is not testable, the operation cannot be made
atomic.
33Idempotence of Physiological REDO
- Repeated REDOs can arise from repeated failures
during restart. - Example Suppose the following physiological log
record were redone many times - ltinsert op, base filename, page number, record
value gt - If no special care were taken, this repeated REDO
would result in many inserts of the record into
the page.
34Idempotence of Physiological REDO(Contd)
- The following logic makes physiological REDOs
idempotent - idempotent_physiologic_redo (page, logrec)
-
- if (page_lsn lt logrec_lsn)
- redo (page, logrec)
-
- Note The first successful REDO will advance the
page LSN and cause all subsequent REDO of this
log record to be null operations.
35The Need for the 2-Phase Commit Protocol
- Cancel key The client may hit the cancel key at
any time during the transaction. - Server Logic A server may require that a certain
set of steps be performed in order to make a
complete transaction. - Example At commit, many forms-processing systems
check the completeness of the data. - Integrity check SQL has the option to defer
referential integrity checks to transaction
commit. If any integrity checks are violated at
commit, the transaction changes cannot be
committed, and SQL wants to abort the
transaction.
36The Need for the 2-Phase Commit Protocol (Contd)
- Field calls It is possible that field calls
cannot acquire the locks or that the predicates
become false at the end of the transaction. In
such cases, the RM waits to abort the
transaction. - 2-Phase Commit Protocol When a transaction is
about to commit, each participant in the
transaction is given a chance to vote on whether
the transaction is a consistent state
transformation. If all the RMs vote yes, the
transaction can commit. If any vote no, the
transaction is aborted.
372-Phase Commit Commit
- Phase I
- Prepare Invoke each RM asking for its vote.
- Decide If all vote yes, durably write the
transaction commit log record. - Note The commit record write is what makes a
transaction atomic and durable. If the system
fails prior to that instant, the transaction will
be undone at restart otherwise, phase 2 will be
carried forward by the restart logic.
382-Phase Commit Commit (Contd)
- Phase II
- Commit Invoke each RM telling it the commit
decision. - Note The RM can now release locks, deliver real
messages, and perform other clean-up tasks. - Complete When all acknowledge the commit
message, write a commit completion record to the
log, indicating that phase 2 ended. When the
completion message is durable, deallocate the
live transaction state. - Note Phase 2 completion record, is used at
restart to indicate that the RM have all been
informed about the transaction
39Performance Advantage of Logging
- Commit copies no objects ? only log records ? to
durable memory. - logging converts random write I/Os to sequential
write I/Os.
402-Phase Commit Abort
- If any RM votes no during the prepare step, or if
it does not respond at all, then the transaction
cannot commit. - The simplest thing to do in this case is to roll
back the transaction by calling Abort_work ( ).
412-Phase Commit Abort (Contd)
- The logic for Abort_work ( ) is as follows
- Undo Read the transactions log backwards,
issuing UNDO of each record. The RM that wrote
the record is invoked to undo the operation. - Broadcast At each savepoint, invoke each RM
telling it the transaction is at the savepoint. - Abort Write the transaction abort record to the
log (UNDO of begin_work( )). - Complete Write a complete record to the log
indicating that abort ended. Deallocate the live
transaction state.
42Transaction Trees
- How does a transaction manager first hear about a
distributed transaction? - There are two cases
- Outgoing case a local transaction sends a
request to another node. - Incoming case a new transaction request arrives
from a remote transaction manager.
43Transaction Trees (Contd)
- The TM involved in a transaction form the
transaction tree.
TM
RM
Root TM (coordinator) performs the original
Began_work( ).
a session
RM
A participant
TM
TM
TM
RM
- This TM has one incoming session and two
outgoing sessions. - It is both a participant (on the incoming
session) and a coordinator (on the outgoing
session).
a local RM
TM
TM
44Distributed 2-Phase CommitCommit Coordinator
- The root commit coordinator executes the
following logic when a successful commit_work ( )
is invoked on a distributed transaction. - Local prepare Invoke each local RM to prepare
for commit. - Distributed prepare Send a prepare request on
each of the transactions outgoing sessions. - Decide If all RM vote yes and all outgoing
sessions respond yes, then durably write the
transaction commit log record containing a list
of participating RMs and TMs.
45Distributed 2-Phase CommitCommit Coordinator
(Contd)
- Commit Invoke each participating RM, telling it
the commit decision. Send commit message on
each of the transactions outgoing sessions. - Complete When all local RMs and all outgoing
sessions acknowledge commit, write a completion
record to the log indicating that phase 2
completed. - When the completion record is durable, deallocate
the live transaction state.
46Distributed 2-Phase CommitCommit Participant
- When the prepare message arrives, the participant
executes the following logic - Prepare ( )
- Local Prepare Invoke each local RM to prepare
for commit - Distributed prepare Send prepare requests on the
outgoing sessions. - Decide If all RMs vote yes and all outgoing
sessions respond yes, then the local node is
almost prepared. - Prepared Durably write the transaction prepare
log record containing a list of participating
RMs, participating TMs, and the parent TM. - Respond Send yes as response (vote) to the
prepare message on the incoming session. - Wait Wait (forever) for a commit message
coordinator.