Automating the Detection of Snapshot Isolation Anomalies - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Automating the Detection of Snapshot Isolation Anomalies

Description:

It specifies that all transactions occur in a completely isolated fashion. ... To read a data item, it obtains a shared lock on it, and reads the latest version. ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 42

Provided by: ajitav

Category:

more less

Transcript and Presenter's Notes

Title: Automating the Detection of Snapshot Isolation Anomalies

1
Automating the Detection ofSnapshot Isolation
Anomalies

Sudhir Jorwekar (IIT Bombay)
Alan Fekete (Univ. Sydney)
Krithi Ramamritham (IIT Bombay)
S. Sudarshan (IIT Bombay)

Presented By Ajitav Sahoo (08305011)
2
Motivation

Non-serializable executions are possible in
Snapshot Isolation.
Many industry applications run on systems that
use Snapshot Isolation as the isolation level
(highest level of consistency).
e.g. Oracle, PostgreSQL, SQL Server etc.
Theory for identifying such anomalies already
exists (Needs manual analysis)

3
Agenda

Background on OLTP and Concurrency control
Introduction to Snapshot Isolation Protocol
Examples of SI-Anomalies
Existing Theory for Detecting SI-Anomalies
Analyzing the transaction programs
Reducing the false positive
Results
Conclusion

4
OLTP Environment

Data stored in DBMS
Integrity constraints applied
Static collection of application programs
Possibly parameterized
Can query and/or modify database
Written to assume integrity constraints
Users run application programs
Same program run many times with different
parameters, perhaps concurrently

5
Concurrency Problems

Interleaving of read/writes by different
applications can leave data not matching.
Violation of Integrity constraints
Usual Problems
Lost update (txn states not present in final
state)
Inconsistent read (partial effect of txn seen)
DBMS must control concurrent execution of
transactions, to ensure consistency.

6
Serializability

A schedule/history is serializable,
provided it is equivalent (in outputs and in
impact on final state) to
an execution with each application running alone
one after another serially.
It specifies that all transactions occur in a
completely isolated fashion.
If the system detects a concurrent txn in
progress which would violate the serializability,
it must force that txn to roll back.

Highest level of Consistency
7
Two-Phase Locking Protocol

This is a protocol which ensures
conflict-serializable schedules.
Phase 1 Growing Phase
transaction may obtain locks
transaction may not release locks
Phase 2 Shrinking Phase
transaction may release locks
transaction may not obtain locks
Although assures serializability, doesnt ensure
freedom from deadlocks and cascading roll-back is
possible.

8
Two-Phase Locking Protocol (contd..)

Strict Two-phase locking
Modified version of 2PL
A transaction must hold all its exclusive locks
till it commits/aborts.
Rigorous Two-phase locking
Even stricter than strict-2PL
Here all locks are held till commit/abort.
Multiversion Two-phase locking
keep old versions of data item to increase
concurrency.
Differentiates between read-only transactions and
update transactions

9
Multiversion Two-Phase Locking

Update transactions acquire read and write locks,
and hold all locks up to the end of the
transaction. That is, update transactions follow
rigorous two-phase locking.
Each successful write results in the creation of
a new version of the data item written.
To read a data item, it obtains a shared lock on
it, and reads the latest version.
Read-only transactions are assigned a timestamp
by reading the current value of ts-counter
before they start execution they follow the
multiversion timestamp-ordering protocol for
performing reads.

10
Other Weaker Isolation Levels

Read Uncommitted (one txn may see uncommitted
changes by some other txn).
Read Committed (data records retrieved by a query
are not prevented from modification by some other
transaction).
Repeatable Read (the txn acquires read locks on
all retrieved data, but does not acquire range
locks.
These weaker levels bring anomalies like
Nonrepeatable read, Dirty read, and Phantom read.

11
Drawbacks of Serializability

Best known way to ensure serializable execution
is strict 2PL, which get locks before reading or
writing.
Application programs are blocked (even read-only)
and leads to low concurrency.

Holds locks until application commits.
Throughput is drastically lowered.
12
What is Snapshot Isolation?

A transaction T executing with Snapshot Isolation
takes snapshot of committed data at start
always reads/modifies data in its own snapshot
updates of concurrent transactions are not
visible to T
writes of T complete when it commits
Intuition snapshot should be consistent, if the
database was consistent before.
Read doesnt give current value (instead gives
value at it was when transaction started)
Not equivalent to a serial execution
In a serial execution, one transaction would see
the other

13
Implementation of Snapshot Isolation
T1 T2 T3
R(Y)?0 Commit
Start R(X) ? 0 W(Y 1)
W(X2) W(Z3) Commit
R(Z) ? 0 R(Y) ? 1 W(X3) Commit-Req Abort

Commits only if no other concurrent transaction
has already written data that T1 intends to
write.
(First Committer wins)
Most implementations use exclusive locks on
modified rows
(First Updater wins Policy)

Concurrent updates not visible Own updates are
visible Not first-committer of X Serialization
error, T2 is rolled back
14
First Committer Wins
T1 deposits 40 in X T2 deposits 70 in X
R(X,100)
R(X,100)
W(X,170)
W(X,140)
Commit
Commit Request Serialization problem detected by SI, ABORT!
15
Benefits of SI

No extra storage for multiple versions.
Reading is never blocked, even by concurrent
writer. So, throughput is good.
Performance similar to Read Committed
Better concurrency than serializable isolation
level
Prevents classical anomalies
No lost update (because first committer wins)
No inconsistent read (all txn reads see the same
set of complete txn)

16
Snapshot Isolation Anomalies

SI breaks serializability when transactions
modify.
Among concurrent transaction, neither sees the
effect of the other.
Doesnt always give serializable execution,
integrity constraints can be violated.
Two common types of anomalies Fekete et al.
SIGMOD05
Write skew anomaly
Read-only Transactional anomaly

17
Anomaly Write Skew (with Updates)
Constraint XYgt0 Initially, X 100 and Y 0
T1 withdraw 70 from X T2 withdraw 90 from Y
R(X,100)
R(Y,0)
R(X,100)
R(Y,0)
W(Y,-90)
W(X,30)
Commit
X Y -60
Skewed Write
18
Anomaly Write Skew (with Inserts)

A voucher with unique voucher is to be created
for every bill
Programmer codes
m select max(vno)
insert new tuple (billno, voucherm1)
Let max(vno)10 and new vouchers for bill
numbers X and Y are to be created

T1 Insert bill no.X T2 Insert bill no.Y
R(max(vno),10)
R(max(vno),10)
Insert(X,11)
Insert(Y,11)
Commit
Commit
Duplicate voucher created
Skewed Write
19
Detecting Anomalies
Goal is to ensure that every possible execution
in given application is serializable (not just a
particular execution).

Application consists of transaction programs
from which different transactions are generated
depending on
the parameter values (input)
the control structures (decision based on value)
Transactions might interleave in different ways.
Hence, it is infeasible to enumerate every
possible execution.

20
Detecting Anomalies Static Analysis

SDG Static Dependency Graph Fekete et al.
TODS05
Nodes Transaction Programs as nodes.
Edges Let T1 and T2 be any execution instances
of transaction program P1and P2 respectively
P1 ? P2 if there can exist some T1 that conflicts
with some T2
it is marked vulnerable if dependency does not
prevent concurrent execution

VUL
P1
P2
R
21
Conflicts under SI

Read Dependency (WR) P1? P2
P1 modifies a data item, which is seen by P2s
read.
P1 must completely precede P2 (because of
snapshot read)
Write Dependency (WW) P1? P2
P1 modifies a data item, that is later modified
by P2
P1 must completely precede P2 (because of
first-committer-wins)
Antidependency (RW) P1? P2
P1 reads a data item, and doesnt see P2s
modifications
Either P1 precedes P2, or P1 and P2 are
concurrent.

22
Static Analysis (contd)

Conditions for Vulnerability
RW conflict from T1 to T2, without WW conflict
T1 and T2 are concurrent.
T1 T2 and T1 T2
Pivot (Dangerous Structure)
A transaction program P is a pivot in a SDG, if
there is a cycle containing subpath with

Pivot
P
RW
WW
Q
S
R
VUL
VUL
P1
P2
R
P3
R
23
Static Analysis (contd)

Theorem Fekete TODS05
Absence of pivot in the SDG(A) ? serializable
execution under SI.
What if pivot exists in the SDG ?
Modify some application programs without changing
business logic.
Change at least one vulnerable edge to be
non-vulnerable.
Not always possible, also may have different
impacts on concurrency.

24
Avoiding Anomalies

S2PL for Pivots
Assumption is that the SI transactions obtain
write locks for data items written.
Introduces commit order on vulnerable edge.
Materializing Conflicts
Make both transactions write a common data item
(in a new table).
converts vulnerable edge to non-vulnerable (or
protected).
Promoting reads to writes
Promoting reads in P1 to writes will cause T1 T2
converts outgoing vulnerable edge to
non-vulnerable (or protected).
select for update (in case of Oracle)

WW
25
Transaction Programs in SQL

Identifying Set of Transaction Programs (SQL)
Program Analysis.
May not be possible for large applications.
SQL traces at backend.
May not cover all the transaction programs.
Characteristics of Transaction Programs (in SQL)
SQL statements
SELECT, INSERT, DELETE etc.
Parameterization
WHERE colUserInput

26
Mini Banking example

Schema
account (accno, balance, acctype)
customer (id, name, address)
owner (id,accno)
txn (txnid, txntype, accno, id, amount,
timestamp)
batchaudit (bid, start, end, inamount, outamount)
Transactions
Creating new Account (CA1, CA2)
Update Customer Info (UCI)
Deposit (DEP)
Withdraw (W1, W2)
End-of-day Audit (EOD)

27
Identifying Dependencies

Syntactic read and write sets
rset(P) is the set of columns read by transaction
program P
wset(P) is the set of columns written by P.
For UCI transaction shown below
begin
select from customer where idid
update customer set name?, address? where
idid
commit

rset(UCI) customer.id, customer.name,
customer.address wset(UCI) customer.name,
customer.address
28
Syntactic Column-based Analysis

Column-based Syntactic Dependency Graph (CSDG)
Nodes are transaction programs
An edge Pi ? Pj is marked when
rset(Pi) ? wset(Pj) ? ? or wset(Pi) ? rset(Pj) ?
? or
wset(Pi) ? wset(Pj) ? ?
An edge Pi ? Pj is marked when as
pseudovulnerable (PVUL) whenever
rset(Pi) ? wset(Pj) ? ?
P is a syntactic pseudopivot, if some cycles of
edges in CSDG contains a subpath

PVUL
PVUL
P1
P2
R
P3
R
29
Syntactic Analysis (contd)

An edge in CSDG is pseudovulnerable, when
corresponding edge in SDG is vulnerable.
Every pivot is a syntactic pseudopivot. but not
vice versa
Syntactic column-based analysis is safe.
No false negatives (where a potential anomaly is
not identified)

Theorem If a set of transaction programs
contains no syntactic pseudopivots, then every
execution under SI will in fact be serializable.
30
False Positives
Syntactic Pseudopivot
Many transactions which can never cause any
anomaly are detected as syntactic pseudopivot.
False Positives
Pseudovulnerable
CSDG for Banking Application
31
Eliminating False Positives 1 Modification
Protected Readset (MPR)

UCI has a pseudovulnerable self edge.
due to syntactic conflict between select and
update.
seems to imply that two copies of UCI could
create an anomaly.
But selected row is updated subsequently so first
committer wins, the other aborts.
Lowest level data item in PostgreSQL and Oracle
Row.
rset and wset are in terms of rows.
First Committer Wins rule applied at row level.

32
Modification Protected Readset (MPR)

MPR-Select
A statement S in a transaction program P1 is said
to be MPR w.r.t transaction program P2, if either
rset(S) ? wset(P2) ? ?, or
the WHERE clause predicate C used in S is stable
w.r.t P2.
Stable Predicate
Predicate C used in P1 is stable w.r.t P2, iff
for every possible schedule H containing
execution instances of P1 and P2 as T1 and T2
resp, the set of rows identified by C in T1
doesnt depend on the serializable order.

33
MPR Transactions

MPR Transaction
A transaction program P1 is said to be MPR w.r.t
P2 if, every select query as well as every
subquery of an insert, delete or update in P1 is
an MPR-Select w.r.t P2.
WHERE clause predicates of every update/delete
statement in P1 are stable w.r.t. P2.
If a transaction program P1 is MPR w.r.t P2, and
if the DBMS uses row-level granularity for the
first-committer-wins check, the edge from P1 to
P2 can not be vulnerable. (Theorem)

34
Eliminating False Positives 2New Identifier
Generation Test

begin
select max(accno)1 as m from account
insert into account(accno, balance, type) values
(m, 0, type)
commit
rset(CA1) account.accno, wset(CA1)
account.
New Identifier Generation Analysis
for assigning new primary key (numeric)
if two transactions read same max value and
create same identifier, SI will not prevent
concurrent execution
but primary key or referential constraint will!
Explicitly check Select-max conflict, outside
snapshot

35
Eliminating False Positives 3Existence Check
Before Insert
begin select accno as found from account where
accnom if(foundnull) insert into
account values (m, 0, type) else print
Error Requested account number is already in
use endif commit

rset(CA1) account.accno, wset(CA1)
account.
Select with given PK ... if not found (Insert
values with same PK)
Select using primary key can not conflict with
Insert of other transaction having same pattern.

36
After Eliminating False Positives

Eliminated False Positives
UCI MPR
DEP MPR
CA1 CA2 NIGA / ECIA

Remaining Syntactic Pseudopivot
37
Tool for Analyzing an application

The automated tool has the following flow of
activities
Find the set of transaction programs.
Extract the syntactic read and write sets.
Create CSDG using conservative syntactic analysis
and detect syntactic pseudopivots.
Eliminate/Reduce false positives.
Select appropriate techniques to avoid anomalies
(currently done manually).
After using the techniques to avoid anomalies,
we can rerun the analysis to check whether they
worked.

38
Experimental Results
TPC-C Mini Bank Acad. Finance
Distinct Transactions 7 7 26 34
Syntactic Pseudopivots detected 4 7 25 34
EFP1 MPR Detected 3 2 11 4
EFP2 New Identifier Generation Protection detected 0 2 3 3
EFP3 Existence check before Insert protection 0 0 2 0
39
Conclusion

Theory of Syntactic Analysis to obtain a superset
of transactions that may cause anomalies.
Studied some general patterns of false positives
and proposed sufficient conditions for
identifying such transactions.
Developed a tool that can automate the testing of
database applications for safety against SI
anomalies
identified some genuine problems in production
code.
Implementation issues discussed in paper.

40
Thank You!
41
References

S. Jorwekar, A. Fekete, K. Ramamritham, S.
Sudarshan in VLDB 2007 Automating the Detection
of Snapshot Isolation Anomalies
A. Fekete, D. Liarokapis, E. O'Neil, P. ONeil,
D. Shasha in TODS2005 Making Snapshot Isolation
Serializable
A. Fekete, E. O'Neil, P. ONeil in SIGMOD 2004
A Read-only Transaction Anomaly under Snapshot
Isolation

Write a Comment

User Comments (0)