Distributed Databases DDBs - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Distributed Databases DDBs

Description:

in the prepare state, a transaction can go to the commit or abort state ... Each site writes changes to log file, send ready to commit ... – PowerPoint PPT presentation

Number of Views:154

Avg rating:3.0/5.0

Slides: 46

Provided by: Vrb

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Databases DDBs

1
Distributed Databases (DDBs)

Chapter 11

2
Distributed Databases

Distributed Systems goal
to offer local DB autonomy at geographically
distributed locations
Multiple CPU's each has DBMS, but data
distributed

3
Advantages of DDBs

Distributed nature of some DB applications (bank
branches)
Increased reliability and availability if site
failure - also replicate data at gt 1 site
Data sharing but also local control
Improved performance - smaller DBs exist at each
site

4
Disadvantages Increased complexity

Additional functions needed
global vs. local queries
access remote sites, transmit queries and data
keep track of data and replication
execution strategies if data at gt 1 site
which copy to access
maintain consistency of copies
recover from site crashes

5
Architectures

Parallel Systems goal to construct a faster
centralized computer
Distributed Systems goal to offer local DB
autonomy at geographically distributed locations

6
Parallel DBSs

Shared-memory multiprocessor - get N times as
much work with N CPU's access - MIMD, SIMD -
equal access to same data, massively parallel
Parallel shared nothing - data split among CPUs,
each has own CPU, divide work for transactions,
communicate over high speed networks
LANs - homogeneous machines
CPU memory - called a site

7
Distributed DBSs (DDBS)

Distributed DB - share nothing
lower communication rates, WAN, heterogeneous
machines
Homogeneous DDBS
homogeneous same DBMSs
Heterogeneous DDBS
different DBMSs - need ODBC, standard SQL

8
Heterogeneous distributed

Data distributed and site has own DBMS
ORACLE at one site, DB2 at another, etc.
need ODBC, standard SQL
usually transaction manager responsible for
cooperation among sites
need TP monitor (software) - to tie together
heterogeneous DBs
uses RPCs - supplies needed data conversion and
to access data at other sites

9
Multidatabase

autonomous preexisting DBs form new database
collection of cooperating DBSs that are
heterogeneous
Each DB has its own local users, local
transparency and DBA
very high degree of local autonomy
need additional interface to allow users to
access global data

10
Federated DB

federated DB is a multidatabase
keeps a partial view of total schema
Each DB specifies export schema (view)
Each DB keeps track of schema can import
appears centralized for local autonomous users
appears distributed for global users

11
DDBS

Issues in DDBS in slides that follow

12
To process a query

Must use data dictionary that includes info on
data distribution among servers
Ensure atomicity
Parse user query
decomposed into independent site queries
each site query sent to appropriate server site
site processes local query, sends result to
result site
result site combines results of subqueries

13
Data fragments

Can distribute a whole relation at a site
or
Data fragments
logical units of the DB assigned for storage at
various sites
horizontal fragmentation - subset of tuples in
the relation (select)
vertical fragmentation - keeps only certain
attributes of relation (project) need a PK

14
Fragments contd

horizontal fragments
complete - set of fragments whose conditions
include every tuple
disjoint - tuples only member of 1 fragment
salary lt 5000 and dno4
Complete vertical fragment L1 U L2 U ...
Ln - attributes of R Li
intersect Lj PK(R)

15
Replication

Full vs. partial replication
Which copy to access
Improves performance for global queries but
updates a problem
Ensure consistency of replicated copies of data

16
Example fragmentation/replication

Example of fragments for company DB site 1
- company headquarters gets entire DB site
2, 3 - based on dept. no.

17
Concurrency Control and Recovery in Distributed
Databases

Distributed transactions inherit ACID properties
problems
failure at individual sites and communication
links
distributed deadlock (usually timeout)
distributed commit (use 2PC)
concurrency control with multiple copies of data

18
Distributed Commit

if updating data at different sites for same
transaction, when to commit?
difficult part - commit together so have
atomicity
Observation - 3 states of basic transactions
            active - can enter other 2 states
            committed - cannot leave this state
            aborted - cannot leave this state

19
Commit problems

To commit a distributed transaction need a
coordinator (originator) site (S1 in our example)
Given transaction T with with components T1 and
T2
where T1 executes at site S1
T2 executes at site S2
a) if S1 commits locally, sends commit to S2
BUT if S2 crashes, T2 aborts before completes
subtask - it is not atomic
b) if both T1 and T2 ready to commit S1 sends
message to commit
BUT if T2 crashes during commit, not atomic if
cant recover

20
To solve problem

How to solve this problem? Need an additional
state
prepare state
when T is done and ready to commit, it goes to
the prepare state
in the prepare state, a transaction can go to
the commit or abort state
(place prepare log on log buffer, forced to log
file - can reconstruct the transaction)

21
Use Two-Phase Commit (2PC)

used by commercial DBS to achieve coordinated
commit of distributed transactions
Need a coordinator - usually a site that is
participating in the transaction

22
2PC

Phase 1
When each subquery concludes, signal coordinator
Coordinator sends messages to all sites asking
them to prepare site transactions.
Each site writes changes to log file, send ready
to commit
if log fails, send cannot commit, else can
commit
If any site response is "unsuccessful"
coordinator sends messages to all sites to abort
- rollback locally

23
2PC

Phase 2
If all prepare requests successful responses,
coordinator sends commit
Sites record commit in log, update DB
subquery concluded prepare for commit
ready to commit

24
Solved by 2PC?

a) if S1 commits locally, sends commit to S2
BUT if S2 crashes, T2 aborts and it is not
atomic
With 2PC
if T1 completes Transaction
T1 is prepared
If T2 crashes
T2 is unsuccessful
T1 can still abort from prepared state

25
Solved by 2PC?

If both T1 and T2 ready to commit S1 sends
message to commit
BUT if T2 crashes before commit completed, not
atomic
With 2PC
if both T1 and T2 successfully complete
Transaction
T2 and T1 are both prepared
If T2 crashes can recover to prepared state and
commit

26
Any problems with 2PC?

Blocking seems to be the problem
In most practical applications, blocking caused
by 2PC are rare
Therefore, most systems use 2PC
However, when could 2PC block?
Coordinator site fails

27
3PC

However, when could 2PC block?
Coordinator site fails
Some participant declared ready to commit
This is a problem if site holding a lock.
Must wait (blocked) until coordinator recovers
(or elect new one)
Proposed 3PC for this problem
3PC
Requires an extra round of messages
Adds another state pre_commit

28
3PC

Phase 1 the same as 2PC
Coordinator sends vote_req to all sites
Sites respond commit or abort
If one aborts, all abort
Phase 2
If all reply vote_commit, coordinator sends
precommit (prepare_to_commit)
Sites that responded commit, wait
If receive precommit, send ack
Else abort

29
3PC

Phase 3
Coordinator collects all acks
Decides commit
Broadcasts global_commit
Site receiving global_commit, commits

30
3PC

Eliminates blocking due to coordinator failure
when
Site sent vote_commit, but not received
prepare_to_commit
Site received prepare_to_commit but not
global_commit

31
Quorum based 3PC

So what does adding the additional pre-commit
state do?
If in pre-commit state, can elect new coordinator
if it fails
New coordinator collects states from sites and
tries to resolve transaction
Otherwise new coordinator tries to establish
quorum

32
Quorum based 3PC establishing quorum

Every site assigned a vote Vi
Total votes V
Abort and commit quorum are Va and Vc
VaVcgt V
Before commit, must obtain commit quorum
At least one site in pre-commit and (sites in
wait sites in pre_commit) form quorum
Before abort, must obtain abort quorum
(Sites in wait pre-abort) for quorum

33
Distributed concurrency control using locking

Centralized vs. distributed solutions

34
Primary site technique

Primary site technique
all distinguished copies at this site
site acts as coordinator for all DB items
Read lock from primary site, access local copy
if write lock, DBMS must update all copies
disadv
all locking requests sent to same site -
bottleneck
if failure at primary site? must abort and
restart all Ts

35
Primary site with backup

Primary site with backup
all lock information at both sites, so failure
is less of a problem
disadv
slower - 2 sites to communicate with

36
Primary copy technique

Primary copy technique - distribute
responsibility
distinguished copies of different items at
different sites
(can also choose a new coordinator if a failure
- election to choose self as new site)

37
Voting

Voting - no distinguished copy
lock request sent to all sites with copy of data
if granted lock by majority of copies, hold lock
and inform sites with copies lock is granted
if don't get a majority, timeout and cancel
request sent to all sites
disadv
lots of message traffic

38
Additional Issues in DDBSs

Distributed transaction if data distributed
Transaction manager (TM) knows where data is
stored
TM decides which site starts local components of
transaction
Distributed query processing

39
Query Processing

Query parallelism
If Query (read-only) - no 2PC needed
if shared nothing and horizontally fragmented
decompose query into subqueries that can be
processed in parallel
one site (result site) accepts all results and
puts together
Must also do order by, count

40
Distributed Query Processing

If the data is not fragmented
process whatever operations you can locally
(select, project)
Query can involve joins between data at multiple
sites
Must transfer data over network
Transfer of data highest cost
Algorithms to minimize amount of data transferred
(NP-hard)

41
Distributed Query Processing

Assume R1 X R2 -gt Site 3
with R1 at S1 and R2 at S2
Can do
R1 -gt S3 and R2 -gt S3, do join at S3
R1-gt S2, execute join at S2, result to S3
R2-gt S1, execute join at S1, result to S3

42
Semi-join X

p dname, lname Dept Xmgrssn Emp
Assume Dept is R1 and Emp is R2
1) project join attribute of Emp at S2 resulting
in E', send S1. E' is (p ssn Emp)
2) E' X Dept at S1. Project result attributes
and join attributes resulting in E'', send to S2
E'is (p mgr, dname (Dept Xmgrssn E'))
3) p dname, lname (E'' Xmgrssn Emp) and send
result to S3

43
Distributed Query Processing