Distributed Databases DDBs - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

Distributed Databases DDBs

Description:

If shared nothing & horizontally fragmented: Select name, ... If horizontally fragmented. decompose query into subqueries that can be processed in parallel ... – PowerPoint PPT presentation

Number of Views:212

Avg rating:3.0/5.0

Slides: 52

Provided by: Vrb

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Databases DDBs

1
Distributed Databases (DDBs)
2
Distributed Databases

Distributed Systems goal
to offer local DB autonomy at geographically
distributed locations
Multiple CPU's each has DBMS, but data
distributed

3
Advantages of DDBs

Distributed nature of some DB applications (bank
branches)
Increased reliability and availability if site
failure - also replicate data at gt 1 site
Data sharing but also local control
Improved performance - smaller DBs exist at each
site

4
Disadvantages Increased complexity

Additional functions needed
global vs. local queries
access remote sites, transmit queries and data
keep track of data and replication
execution strategies if data at gt 1 site
which copy to access
maintain consistency of copies
recover from site crashes

5
Architectures

Parallel Systems goal to construct a faster
centralized computer
Distributed Systems goal to offer local DB
autonomy at geographically distributed locations

6
Parallel DBSs

Shared-memory multiprocessor
get N times as much work with N CPU's access
MIMD, SIMD - equal access to same data, massively
parallel
Parallel shared nothing
data split among CPUs, each has own CPU, divide
work for transactions, communicate over high
speed networks LANs -
homogeneous machines CPU
memory - called a site

7
Distributed DBSs (DDBS)

Distributed DB - share nothing
lower communication rates
WAN
heterogeneous machines
Homogeneous DDBS
homogeneous same DBMSs
Heterogeneous DDBS
different DBMSs - need ODBC, standard SQL

8
Heterogeneous distributed DBSs HDDBs

Data distributed and each site has own DBMS
ORACLE at one site, DB2 at another, etc.
need ODBC, standard SQL
usually transaction manager responsible for
cooperation among sites
must coordinate distributed transaction
need data conversion and to access data at other
sites

9
Multidatabase

autonomous preexisting DBs form new database
collection of cooperating DBSs that are
heterogeneous
Each DB has its own local users, local
transparency and DBA
very high degree of local autonomy
need additional interface to allow users to
access global data

10
Federated DB

federated DB is a multidatabase that bridges
heterogeneity fully integrated
keeps a partial view of total schema
Each DB specifies import/export schema (view)
appears centralized for local autonomous users
appears distributed for global users

11
DDBS

Issues in DDBS in slides that follow

12
To process a query

Must use data dictionary that includes info on
data distribution among servers
Parse user query
decomposed into independent site queries
each site query sent to appropriate server site
site processes local query, sends result to
result site
result site combines results of subqueries
Ensure atomicity

13
Data fragments

Can distribute a whole relation at a site
or
Data fragments
logical units of the DB assigned for storage at
various sites
horizontal fragmentation - subset of tuples in
the relation (select)
vertical fragmentation - keeps only certain
attributes of relation (project) need a PK

14
Fragments contd

horizontal fragments
complete - set of fragments whose conditions
include every tuple
disjoint - tuples only member of 1 fragment
salary lt 5000 and dno4
Complete vertical fragment L1 U L2 U ...
Ln - attributes of R Li
intersect Lj PK(R)

15
Replication

Full vs. partial replication
Which copy to access
Improves performance for global queries but
updates a problem
Ensure consistency of replicated copies of data

16
Example replication/fragmentation

Example of fragments for company DB site 1
- company headquarters gets entire DB
site 2, 3 horizontal fragments based on
dept. no.

17
Concurrency Control and Recovery in Distributed
Databases

Distributed transactions inherit ACID properties
problems
failure at individual sites and communication
links
distributed deadlock (usually timeout)
distributed commit (use 2PC)
concurrency control with multiple copies of data

18
Distributed Commit

if updating data at different sites for same
transaction, when to commit?
difficult part - commit together so have
atomicity
Observation - 3 states of basic transactions
            active - can enter other 2 states
            committed - cannot leave this state
            aborted - cannot leave this state

19
States

commit
Active
abort

20
Commit problems

To commit a distributed transaction need a
coordinator (originator) site (S1 in our example)
Given transaction T with with components T1 and
T2
where T1 executes at site S1
T2 executes at site S2
a) if S1 commits locally, sends commit message
to S2
BUT if S2 crashes, T2 aborts before completes
subtask not atomic
b) if both T1 and T2 ready to commit S1 sends
message to commit
BUT if T2 crashes during commit, not atomic if
cant recover

21
To solve problem

How to solve these problems?
Need an additional state
prepare state
when T is done and ready to commit, it goes to
the prepare state
in the prepare state, a transaction can go to
the commit or abort state
(place prepare log on log buffer, forced to log
file - can reconstruct the transaction)

22
States

commit
Active Prepare to commit
abort

23
Use Two-Phase Commit (2PC)

Used by commercial DBS to achieve coordinated
commit of distributed transactions
Need a coordinator - usually a site that is
participating in the transaction
Only needed if updates

24
2PC

Phase 1
When each subquery concludes, signal coordinator
If receive done from all, coordinator sends
prepare to commit to all sites
Else coordinate signals abort to all.
Each site writes changes and Prepare to log
file, send ready to commit, site enters
prepared state
if log fails or cannot commit for other reasons,
send cannot commit, else can commit
If dont hear from everyone within a time out
interval assume not OK, send abort to all-
rollback locally

25
2PC

Phase 2
If all prepare requests successful responses,
coordinator sends commit
Sites record Commit in log, update DB
If any site response is unsuccessful, coordinator
sends messages to all sites to abort -
rollback locally
If dont hear from everyone within a time out
interval assume not OK, send abort to all-
rollback locally
subquery concluded ready to commit
(prepared state)
commit record commit

26
Solved by 2PC?

a) if S1 commits locally, sends commit to S2
BUT if S2 crashes, T2 aborts and it is not
atomic
With 2PC
if T1 completes Transaction
T1 is prepared
If T2 crashes
T2 is unsuccessful
T1 can still abort from prepared state

27
Solved by 2PC?

b) If both T1 and T2 ready to commit, S1 sends
message to commit
BUT if T2 crashes before commit completed, not
atomic
With 2PC
if both T1 and T2 successfully complete
Transaction
T2 and T1 are both prepared
If T2 crashes can recover prepared state and
commit

28
Any problems with 2PC?

A blocking protocol (block until commit or
rollback received)
In most practical applications, blocking caused
by 2PC are rare
Therefore, most systems use 2PC
However, when could 2PC block?

29
2PC blocking

However, when could 2PC block?
Problem can occur when site in prepared state
Site voted to commit
Cannot commit until receive commit from
coordinator
Cannot change vote and abort
Participant is blocked until receive message from
coordinator
(or coordinator can block while waiting for
replies from participants what about time out?)
Solution?

30
Three-phase commit

3PC
Requires an extra round of messages
Adds another state pre_commit in between ready
to commit (prepared) state and commit state
When in pre-commit state
can still abort
can contact other sites

31
3PC

Phase 1 the same as 2PC
Coordinator sends vote_req to all sites
Sites respond commit or abort
If one aborts, all abort
Phase 2
If all reply vote_commit, coordinator sends
precommit (prepare_to_commit)
Sites that responded commit, wait
If receive precommit, send ack
Else abort

32
3PC

Phase 3
Coordinator collects all acks
Decides commit
Broadcasts global_commit
Site receiving global_commit, commits
subquery concluded ready to commit
precommit
commit record commit

33
3PC

Eliminates blocking due to coordinator failure
when
Site sent vote_commit, but not received
prepare_to_commit
Site can abort if do not hear from coordinator
Site received prepare_to_commit but not
global_commit
Knows everyone voted to commit, knows it will
eventually commit, can act independently,
communicate with other sites

34
3PC

If in pre-commit state
can elect new coordinator if it fails
New coordinator collects states from sites and
tries to resolve transaction
Otherwise new coordinator tries to establish
quorum

35
Quorum based 3PC establishing quorum

Every site assigned a vote Vi
Total votes V
Abort and commit quorum are Va and Vc
VaVcgt V
Before commit, must obtain commit quorum
At least one site in pre-commit and (sites in
wait sites in pre_commit) form quorum
Before abort, must obtain abort quorum
(Sites in wait pre-abort) for quorum

36
Distributed concurrency control using locking

Centralized vs. distributed solutions
Definition distinguished copy
Locks associated with this copy

37
Primary site technique

Primary site technique
Distinguished copy of all data at this site
All locks kept at this site, all requests sent
here
Site acts as coordinator for all DB items
If write lock, DBMS must update all copies
If read lock from primary site, can access local
copy
disadv
all locking requests sent to same site -
bottleneck
if failure at primary site? must abort and
restart all Ts

38
Primary site with backup

Primary site with backup
all lock information at both sites, so failure
is less of a problem
disadv
slower - 2 sites to communicate with

39
Primary copy technique

Primary copy technique - distribute
responsibility
distinguished copies of different items at
different sites
can also choose a new coordinator if a failure -
election to choose self as new site

40
Voting

Voting - no distinguished copy
lock request sent to all sites with copy of data
if granted lock by majority of copies, hold lock
and inform sites with copies lock is granted
if don't get a majority, timeout and cancel
request sent to all sites
disadv
lots of message traffic

41
Additional Issues in DDBSs

Distributed transaction if data distributed
Transaction manager (TM) knows where data is
stored
TM decides which site starts local components of
transaction
Distributed query processing

42
Query Parallelism

Decompose query into parts that can be executed
in parallel at several sites
Intra query parallelism
If shared nothing horizontally fragmented
Select name, phone from account where age gt 65
Decompose into K different queries
Result site accepts all and puts together (order
by, count)

43
Query Parallelism

What if a join?
Difficult problem if table fragments at different
sites
Must get all values of joint attributes at one
site
Then broadcast to relevant sites value of join
attribute
If site 1 has values 1-10 and site 2 has 11-20,
only set to those sites
Result tuples returned, join performed
Example A XB with tuples A1B1 A2B2
A3B3 tuples A1,A2,A3 sent to S1 S1 sends FK to
B1,B2,B3 sites B1,B2,B3 site send tuples where
PKA1,A2,A3 join performed at S1

44
Distributed Query Processing

If Query (read-only) - no 2PC needed
If horizontally fragmented
decompose query into subqueries that can be
processed in parallel
one site (result site) accepts all results and
puts together
Must also do order by, count

45
Distributed Query Processing

If the data is not fragmented
process whatever operations you can locally
(select, project)
Query can involve joins between data at multiple
sites
Must transfer data over network
Transfer of data highest cost
Algorithms to minimize amount of data transferred
(NP-hard)

46
Distributed Query Processing

Assume R1 X R2 -gt Site 3
with R1 at S1 and R2 at S2
Can do
R1 -gt S3 and R2 -gt S3, do join at S3
R1-gt S2, execute join at S2, result to S3
R2-gt S1, execute join at S1, result to S3

47
Semi-join X

p dname, lname Dept Xmgrssn Emp
Assume Dept is R1 and Emp is R2
1) project join attribute of Emp at S2 resulting
in E', send to S1. E' is (p ssn Emp)
2) E' X Dept at S1. Project result attributes
and join attributes resulting in E'', send to S2
E' is (p mgr, dname (Dept Xmgrssn E'))
3) p dname, lname (E'' Xmgrssn Emp) and send
result to S3

48
Distributed Query Processing