Distributed databases - PowerPoint PPT Presentation

About This Presentation

Title:

Distributed databases

Description:

not achievable (why?) therefore, autonomy should be achieved to the maximum extent possible ... Concurrency control. locking. overhead - increased number of ... – PowerPoint PPT presentation

Number of Views:9

Avg rating:3.0/5.0

Slides: 31

Provided by: maria111

Category:

more less

Transcript and Presenter's Notes

Title: Distributed databases

1
Distributed databases
3
2
Outline

generalities
objectives
problems

3
1
4
Introduction
DBMS in its own right
5
Introduction

distributed database collection of connected
sites
each site is a DB in its own right (1)
has its own DBMS and its own users
operations can be performed locally as if the DB
was not distributed
the sites collaborate (transparently from the
users point of view)
the union of all DBs the DB of the whole
organisation (institution)
(oppose to (1))
physical or logical distribution
strict homogeneity (assumption)

6
Motivation

advantages
matches the structure of the organisation
example
efficiency of processing
stored closely to where it is being used
increased accessibility
remote DBs can be accessed
disadvantage
complexity

7
Implementations (systems)

commercial
ORACLE (Oracle Corporation)
INGRES/STAR (Ask Group Inc. Ingres Division)
DB2 (IBM)
they all provide some sort of features for
distributed databases

8
Fundamental principle

a distributed DB system should look to the user
exactly as a non-distributed DB system

9
2
10
Objectives
local autonomy no reliance on central site
location independence fragmentation
independence replication independence distributed
query processing distributed transaction
management
11
Objectives are

not independent from each other
not exhaustive
sometimes contradicting
different degree of importance (for the user)

12
Local autonomy

all operations at a certain site are fully
controlled by that site
not achievable (why?)
therefore, autonomy should be achieved to the
maximum extent possible
local data is locally owned and managed
local data belongs to the local server even if it
is accessible from other servers
security, integrity, ..., are in the
responsibility of the local server

13
No reliance on a central site

reasons
bottle-neck
vulnerability
conclusion
all sites must be equal

14
Location independence

users should not have to know where data is
physically stored
why do you think this is needed?
think of application programs
what does this objective look like?

15
Data fragmentation

data fragmentation
if a relation can be divided into fragments for
storing purposes
motivation performance - data is stored where it
is mostly used
definition
fragment any subrelation derivable via
restriction or projection

16
Data fragmentation - example
FRAGMENT Emp INTO Lo_Emp AT SITE London
WHERE Dept_id Sales Le_Emp AT SITE
Leeds WHERE Dept_id Dev
17
Fragmentation independence / transparency

users should perceive data as if it were not
fragmented
why?
it is the optimisers responsibility to determine
which fragments need to be physically accessed
similar to views
retrieving
updating (JOIN and UNION views)

18
Data replication

copies of the same fragment can exist at
different sites
reasons
better availability
better performance
disadvantage
update propagation

19
Replication independence / transparency

users should not have to be aware of data
replication
it is the optimisers responsibility to choose
which replica to use
commercial systems
not full support for replication independence
(update problems) - primary copy

20
Distributed query processing

the system must have set level operators
one record at a time - too many messages
(traffic)
relational - indicated
optimisation
particularly relevant!
find best way to move data across the network

21
3
22
Problems

aim
minimise network utilisation

occur
due to network utilisation

query processing catalogue management update
propagation recovery control concurrency control
23
Query processing

in a distributed environment
query execution is distributed
query optimisation is distributed
global optimisation
local optimisation
example
query on relation R issued at site X
part of R, say Ry, stored at Y
part of R, say Rz, stored at Z
where is the query going to be executed?

24
Catalogue management

what other data does the catalog include?
fragmentation, replication ...
where should the catalogue be stored
centralised
fully replicated
loss of autonomy - update propagation!
partitioned
non local operations - very expensive!
combination of first and third

25
Central Catalogue

all updates, including local updates, have to be
recorded in the central catalogue
disadvantages
bottleneck
conflicts with the no reliance on a central
site objective

26
Fully Replicated Catalogue