Title: Distributed databases
1Distributed databases
3
2Outline
- generalities
- objectives
- problems
31
4Introduction
DBMS in its own right
5Introduction
- distributed database collection of connected
sites - each site is a DB in its own right (1)
- has its own DBMS and its own users
- operations can be performed locally as if the DB
was not distributed - the sites collaborate (transparently from the
users point of view) - the union of all DBs the DB of the whole
organisation (institution) - (oppose to (1))
- physical or logical distribution
- strict homogeneity (assumption)
6Motivation
- advantages
- matches the structure of the organisation
- example
- efficiency of processing
- stored closely to where it is being used
- increased accessibility
- remote DBs can be accessed
- disadvantage
- complexity
7Implementations (systems)
- commercial
- ORACLE (Oracle Corporation)
- INGRES/STAR (Ask Group Inc. Ingres Division)
- DB2 (IBM)
- they all provide some sort of features for
distributed databases
8Fundamental principle
- a distributed DB system should look to the user
exactly as a non-distributed DB system
92
10Objectives
local autonomy no reliance on central site
location independence fragmentation
independence replication independence distributed
query processing distributed transaction
management
11Objectives are
- not independent from each other
- not exhaustive
- sometimes contradicting
- different degree of importance (for the user)
12Local autonomy
- all operations at a certain site are fully
controlled by that site - not achievable (why?)
- therefore, autonomy should be achieved to the
maximum extent possible - local data is locally owned and managed
- local data belongs to the local server even if it
is accessible from other servers - security, integrity, ..., are in the
responsibility of the local server
13No reliance on a central site
- reasons
- bottle-neck
- vulnerability
- conclusion
- all sites must be equal
14Location independence
- users should not have to know where data is
physically stored - why do you think this is needed?
- think of application programs
- what does this objective look like?
15Data fragmentation
- data fragmentation
- if a relation can be divided into fragments for
storing purposes - motivation performance - data is stored where it
is mostly used - definition
- fragment any subrelation derivable via
restriction or projection
16Data fragmentation - example
FRAGMENT Emp INTO Lo_Emp AT SITE London
WHERE Dept_id Sales Le_Emp AT SITE
Leeds WHERE Dept_id Dev
17Fragmentation independence / transparency
- users should perceive data as if it were not
fragmented - why?
- it is the optimisers responsibility to determine
which fragments need to be physically accessed - similar to views
- retrieving
- updating (JOIN and UNION views)
18Data replication
- copies of the same fragment can exist at
different sites - reasons
- better availability
- better performance
- disadvantage
- update propagation
19Replication independence / transparency
- users should not have to be aware of data
replication - it is the optimisers responsibility to choose
which replica to use - commercial systems
- not full support for replication independence
(update problems) - primary copy
20Distributed query processing
- the system must have set level operators
- one record at a time - too many messages
(traffic) - relational - indicated
- optimisation
- particularly relevant!
- find best way to move data across the network
213
22Problems
- aim
- minimise network utilisation
- occur
- due to network utilisation
query processing catalogue management update
propagation recovery control concurrency control
23Query processing
- in a distributed environment
- query execution is distributed
- query optimisation is distributed
- global optimisation
- local optimisation
- example
- query on relation R issued at site X
- part of R, say Ry, stored at Y
- part of R, say Rz, stored at Z
- where is the query going to be executed?
24Catalogue management
- what other data does the catalog include?
- fragmentation, replication ...
- where should the catalogue be stored
- centralised
- fully replicated
- loss of autonomy - update propagation!
- partitioned
- non local operations - very expensive!
- combination of first and third
25Central Catalogue
- all updates, including local updates, have to be
recorded in the central catalogue - disadvantages
- bottleneck
- conflicts with the no reliance on a central
site objective
26Fully Replicated Catalogue
- the entire database catalogue (not only the local
one) is stored at each site - every time an update is made, it has to be
recorded at each site - disadvantages
- loss of local autonomy
- time and network traffic consuming updates
27Update propagation
- problems because of replication
- data might become less available
- primary copy scheme
- one copy is designated primary copy (unique)
- primary copies exist at different sites
(distributed) - an update is logically complete if the primary
copy has been updated - the site holding the primary copy would have to
propagate the updates - violation of local autonomy
28Concurrency control
- locking
- overhead - increased number of messages
- primary copy strategy
- locking only the primary copy
- the primary copys site will propagate the update
- loss of autonomy (severely)
- global deadlock
- two interlocked (waiting for each other) sites
- cannot be detected using the wait-for graph -
therefore, communication overhead
29(No Transcript)
30Conclusion
- generalities
- objectives in brief
- problems in brief