Distributed databases - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed databases

Description:

not achievable (why?) therefore, autonomy should be achieved to the maximum extent possible ... Concurrency control. locking. overhead - increased number of ... – PowerPoint PPT presentation

Number of Views:9
Avg rating:3.0/5.0
Slides: 31
Provided by: maria111
Category:

less

Transcript and Presenter's Notes

Title: Distributed databases


1
Distributed databases
3
2
Outline
  • generalities
  • objectives
  • problems

3
1
4
Introduction
DBMS in its own right
5
Introduction
  • distributed database collection of connected
    sites
  • each site is a DB in its own right (1)
  • has its own DBMS and its own users
  • operations can be performed locally as if the DB
    was not distributed
  • the sites collaborate (transparently from the
    users point of view)
  • the union of all DBs the DB of the whole
    organisation (institution)
  • (oppose to (1))
  • physical or logical distribution
  • strict homogeneity (assumption)

6
Motivation
  • advantages
  • matches the structure of the organisation
  • example
  • efficiency of processing
  • stored closely to where it is being used
  • increased accessibility
  • remote DBs can be accessed
  • disadvantage
  • complexity

7
Implementations (systems)
  • commercial
  • ORACLE (Oracle Corporation)
  • INGRES/STAR (Ask Group Inc. Ingres Division)
  • DB2 (IBM)
  • they all provide some sort of features for
    distributed databases

8
Fundamental principle
  • a distributed DB system should look to the user
    exactly as a non-distributed DB system

9
2
10
Objectives
local autonomy no reliance on central site
location independence fragmentation
independence replication independence distributed
query processing distributed transaction
management
11
Objectives are
  • not independent from each other
  • not exhaustive
  • sometimes contradicting
  • different degree of importance (for the user)

12
Local autonomy
  • all operations at a certain site are fully
    controlled by that site
  • not achievable (why?)
  • therefore, autonomy should be achieved to the
    maximum extent possible
  • local data is locally owned and managed
  • local data belongs to the local server even if it
    is accessible from other servers
  • security, integrity, ..., are in the
    responsibility of the local server

13
No reliance on a central site
  • reasons
  • bottle-neck
  • vulnerability
  • conclusion
  • all sites must be equal

14
Location independence
  • users should not have to know where data is
    physically stored
  • why do you think this is needed?
  • think of application programs
  • what does this objective look like?

15
Data fragmentation
  • data fragmentation
  • if a relation can be divided into fragments for
    storing purposes
  • motivation performance - data is stored where it
    is mostly used
  • definition
  • fragment any subrelation derivable via
    restriction or projection

16
Data fragmentation - example
FRAGMENT Emp INTO Lo_Emp AT SITE London
WHERE Dept_id Sales Le_Emp AT SITE
Leeds WHERE Dept_id Dev
17
Fragmentation independence / transparency
  • users should perceive data as if it were not
    fragmented
  • why?
  • it is the optimisers responsibility to determine
    which fragments need to be physically accessed
  • similar to views
  • retrieving
  • updating (JOIN and UNION views)

18
Data replication
  • copies of the same fragment can exist at
    different sites
  • reasons
  • better availability
  • better performance
  • disadvantage
  • update propagation

19
Replication independence / transparency
  • users should not have to be aware of data
    replication
  • it is the optimisers responsibility to choose
    which replica to use
  • commercial systems
  • not full support for replication independence
    (update problems) - primary copy

20
Distributed query processing
  • the system must have set level operators
  • one record at a time - too many messages
    (traffic)
  • relational - indicated
  • optimisation
  • particularly relevant!
  • find best way to move data across the network

21
3
22
Problems
  • aim
  • minimise network utilisation
  • occur
  • due to network utilisation

query processing catalogue management update
propagation recovery control concurrency control
23
Query processing
  • in a distributed environment
  • query execution is distributed
  • query optimisation is distributed
  • global optimisation
  • local optimisation
  • example
  • query on relation R issued at site X
  • part of R, say Ry, stored at Y
  • part of R, say Rz, stored at Z
  • where is the query going to be executed?

24
Catalogue management
  • what other data does the catalog include?
  • fragmentation, replication ...
  • where should the catalogue be stored
  • centralised
  • fully replicated
  • loss of autonomy - update propagation!
  • partitioned
  • non local operations - very expensive!
  • combination of first and third

25
Central Catalogue
  • all updates, including local updates, have to be
    recorded in the central catalogue
  • disadvantages
  • bottleneck
  • conflicts with the no reliance on a central
    site objective

26
Fully Replicated Catalogue
  • the entire database catalogue (not only the local
    one) is stored at each site
  • every time an update is made, it has to be
    recorded at each site
  • disadvantages
  • loss of local autonomy
  • time and network traffic consuming updates

27
Update propagation
  • problems because of replication
  • data might become less available
  • primary copy scheme
  • one copy is designated primary copy (unique)
  • primary copies exist at different sites
    (distributed)
  • an update is logically complete if the primary
    copy has been updated
  • the site holding the primary copy would have to
    propagate the updates
  • violation of local autonomy

28
Concurrency control
  • locking
  • overhead - increased number of messages
  • primary copy strategy
  • locking only the primary copy
  • the primary copys site will propagate the update
  • loss of autonomy (severely)
  • global deadlock
  • two interlocked (waiting for each other) sites
  • cannot be detected using the wait-for graph -
    therefore, communication overhead

29
(No Transcript)
30
Conclusion
  • generalities
  • objectives in brief
  • problems in brief
Write a Comment
User Comments (0)
About PowerShow.com