DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 - PowerPoint PPT Presentation

About This Presentation
Title:

DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002

Description:

... (row-by-row) based client-server interfaces ... Compiler integrated interfaces (embedded SQL) ODBC: SQL-based standardized subroutine call library (Microsoft) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 18
Provided by: kjello
Category:

less

Transcript and Presenter's Notes

Title: DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002


1
DATABASE SYSTEMS - 10pCourse No. 2AD235 Spring
2002
  • A second course on development of database
    systems
  • Kjell OrsbornUppsala Database
    LaboratoryDepartment of Information Technology,
    Uppsala University, Uppsala, Sweden

2
Introduction to Distributed DBMSs(Elmasri/Navathe
ch. 24)
  • Distributed DBMS (ch. 24.4 and 24.5 are omitted)
  • Kjell Orsborn Uppsala Database
    Laboratory,Department of Information Technology,
    Uppsala University, Uppsala, Sweden

3
Distributed DBMSs
  • A distributed database (DDB) is a collection of
    several logically interrelated databases
    distributed over a computer network including a
    number of computers (nodes).
  • A distributed database mangement system (DDBMS)
    is a software system that permits management of
    DDBs and that makes the distribution transparent
    for the user.
  • A DDB is not
  • a collection of files (need structure and DB
    manager)
  • a client-server interface to a database
  • data on one node, clients on other nodes in
    network
  • (almost) every centralized DBMS has client-server
    interface

4
Background
  • What is a Distributed System?
  • A Distributed System is a number of autonomous
    computers communicating over a network with
    software for integrated tasks.
  • Examples of Distributed Systems
  • SUNs Network File System (NFS), distributed file
    system

5
Distributed DBMSs . . .
Distributed database over several nodes in a
network
Centralized database in a network
Node 1
Node 5
Node 1
Node 2
Node 5
Node 2
communication network
communication network
Node 4
Node 4
Node 3
Node 3
6
Centralized Database Server
  • Stream (row-by-row) based client-server
    interfaces
  • DBMS specific interfaces
  • Compiler integrated interfaces (embedded SQL)
  • ODBC SQL-based standardized subroutine call
    library (Microsoft)
  • JDBC ODBC for Java (not Microsoft)

7
Distributed Databases
  • Database seen as one unit queries and updates to
    ONE database.
  • Data in database transparently distributed over
    many DB nodes.
  • Manual partitioning or fragmentation of data
    tables.
  • DBMS automatically optimizes queries and updates
    to distributed database.

8
Multi-Databases
  • Database seen as several heterogeneous units
  • Multi-database query language needed to combine
    data from the databases.
  • Primitives needed to integrate (combine, fuse)
    data from the databases.
  • Special query optimization techniques to deal
    with heterogneity and dynamism.

9
Example of Multi-Database
  • Automatic Teller Machines, ATMs

10
Fragmentation of data
  • data fragmentation ( data partitioning)
  • division of data sets (e.g. a relation) into
    several pieces - fragments transparently stored
    on several different nodes
  • increased accessability and performance
  • several types of fragmentation
  • horisontal fragmentation
  • vertical fragmentation
  • mixed fragmentation
  • good when nodes far apart

11
Replication of data
  • copies of the same data on several nodes
  • increased reliability and access performance
  • more complex updating, transactions handling,
    recovery.
  • updates must be propagated to each replica!
  • special procedures after failures to restore
    consistency
  • more problematic transaction synchronization!
  • types of replication
  • full replication (whole db at each node)
  • no replication (each fragment only at one node)
  • partial replication (certain fragments
    replicated)
  • not necessary to replicate all tables
  • full replication often not realistic!

12
Transparency in a DDBMS
  • By transparency we here mean the hiding of basic
    implementation details from one abstraction level
    to another.
  • Data independence
  • logical data independence
  • physical data independence
  • Network transparency
  • protect user from operational details of network
  • hides the existence of a network
  • no machine names in database table references
  • location transparency
  • naming transparency
  • Replication transparency
  • user should not experience data replicas
  • automatic handling of updates, such as replica
    propagation
  • automatic handling of node crasches
  • Fragmentation transparency
  • hides the existence of fragments
  • e.g. that a logical relation is horizontally
    fragmented into local physical tables
  • handling of transformation of global queries to
    fragmented queries

13
Advantages of Distributed DBMSs . . .
  • Data sharing
  • uniform interface and sharing of data through the
    DDBMS
  • natural to distribute certain database
    applications
  • Increased reliability
  • redundance increase security and accessability
  • crashes less severe (if application not dependent
    of non-local data)
  • Local independence
  • allows sharing of data but keeps local control of
    data
  • Improved performance
  • avoid unnecessary data transfer
  • Expandibility
  • easy to add new nodes (not always linear scale up
    due to central directory)
  • Local autonomy
  • local control
  • local policies

14
Problems with Distributed DBMSs . . .
  • Complexity
  • database administration becomes more complex
    (such as recovery)
  • increased complexity of system design,
    implementation and maintenance
  • Security
  • keep security in a network harder
  • Networking a known problem
  • Distributed administration
  • less control and more meetings
  • Cost
  • hardware - software - development/maintenance

15
Problems with Distributed DBMSs . . .
  • Distributed schema management
  • schema is accessed whenever SQL query issued!
  • global directory gt Central Database becomes hot
    spot
  • local directories gt Data replication
  • gt Since schema is not updated often but need to
    be accessed very often it is normally fully
    replicated by the DDBMS.
  • Distributed concurrency control
  • consistency of replicas mutual consistency
  • Distributed deadlock management
  • Reliability of DDBMS
  • consistency of replicas
  • bring up (fragmented) database at failed sites
  • OS Support
  • multiple layers of network software

16
Additional functionality required by DDBMS
  • Access of physically divided databases - schema
    management
  • Handling of distribution and replication of data
  • which copy of data should for example be used
  • Handling of consistency of replicated data
  • Handling of distributed queries
  • Handling of distributed transactions (over
    several network nodes)
  • Handling of recovery/restart from crashes (of
    nodes) and new types of errors such as
    communication errrors/failures.

17
Distributed database design
  • Goal
  • to minimize the combined cost of maintaining
    data, recieve efficient communication and good
    performance for transactions.
  • Problems
  • where (on which node/nodes) shall data and
    applications be placed
  • partitioning of data (split data into distributed
    partitions)
  • replication of data (copies of data on several
    nodes)
  • NP-complete optimization problem.
  • distributed query processing
  • automatically done by distributed query processor
    of DDBMS
  • analyze query --gt distributed execution plan
  • factors
  • data replication
  • data availability
  • communication costs
Write a Comment
User Comments (0)
About PowerShow.com